Classifiers of Data Sharing Statements in Clinical Trial Records.
BERT in Healthcare Text Analysis
Clinical Trial Data Classification
IPD Sharing Statement Evaluation
NLP Applications in Medicine
Journal
Studies in health technology and informatics
ISSN: 1879-8365
Titre abrégé: Stud Health Technol Inform
Pays: Netherlands
ID NLM: 9214582
Informations de publication
Date de publication:
22 Aug 2024
22 Aug 2024
Historique:
medline:
23
8
2024
pubmed:
23
8
2024
entrez:
23
8
2024
Statut:
ppublish
Résumé
Digital individual participant data (IPD) from clinical trials are increasingly distributed for potential scientific reuse. The identification of available IPD, however, requires interpretations of textual data-sharing statements (DSS) in large databases. Recent advancements in computational linguistics include pre-trained language models that promise to simplify the implementation of effective classifiers based on textual inputs. In a subset of 5,000 textual DSS from ClinicalTrials.gov, we evaluate how well classifiers based on domain-specific pre-trained language models reproduce original availability categories as well as manually annotated labels. Typical metrics indicate that classifiers that predicted manual annotations outperformed those that learned to output the original availability categories. This suggests that the textual DSS descriptions contain applicable information that the availability categories do not, and that such classifiers could thus aid the automatic identification of available IPD in large trial databases.
Identifiants
pubmed: 39176922
pii: SHTI240541
doi: 10.3233/SHTI240541
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM