The automation of relevant trial registration screening for systematic review updates: an evaluation study on a large dataset of ClinicalTrials.gov registrations.
Document similarity
Hierarchical clustering
Systematic reviews
Trial registrations
Journal
BMC medical research methodology
ISSN: 1471-2288
Titre abrégé: BMC Med Res Methodol
Pays: England
ID NLM: 100968545
Informations de publication
Date de publication:
18 12 2021
18 12 2021
Historique:
received:
11
05
2021
accepted:
22
11
2021
entrez:
19
12
2021
pubmed:
20
12
2021
medline:
27
1
2022
Statut:
epublish
Résumé
Clinical trial registries can be used as sources of clinical evidence for systematic review synthesis and updating. Our aim was to evaluate methods for identifying clinical trial registrations that should be screened for inclusion in updates of published systematic reviews. A set of 4644 clinical trial registrations (ClinicalTrials.gov) included in 1089 systematic reviews (PubMed) were used to evaluate two methods (document similarity and hierarchical clustering) and representations (L2-normalised TF-IDF, Latent Dirichlet Allocation, and Doc2Vec) for ranking 163,501 completed clinical trials by relevance. Clinical trial registrations were ranked for each systematic review using seeding clinical trials, simulating how new relevant clinical trials could be automatically identified for an update. Performance was measured by the number of clinical trials that need to be screened to identify all relevant clinical trials. Using the document similarity method with TF-IDF feature representation and Euclidean distance metric, all relevant clinical trials for half of the systematic reviews were identified after screening 99 trials (IQR 19 to 491). The best-performing hierarchical clustering was using Ward agglomerative clustering (with TF-IDF representation and Euclidean distance) and needed to screen 501 clinical trials (IQR 43 to 4363) to achieve the same result. An evaluation using a large set of mined links between published systematic reviews and clinical trial registrations showed that document similarity outperformed hierarchical clustering for identifying relevant clinical trials to include in systematic review updates.
Sections du résumé
BACKGROUND
Clinical trial registries can be used as sources of clinical evidence for systematic review synthesis and updating. Our aim was to evaluate methods for identifying clinical trial registrations that should be screened for inclusion in updates of published systematic reviews.
METHODS
A set of 4644 clinical trial registrations (ClinicalTrials.gov) included in 1089 systematic reviews (PubMed) were used to evaluate two methods (document similarity and hierarchical clustering) and representations (L2-normalised TF-IDF, Latent Dirichlet Allocation, and Doc2Vec) for ranking 163,501 completed clinical trials by relevance. Clinical trial registrations were ranked for each systematic review using seeding clinical trials, simulating how new relevant clinical trials could be automatically identified for an update. Performance was measured by the number of clinical trials that need to be screened to identify all relevant clinical trials.
RESULTS
Using the document similarity method with TF-IDF feature representation and Euclidean distance metric, all relevant clinical trials for half of the systematic reviews were identified after screening 99 trials (IQR 19 to 491). The best-performing hierarchical clustering was using Ward agglomerative clustering (with TF-IDF representation and Euclidean distance) and needed to screen 501 clinical trials (IQR 43 to 4363) to achieve the same result.
CONCLUSION
An evaluation using a large set of mined links between published systematic reviews and clinical trial registrations showed that document similarity outperformed hierarchical clustering for identifying relevant clinical trials to include in systematic review updates.
Identifiants
pubmed: 34922458
doi: 10.1186/s12874-021-01485-6
pii: 10.1186/s12874-021-01485-6
pmc: PMC8684229
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
281Subventions
Organisme : NLM NIH HHS
ID : R01 LM012976
Pays : United States
Informations de copyright
© 2021. The Author(s).
Références
PLoS Med. 2016 May 24;13(5):e1002028
pubmed: 27218655
BMJ. 2013 Jan 10;346:f139
pubmed: 23305843
Syst Rev. 2017 Jul 3;6(1):123
pubmed: 28669351
Ann Intern Med. 2010 Aug 3;153(3):158-66
pubmed: 20679560
J Biomed Inform. 2018 Mar;79:32-40
pubmed: 29410356
BMJ. 2013 Oct 29;347:f6104
pubmed: 24169943
J Clin Epidemiol. 2019 May;109:62-69
pubmed: 30708175
J Clin Epidemiol. 2018 Nov;103:101-111
pubmed: 30297037
Syst Rev. 2018 Mar 12;7(1):45
pubmed: 29530097
J Clin Epidemiol. 2019 Jun;110:42-49
pubmed: 30849512
BMC Med Res Methodol. 2005 Oct 14;5:33
pubmed: 16225692
PLoS One. 2014 Dec 23;9(12):e114023
pubmed: 25536072
BMJ. 2001 Oct 13;323(7317):833-6
pubmed: 11597966
BMJ. 2016 Jul 20;354:i3507
pubmed: 27443385
BMC Med Inform Decis Mak. 2010 Sep 28;10:56
pubmed: 20920176
JAMIA Open. 2019 Jan 11;2(1):15-22
pubmed: 31984340
J Clin Epidemiol. 2018 Mar;95:94-101
pubmed: 29277557
BMC Med. 2018 Oct 16;16(1):173
pubmed: 30322399
Syst Rev. 2015 Jan 14;4:5
pubmed: 25588314
AMIA Annu Symp Proc. 2008 Nov 06;:141-5
pubmed: 18999067
PLoS One. 2010 Apr 01;5(4):e9914
pubmed: 20376338
PLoS One. 2013 Jul 05;8(7):e66844
pubmed: 23861749
Ann Intern Med. 2017 Aug 1;167(3):213-215
pubmed: 28605762
JAMA. 1998 Jul 15;280(3):278-80
pubmed: 9676681
BMC Med Inform Decis Mak. 2012 Apr 19;12:33
pubmed: 22515596