Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error.


Journal

Systematic reviews
ISSN: 2046-4053
Titre abrégé: Syst Rev
Pays: England
ID NLM: 101580575

Informations de publication

Date de publication:
15 01 2019
Historique:
received: 01 02 2018
accepted: 03 01 2019
entrez: 17 1 2019
pubmed: 17 1 2019
medline: 24 3 2020
Statut: epublish

Résumé

Here, we outline a method of applying existing machine learning (ML) approaches to aid citation screening in an on-going broad and shallow systematic review of preclinical animal studies. The aim is to achieve a high-performing algorithm comparable to human screening that can reduce human resources required for carrying out this step of a systematic review. We applied ML approaches to a broad systematic review of animal models of depression at the citation screening stage. We tested two independently developed ML approaches which used different classification models and feature sets. We recorded the performance of the ML approaches on an unseen validation set of papers using sensitivity, specificity and accuracy. We aimed to achieve 95% sensitivity and to maximise specificity. The classification model providing the most accurate predictions was applied to the remaining unseen records in the dataset and will be used in the next stage of the preclinical biomedical sciences systematic review. We used a cross-validation technique to assign ML inclusion likelihood scores to the human screened records, to identify potential errors made during the human screening process (error analysis). ML approaches reached 98.7% sensitivity based on learning from a training set of 5749 records, with an inclusion prevalence of 13.2%. The highest level of specificity reached was 86%. Performance was assessed on an independent validation dataset. Human errors in the training and validation sets were successfully identified using the assigned inclusion likelihood from the ML model to highlight discrepancies. Training the ML algorithm on the corrected dataset improved the specificity of the algorithm without compromising sensitivity. Error analysis correction leads to a 3% improvement in sensitivity and specificity, which increases precision and accuracy of the ML algorithm. This work has confirmed the performance and application of ML algorithms for screening in systematic reviews of preclinical animal studies. It has highlighted the novel use of ML algorithms to identify human error. This needs to be confirmed in other reviews with different inclusion prevalence levels, but represents a promising approach to integrating human decisions and automation in systematic review methodology.

Sections du résumé

BACKGROUND
Here, we outline a method of applying existing machine learning (ML) approaches to aid citation screening in an on-going broad and shallow systematic review of preclinical animal studies. The aim is to achieve a high-performing algorithm comparable to human screening that can reduce human resources required for carrying out this step of a systematic review.
METHODS
We applied ML approaches to a broad systematic review of animal models of depression at the citation screening stage. We tested two independently developed ML approaches which used different classification models and feature sets. We recorded the performance of the ML approaches on an unseen validation set of papers using sensitivity, specificity and accuracy. We aimed to achieve 95% sensitivity and to maximise specificity. The classification model providing the most accurate predictions was applied to the remaining unseen records in the dataset and will be used in the next stage of the preclinical biomedical sciences systematic review. We used a cross-validation technique to assign ML inclusion likelihood scores to the human screened records, to identify potential errors made during the human screening process (error analysis).
RESULTS
ML approaches reached 98.7% sensitivity based on learning from a training set of 5749 records, with an inclusion prevalence of 13.2%. The highest level of specificity reached was 86%. Performance was assessed on an independent validation dataset. Human errors in the training and validation sets were successfully identified using the assigned inclusion likelihood from the ML model to highlight discrepancies. Training the ML algorithm on the corrected dataset improved the specificity of the algorithm without compromising sensitivity. Error analysis correction leads to a 3% improvement in sensitivity and specificity, which increases precision and accuracy of the ML algorithm.
CONCLUSIONS
This work has confirmed the performance and application of ML algorithms for screening in systematic reviews of preclinical animal studies. It has highlighted the novel use of ML algorithms to identify human error. This needs to be confirmed in other reviews with different inclusion prevalence levels, but represents a promising approach to integrating human decisions and automation in systematic review methodology.

Identifiants

pubmed: 30646959
doi: 10.1186/s13643-019-0942-7
pii: 10.1186/s13643-019-0942-7
pmc: PMC6334440
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

23

Subventions

Organisme : Wellcome Trust
ID : MR/N015665/1
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/N015665/1
Pays : United Kingdom
Organisme : Wellcome Trust
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/J005037/1
Pays : United Kingdom
Organisme : National Centre for the Replacement, Refinement and Reduction of Animals in Research
ID : NC/L000970/1
Pays : United Kingdom

Références

J Stat Softw. 2010;33(1):1-22
pubmed: 20808728
J Biomed Inform. 2014 Oct;51:242-53
pubmed: 24954015
BMC Bioinformatics. 2010 Jan 26;11:55
pubmed: 20102628
Res Synth Methods. 2018 Sep;9(3):470-488
pubmed: 29956486
BMC Med Inform Decis Mak. 2012 Apr 19;12:33
pubmed: 22515596
Evid Based Preclin Med. 2016 Dec;3(2):e00022
pubmed: 28405408
J Am Med Inform Assoc. 2006 Mar-Apr;13(2):206-19
pubmed: 16357352
Res Synth Methods. 2018 Dec;9(4):602-614
pubmed: 29314757
Syst Rev. 2015 Jan 14;4:5
pubmed: 25588314
J Cereb Blood Flow Metab. 2014 May;34(5):737-42
pubmed: 24549183
Lab Anim. 2010 Jul;44(3):170-5
pubmed: 20551243
Stat Med. 2008 Jan 30;27(2):157-72; discussion 207-12
pubmed: 17569110
Res Synth Methods. 2014 Mar;5(1):31-49
pubmed: 26054024
Stat Med. 1998 Apr 30;17(8):857-72
pubmed: 9595616
BMJ Open. 2017 Feb 27;7(2):e012545
pubmed: 28242767
Epidemiology. 2014 Jan;25(1):114-21
pubmed: 24240655
Lab Anim. 2014 Jan;48(1):88
pubmed: 23836850
J Biomed Inform. 2017 Aug;72:67-76
pubmed: 28648605
IEEE Trans Pattern Anal Mach Intell. 2010 Mar;32(3):569-75
pubmed: 20075479
Syst Rev. 2014 Jul 09;3:74
pubmed: 25005128
Syst Rev. 2015 Jun 15;4:80
pubmed: 26073974
Genet Med. 2012 Jul;14(7):663-9
pubmed: 22481134
Syst Rev. 2016 May 23;5:87
pubmed: 27216467
J Clin Epidemiol. 2017 Nov;91:23-30
pubmed: 28912002

Auteurs

Alexandra Bannach-Brown (A)

Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland. a.bannach-brown@ed.ac.uk.
Translational Neuropsychiatry Unit, Aarhus University, Aarhus, Denmark. a.bannach-brown@ed.ac.uk.
Present Address: Centre for Research in Evidence-Based Practice, Bond University, Gold Coast, Australia. a.bannach-brown@ed.ac.uk.

Piotr Przybyła (P)

National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester, England.

James Thomas (J)

EPPI-Centre, Department of Social Science, University College London, London, England.

Andrew S C Rice (ASC)

Pain Research, Department of Surgery and Cancer, Imperial College, London, England.

Sophia Ananiadou (S)

National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester, England.

Jing Liao (J)

Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland.

Malcolm Robert Macleod (MR)

Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH