Data extraction for evidence synthesis using a large language model: A proof-of-concept study.
accuracy
artificial intelligence
data extraction
evidence synthesis
large language models
proof of concept
Journal
Research synthesis methods
ISSN: 1759-2887
Titre abrégé: Res Synth Methods
Pays: England
ID NLM: 101543738
Informations de publication
Date de publication:
03 Mar 2024
03 Mar 2024
Historique:
revised:
18
12
2023
received:
02
10
2023
accepted:
26
01
2024
medline:
4
3
2024
pubmed:
4
3
2024
entrez:
3
3
2024
Statut:
aheadofprint
Résumé
Data extraction is a crucial, yet labor-intensive and error-prone part of evidence synthesis. To date, efforts to harness machine learning for enhancing efficiency of the data extraction process have fallen short of achieving sufficient accuracy and usability. With the release of large language models (LLMs), new possibilities have emerged to increase efficiency and accuracy of data extraction for evidence synthesis. The objective of this proof-of-concept study was to assess the performance of an LLM (Claude 2) in extracting data elements from published studies, compared with human data extraction as employed in systematic reviews. Our analysis utilized a convenience sample of 10 English-language, open-access publications of randomized controlled trials included in a single systematic review. We selected 16 distinct types of data, posing varying degrees of difficulty (160 data elements across 10 studies). We used the browser version of Claude 2 to upload the portable document format of each publication and then prompted the model for each data element. Across 160 data elements, Claude 2 demonstrated an overall accuracy of 96.3% with a high test-retest reliability (replication 1: 96.9%; replication 2: 95.0% accuracy). Overall, Claude 2 made 6 errors on 160 data items. The most common errors (n = 4) were missed data items. Importantly, Claude 2's ease of use was high; it required no technical expertise or labeled training data for effective operation (i.e., zero-shot learning). Based on findings of our proof-of-concept study, leveraging LLMs has the potential to substantially enhance the efficiency and accuracy of data extraction for evidence syntheses.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : RTI International Innovation Office
Informations de copyright
© 2024 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd.
Références
Institute of Medicine. Finding What Works in Health Care: Standards for Systematic Reviews. National Academies Press; 2011.
Higgins J, Thomas J, Chandler J, et al. Cochrane Handbook For Systematic Reviews of Interventions Version 6.4 (updated August 2023). Cochrane; 2023. Available from: www.training.cochrane.org/handbook Accessed September 27, 2023.
Nussbaumer-Streit B, Ellen M, Klerings I, et al. Resource use during systematic review production varies widely: a scoping review. J Clin Epidemiol. 2021;139:287-296.
Li T, Saldanha IJ, Jap J, et al. A randomized trial provided new evidence on the accuracy and efficiency of traditional vs. electronically annotated abstraction approaches in systematic reviews. J Clin Epidemiol. 2019;115:77-89.
Mathes T, Klassen P, Pieper D. Frequency of data extraction errors and methods to increase data extraction quality: a methodological review. BMC Med Res Methodol. 2017;17(1):152.
Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev. 2015;4:78.
Marshall IJ, Wallace BC. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst Rev. 2019;8:1-10.
Blaizot A, Veettil SK, Saidoung P, et al. Using artificial intelligence methods for systematic review in health sciences: a systematic review. Res Synth Methods. 2022;13(3):353-362.
Schmidt L, Finnerty Mutlu AN, Elmore R, Olorisade BK, Thomas J, Higgins JPT. Data extraction methods for systematic review (semi)automation: update of a living systematic review. F1000Res. 2021;10:401.
Bonin F, Gleize M, Hou Y, et al., eds. Knowledge extraction and prediction from behavior science randomized controlled trials: a case study in smoking cessation. AMIA Annual Symposium Proceedings. American Medical Informatics Association; 2020:253.
OpenAI R. GPT-4 Technical Report. arXiv, 2303-08774 2023.
Anthropic. Claude 2. Available from: https://www.anthropic.com/index/claude-2 Accessed September 27, 2023.
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in Neural Information Processing Systems. Curran Associates; 2017:30. Available from: https://www.proceedings.com/39083.html
Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput Surv. 2023;55(9):1-35.
Brown T, Mann B, Ryder N, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems. Vol 33. Curran Associates; 2020:1877-1901. Available from: https://www.proceedings.com/59066.html
Wei J, Bosma M, Zhao VY, et al. Finetuned language models are zero-shot learners. arXiv preprint arXiv:210901652 2021.
Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems. Vol 35. Curran Associates; 2022:27730-27744. Available from: https://www.proceedings.com/68431.html
Liang P, Bommasani R, Lee T, et al. Holistic evaluation of language models. arXiv Preprint arXiv:221109110 2022.
Zheng L, Chiang W-L, Sheng Y, et al. Judging LLM-as-a-judge with MT-bench and Chatbot arena. arXiv Preprint arXiv:230605685 2023.
Chen H, Jiao F, Li X, et al. ChatGPT's one-year anniversary: are open-source large language models catching up? arXiv Preprint arXiv:231116989. 2023.
Blauvelt A, Papp K, Gottlieb A, et al. A head-to-head comparison of ixekizumab vs. guselkumab in patients with moderate-to-severe plaque psoriasis: 12-week efficacy, safety and speed of response from a randomized, double-blinded trial. Br J Dermatol. 2020;182(6):1348-1358.
Lebwohl M, Blauvelt A, Paul C, et al. Certolizumab pegol for the treatment of chronic plaque psoriasis: results through 48 weeks of a phase 3, multicenter, randomized, double-blind, etanercept- and placebo-controlled study (CIMPACT). J Am Acad Dermatol. 2018;79(2):266-276.e5.
Reich K, Pinter A, Lacour JP, et al. Comparison of ixekizumab with ustekinumab in moderate-to-severe psoriasis: 24-week results from IXORA-S, a phase III study. Br J Dermatol. 2017;177(4):1014-1023.
Papp KA, Merola JF, Gottlieb AB, et al. Dual neutralization of both interleukin 17A and interleukin 17F with bimekizumab in patients with psoriasis: results from BE ABLE 1, a 12-week randomized, double-blinded, placebo-controlled phase 2b trial. J Am Acad Dermatol. 2018;79(2):277-286.e10.
Reich K, Armstrong AW, Foley P, et al. Efficacy and safety of guselkumab, an anti-interleukin-23 monoclonal antibody, compared with adalimumab for the treatment of patients with moderate to severe psoriasis with randomized withdrawal and retreatment: results from the phase III, double-blind, placebo- and active comparator-controlled VOYAGE 2 trial. J Am Acad Dermatol. 2017;76(3):418-431.
Warren RB, Blauvelt A, Poulin Y, et al. Efficacy and safety of risankizumab vs. secukinumab in patients with moderate-to-severe plaque psoriasis (IMMerge): results from a phase III, randomized, open-label, efficacy-assessor-blinded clinical trial. Br J Dermatol. 2021;184(1):50-59.
Glatt S, Helmer E, Haier B, et al. First-in-human randomized study of bimekizumab, a humanized monoclonal antibody and selective dual inhibitor of IL-17A and IL-17F, in mild psoriasis. Br J Clin Pharmacol. 2017;83(5):991-1001.
Bagel J, Nia J, Hashim PW, et al. Secukinumab is superior to ustekinumab in clearing skin in patients with moderate to severe plaque psoriasis (16-week CLARITY results). Dermatol Ther. 2018;8(4):571-579.
Thaci D, Blauvelt A, Reich K, et al. Secukinumab is superior to ustekinumab in clearing skin of subjects with moderate to severe plaque psoriasis: CLEAR, a randomized controlled trial. J Am Acad Dermatol. 2015;73(3):400-409.
Reich K, Gooderham M, Green L, et al. The efficacy and safety of apremilast, etanercept and placebo in patients with moderate-to-severe plaque psoriasis: 52-week results from a phase IIIb, randomized, placebo-controlled trial (LIBERATE). J Eur Acad Dermatol Venereol. 2017;31(3):507-517.
Trikalinos TA, Balion CM. Chapter 9: options for summarizing medical test performance in the absence of a "gold standard". J Gen Intern Med. 2012;27(Suppl 1):S67-S75.
Restificar A, Ananiadou S, eds. Inferring appropriate eligibility criteria in clinical trial protocols without labeled data. Proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics. Association for Computer Machinery (ACM); 2012.