Data extraction for evidence synthesis using a large language model: A proof-of-concept study.

accuracy artificial intelligence data extraction evidence synthesis large language models proof of concept

Journal

Research synthesis methods

ISSN: 1759-2887

Titre abrégé: Res Synth Methods

Pays: England

ID NLM: 101543738

Informations de publication

Date de publication:
03 Mar 2024

Historique:

revised: 18 12 2023

received: 02 10 2023

accepted: 26 01 2024

medline: 4 3 2024

pubmed: 4 3 2024

entrez: 3 3 2024

Statut: aheadofprint

Résumé

Data extraction is a crucial, yet labor-intensive and error-prone part of evidence synthesis. To date, efforts to harness machine learning for enhancing efficiency of the data extraction process have fallen short of achieving sufficient accuracy and usability. With the release of large language models (LLMs), new possibilities have emerged to increase efficiency and accuracy of data extraction for evidence synthesis. The objective of this proof-of-concept study was to assess the performance of an LLM (Claude 2) in extracting data elements from published studies, compared with human data extraction as employed in systematic reviews. Our analysis utilized a convenience sample of 10 English-language, open-access publications of randomized controlled trials included in a single systematic review. We selected 16 distinct types of data, posing varying degrees of difficulty (160 data elements across 10 studies). We used the browser version of Claude 2 to upload the portable document format of each publication and then prompted the model for each data element. Across 160 data elements, Claude 2 demonstrated an overall accuracy of 96.3% with a high test-retest reliability (replication 1: 96.9%; replication 2: 95.0% accuracy). Overall, Claude 2 made 6 errors on 160 data items. The most common errors (n = 4) were missed data items. Importantly, Claude 2's ease of use was high; it required no technical expertise or labeled training data for effective operation (i.e., zero-shot learning). Based on findings of our proof-of-concept study, leveraging LLMs has the potential to substantially enhance the efficiency and accuracy of data extraction for evidence syntheses.

Identifiants

DOI: 10.1002/jrsm.1710 PMID: 38432227

pubmed: 38432227

doi: 10.1002/jrsm.1710

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Subventions

Organisme : RTI International Innovation Office

Informations de copyright

Références

Institute of Medicine. Finding What Works in Health Care: Standards for Systematic Reviews. National Academies Press; 2011.

Higgins J, Thomas J, Chandler J, et al. Cochrane Handbook For Systematic Reviews of Interventions Version 6.4 (updated August 2023). Cochrane; 2023. Available from: www.training.cochrane.org/handbook Accessed September 27, 2023.

Nussbaumer-Streit B, Ellen M, Klerings I, et al. Resource use during systematic review production varies widely: a scoping review. J Clin Epidemiol. 2021;139:287-296.

Li T, Saldanha IJ, Jap J, et al. A randomized trial provided new evidence on the accuracy and efficiency of traditional vs. electronically annotated abstraction approaches in systematic reviews. J Clin Epidemiol. 2019;115:77-89.

Mathes T, Klassen P, Pieper D. Frequency of data extraction errors and methods to increase data extraction quality: a methodological review. BMC Med Res Methodol. 2017;17(1):152.

Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev. 2015;4:78.

Marshall IJ, Wallace BC. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst Rev. 2019;8:1-10.

Blaizot A, Veettil SK, Saidoung P, et al. Using artificial intelligence methods for systematic review in health sciences: a systematic review. Res Synth Methods. 2022;13(3):353-362.

Schmidt L, Finnerty Mutlu AN, Elmore R, Olorisade BK, Thomas J, Higgins JPT. Data extraction methods for systematic review (semi)automation: update of a living systematic review. F1000Res. 2021;10:401.

Bonin F, Gleize M, Hou Y, et al., eds. Knowledge extraction and prediction from behavior science randomized controlled trials: a case study in smoking cessation. AMIA Annual Symposium Proceedings. American Medical Informatics Association; 2020:253.

OpenAI R. GPT-4 Technical Report. arXiv, 2303-08774 2023.

Anthropic. Claude 2. Available from: https://www.anthropic.com/index/claude-2 Accessed September 27, 2023.

Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in Neural Information Processing Systems. Curran Associates; 2017:30. Available from: https://www.proceedings.com/39083.html

Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput Surv. 2023;55(9):1-35.

Brown T, Mann B, Ryder N, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems. Vol 33. Curran Associates; 2020:1877-1901. Available from: https://www.proceedings.com/59066.html

Wei J, Bosma M, Zhao VY, et al. Finetuned language models are zero-shot learners. arXiv preprint arXiv:210901652 2021.

Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems. Vol 35. Curran Associates; 2022:27730-27744. Available from: https://www.proceedings.com/68431.html

Liang P, Bommasani R, Lee T, et al. Holistic evaluation of language models. arXiv Preprint arXiv:221109110 2022.

Zheng L, Chiang W-L, Sheng Y, et al. Judging LLM-as-a-judge with MT-bench and Chatbot arena. arXiv Preprint arXiv:230605685 2023.

Chen H, Jiao F, Li X, et al. ChatGPT's one-year anniversary: are open-source large language models catching up? arXiv Preprint arXiv:231116989. 2023.

Blauvelt A, Papp K, Gottlieb A, et al. A head-to-head comparison of ixekizumab vs. guselkumab in patients with moderate-to-severe plaque psoriasis: 12-week efficacy, safety and speed of response from a randomized, double-blinded trial. Br J Dermatol. 2020;182(6):1348-1358.

Lebwohl M, Blauvelt A, Paul C, et al. Certolizumab pegol for the treatment of chronic plaque psoriasis: results through 48 weeks of a phase 3, multicenter, randomized, double-blind, etanercept- and placebo-controlled study (CIMPACT). J Am Acad Dermatol. 2018;79(2):266-276.e5.

Reich K, Pinter A, Lacour JP, et al. Comparison of ixekizumab with ustekinumab in moderate-to-severe psoriasis: 24-week results from IXORA-S, a phase III study. Br J Dermatol. 2017;177(4):1014-1023.

Papp KA, Merola JF, Gottlieb AB, et al. Dual neutralization of both interleukin 17A and interleukin 17F with bimekizumab in patients with psoriasis: results from BE ABLE 1, a 12-week randomized, double-blinded, placebo-controlled phase 2b trial. J Am Acad Dermatol. 2018;79(2):277-286.e10.

Reich K, Armstrong AW, Foley P, et al. Efficacy and safety of guselkumab, an anti-interleukin-23 monoclonal antibody, compared with adalimumab for the treatment of patients with moderate to severe psoriasis with randomized withdrawal and retreatment: results from the phase III, double-blind, placebo- and active comparator-controlled VOYAGE 2 trial. J Am Acad Dermatol. 2017;76(3):418-431.

Warren RB, Blauvelt A, Poulin Y, et al. Efficacy and safety of risankizumab vs. secukinumab in patients with moderate-to-severe plaque psoriasis (IMMerge): results from a phase III, randomized, open-label, efficacy-assessor-blinded clinical trial. Br J Dermatol. 2021;184(1):50-59.

Glatt S, Helmer E, Haier B, et al. First-in-human randomized study of bimekizumab, a humanized monoclonal antibody and selective dual inhibitor of IL-17A and IL-17F, in mild psoriasis. Br J Clin Pharmacol. 2017;83(5):991-1001.

Bagel J, Nia J, Hashim PW, et al. Secukinumab is superior to ustekinumab in clearing skin in patients with moderate to severe plaque psoriasis (16-week CLARITY results). Dermatol Ther. 2018;8(4):571-579.

Thaci D, Blauvelt A, Reich K, et al. Secukinumab is superior to ustekinumab in clearing skin of subjects with moderate to severe plaque psoriasis: CLEAR, a randomized controlled trial. J Am Acad Dermatol. 2015;73(3):400-409.

Reich K, Gooderham M, Green L, et al. The efficacy and safety of apremilast, etanercept and placebo in patients with moderate-to-severe plaque psoriasis: 52-week results from a phase IIIb, randomized, placebo-controlled trial (LIBERATE). J Eur Acad Dermatol Venereol. 2017;31(3):507-517.

Trikalinos TA, Balion CM. Chapter 9: options for summarizing medical test performance in the absence of a "gold standard". J Gen Intern Med. 2012;27(Suppl 1):S67-S75.

Restificar A, Ananiadou S, eds. Inferring appropriate eligibility criteria in clinical trial protocols without labeled data. Proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics. Association for Computer Machinery (ACM); 2012.

Data extraction for evidence synthesis using a large language model: A proof-of-concept study.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Subventions

Informations de copyright

Références

Auteurs

Gerald Gartlehner (G)

Leila Kahwati (L)

Rainer Hilscher (R)

Ian Thomas (I)

Shannon Kugley (S)

Karen Crotty (K)

Meera Viswanathan (M)

Barbara Nussbaumer-Streit (B)

Graham Booth (G)

Nathaniel Erskine (N)

Amanda Konet (A)

Robert Chew (R)

Classifications MeSH