A study of generative large language model for medical research and healthcare.
Journal
NPJ digital medicine
ISSN: 2398-6352
Titre abrégé: NPJ Digit Med
Pays: England
ID NLM: 101731738
Informations de publication
Date de publication:
16 Nov 2023
16 Nov 2023
Historique:
received:
05
06
2023
accepted:
01
11
2023
medline:
17
11
2023
pubmed:
17
11
2023
entrez:
17
11
2023
Statut:
epublish
Résumé
There are enormous enthusiasm and concerns in applying large language models (LLMs) to healthcare. Yet current assumptions are based on general-purpose LLMs such as ChatGPT, which are not developed for medical use. This study develops a generative clinical LLM, GatorTronGPT, using 277 billion words of text including (1) 82 billion words of clinical text from 126 clinical departments and approximately 2 million patients at the University of Florida Health and (2) 195 billion words of diverse general English text. We train GatorTronGPT using a GPT-3 architecture with up to 20 billion parameters and evaluate its utility for biomedical natural language processing (NLP) and healthcare text generation. GatorTronGPT improves biomedical natural language processing. We apply GatorTronGPT to generate 20 billion words of synthetic text. Synthetic NLP models trained using synthetic text generated by GatorTronGPT outperform models trained using real-world clinical text. Physicians' Turing test using 1 (worst) to 9 (best) scale shows that there are no significant differences in linguistic readability (p = 0.22; 6.57 of GatorTronGPT compared with 6.93 of human) and clinical relevance (p = 0.91; 7.0 of GatorTronGPT compared with 6.97 of human) and that physicians cannot differentiate them (p < 0.001). This study provides insights into the opportunities and challenges of LLMs for medical research and healthcare.
Identifiants
pubmed: 37973919
doi: 10.1038/s41746-023-00958-w
pii: 10.1038/s41746-023-00958-w
pmc: PMC10654385
doi:
Types de publication
Journal Article
Langues
eng
Pagination
210Subventions
Organisme : NIA NIH HHS
ID : R56 AG069880
Pays : United States
Organisme : NIA NIH HHS
ID : R01 AG080624
Pays : United States
Organisme : NCI NIH HHS
ID : R01 CA246418
Pays : United States
Informations de copyright
© 2023. The Author(s).
Références
Brief Bioinform. 2022 Nov 19;23(6):
pubmed: 36156661
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):232
pubmed: 31801524
BMC Med Res Methodol. 2013 Apr 29;13:61
pubmed: 23627889
JAMA Netw Open. 2019 Aug 2;2(8):e199609
pubmed: 31418810
Am J Obstet Gynecol. 2023 Jun;228(6):696-705
pubmed: 36924907
Ann Intern Med. 2018 Jul 3;169(1):50-51
pubmed: 29801050
Bioinformatics. 2022 Nov 15;38(22):5100-5107
pubmed: 36205562
Lancet Digit Health. 2023 Mar;5(3):e107-e108
pubmed: 36754724
NPJ Digit Med. 2022 Dec 26;5(1):194
pubmed: 36572766
Lancet Digit Health. 2023 Apr;5(4):e179-e181
pubmed: 36894409
Nature. 2023 Aug;620(7972):172-180
pubmed: 37438534
Sci Data. 2016 May 24;3:160035
pubmed: 27219127
Crit Care. 2023 Mar 21;27(1):120
pubmed: 36945051
Database (Oxford). 2016 May 09;2016:
pubmed: 27161011
J Am Med Inform Assoc. 2023 Aug 18;30(9):1486-1493
pubmed: 37316988
JAMA Intern Med. 2022 May 1;182(5):564-566
pubmed: 35344006
Lancet Digit Health. 2023 Jun;5(6):e333-e335
pubmed: 37120418
N Engl J Med. 2023 Mar 30;388(13):1233-1239
pubmed: 36988602
PLoS One. 2020 Dec 17;15(12):e0240376
pubmed: 33332380
NPJ Digit Med. 2023 Jul 29;6(1):135
pubmed: 37516790
Science. 2017 Apr 14;356(6334):183-186
pubmed: 28408601
J Biomed Inform. 2021 Dec;124:103938
pubmed: 34695581
J Nucl Med. 2023 May;64(5):701-703
pubmed: 37055219
J Biomed Inform. 2013 Oct;46(5):914-20
pubmed: 23906817
Nature. 2023 Jul;619(7969):357-362
pubmed: 37286606
J Am Med Inform Assoc. 2021 Sep 18;28(10):2193-2201
pubmed: 34272955
J Med Syst. 2023 Mar 04;47(1):33
pubmed: 36869927
Int J Environ Res Public Health. 2023 Feb 15;20(4):
pubmed: 36834073