Use of Large Language Models to Assess the Likelihood of Epidemics From the Content of Tweets: Infodemiology Study.

Humans United States Infodemiology Epidemics Disease Outbreaks Conjunctivitis Language Social Media

GPT-3.5 GPT-4 Generative Pre-trained Transformers Twitter X formerly known as Twitter conjunctivitis epidemic detection generative large language model infectious eye disease microblog social media

Journal

Journal of medical Internet research

ISSN: 1438-8871

Titre abrégé: J Med Internet Res

Pays: Canada

ID NLM: 100959882

Informations de publication

Date de publication:
01 Mar 2024

Historique:

received: 19 05 2023

accepted: 19 01 2024

revised: 20 12 2023

medline: 4 3 2024

pubmed: 1 3 2024

entrez: 1 3 2024

Statut: epublish

Résumé

Previous work suggests that Google searches could be useful in identifying conjunctivitis epidemics. Content-based assessment of social media content may provide additional value in serving as early indicators of conjunctivitis and other systemic infectious diseases. We investigated whether large language models, specifically GPT-3.5 and GPT-4 (OpenAI), can provide probabilistic assessments of whether social media posts about conjunctivitis could indicate a regional outbreak. A total of 12,194 conjunctivitis-related tweets were obtained using a targeted Boolean search in multiple languages from India, Guam (United States), Martinique (France), the Philippines, American Samoa (United States), Fiji, Costa Rica, Haiti, and the Bahamas, covering the time frame from January 1, 2012, to March 13, 2023. By providing these tweets via prompts to GPT-3.5 and GPT-4, we obtained probabilistic assessments that were validated by 2 human raters. We then calculated Pearson correlations of these time series with tweet volume and the occurrence of known outbreaks in these 9 locations, with time series bootstrap used to compute CIs. Probabilistic assessments derived from GPT-3.5 showed correlations of 0.60 (95% CI 0.47-0.70) and 0.53 (95% CI 0.40-0.65) with the 2 human raters, with higher results for GPT-4. The weekly averages of GPT-3.5 probabilities showed substantial correlations with weekly tweet volume for 44% (4/9) of the countries, with correlations ranging from 0.10 (95% CI 0.0-0.29) to 0.53 (95% CI 0.39-0.89), with larger correlations for GPT-4. More modest correlations were found for correlation with known epidemics, with substantial correlation only in American Samoa (0.40, 95% CI 0.16-0.81). These findings suggest that GPT prompting can efficiently assess the content of social media posts and indicate possible disease outbreaks to a degree of accuracy comparable to that of humans. Furthermore, we found that automated content analysis of tweets is related to tweet volume for conjunctivitis-related posts in some locations and to the occurrence of actual epidemics. Future work may improve the sensitivity and specificity of these methods for disease outbreak detection.

Sections du résumé

BACKGROUND BACKGROUND

OBJECTIVE OBJECTIVE

We investigated whether large language models, specifically GPT-3.5 and GPT-4 (OpenAI), can provide probabilistic assessments of whether social media posts about conjunctivitis could indicate a regional outbreak.

METHODS METHODS

A total of 12,194 conjunctivitis-related tweets were obtained using a targeted Boolean search in multiple languages from India, Guam (United States), Martinique (France), the Philippines, American Samoa (United States), Fiji, Costa Rica, Haiti, and the Bahamas, covering the time frame from January 1, 2012, to March 13, 2023. By providing these tweets via prompts to GPT-3.5 and GPT-4, we obtained probabilistic assessments that were validated by 2 human raters. We then calculated Pearson correlations of these time series with tweet volume and the occurrence of known outbreaks in these 9 locations, with time series bootstrap used to compute CIs.

RESULTS RESULTS

Probabilistic assessments derived from GPT-3.5 showed correlations of 0.60 (95% CI 0.47-0.70) and 0.53 (95% CI 0.40-0.65) with the 2 human raters, with higher results for GPT-4. The weekly averages of GPT-3.5 probabilities showed substantial correlations with weekly tweet volume for 44% (4/9) of the countries, with correlations ranging from 0.10 (95% CI 0.0-0.29) to 0.53 (95% CI 0.39-0.89), with larger correlations for GPT-4. More modest correlations were found for correlation with known epidemics, with substantial correlation only in American Samoa (0.40, 95% CI 0.16-0.81).

CONCLUSIONS CONCLUSIONS

These findings suggest that GPT prompting can efficiently assess the content of social media posts and indicate possible disease outbreaks to a degree of accuracy comparable to that of humans. Furthermore, we found that automated content analysis of tweets is related to tweet volume for conjunctivitis-related posts in some locations and to the occurrence of actual epidemics. Future work may improve the sensitivity and specificity of these methods for disease outbreak detection.

Identifiants

DOI: 10.2196/49139 PMID: 38427404 PMC: PMC10943433

pubmed: 38427404

pii: v26i1e49139

doi: 10.2196/49139

pmc: PMC10943433

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

e49139

Subventions

Organisme : NEI NIH HHS

ID : P30 EY002162

Pays : United States

Organisme : NEI NIH HHS

ID : R01 EY024608

Pays : United States

Informations de copyright

©Michael S Deiner, Natalie A Deiner, Vagelis Hristidis, Stephen D McLeod, Thuy Doan, Thomas M Lietman, Travis C Porco. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 01.03.2024.

Références

J Med Internet Res. 2009 Mar 27;11(1):e11

pubmed: 19329408

J Med Internet Res. 2014 Nov 14;16(11):e250

pubmed: 25406040

Healthc Anal (N Y). 2023 Nov;3:100172

pubmed: 37064254

J Med Internet Res. 2014 Oct 20;16(10):e236

pubmed: 25331122

Int J Environ Res Public Health. 2023 Mar 03;20(5):

pubmed: 36901550

Br J Ophthalmol. 2014 Jun;98(6):841-3

pubmed: 24682179

World Wide Web. 2023;26(1):55-70

pubmed: 35308294

Epidemiology. 2020 Jan;31(1):90-97

pubmed: 31651659

PLoS One. 2023 Feb 24;18(2):e0282101

pubmed: 36827297

Emerg Infect Dis. 2018 Jan;24(1):168-170

pubmed: 29260662

J Biomed Inform. 2020 Aug;108:103500

pubmed: 32622833

J Biomed Inform. 2016 Aug;62:1-11

pubmed: 27224846

Healthcare (Basel). 2020 Aug 28;8(3):

pubmed: 32872330

Clin Ophthalmol. 2020 Feb 11;14:377-387

pubmed: 32103884

Ophthalmology. 2019 Sep;126(9):1219-1229

pubmed: 30981915

Br Med Bull. 2013;108:5-24

pubmed: 24103335

Expert Syst Appl. 2022 Jul 15;198:116882

pubmed: 35308584

JAMA Ophthalmol. 2016 Sep 1;134(9):1024-30

pubmed: 27416554

Invest Ophthalmol Vis Sci. 2018 Feb 1;59(2):910-920

pubmed: 29450538

J Med Internet Res. 2020 Jun 16;22(6):e19284

pubmed: 32501804

J Clin Virol. 2022 Dec;157:105318

pubmed: 36242841

Sci Rep. 2023 Nov 22;13(1):20512

pubmed: 37993519

Am J Trop Med Hyg. 2018 Jul;99(1):229-232

pubmed: 29761759

JMIR Public Health Surveill. 2016 Oct 20;2(2):e161

pubmed: 27765731

Annu Rev Public Health. 2020 Apr 2;41:101-118

pubmed: 31905322

PLoS One. 2023 May 8;18(5):e0285101

pubmed: 37155655

Am J Public Health. 2017 Jan;107(1):e1-e8

pubmed: 27854532

Cureus. 2023 Dec 12;15(12):e50369

pubmed: 38213361

JMIR Infodemiology. 2023 Mar 10;3:e40575

pubmed: 37113377

JMIR Med Educ. 2023 Mar 6;9:e46885

pubmed: 36863937

JMIR Med Educ. 2023 Mar 8;9:e46876

pubmed: 36867743

MMWR Morb Mortal Wkly Rep. 2013 Aug 16;62(32):637-41

pubmed: 23945769

J Clin Virol. 2022 Dec;157:105300

pubmed: 36209621

JMIR Public Health Surveill. 2018 Sep 25;4(3):e65

pubmed: 30274968

J Comput Soc Sci. 2023;6(1):359-388

pubmed: 36405087

Ophthalmology. 2019 Jun;126(6):779-782

pubmed: 31122357

Euro Surveill. 2012 Jun 07;17(23):

pubmed: 22720741

Use of Large Language Models to Assess the Likelihood of Epidemics From the Content of Tweets: Infodemiology Study.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Informations de copyright

Références

Auteurs

Michael S Deiner (MS)

Natalie A Deiner (NA)

Vagelis Hristidis (V)

Stephen D McLeod (SD)

Thuy Doan (T)

Thomas M Lietman (TM)

Travis C Porco (TC)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH