Use of Large Language Models to Assess the Likelihood of Epidemics From the Content of Tweets: Infodemiology Study.
GPT-3.5
GPT-4
Generative Pre-trained Transformers
Twitter
X formerly known as Twitter
conjunctivitis
epidemic detection
generative large language model
infectious eye disease
microblog
social media
Journal
Journal of medical Internet research
ISSN: 1438-8871
Titre abrégé: J Med Internet Res
Pays: Canada
ID NLM: 100959882
Informations de publication
Date de publication:
01 Mar 2024
01 Mar 2024
Historique:
received:
19
05
2023
accepted:
19
01
2024
revised:
20
12
2023
medline:
4
3
2024
pubmed:
1
3
2024
entrez:
1
3
2024
Statut:
epublish
Résumé
Previous work suggests that Google searches could be useful in identifying conjunctivitis epidemics. Content-based assessment of social media content may provide additional value in serving as early indicators of conjunctivitis and other systemic infectious diseases. We investigated whether large language models, specifically GPT-3.5 and GPT-4 (OpenAI), can provide probabilistic assessments of whether social media posts about conjunctivitis could indicate a regional outbreak. A total of 12,194 conjunctivitis-related tweets were obtained using a targeted Boolean search in multiple languages from India, Guam (United States), Martinique (France), the Philippines, American Samoa (United States), Fiji, Costa Rica, Haiti, and the Bahamas, covering the time frame from January 1, 2012, to March 13, 2023. By providing these tweets via prompts to GPT-3.5 and GPT-4, we obtained probabilistic assessments that were validated by 2 human raters. We then calculated Pearson correlations of these time series with tweet volume and the occurrence of known outbreaks in these 9 locations, with time series bootstrap used to compute CIs. Probabilistic assessments derived from GPT-3.5 showed correlations of 0.60 (95% CI 0.47-0.70) and 0.53 (95% CI 0.40-0.65) with the 2 human raters, with higher results for GPT-4. The weekly averages of GPT-3.5 probabilities showed substantial correlations with weekly tweet volume for 44% (4/9) of the countries, with correlations ranging from 0.10 (95% CI 0.0-0.29) to 0.53 (95% CI 0.39-0.89), with larger correlations for GPT-4. More modest correlations were found for correlation with known epidemics, with substantial correlation only in American Samoa (0.40, 95% CI 0.16-0.81). These findings suggest that GPT prompting can efficiently assess the content of social media posts and indicate possible disease outbreaks to a degree of accuracy comparable to that of humans. Furthermore, we found that automated content analysis of tweets is related to tweet volume for conjunctivitis-related posts in some locations and to the occurrence of actual epidemics. Future work may improve the sensitivity and specificity of these methods for disease outbreak detection.
Sections du résumé
BACKGROUND
BACKGROUND
Previous work suggests that Google searches could be useful in identifying conjunctivitis epidemics. Content-based assessment of social media content may provide additional value in serving as early indicators of conjunctivitis and other systemic infectious diseases.
OBJECTIVE
OBJECTIVE
We investigated whether large language models, specifically GPT-3.5 and GPT-4 (OpenAI), can provide probabilistic assessments of whether social media posts about conjunctivitis could indicate a regional outbreak.
METHODS
METHODS
A total of 12,194 conjunctivitis-related tweets were obtained using a targeted Boolean search in multiple languages from India, Guam (United States), Martinique (France), the Philippines, American Samoa (United States), Fiji, Costa Rica, Haiti, and the Bahamas, covering the time frame from January 1, 2012, to March 13, 2023. By providing these tweets via prompts to GPT-3.5 and GPT-4, we obtained probabilistic assessments that were validated by 2 human raters. We then calculated Pearson correlations of these time series with tweet volume and the occurrence of known outbreaks in these 9 locations, with time series bootstrap used to compute CIs.
RESULTS
RESULTS
Probabilistic assessments derived from GPT-3.5 showed correlations of 0.60 (95% CI 0.47-0.70) and 0.53 (95% CI 0.40-0.65) with the 2 human raters, with higher results for GPT-4. The weekly averages of GPT-3.5 probabilities showed substantial correlations with weekly tweet volume for 44% (4/9) of the countries, with correlations ranging from 0.10 (95% CI 0.0-0.29) to 0.53 (95% CI 0.39-0.89), with larger correlations for GPT-4. More modest correlations were found for correlation with known epidemics, with substantial correlation only in American Samoa (0.40, 95% CI 0.16-0.81).
CONCLUSIONS
CONCLUSIONS
These findings suggest that GPT prompting can efficiently assess the content of social media posts and indicate possible disease outbreaks to a degree of accuracy comparable to that of humans. Furthermore, we found that automated content analysis of tweets is related to tweet volume for conjunctivitis-related posts in some locations and to the occurrence of actual epidemics. Future work may improve the sensitivity and specificity of these methods for disease outbreak detection.
Identifiants
pubmed: 38427404
pii: v26i1e49139
doi: 10.2196/49139
pmc: PMC10943433
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
e49139Subventions
Organisme : NEI NIH HHS
ID : P30 EY002162
Pays : United States
Organisme : NEI NIH HHS
ID : R01 EY024608
Pays : United States
Informations de copyright
©Michael S Deiner, Natalie A Deiner, Vagelis Hristidis, Stephen D McLeod, Thuy Doan, Thomas M Lietman, Travis C Porco. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 01.03.2024.
Références
J Med Internet Res. 2009 Mar 27;11(1):e11
pubmed: 19329408
J Med Internet Res. 2014 Nov 14;16(11):e250
pubmed: 25406040
Healthc Anal (N Y). 2023 Nov;3:100172
pubmed: 37064254
J Med Internet Res. 2014 Oct 20;16(10):e236
pubmed: 25331122
Int J Environ Res Public Health. 2023 Mar 03;20(5):
pubmed: 36901550
Br J Ophthalmol. 2014 Jun;98(6):841-3
pubmed: 24682179
World Wide Web. 2023;26(1):55-70
pubmed: 35308294
Epidemiology. 2020 Jan;31(1):90-97
pubmed: 31651659
PLoS One. 2023 Feb 24;18(2):e0282101
pubmed: 36827297
Emerg Infect Dis. 2018 Jan;24(1):168-170
pubmed: 29260662
J Biomed Inform. 2020 Aug;108:103500
pubmed: 32622833
J Biomed Inform. 2016 Aug;62:1-11
pubmed: 27224846
Healthcare (Basel). 2020 Aug 28;8(3):
pubmed: 32872330
Clin Ophthalmol. 2020 Feb 11;14:377-387
pubmed: 32103884
Ophthalmology. 2019 Sep;126(9):1219-1229
pubmed: 30981915
Br Med Bull. 2013;108:5-24
pubmed: 24103335
Expert Syst Appl. 2022 Jul 15;198:116882
pubmed: 35308584
JAMA Ophthalmol. 2016 Sep 1;134(9):1024-30
pubmed: 27416554
Invest Ophthalmol Vis Sci. 2018 Feb 1;59(2):910-920
pubmed: 29450538
J Med Internet Res. 2020 Jun 16;22(6):e19284
pubmed: 32501804
J Clin Virol. 2022 Dec;157:105318
pubmed: 36242841
Sci Rep. 2023 Nov 22;13(1):20512
pubmed: 37993519
Am J Trop Med Hyg. 2018 Jul;99(1):229-232
pubmed: 29761759
JMIR Public Health Surveill. 2016 Oct 20;2(2):e161
pubmed: 27765731
Annu Rev Public Health. 2020 Apr 2;41:101-118
pubmed: 31905322
PLoS One. 2023 May 8;18(5):e0285101
pubmed: 37155655
Am J Public Health. 2017 Jan;107(1):e1-e8
pubmed: 27854532
Cureus. 2023 Dec 12;15(12):e50369
pubmed: 38213361
JMIR Infodemiology. 2023 Mar 10;3:e40575
pubmed: 37113377
JMIR Med Educ. 2023 Mar 6;9:e46885
pubmed: 36863937
JMIR Med Educ. 2023 Mar 8;9:e46876
pubmed: 36867743
MMWR Morb Mortal Wkly Rep. 2013 Aug 16;62(32):637-41
pubmed: 23945769
J Clin Virol. 2022 Dec;157:105300
pubmed: 36209621
JMIR Public Health Surveill. 2018 Sep 25;4(3):e65
pubmed: 30274968
J Comput Soc Sci. 2023;6(1):359-388
pubmed: 36405087
Ophthalmology. 2019 Jun;126(6):779-782
pubmed: 31122357
Euro Surveill. 2012 Jun 07;17(23):
pubmed: 22720741