Towards scaling Twitter for digital epidemiology of birth defects.

Data mining Epidemiology

Journal

NPJ digital medicine
ISSN: 2398-6352
Titre abrégé: NPJ Digit Med
Pays: England
ID NLM: 101731738

Informations de publication

Date de publication:
2019
Historique:
received: 22 05 2019
accepted: 12 08 2019
entrez: 5 10 2019
pubmed: 5 10 2019
medline: 5 10 2019
Statut: epublish

Résumé

Social media has recently been used to identify and study a small cohort of Twitter users whose pregnancies with birth defect outcomes-the leading cause of infant mortality-could be observed via their publicly available tweets. In this study, we exploit social media on a larger scale by developing natural language processing (NLP) methods to automatically detect, among thousands of users, a cohort of mothers reporting that their child has a birth defect. We used 22,999 annotated tweets to train and evaluate supervised machine learning algorithms-feature-engineered and deep learning-based classifiers-that automatically distinguish tweets referring to the user's pregnancy outcome from tweets that merely mention birth defects. Because 90% of the tweets merely mention birth defects, we experimented with under-sampling and over-sampling approaches to address this class imbalance. An SVM classifier achieved the best performance for the two positive classes: an F

Identifiants

pubmed: 31583284
doi: 10.1038/s41746-019-0170-5
pii: 170
pmc: PMC6773753
doi:

Types de publication

Journal Article

Langues

eng

Pagination

96

Informations de copyright

© The Author(s) 2019.

Déclaration de conflit d'intérêts

Competing interestsThe authors declare no competing interests.

Références

Drug Saf. 2004;27(4):215-28
pubmed: 15003034
Clin Dermatol. 2016 May-Jun;34(3):410-5
pubmed: 27265080
Semin Perinatol. 2001 Jun;25(3):191-5
pubmed: 11453616
Drug Saf. 2016 Mar;39(3):231-40
pubmed: 26748505
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276
Am J Obstet Gynecol. 2009 Apr;200(4):357-64
pubmed: 19318144
Pharmacoepidemiol Drug Saf. 2017 Feb;26(2):208-214
pubmed: 28028914
Fam Med. 2005 May;37(5):360-3
pubmed: 15883903
Neural Netw. 2018 Oct;106:249-259
pubmed: 30092410
Womens Health Issues. 2013 Jan;23(1):e39-45
pubmed: 23312713
MMWR Morb Mortal Wkly Rep. 2008 Jan 11;57(1):1-5
pubmed: 18185492
J Biomed Inform. 2015 Feb;53:196-207
pubmed: 25451103
J Med Internet Res. 2017 Oct 30;19(10):e361
pubmed: 29084707
BMJ. 2017 May 30;357:j2249
pubmed: 28559234
Birth Defects Res A Clin Mol Teratol. 2007 Nov;79(11):743-53
pubmed: 17990334
Obstet Gynecol Surv. 2017 Feb;72(2):123-135
pubmed: 28218773
J Am Med Inform Assoc. 2018 Oct 1;25(10):1274-1283
pubmed: 30272184
Pharmacoepidemiol Drug Saf. 2014 Aug;23(8):779-86
pubmed: 24974947
Drug Saf. 2019 Mar;42(3):389-400
pubmed: 30284214
Natl Vital Stat Rep. 2015 Aug 6;64(9):1-30
pubmed: 26270610
Circulation. 2007 Jun 12;115(23):2995-3014
pubmed: 17519397
J Biomed Inform. 2018 Dec;88:98-107
pubmed: 30445220
Birth Defects Res A Clin Mol Teratol. 2015 Nov;103(11):972-93
pubmed: 26611917
Birth Defects Res A Clin Mol Teratol. 2006 Oct;76(10):706-13
pubmed: 17022030
J Biomed Inform. 2018 Nov;87:68-78
pubmed: 30292855

Auteurs

Ari Z Klein (AZ)

1Department of Biostatistics, Epidemiology, and Informatics Perelman School of Medicine University of Pennsylvania, Philadelphia, PA USA.

Abeed Sarker (A)

2Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA USA.

Davy Weissenbacher (D)

1Department of Biostatistics, Epidemiology, and Informatics Perelman School of Medicine University of Pennsylvania, Philadelphia, PA USA.

Graciela Gonzalez-Hernandez (G)

1Department of Biostatistics, Epidemiology, and Informatics Perelman School of Medicine University of Pennsylvania, Philadelphia, PA USA.

Classifications MeSH