RedMed: Extending drug lexicons for social media applications.


Journal

Journal of biomedical informatics
ISSN: 1532-0480
Titre abrégé: J Biomed Inform
Pays: United States
ID NLM: 100970413

Informations de publication

Date de publication:
11 2019
Historique:
received: 11 06 2019
revised: 02 10 2019
accepted: 11 10 2019
pubmed: 19 10 2019
medline: 6 10 2020
entrez: 19 10 2019
Statut: ppublish

Résumé

Social media has been identified as a promising potential source of information for pharmacovigilance. The adoption of social media data has been hindered by the massive and noisy nature of the data. Initial attempts to use social media data have relied on exact text matches to drugs of interest, and therefore suffer from the gap between formal drug lexicons and the informal nature of social media. The Reddit comment archive represents an ideal corpus for bridging this gap. We trained a word embedding model, RedMed, to facilitate the identification and retrieval of health entities from Reddit data. We compare the performance of our model trained on a consumer-generated corpus against publicly available models trained on expert-generated corpora. Our automated classification pipeline achieves an accuracy of 0.88 and a specificity of >0.9 across four different term classes. Of all drug mentions, an average of 79% (±0.5%) were exact matches to a generic or trademark drug name, 14% (±0.5%) were misspellings, 6.4% (±0.3%) were synonyms, and 0.13% (±0.05%) were pill marks. We find that our system captures an additional 20% of mentions; these would have been missed by approaches that rely solely on exact string matches. We provide a lexicon of misspellings and synonyms for 2978 drugs and a word embedding model trained on a health-oriented subset of Reddit.

Identifiants

pubmed: 31627020
pii: S1532-0464(19)30226-6
doi: 10.1016/j.jbi.2019.103307
pmc: PMC6874884
mid: NIHMS1542315
pii:
doi:

Substances chimiques

Pharmaceutical Preparations 0

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

103307

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM102365
Pays : United States
Organisme : NLM NIH HHS
ID : R01 LM005652
Pays : United States
Organisme : NLM NIH HHS
ID : T32 LM012409
Pays : United States

Informations de copyright

Copyright © 2019 Elsevier Inc. All rights reserved.

Références

Stud Health Technol Inform. 2015;210:55-9
pubmed: 25991101
J Am Med Inform Assoc. 2019 Jun 1;26(6):577-579
pubmed: 31087070
Nucleic Acids Res. 2016 Jan 4;44(D1):D1075-9
pubmed: 26481350
J Med Internet Res. 2007 Feb 28;9(1):e4
pubmed: 17478413
AMIA Annu Symp Proc. 2010 Nov 13;2010:572-6
pubmed: 21347043
J Biomed Inform. 2007 Jun;40(3):288-99
pubmed: 16875881
AMIA Jt Summits Transl Sci Proc. 2014 Apr 07;2014:90-5
pubmed: 25717407
Stud Health Technol Inform. 2018;247:136-140
pubmed: 29677938
AMIA Annu Symp Proc. 2011;2011:1019-26
pubmed: 22195162
JMIR Public Health Surveill. 2018 Jan 08;4(1):e2
pubmed: 29311050
J Biomed Inform. 2018 Nov;87:12-20
pubmed: 30217670
Bioinformatics. 2016 Dec 1;32(23):3635-3644
pubmed: 27531100
Addict Behav. 2017 Feb;65:289-295
pubmed: 27568339
IEEE Trans Knowl Data Eng. 2018 Oct;30(10):1825-1837
pubmed: 31105412
J Biomed Inform. 2018 Dec;88:98-107
pubmed: 30445220
Drug Saf. 1999 Feb;20(2):109-17
pubmed: 10082069
J Biomed Inform. 2015 Apr;54:202-12
pubmed: 25720841
J Med Internet Res. 2013 Sep 06;15(9):e189
pubmed: 24014109
AMIA Annu Symp Proc. 2018 Apr 16;2017:1362-1371
pubmed: 29854205
Data Brief. 2016 Nov 23;10:122-131
pubmed: 27981203
Proc ACM SIGMOD Int Conf Manag Data. 2015 May-Jun;2015:1729-1744
pubmed: 26705375
Nucleic Acids Res. 2018 Jan 4;46(D1):D1074-D1082
pubmed: 29126136
Drug Saf. 2014 May;37(5):343-50
pubmed: 24777653
J Am Med Inform Assoc. 2015 May;22(3):671-81
pubmed: 25755127

Auteurs

Adam Lavertu (A)

Biomedical Informatics Training Program, Stanford University, Stanford, CA 94305, USA.

Russ B Altman (RB)

Department of Bioengineering, Stanford University, Stanford, CA 94305, USA. Electronic address: rbaltman@stanford.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH