The Adverse Drug Reactions From Patient Reports in Social Media Project: Protocol for an Evaluation Against a Gold Standard.

MedDRA Racine Pharma data mining drug-related side effects and adverse reactions natural language processing social media

Journal

JMIR research protocols
ISSN: 1929-0748
Titre abrégé: JMIR Res Protoc
Pays: Canada
ID NLM: 101599504

Informations de publication

Date de publication:
07 May 2019
Historique:
received: 29 06 2018
accepted: 21 12 2018
revised: 16 11 2018
entrez: 9 5 2019
pubmed: 9 5 2019
medline: 9 5 2019
Statut: epublish

Résumé

Social media is a potential source of information on postmarketing drug safety surveillance that still remains unexploited nowadays. Information technology solutions aiming at extracting adverse reactions (ADRs) from posts on health forums require a rigorous evaluation methodology if their results are to be used to make decisions. First, a gold standard, consisting of manual annotations of the ADR by human experts from the corpus extracted from social media, must be implemented and its quality must be assessed. Second, as for clinical research protocols, the sample size must rely on statistical arguments. Finally, the extraction methods must target the relation between the drug and the disease (which might be either treated or caused by the drug) rather than simple co-occurrences in the posts. We propose a standardized protocol for the evaluation of a software extracting ADRs from the messages on health forums. The study is conducted as part of the Adverse Drug Reactions from Patient Reports in Social Media project. Messages from French health forums were extracted. Entity recognition was based on Racine Pharma lexicon for drugs and Medical Dictionary for Regulatory Activities terminology for potential adverse events (AEs). Natural language processing-based techniques automated the ADR information extraction (relation between the drug and AE entities). The corpus of evaluation was a random sample of the messages containing drugs and/or AE concepts corresponding to recent pharmacovigilance alerts. A total of 2 persons experienced in medical terminology manually annotated the corpus, thus creating the gold standard, according to an annotator guideline. We will evaluate our tool against the gold standard with recall, precision, and f-measure. Interannotator agreement, reflecting gold standard quality, will be evaluated with hierarchical kappa. Granularities in the terminologies will be further explored. Necessary and sufficient sample size was calculated to ensure statistical confidence in the assessed results. As we expected a global recall of 0.5, we needed at least 384 identified ADR concepts to obtain a 95% CI with a total width of 0.10 around 0.5. The automated ADR information extraction in the corpus for evaluation is already finished. The 2 annotators already completed the annotation process. The analysis of the performance of the ADR information extraction module as compared with gold standard is ongoing. This protocol is based on the standardized statistical methods from clinical research to create the corpus, thus ensuring the necessary statistical power of the assessed results. Such evaluation methodology is required to make the ADR information extraction software useful for postmarketing drug safety surveillance. RR1-10.2196/11448.

Sections du résumé

BACKGROUND BACKGROUND
Social media is a potential source of information on postmarketing drug safety surveillance that still remains unexploited nowadays. Information technology solutions aiming at extracting adverse reactions (ADRs) from posts on health forums require a rigorous evaluation methodology if their results are to be used to make decisions. First, a gold standard, consisting of manual annotations of the ADR by human experts from the corpus extracted from social media, must be implemented and its quality must be assessed. Second, as for clinical research protocols, the sample size must rely on statistical arguments. Finally, the extraction methods must target the relation between the drug and the disease (which might be either treated or caused by the drug) rather than simple co-occurrences in the posts.
OBJECTIVE OBJECTIVE
We propose a standardized protocol for the evaluation of a software extracting ADRs from the messages on health forums. The study is conducted as part of the Adverse Drug Reactions from Patient Reports in Social Media project.
METHODS METHODS
Messages from French health forums were extracted. Entity recognition was based on Racine Pharma lexicon for drugs and Medical Dictionary for Regulatory Activities terminology for potential adverse events (AEs). Natural language processing-based techniques automated the ADR information extraction (relation between the drug and AE entities). The corpus of evaluation was a random sample of the messages containing drugs and/or AE concepts corresponding to recent pharmacovigilance alerts. A total of 2 persons experienced in medical terminology manually annotated the corpus, thus creating the gold standard, according to an annotator guideline. We will evaluate our tool against the gold standard with recall, precision, and f-measure. Interannotator agreement, reflecting gold standard quality, will be evaluated with hierarchical kappa. Granularities in the terminologies will be further explored.
RESULTS RESULTS
Necessary and sufficient sample size was calculated to ensure statistical confidence in the assessed results. As we expected a global recall of 0.5, we needed at least 384 identified ADR concepts to obtain a 95% CI with a total width of 0.10 around 0.5. The automated ADR information extraction in the corpus for evaluation is already finished. The 2 annotators already completed the annotation process. The analysis of the performance of the ADR information extraction module as compared with gold standard is ongoing.
CONCLUSIONS CONCLUSIONS
This protocol is based on the standardized statistical methods from clinical research to create the corpus, thus ensuring the necessary statistical power of the assessed results. Such evaluation methodology is required to make the ADR information extraction software useful for postmarketing drug safety surveillance.
INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) UNASSIGNED
RR1-10.2196/11448.

Identifiants

pubmed: 31066711
pii: v8i5e11448
doi: 10.2196/11448
pmc: PMC6528435
doi:

Types de publication

Journal Article

Langues

eng

Pagination

e11448

Informations de copyright

©Armelle Arnoux-Guenegou, Yannick Girardeau, Xiaoyi Chen, Myrtille Deldossi, Rim Aboukhamis, Carole Faviez, Badisse Dahamna, Pierre Karapetiantz, Sylvie Guillemin-Lanne, Agnès Lillo-Le Louët, Nathalie Texier, Anita Burgun, Sandrine Katsahian. Originally published in JMIR Research Protocols (http://www.researchprotocols.org), 07.05.2019.

Références

Pharmacoepidemiol Drug Saf. 2002 Jan-Feb;11(1):3-10
pubmed: 11998548
Drug Saf. 2002;25(6):381-92
pubmed: 12071774
J Am Med Inform Assoc. 2003 Mar-Apr;10(2):115-28
pubmed: 12595401
Drug Saf. 2006;29(5):385-96
pubmed: 16689555
Drug Saf. 2007;30(8):669-75
pubmed: 17696579
Crit Care Med. 2011 May;39(5):952-60
pubmed: 21283005
J Biomed Inform. 2011 Dec;44(6):989-96
pubmed: 21820083
AMIA Annu Symp Proc. 2011;2011:217-26
pubmed: 22195073
AMIA Annu Symp Proc. 2011;2011:1019-26
pubmed: 22195162
J Biomed Inform. 2013 Apr;46(2):275-85
pubmed: 23380683
BMC Med Inform Decis Mak. 2014 Feb 24;14:13
pubmed: 24559132
Drug Saf. 2014 May;37(5):343-50
pubmed: 24777653
BMC Med Inform Decis Mak. 2014 Oct 23;14:91
pubmed: 25341686
J Biomed Inform. 2015 Feb;53:196-207
pubmed: 25451103
J Biomed Inform. 2015 Apr;54:202-12
pubmed: 25720841
J Am Med Inform Assoc. 2015 May;22(3):671-81
pubmed: 25755127
AMIA Annu Symp Proc. 2014 Nov 14;2014:924-33
pubmed: 25954400
Stud Health Technol Inform. 2015;210:526-30
pubmed: 25991203
BMJ. 2015 Oct 28;351:h5527
pubmed: 26511519
Curr Pharm Des. 2016;22(23):3498-526
pubmed: 27157416
BMJ Open. 2017 Jan 19;7(1):e013474
pubmed: 28104709
JMIR Res Protoc. 2017 Sep 21;6(9):e179
pubmed: 28935617
SHB12 (2012). 2012 Oct 29;2012:25-32
pubmed: 28967001
Stud Health Technol Inform. 2017;245:322-326
pubmed: 29295108
JMIR Public Health Surveill. 2018 May 09;4(2):e51
pubmed: 29743155
Pharmacotherapy. 2018 Aug;38(8):822-841
pubmed: 29884988
Eur J Clin Pharmacol. 1998 Jun;54(4):315-21
pubmed: 9696956

Auteurs

Armelle Arnoux-Guenegou (A)

INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.

Yannick Girardeau (Y)

INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.
Département d'Informatique Médicale, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France.

Xiaoyi Chen (X)

INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.

Myrtille Deldossi (M)

Innovative Projects - Text Mining, Expert System, Paris, France.

Rim Aboukhamis (R)

Centre Régional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France.

Carole Faviez (C)

Kappa Santé, Paris, France.

Badisse Dahamna (B)

Service d'Informatique Biomédicale, D2IM, Centre Hospitalier Universitaire de Rouen, Rouen, France.

Pierre Karapetiantz (P)

INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.

Sylvie Guillemin-Lanne (S)

Innovative Projects - Text Mining, Expert System, Paris, France.

Agnès Lillo-Le Louët (A)

Centre Régional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France.

Anita Burgun (A)

INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.
Département d'Informatique Médicale, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France.
INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Paris Descartes University, Sorbonne Paris Cité, Paris, France.

Sandrine Katsahian (S)

INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Centre de Recherche des Cordeliers, Paris, France.
INSERM U1138 - Team 22, Information Sciences to Support Personalized Medicine, Paris Descartes University, Sorbonne Paris Cité, Paris, France.
Clinical Research Unit Hôpitaux Universitaires Paris Ouest, Hôpital Européen Georges-Pompidou, Assistance Publique - Hôpitaux de Paris, Paris, France.
INSERM CIC1418, Clinical Epidemiology, Hôpital Européen Georges-Pompidou, Paris, France.

Classifications MeSH