Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature.

Deep Learning Information Storage and Retrieval Algorithms Drug Interactions PubMed

Active learning Deep learning Drug-drug interaction Information retrieval Positive sampling Random negative sampling Similarity sampling Uncertainty sampling

Journal

Journal of biomedical semantics

ISSN: 2041-1480

Titre abrégé: J Biomed Semantics

Pays: England

ID NLM: 101531992

Informations de publication

Date de publication:
30 05 2023

Historique:

received: 09 03 2022

accepted: 29 04 2023

medline: 31 5 2023

pubmed: 30 5 2023

entrez: 29 5 2023

Statut: epublish

Résumé

Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper. PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively. By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced.

Sections du résumé

BACKGROUND

RESULTS

PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively.

CONCLUSIONS

By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced.

Identifiants

DOI: 10.1186/s13326-023-00287-7 PMID: 37248476 PMC: PMC10228061

pubmed: 37248476

doi: 10.1186/s13326-023-00287-7

pii: 10.1186/s13326-023-00287-7

pmc: PMC10228061

doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

Pagination

Subventions

Organisme : NICHD NIH HHS

ID : P30 HD106451

Pays : United States

Organisme : NIDA NIH HHS

ID : R01 DA048001

Pays : United States

Organisme : NCI NIH HHS

ID : U01 CA248240

Pays : United States

Organisme : NLM NIH HHS

ID : R01 LM011945

Pays : United States

Informations de copyright

Références

BMC Bioinformatics. 2017 Oct 10;18(1):445

pubmed: 29017459

Pharmacoepidemiol Drug Saf. 2014 May;23(5):489-97

pubmed: 24616171

Expert Opin Drug Saf. 2014 Jan;13(1):57-65

pubmed: 24073682

PLoS One. 2020 Sep 11;15(9):e0238694

pubmed: 32915836

Front Pharmacol. 2021 Apr 23;11:582470

pubmed: 34017245

Int J Med Inform. 2017 Oct;106:25-31

pubmed: 28870380

J Natl Cancer Inst. 2011 Aug 17;103(16):1222-6

pubmed: 21765011

Bioinformatics. 2018 Mar 1;34(5):828-835

pubmed: 29077847

BMC Bioinformatics. 2013 Feb 01;14:35

pubmed: 23374886

J Basic Clin Physiol Pharmacol. 2020 Sep 8;:

pubmed: 32903207

Chem Rev. 2017 Jun 28;117(12):7673-7761

pubmed: 28475312

Bioinformatics. 2021 Jul 19;37(12):1739-1746

pubmed: 33098410

P T. 2018 Jun;43(6):340-351

pubmed: 29896033

Bioinformatics. 2016 Nov 15;32(22):3444-3453

pubmed: 27466626

J Biomed Inform. 2018 May;81:83-92

pubmed: 29601989

BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):578

pubmed: 29297301

Trends Pharmacol Sci. 2013 Mar;34(3):178-84

pubmed: 23414686

PLoS One. 2015 May 11;10(5):e0122199

pubmed: 25961290

Methods Mol Biol. 2022;2496:259-282

pubmed: 35713869

Brief Bioinform. 2018 Sep 28;19(5):863-877

pubmed: 28334070

J Clin Pharmacol. 2003 May;43(5):443-69

pubmed: 12751267

J Basic Clin Pharm. 2014 Mar;5(2):44-8

pubmed: 25031499

Database (Oxford). 2022 May 18;2022:

pubmed: 35616099

Methods Mol Biol. 2022;2496:237-258

pubmed: 35713868

J Clin Pharm Ther. 2021 Jun;46(3):853-855

pubmed: 33277702

Pharmacoepidemiol Drug Saf. 2010 Sep;19(9):901-10

pubmed: 20623513

Clin Pharmacol Ther. 2016 Jan;99(1):92-100

pubmed: 26479278

BMC Bioinformatics. 2022 Aug 14;23(Suppl 7):338

pubmed: 35965308

Methods Mol Biol. 2014;1159:47-75

pubmed: 24788261

Nucleic Acids Res. 2021 Jan 8;49(D1):D1358-D1364

pubmed: 33151297

Expert Opin Drug Saf. 2012 Jan;11(1):83-94

pubmed: 22022824

Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Informations de copyright

Références

Auteurs

Weixin Xie (W)

Kunjie Fan (K)

Shijun Zhang (S)

Lang Li (L)

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Exploring structural diversity across the protein universe with The Encyclopedia of Domains.

Multilabel SegSRGAN-A framework for parcellation and morphometry of preterm brain in MRI.

An arithmetic operation P system based on symmetric ternary system.

Classifications MeSH