Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature.
Active learning
Deep learning
Drug-drug interaction
Information retrieval
Positive sampling
Random negative sampling
Similarity sampling
Uncertainty sampling
Journal
Journal of biomedical semantics
ISSN: 2041-1480
Titre abrégé: J Biomed Semantics
Pays: England
ID NLM: 101531992
Informations de publication
Date de publication:
30 05 2023
30 05 2023
Historique:
received:
09
03
2022
accepted:
29
04
2023
medline:
31
5
2023
pubmed:
30
5
2023
entrez:
29
5
2023
Statut:
epublish
Résumé
Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper. PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively. By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced.
Sections du résumé
BACKGROUND
Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper.
RESULTS
PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively.
CONCLUSIONS
By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced.
Identifiants
pubmed: 37248476
doi: 10.1186/s13326-023-00287-7
pii: 10.1186/s13326-023-00287-7
pmc: PMC10228061
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
5Subventions
Organisme : NICHD NIH HHS
ID : P30 HD106451
Pays : United States
Organisme : NIDA NIH HHS
ID : R01 DA048001
Pays : United States
Organisme : NCI NIH HHS
ID : U01 CA248240
Pays : United States
Organisme : NLM NIH HHS
ID : R01 LM011945
Pays : United States
Informations de copyright
© 2023. The Author(s).
Références
BMC Bioinformatics. 2017 Oct 10;18(1):445
pubmed: 29017459
Pharmacoepidemiol Drug Saf. 2014 May;23(5):489-97
pubmed: 24616171
Expert Opin Drug Saf. 2014 Jan;13(1):57-65
pubmed: 24073682
PLoS One. 2020 Sep 11;15(9):e0238694
pubmed: 32915836
Front Pharmacol. 2021 Apr 23;11:582470
pubmed: 34017245
Int J Med Inform. 2017 Oct;106:25-31
pubmed: 28870380
J Natl Cancer Inst. 2011 Aug 17;103(16):1222-6
pubmed: 21765011
Bioinformatics. 2018 Mar 1;34(5):828-835
pubmed: 29077847
BMC Bioinformatics. 2013 Feb 01;14:35
pubmed: 23374886
J Basic Clin Physiol Pharmacol. 2020 Sep 8;:
pubmed: 32903207
Chem Rev. 2017 Jun 28;117(12):7673-7761
pubmed: 28475312
Bioinformatics. 2021 Jul 19;37(12):1739-1746
pubmed: 33098410
P T. 2018 Jun;43(6):340-351
pubmed: 29896033
Bioinformatics. 2016 Nov 15;32(22):3444-3453
pubmed: 27466626
J Biomed Inform. 2018 May;81:83-92
pubmed: 29601989
BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):578
pubmed: 29297301
Trends Pharmacol Sci. 2013 Mar;34(3):178-84
pubmed: 23414686
PLoS One. 2015 May 11;10(5):e0122199
pubmed: 25961290
Methods Mol Biol. 2022;2496:259-282
pubmed: 35713869
Brief Bioinform. 2018 Sep 28;19(5):863-877
pubmed: 28334070
J Clin Pharmacol. 2003 May;43(5):443-69
pubmed: 12751267
J Basic Clin Pharm. 2014 Mar;5(2):44-8
pubmed: 25031499
Database (Oxford). 2022 May 18;2022:
pubmed: 35616099
Methods Mol Biol. 2022;2496:237-258
pubmed: 35713868
J Clin Pharm Ther. 2021 Jun;46(3):853-855
pubmed: 33277702
Pharmacoepidemiol Drug Saf. 2010 Sep;19(9):901-10
pubmed: 20623513
Clin Pharmacol Ther. 2016 Jan;99(1):92-100
pubmed: 26479278
BMC Bioinformatics. 2022 Aug 14;23(Suppl 7):338
pubmed: 35965308
Methods Mol Biol. 2014;1159:47-75
pubmed: 24788261
Nucleic Acids Res. 2021 Jan 8;49(D1):D1358-D1364
pubmed: 33151297
Expert Opin Drug Saf. 2012 Jan;11(1):83-94
pubmed: 22022824