Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.


Journal

Database : the journal of biological databases and curation
ISSN: 1758-0463
Titre abrégé: Database (Oxford)
Pays: England
ID NLM: 101517697

Informations de publication

Date de publication:
01 01 2019
Historique:
received: 02 03 2018
accepted: 19 12 2018
entrez: 29 1 2019
pubmed: 29 1 2019
medline: 5 6 2019
Statut: epublish

Résumé

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein-protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.

Identifiants

pubmed: 30689846
pii: 5303240
doi: 10.1093/database/bay147
pmc: PMC6348314
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, N.I.H., Intramural

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : NCI NIH HHS
ID : P30 CA177558
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM126558
Pays : United States
Organisme : NIH HHS
ID : R24 OD011194
Pays : United States
Organisme : NIH HHS
ID : R01 OD010929
Pays : United States

Références

Bioinformatics. 2016 Jun 15;32(12):1907-10
pubmed: 26883486
N Engl J Med. 2015 Feb 26;372(9):793-5
pubmed: 25635347
Database (Oxford). 2013 Jan 17;2013:bas056
pubmed: 23327936
Database (Oxford). 2016 Aug 23;2016:
pubmed: 27554092
IEEE Trans Pattern Anal Mach Intell. 2009 Apr;31(4):721-35
pubmed: 19229086
PLoS One. 2012;7(6):e38460
pubmed: 22679507
Database (Oxford). 2016 Aug 10;2016:
pubmed: 27515823
Biomed Res Int. 2015;2015:918710
pubmed: 26380306
Database (Oxford). 2012 Nov 17;2012:bas043
pubmed: 23160416
Database (Oxford). 2013 Sep 18;2013:bat064
pubmed: 24048470
Database (Oxford). 2016 Sep 01;2016:
pubmed: 27589961
Bioinformatics. 2017 Nov 1;33(21):3454-3460
pubmed: 29036270
Curr Opin Genet Dev. 2013 Dec;23(6):611-21
pubmed: 24238873
BMC Bioinformatics. 2011 Oct 03;12 Suppl 8:S1
pubmed: 22151647
Bioinformatics. 2004 Mar 1;20(4):557-68
pubmed: 14990452
Database (Oxford). 2014 Jun 30;2014:
pubmed: 24980129
Database (Oxford). 2016 Sep 01;2016:
pubmed: 27589962
Database (Oxford). 2016 May 09;2016:
pubmed: 27161011
Bioinformatics. 2012 Feb 15;28(4):597-8
pubmed: 22199390
Bioinformatics. 2013 Jun 1;29(11):1433-9
pubmed: 23564842
PLoS One. 2016 Apr 13;11(4):e0152725
pubmed: 27073839
J Biomed Semantics. 2018 Jan 30;9(1):7
pubmed: 29382397
Hum Mutat. 2008 Mar;29(3):333-44
pubmed: 18058827
Adv Exp Med Biol. 2016;939:139-166
pubmed: 27807747
Nucleic Acids Res. 2014 Jan;42(Database issue):D358-63
pubmed: 24234451
Bioinformatics. 2018 Jan 1;34(1):80-87
pubmed: 28968638
Database (Oxford). 2014 Jun 09;2014:
pubmed: 24914232
Genome Biol. 2008;9 Suppl 2:S5
pubmed: 18834496
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2
pubmed: 25810773
Database (Oxford). 2017 Jan 10;2017:
pubmed: 28077563
Nucleic Acids Res. 2017 Jan 4;45(D1):D369-D379
pubmed: 27980099
J Biomed Inform. 2013 Oct;46(5):914-20
pubmed: 23906817
Bioinformatics. 2017 Jun 15;33(12):1852-1858
pubmed: 28200120
BMC Bioinformatics. 2011 Oct 03;12 Suppl 8:S4
pubmed: 22151968
Database (Oxford). 2012 Apr 18;2012:bas020
pubmed: 22513129
Database (Oxford). 2014 Jul 22;2014:
pubmed: 25052701
Genome Biol. 2008;9 Suppl 2:S1
pubmed: 18834487
BMC Bioinformatics. 2005;6 Suppl 1:S1
pubmed: 15960821
Database (Oxford). 2018 Jan 1;2018:
pubmed: 30295718
PLoS Comput Biol. 2016 Nov 30;12(11):e1005017
pubmed: 27902695
Genome Biol. 2008;9 Suppl 2:S4
pubmed: 18834495
BMC Bioinformatics. 2015 Apr 30;16:138
pubmed: 25925131
Bioinformatics. 2007 Jul 15;23(14):1862-5
pubmed: 17495998
Database (Oxford). 2016 Mar 19;2016:
pubmed: 26994911
BioData Min. 2016 Dec 19;9:41
pubmed: 28031747

Auteurs

Rezarta Islamaj Dogan (R)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.

Sun Kim (S)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.

Andrew Chatr-Aryamontri (A)

Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Canada.

Chih-Hsuan Wei (CH)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.

Donald C Comeau (DC)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.

Rui Antunes (R)

Department of Electronics, Telecommunications and Informatics (DETI)/Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal.

Sérgio Matos (S)

Department of Electronics, Telecommunications and Informatics (DETI)/Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal.

Qingyu Chen (Q)

School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia.

Aparna Elangovan (A)

School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia.

Nagesh C Panyam (NC)

School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia.

Karin Verspoor (K)

School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia.

Hongfang Liu (H)

Department of Health Science Research, Mayo Clinic, Rochester, MN, USA.

Yanshan Wang (Y)

Department of Health Science Research, Mayo Clinic, Rochester, MN, USA.

Zhuang Liu (Z)

School of Computer Science and Technology, Dalian University of Technology, Dalian, China.

Berna Altinel (B)

Department of Computer Engineering, Marmara University, Istanbul, Turkey.

Zehra Melce Hüsünbeyi (ZM)

Department of Computer Engineering, Bogaziçi University, Istanbul, Turkey.

Aris Fergadis (A)

School of Electrical and Computer Engineering, National Technical University of Athens, Zografou, Athens, Greece.

Chen-Kai Wang (CK)

Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan.

Hong-Jie Dai (HJ)

Department of Electrical Engineering, National Kaousiung University of Science and Technology, Kaohsiung, Taiwan.

Tung Tran (T)

Department of Computer Science, University of Kentucky, Lexington, KY, USA.

Ramakanth Kavuluru (R)

Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, USA.

Ling Luo (L)

College of Computer Science and Technology, Dalian University of Technology, Dalian, China.

Albert Steppi (A)

Department of Statistics, Florida State University, Florida, USA.

Jinfeng Zhang (J)

Department of Statistics, Florida State University, Florida, USA.

Jinchan Qu (J)

Department of Statistics, Florida State University, Florida, USA.

Zhiyong Lu (Z)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH