Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII.

United States Humans COVID-19 National Library of Medicine (U.S.) Data Mining Databases, Factual MEDLINE

Journal

Database : the journal of biological databases and curation

ISSN: 1758-0463

Titre abrégé: Database (Oxford)

Pays: England

ID NLM: 101517697

Informations de publication

Date de publication:
07 03 2023

Historique:

received: 19 08 2022

revised: 06 01 2023

accepted: 15 02 2023

entrez: 7 3 2023

pubmed: 8 3 2023

medline: 10 3 2023

Statut: ppublish

Résumé

The BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are one of the most searched biomedical entities in PubMed, and-as highlighted during the coronavirus disease 2019 pandemic-their identification may significantly advance research in multiple biomedical subfields. While previous community challenges focused on identifying chemical names mentioned in titles and abstracts, the full text contains valuable additional detail. We, therefore, organized the BioCreative NLM-Chem track as a community effort to address automated chemical entity recognition in full-text articles. The track consisted of two tasks: (i) chemical identification and (ii) chemical indexing. The chemical identification task required predicting all chemicals mentioned in recently published full-text articles, both span [i.e. named entity recognition (NER)] and normalization (i.e. entity linking), using Medical Subject Headings (MeSH). The chemical indexing task required identifying which chemicals reflect topics for each article and should therefore appear in the listing of MeSH terms for the document in the MEDLINE article indexing. This manuscript summarizes the BioCreative NLM-Chem track and post-challenge experiments. We received a total of 85 submissions from 17 teams worldwide. The highest performance achieved for the chemical identification task was 0.8672 F-score (0.8759 precision and 0.8587 recall) for strict NER performance and 0.8136 F-score (0.8621 precision and 0.7702 recall) for strict normalization performance. The highest performance achieved for the chemical indexing task was 0.6073 F-score (0.7417 precision and 0.5141 recall). This community challenge demonstrated that (i) the current substantial achievements in deep learning technologies can be utilized to improve automated prediction accuracy further and (ii) the chemical indexing task is substantially more challenging. We look forward to further developing biomedical text-mining methods to respond to the rapid growth of biomedical literature. The NLM-Chem track dataset and other challenge materials are publicly available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/.

Identifiants

DOI: 10.1093/database/baad005 PMID: 36882099 PMC: PMC9991492

pubmed: 36882099

pii: 7071696

doi: 10.1093/database/baad005

pmc: PMC9991492

pii:

doi:

Types de publication

Journal Article Research Support, N.I.H., Intramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Informations de copyright

Published by Oxford University Press 2023. This work is written by (a) US Government employee(s) and is in the public domain in the US.

Références

Nucleic Acids Res. 2021 Jan 8;49(D1):D1388-D1395

pubmed: 33151290

Bull Med Libr Assoc. 2000 Jul;88(3):265-6

pubmed: 10928714

F1000Res. 2014 Apr 25;3:96

pubmed: 25254099

BMC Bioinformatics. 2017 Aug 15;18(1):368

pubmed: 28810903

Sci Data. 2019 May 10;6(1):52

pubmed: 31076572

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2

pubmed: 25810773

J Cheminform. 2016 Jun 10;8:32

pubmed: 27293485

Bioinformatics. 2020 Feb 15;36(4):1234-1240

pubmed: 31501885

Sci Data. 2021 Mar 25;8(1):91

pubmed: 33767203

AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:561-568

pubmed: 32477678

Nucleic Acids Res. 2019 Jul 2;47(W1):W587-W593

pubmed: 31114887

J Am Med Inform Assoc. 2021 Aug 13;28(9):1892-1899

pubmed: 34157094

BMC Bioinformatics. 2008 Sep 25;9:402

pubmed: 18817555

Patterns (N Y). 2023 Jan 13;4(1):100659

pubmed: 36471749

Bioinformatics. 2008 Jul 1;24(13):i268-76

pubmed: 18586724

Database (Oxford). 2016 May 09;2016:

pubmed: 27161011

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1

pubmed: 25810766

Database (Oxford). 2022 Dec 1;2022:

pubmed: 36458799

Database (Oxford). 2009;2009:bap018

pubmed: 20157491

Proc AMIA Symp. 2001;:17-21

pubmed: 11825149

Front Res Metr Anal. 2021 Mar 25;6:654438

pubmed: 33870071

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S3

pubmed: 25810774

Genome Biol. 2008;9 Suppl 2:S2

pubmed: 18834493

PLoS Biol. 2020 Jun 1;18(6):e3000716

pubmed: 32479517

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70

pubmed: 14681409

BMC Bioinformatics. 2012 Jul 09;13:161

pubmed: 22776079

Nucleic Acids Res. 2021 Jan 8;49(D1):D1534-D1540

pubmed: 33166392

Proc AMIA Symp. 1999;:176-80

pubmed: 10566344

Nucleic Acids Res. 2013 Jan;41(Database issue):D456-63

pubmed: 23180789

Annu Rev Biomed Data Sci. 2021 Jul 20;4:313-339

pubmed: 34465169

J Am Med Inform Assoc. 2020 Oct 1;27(10):1529-1537

pubmed: 32968800

J Chem Inf Model. 2022 May 9;62(9):2035-2045

pubmed: 34115937

Chem Rev. 2017 Jun 28;117(12):7673-7761

pubmed: 28475312

Database (Oxford). 2013 Sep 18;2013:bat064

pubmed: 24048470

Pac Symp Biocomput. 2006;:28-39

pubmed: 17094225

BMC Bioinformatics. 2015 Apr 30;16:138

pubmed: 25925131

J Biomed Inform. 2007 Feb;40(1):30-43

pubmed: 16697710

Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Robert Leaman (R)

Rezarta Islamaj (R)

Virginia Adams (V)

Mohammed A Alliheedi (MA)

João Rafael Almeida (JR)

Rui Antunes (R)

Robert Bevan (R)

Yung-Chun Chang (YC)

Arslan Erdengasileng (A)

Matthew Hodgskiss (M)

Ryuki Ida (R)

Hyunjae Kim (H)

Keqiao Li (K)

Robert E Mercer (RE)

Lukrécia Mertová (L)

Ghadeer Mobasher (G)

Hoo-Chang Shin (HC)

Mujeen Sung (M)

Tomoki Tsujimura (T)

Wen-Chao Yeh (WC)

Zhiyong Lu (Z)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH