NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature.


Journal

Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192

Informations de publication

Date de publication:
25 03 2021
Historique:
received: 26 11 2020
accepted: 19 01 2021
entrez: 26 3 2021
pubmed: 27 3 2021
medline: 28 5 2021
Statut: epublish

Résumé

Automatically identifying chemical and drug names in scientific publications advances information access for this important class of entities in a variety of biomedical disciplines by enabling improved retrieval and linkage to related concepts. While current methods for tagging chemical entities were developed for the article title and abstract, their performance in the full article text is substantially lower. However, the full text frequently contains more detailed chemical information, such as the properties of chemical compounds, their biological effects and interactions with diseases, genes and other chemicals. We therefore present the NLM-Chem corpus, a full-text resource to support the development and evaluation of automated chemical entity taggers. The NLM-Chem corpus consists of 150 full-text articles, doubly annotated by ten expert NLM indexers, with ~5000 unique chemical name annotations, mapped to ~2000 MeSH identifiers. We also describe a substantially improved chemical entity tagger, with automated annotations for all of PubMed and PMC freely accessible through the PubTator web-based interface and API. The NLM-Chem corpus is freely available.

Identifiants

pubmed: 33767203
doi: 10.1038/s41597-021-00875-1
pii: 10.1038/s41597-021-00875-1
pmc: PMC7994842
doi:

Substances chimiques

Organic Chemicals 0
Pharmaceutical Preparations 0

Types de publication

Journal Article Research Support, N.I.H., Intramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

91

Subventions

Organisme : NLM NIH HHS
ID : R00 LM013001
Pays : United States
Organisme : U.S. Department of Health & Human Services | NIH | U.S. National Library of Medicine (NLM)
ID : Intramural Research Program

Références

Database (Oxford). 2009;2009:bap018
pubmed: 20157491
F1000Res. 2014 Apr 25;3:96
pubmed: 25254099
Nucleic Acids Res. 2020 Jul 2;48(W1):W5-W11
pubmed: 32383756
J Biomed Inform. 2014 Feb;47:1-10
pubmed: 24393765
Nucleic Acids Res. 2013 Jan;41(Database issue):D456-63
pubmed: 23180789
Mutat Res. 2002 Jan 29;499(1):27-52
pubmed: 11804603
ISRN Bioinform. 2012 Feb 15;2012:619427
pubmed: 25937941
Database (Oxford). 2012 Apr 18;2012:bas020
pubmed: 22513129
Database (Oxford). 2019 Jan 1;2019:
pubmed: 30698776
Chem Rev. 2017 Jun 28;117(12):7673-7761
pubmed: 28475312
Nucleic Acids Res. 2019 Jul 2;47(W1):W587-W593
pubmed: 31114887
Bioinformatics. 2019 Sep 15;35(18):3533-3535
pubmed: 30715220
BMC Bioinformatics. 2008 Sep 25;9:402
pubmed: 18817555
Database (Oxford). 2016 May 09;2016:
pubmed: 27161011
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1
pubmed: 25810766
J Cheminform. 2019 Jan 10;11(1):3
pubmed: 30631966
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S3
pubmed: 25810774
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
pubmed: 14681409
BMC Bioinformatics. 2012 Jul 09;13:161
pubmed: 22776079
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2
pubmed: 25810773
Database (Oxford). 2017 Jan 10;2017:
pubmed: 28077563
Brief Bioinform. 2018 Nov 27;19(6):1400-1414
pubmed: 28633401
Bioinformatics. 2016 Sep 15;32(18):2839-46
pubmed: 27283952
Database (Oxford). 2016 May 12;2016:
pubmed: 27173521
Database (Oxford). 2019 Jan 1;2019:
pubmed: 30689846
Database (Oxford). 2019 Jan 1;2019:
pubmed: 31267135

Auteurs

Rezarta Islamaj (R)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Robert Leaman (R)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Sun Kim (S)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Dongseop Kwon (D)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Chih-Hsuan Wei (CH)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Donald C Comeau (DC)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Yifan Peng (Y)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

David Cissel (D)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Cathleen Coss (C)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Carol Fisher (C)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Rob Guzman (R)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Preeti Gokal Kochar (PG)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Stella Koppel (S)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Dorothy Trinh (D)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Keiko Sekiya (K)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Janice Ward (J)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Deborah Whitman (D)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Susan Schmidt (S)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Zhiyong Lu (Z)

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA. Zhiyong.Lu@nih.gov.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Humans Pharmaceutical Preparations Drug Utilization Prescription Drugs

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Cephalometry Humans Anatomic Landmarks Software Internet

Classifications MeSH