NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature.
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
25 03 2021
25 03 2021
Historique:
received:
26
11
2020
accepted:
19
01
2021
entrez:
26
3
2021
pubmed:
27
3
2021
medline:
28
5
2021
Statut:
epublish
Résumé
Automatically identifying chemical and drug names in scientific publications advances information access for this important class of entities in a variety of biomedical disciplines by enabling improved retrieval and linkage to related concepts. While current methods for tagging chemical entities were developed for the article title and abstract, their performance in the full article text is substantially lower. However, the full text frequently contains more detailed chemical information, such as the properties of chemical compounds, their biological effects and interactions with diseases, genes and other chemicals. We therefore present the NLM-Chem corpus, a full-text resource to support the development and evaluation of automated chemical entity taggers. The NLM-Chem corpus consists of 150 full-text articles, doubly annotated by ten expert NLM indexers, with ~5000 unique chemical name annotations, mapped to ~2000 MeSH identifiers. We also describe a substantially improved chemical entity tagger, with automated annotations for all of PubMed and PMC freely accessible through the PubTator web-based interface and API. The NLM-Chem corpus is freely available.
Identifiants
pubmed: 33767203
doi: 10.1038/s41597-021-00875-1
pii: 10.1038/s41597-021-00875-1
pmc: PMC7994842
doi:
Substances chimiques
Organic Chemicals
0
Pharmaceutical Preparations
0
Types de publication
Journal Article
Research Support, N.I.H., Intramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
91Subventions
Organisme : NLM NIH HHS
ID : R00 LM013001
Pays : United States
Organisme : U.S. Department of Health & Human Services | NIH | U.S. National Library of Medicine (NLM)
ID : Intramural Research Program
Références
Database (Oxford). 2009;2009:bap018
pubmed: 20157491
F1000Res. 2014 Apr 25;3:96
pubmed: 25254099
Nucleic Acids Res. 2020 Jul 2;48(W1):W5-W11
pubmed: 32383756
J Biomed Inform. 2014 Feb;47:1-10
pubmed: 24393765
Nucleic Acids Res. 2013 Jan;41(Database issue):D456-63
pubmed: 23180789
Mutat Res. 2002 Jan 29;499(1):27-52
pubmed: 11804603
ISRN Bioinform. 2012 Feb 15;2012:619427
pubmed: 25937941
Database (Oxford). 2012 Apr 18;2012:bas020
pubmed: 22513129
Database (Oxford). 2019 Jan 1;2019:
pubmed: 30698776
Chem Rev. 2017 Jun 28;117(12):7673-7761
pubmed: 28475312
Nucleic Acids Res. 2019 Jul 2;47(W1):W587-W593
pubmed: 31114887
Bioinformatics. 2019 Sep 15;35(18):3533-3535
pubmed: 30715220
BMC Bioinformatics. 2008 Sep 25;9:402
pubmed: 18817555
Database (Oxford). 2016 May 09;2016:
pubmed: 27161011
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1
pubmed: 25810766
J Cheminform. 2019 Jan 10;11(1):3
pubmed: 30631966
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S3
pubmed: 25810774
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
pubmed: 14681409
BMC Bioinformatics. 2012 Jul 09;13:161
pubmed: 22776079
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2
pubmed: 25810773
Database (Oxford). 2017 Jan 10;2017:
pubmed: 28077563
Brief Bioinform. 2018 Nov 27;19(6):1400-1414
pubmed: 28633401
Bioinformatics. 2016 Sep 15;32(18):2839-46
pubmed: 27283952
Database (Oxford). 2016 May 12;2016:
pubmed: 27173521
Database (Oxford). 2019 Jan 1;2019:
pubmed: 30689846
Database (Oxford). 2019 Jan 1;2019:
pubmed: 31267135