Identification of the Core Chemical Structure in SureChEMBL Patents.


Journal

Journal of chemical information and modeling
ISSN: 1549-960X
Titre abrégé: J Chem Inf Model
Pays: United States
ID NLM: 101230060

Informations de publication

Date de publication:
24 05 2021
Historique:
pubmed: 1 5 2021
medline: 29 6 2021
entrez: 30 4 2021
Statut: ppublish

Résumé

The SureChEMBL database provides open access to 17 million chemical entities mentioned in 14 million patents published since 1970. However, alongside with molecules covered by patent claims, the database is full of starting materials and intermediate products of little pharmacological relevance. Herein, we introduce a new filtering protocol to automatically select the core chemical structures best representing a congeneric series of pharmacologically relevant molecules in patents. The protocol is first validated against a selection of 890 SureChEMBL patents for which a total of 51,738 manually curated molecules are deposited in ChEMBL. Our protocol was able to select 92.5% of the molecules in ChEMBL from all 270,968 molecules in SureChEMBL for those patents. Subsequently, the protocol was applied to all 240,988 US pharmacological patents for which 9,111,706 molecules are available in SureChEMBL. The unsupervised filtering process selected 5,949,214 molecules (65.3% of the total number of molecules) that form highly congeneric chemical series in 188,795 of those patents (78.3% of the total number of patents). A SureChEMBL version enriched with molecules of pharmacological relevance is available for download at https://ftp.ebi.ac.uk/pub/databases/chembl/SureChEMBLccs.

Identifiants

pubmed: 33929850
doi: 10.1021/acs.jcim.1c00151
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

2241-2247

Auteurs

Maria J Falaguera (MJ)

Research Group on Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute and University Pompeu Fabra, Parc de Recerca Biomèdica (PRBB), Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain.

Jordi Mestres (J)

Research Group on Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute and University Pompeu Fabra, Parc de Recerca Biomèdica (PRBB), Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain.

Articles similaires

Humans Recurrence Male Female Middle Aged

Real world data on cervical cancer treatment patterns, healthcare access and resource utilization in the Brazilian public healthcare system.

Thabata Martins Ferreira Campuzano, Maria Amelia Carlos Souto Maior Borba, Paula de Mendonça Batista et al.
1.00
Humans Female Uterine Cervical Neoplasms Brazil Middle Aged
Humans Female Breast Neoplasms Retrospective Studies Middle Aged
International Classification of Diseases Humans Skin Diseases Algorithms Germany

Classifications MeSH