Methodology of aiQSAR: a group-specific approach to QSAR modelling.

Acute oral toxicity Ames test Applicability domain measure BCF Local models Machine learning QSAR

Journal

Journal of cheminformatics
ISSN: 1758-2946
Titre abrégé: J Cheminform
Pays: England
ID NLM: 101516718

Informations de publication

Date de publication:
03 Apr 2019
Historique:
received: 08 01 2019
accepted: 25 03 2019
entrez: 5 4 2019
pubmed: 5 4 2019
medline: 5 4 2019
Statut: epublish

Résumé

Several QSAR methodology developments have shown promise in recent years. These include the consensus approach to generate the final prediction of a model, utilizing new, advanced machine learning algorithms and streamlining, standardization and automation of various QSAR steps. One approach that seems under-explored is at-the-runtime generation of local models specific to individual compounds. This approach was quite likely limited by the computational requirements, but with current increases in processing power and the widespread availability of cluster-computing infrastructure, this limitation is no longer that severe. We propose a new QSAR methodology: aiQSAR, whose aim is to generate endpoint predictions directly from the input dataset by building an array of local models generated at-the-runtime and specific for each compound in the dataset. The local group of each compound is selected on the basis of fingerprint similarities and the final prediction is calculated by integrating the results of a number of autonomous mathematical models. The method is applicable to regression, binary classification and multi-class classification and was tested on one dataset for each endpoint type: bioconcentration factor (BCF) for regression, Ames test for binary classification and Environmental Protection Agency (EPA) acute rat oral toxicity ranking for multi-class classification. As part of this method, the applicability domain of each prediction is assessed through the applicability domain measure, calculated on the basis of the fingerprint similarities in each local group of compounds. We outline the methodology for a new QSAR-based predictive tool whose advantages are automation, group-specific approach to modelling and simplicity of execution. Our aim now will be to develop this method into a stand-alone software tool. We hope that eventual adoption of our tool would make QSAR modelling more accessible and transparent. Our methodology could be used as an initial modelling step, to predict new compounds by simply loading the training dataset as an input. Predictions could then be further evaluated and refined either by other tools or through optimization of aiQSAR parameters.

Sections du résumé

BACKGROUND BACKGROUND
Several QSAR methodology developments have shown promise in recent years. These include the consensus approach to generate the final prediction of a model, utilizing new, advanced machine learning algorithms and streamlining, standardization and automation of various QSAR steps. One approach that seems under-explored is at-the-runtime generation of local models specific to individual compounds. This approach was quite likely limited by the computational requirements, but with current increases in processing power and the widespread availability of cluster-computing infrastructure, this limitation is no longer that severe.
RESULTS RESULTS
We propose a new QSAR methodology: aiQSAR, whose aim is to generate endpoint predictions directly from the input dataset by building an array of local models generated at-the-runtime and specific for each compound in the dataset. The local group of each compound is selected on the basis of fingerprint similarities and the final prediction is calculated by integrating the results of a number of autonomous mathematical models. The method is applicable to regression, binary classification and multi-class classification and was tested on one dataset for each endpoint type: bioconcentration factor (BCF) for regression, Ames test for binary classification and Environmental Protection Agency (EPA) acute rat oral toxicity ranking for multi-class classification. As part of this method, the applicability domain of each prediction is assessed through the applicability domain measure, calculated on the basis of the fingerprint similarities in each local group of compounds.
CONCLUSIONS CONCLUSIONS
We outline the methodology for a new QSAR-based predictive tool whose advantages are automation, group-specific approach to modelling and simplicity of execution. Our aim now will be to develop this method into a stand-alone software tool. We hope that eventual adoption of our tool would make QSAR modelling more accessible and transparent. Our methodology could be used as an initial modelling step, to predict new compounds by simply loading the training dataset as an input. Predictions could then be further evaluated and refined either by other tools or through optimization of aiQSAR parameters.

Identifiants

pubmed: 30945010
doi: 10.1186/s13321-019-0350-y
pii: 10.1186/s13321-019-0350-y
pmc: PMC6446381
doi:

Types de publication

Journal Article

Langues

eng

Pagination

27

Subventions

Organisme : H2020 Marie Skłodowska-Curie Actions
ID : MSCA-ITN-2016
Organisme : H2020 Marie Skłodowska-Curie Actions
ID : 721975

Références

Mutat Res. 2000 Nov 20;455(1-2):29-60
pubmed: 11113466
J Chem Inf Comput Sci. 2003 May-Jun;43(3):707-20
pubmed: 12767129
J Chem Inf Model. 2006 Jul-Aug;46(4):1836-47
pubmed: 16859315
J Chem Inf Model. 2007 Jan-Feb;47(1):159-69
pubmed: 17238261
J Chem Inf Model. 2007 Jul-Aug;47(4):1460-8
pubmed: 17616180
Chemosphere. 2008 Dec;73(11):1701-7
pubmed: 18954891
Chem Cent J. 2010 Jul 29;4 Suppl 1:I1
pubmed: 20678179
J Stat Softw. 2010;33(1):1-22
pubmed: 20808728
J Cheminform. 2011 Oct 07;3:33
pubmed: 21982300
Bioinformatics. 2012 Dec 15;28(24):3169-77
pubmed: 23060614
Front Pharmacol. 2013 Apr 09;4:38
pubmed: 23761761
J Med Chem. 2014 Jun 26;57(12):4977-5010
pubmed: 24351051
J Am Assoc Lab Anim Sci. 2015 Mar;54(2):120-32
pubmed: 25836957
J Cheminform. 2015 May 20;7:20
pubmed: 26052348
J Environ Sci Health C Environ Carcinog Ecotoxicol Rev. 2017 Oct 2;35(4):239-257
pubmed: 29027864
J Cheminform. 2017 May 4;9(1):27
pubmed: 29086046
Sci Rep. 2018 Jun 14;8(1):9110
pubmed: 29904147
J Chem Inf Model. 2018 Aug 27;58(8):1501-1517
pubmed: 29949360
Comput Toxicol. 2018 Nov;8(11):21-24
pubmed: 30320239
J Cheminform. 2018 Dec 10;10(1):60
pubmed: 30536051
Environ Health Perspect. 1998 Apr;106 Suppl 2:497-503
pubmed: 9599698

Auteurs

Kristijan Vukovic (K)

Istituto di Ricerche Farmacologiche Mario Negri-IRCCS, Via Mario Negri 2, 20156, Milan, Italy. kristijan.vukovic@marionegri.it.
Jozef Stefan International Postgraduate School, Jamova cesta 39, 1000, Ljubljana, Slovenia. kristijan.vukovic@marionegri.it.

Domenico Gadaleta (D)

Istituto di Ricerche Farmacologiche Mario Negri-IRCCS, Via Mario Negri 2, 20156, Milan, Italy.

Emilio Benfenati (E)

Istituto di Ricerche Farmacologiche Mario Negri-IRCCS, Via Mario Negri 2, 20156, Milan, Italy.

Classifications MeSH