QSAR modelling of a large imbalanced aryl hydrocarbon activation dataset by rational and random sampling and screening of 80,086 REACH pre-registered and/or registered substances.


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2019
Historique:
received: 26 09 2018
accepted: 01 03 2019
entrez: 15 3 2019
pubmed: 15 3 2019
medline: 18 12 2019
Statut: epublish

Résumé

The Aryl hydrocarbon receptor (AhR) plays important roles in many normal and pathological physiological processes, including endocrine homeostasis, foetal development, cell cycle regulation, cellular oxidation/antioxidation, immune regulation, metabolism of endogenous and exogenous substances, and carcinogenesis. An experimental data set for human in vitro AhR activation comprising 324,858 substances, of which 1,982 were confirmed actives, was used to test an in-house-developed approach to rationally select Quantitative Structure-Activity Relationship (QSAR) training set substances from an unbalanced data set. In the first iteration, active and inactive substances were selected by random to make QSAR models. Then, more inactive substances were added to the training set in two further iterations based on incorrect or out-of-domain predictions to produce larger models. The resulting 'rational' model, comprising 832 actives and four times as many inactives, i.e. 3,328, was compared to a model with a training set of same size and proportion of inactives chosen entirely by random. Both models underwent robust cross-validation and external validation showing good statistical performance, with the rational model having external validation sensitivity of 85.1% and specificity of 97.1%, compared to the random model with sensitivity 89.1% and specificity 91.3%. Furthermore, we integrated the training sets for both models with the 93 external validation test set actives and 372 randomly selected inactives to make two final models. They also underwent external validations for specificity and cross-validations, which confirmed that good predictivity was maintained. All developed models were applied to predict 80,086 EU REACH substances. The rational and random final models had 63.1% and 56.9% coverage of the REACH set, respectively, and predicted 1,256 and 3,214 substances as actives. The final models as well as predictions for AhR activation for 650,000 substances will be published in the Danish (Q)SAR Database and can, for example, be used for priority setting, in read-across predictions and in weight-of-evidence assessments of chemicals.

Identifiants

pubmed: 30870500
doi: 10.1371/journal.pone.0213848
pii: PONE-D-18-28085
pmc: PMC6417725
doi:

Substances chimiques

Hydrocarbons, Aromatic 0
Receptors, Aryl Hydrocarbon 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0213848

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Bioinformatics. 2009 Dec 15;25(24):3310-6
pubmed: 19825798
Toxicol In Vitro. 2013 Jun;27(4):1320-46
pubmed: 23453986
Metabolism. 2001 Sep;50(9):1001-3
pubmed: 11555828
J Mol Graph Model. 2017 Mar;72:256-265
pubmed: 28135672
Environ Health Perspect. 2009 Jul;117(7):1139-46
pubmed: 19654925
Pharmacol Ther. 1993 Feb-Mar;57(2-3):237-57
pubmed: 8361994
J Chem Inf Comput Sci. 2000 Nov-Dec;40(6):1302-14
pubmed: 11128088
J Chem Inf Model. 2006 Nov-Dec;46(6):2537-51
pubmed: 17125194
Sci Total Environ. 2014 Jun 1;482-483:358-65
pubmed: 24662204
J Chem Inf Model. 2014 Mar 24;54(3):705-12
pubmed: 24524735
Nucleic Acids Res. 2016 Jan 4;44(D1):D1202-13
pubmed: 26400175
Breast Cancer Res. 2004;6(6):246-54
pubmed: 15535854
Int J Androl. 2008 Apr;31(2):209-23
pubmed: 18217984
Cancer Lett. 2005 Sep 28;227(2):115-24
pubmed: 16112414
Toxicol Sci. 2011 Nov;124(1):1-22
pubmed: 21908767
Mol Inform. 2010 Jul 12;29(6-7):476-88
pubmed: 27463326
Expert Opin Drug Metab Toxicol. 2010 Apr;6(4):505-18
pubmed: 20074001
Nat Chem Biol. 2015 Aug;11(8):535
pubmed: 26196763
Gene. 1991 Jul 22;103(2):171-7
pubmed: 1889744
Immunology. 2016 Jan;147(1):41-54
pubmed: 26555456

Auteurs

Kyrylo Klimenko (K)

Division of Diet, Disease Prevention and Toxicology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark.

Sine A Rosenberg (SA)

Division of Diet, Disease Prevention and Toxicology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark.

Marianne Dybdahl (M)

Division of Diet, Disease Prevention and Toxicology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark.

Eva B Wedebye (EB)

Division of Diet, Disease Prevention and Toxicology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark.

Nikolai G Nikolov (NG)

Division of Diet, Disease Prevention and Toxicology, National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH