Multi-label classification and label dependence in in silico toxicity prediction.

Biological Assay Decision Trees Hazardous Substances / classification Logistic Models Machine Learning Models, Theoretical Toxicity Tests

Label dependence Multi-label classification Tox21 Toxicity prediction

Journal

Toxicology in vitro : an international journal published in association with BIBRA

ISSN: 1879-3177

Titre abrégé: Toxicol In Vitro

Pays: England

ID NLM: 8712158

Informations de publication

Date de publication:
Aug 2021

Historique:

received: 30 01 2020

revised: 04 12 2020

accepted: 01 04 2021

pubmed: 12 4 2021

medline: 19 11 2021

entrez: 11 4 2021

Statut: ppublish

Résumé

Most computational predictive models are specifically trained for a single toxicity endpoint and lack the ability to learn dependencies between endpoints, such as those targeting similar biological pathways. In this study, we compare the performance of 3 multi-label classification (MLC) models, namely Classifier Chains (CC), Label Powersets (LP) and Stacking (SBR), against independent classifiers (Binary Relevance) on Tox21 challenge data. Also, we develop a novel label dependence measure that shows full range of values, even at low prior probabilities, for the purpose of data-driven label partitioning. Using Logistic Regression as the base classifier and random label partitioning (k = 3), CC show statistically significant improvements in model performance using Hamming and multi-label accuracy scores (p<0.05), while SBR show significant improvements in multi-label accuracy scores. The weights in the Logistic Regression and Stacking models are positively associated with label dependencies, suggesting that learning label dependence is a key contributor to improving model performance. An original quantitative measure of label dependency is combined with the Louvain community detection method to learn label partitioning using a data-driven process. The resulting MLCs with learned label partitioning were generally found to be non-inferior to their corresponding random or no label partitioning counterparts. Additionally, using the Random Forest classifier in a 10-fold stratified cross validation Stacking model, we find that the top-performing stacking model out-performs the corresponding base model in 11 out of 12 Tox21 labels. Taken together, these results suggest that MLC models could potentially boost the performance of current single-endpoint predictive models and that label partitioning learning may be used in place of random label partitionings.

Identifiants

DOI: 10.1016/j.tiv.2021.105157 PMID: 33839234

pubmed: 33839234

pii: S0887-2333(21)00082-5

doi: 10.1016/j.tiv.2021.105157

pii:

doi:

Substances chimiques

Hazardous Substances 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

105157

Multi-label classification and label dependence in in silico toxicity prediction.

Journal

Informations de publication

Résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Auteurs

Xiu Huan Yap (XH)

Michael Raymer (M)

Articles similaires

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Understanding the role of machine learning in predicting progression of osteoarthritis.

Mathematical modeling of vancomycin release from Poly-L-Lactic Acid-Coated implants.

Editorial: Artificial Intelligence (AI), Digital Image Analysis, and the Future of Cancer Diagnosis and Prognosis.

Classifications MeSH