Multi-label classification and label dependence in in silico toxicity prediction.


Journal

Toxicology in vitro : an international journal published in association with BIBRA
ISSN: 1879-3177
Titre abrégé: Toxicol In Vitro
Pays: England
ID NLM: 8712158

Informations de publication

Date de publication:
Aug 2021
Historique:
received: 30 01 2020
revised: 04 12 2020
accepted: 01 04 2021
pubmed: 12 4 2021
medline: 19 11 2021
entrez: 11 4 2021
Statut: ppublish

Résumé

Most computational predictive models are specifically trained for a single toxicity endpoint and lack the ability to learn dependencies between endpoints, such as those targeting similar biological pathways. In this study, we compare the performance of 3 multi-label classification (MLC) models, namely Classifier Chains (CC), Label Powersets (LP) and Stacking (SBR), against independent classifiers (Binary Relevance) on Tox21 challenge data. Also, we develop a novel label dependence measure that shows full range of values, even at low prior probabilities, for the purpose of data-driven label partitioning. Using Logistic Regression as the base classifier and random label partitioning (k = 3), CC show statistically significant improvements in model performance using Hamming and multi-label accuracy scores (p<0.05), while SBR show significant improvements in multi-label accuracy scores. The weights in the Logistic Regression and Stacking models are positively associated with label dependencies, suggesting that learning label dependence is a key contributor to improving model performance. An original quantitative measure of label dependency is combined with the Louvain community detection method to learn label partitioning using a data-driven process. The resulting MLCs with learned label partitioning were generally found to be non-inferior to their corresponding random or no label partitioning counterparts. Additionally, using the Random Forest classifier in a 10-fold stratified cross validation Stacking model, we find that the top-performing stacking model out-performs the corresponding base model in 11 out of 12 Tox21 labels. Taken together, these results suggest that MLC models could potentially boost the performance of current single-endpoint predictive models and that label partitioning learning may be used in place of random label partitionings.

Identifiants

pubmed: 33839234
pii: S0887-2333(21)00082-5
doi: 10.1016/j.tiv.2021.105157
pii:
doi:

Substances chimiques

Hazardous Substances 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

105157

Informations de copyright

Copyright © 2021 Elsevier Ltd. All rights reserved.

Auteurs

Xiu Huan Yap (XH)

Biomedical Sciences PhD Program, Wright State University, Dayton, OH, USA. Electronic address: yap.4@wright.edu.

Michael Raymer (M)

Department of Computer Science and Engineering, Wright State University, Dayton, OH, USA.

Articles similaires

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis
Vancomycin Polyesters Anti-Bacterial Agents Models, Theoretical Drug Liberation
Humans Artificial Intelligence Neoplasms Prognosis Image Processing, Computer-Assisted

Classifications MeSH