Multi-label classification and label dependence in in silico toxicity prediction.
Label dependence
Multi-label classification
Tox21
Toxicity prediction
Journal
Toxicology in vitro : an international journal published in association with BIBRA
ISSN: 1879-3177
Titre abrégé: Toxicol In Vitro
Pays: England
ID NLM: 8712158
Informations de publication
Date de publication:
Aug 2021
Aug 2021
Historique:
received:
30
01
2020
revised:
04
12
2020
accepted:
01
04
2021
pubmed:
12
4
2021
medline:
19
11
2021
entrez:
11
4
2021
Statut:
ppublish
Résumé
Most computational predictive models are specifically trained for a single toxicity endpoint and lack the ability to learn dependencies between endpoints, such as those targeting similar biological pathways. In this study, we compare the performance of 3 multi-label classification (MLC) models, namely Classifier Chains (CC), Label Powersets (LP) and Stacking (SBR), against independent classifiers (Binary Relevance) on Tox21 challenge data. Also, we develop a novel label dependence measure that shows full range of values, even at low prior probabilities, for the purpose of data-driven label partitioning. Using Logistic Regression as the base classifier and random label partitioning (k = 3), CC show statistically significant improvements in model performance using Hamming and multi-label accuracy scores (p<0.05), while SBR show significant improvements in multi-label accuracy scores. The weights in the Logistic Regression and Stacking models are positively associated with label dependencies, suggesting that learning label dependence is a key contributor to improving model performance. An original quantitative measure of label dependency is combined with the Louvain community detection method to learn label partitioning using a data-driven process. The resulting MLCs with learned label partitioning were generally found to be non-inferior to their corresponding random or no label partitioning counterparts. Additionally, using the Random Forest classifier in a 10-fold stratified cross validation Stacking model, we find that the top-performing stacking model out-performs the corresponding base model in 11 out of 12 Tox21 labels. Taken together, these results suggest that MLC models could potentially boost the performance of current single-endpoint predictive models and that label partitioning learning may be used in place of random label partitionings.
Identifiants
pubmed: 33839234
pii: S0887-2333(21)00082-5
doi: 10.1016/j.tiv.2021.105157
pii:
doi:
Substances chimiques
Hazardous Substances
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
105157Informations de copyright
Copyright © 2021 Elsevier Ltd. All rights reserved.