Multi-assignment clustering: Machine learning from a biological perspective.
Annotation enrichment
Clustering
K-means
Multiple cluster assignment
Pathways
Transcriptomics
Journal
Journal of biotechnology
ISSN: 1873-4863
Titre abrégé: J Biotechnol
Pays: Netherlands
ID NLM: 8411927
Informations de publication
Date de publication:
20 Jan 2021
20 Jan 2021
Historique:
received:
17
06
2020
accepted:
03
12
2020
pubmed:
8
12
2020
medline:
25
9
2021
entrez:
7
12
2020
Statut:
ppublish
Résumé
A common approach for analyzing large-scale molecular data is to cluster objects sharing similar characteristics. This assumes that genes with highly similar expression profiles are likely participating in a common molecular process. Biological systems are extremely complex and challenging to understand, with proteins having multiple functions that sometimes need to be activated or expressed in a time-dependent manner. Thus, the strategies applied for clustering of these molecules into groups are of key importance for translation of data to biologically interpretable findings. Here we implemented a multi-assignment clustering (MAsC) approach that allows molecules to be assigned to multiple clusters, rather than single ones as in commonly used clustering techniques. When applied to high-throughput transcriptomics data, MAsC increased power of the downstream pathway analysis and allowed identification of pathways with high biological relevance to the experimental setting and the biological systems studied. Multi-assignment clustering also reduced noise in the clustering partition by excluding genes with a low correlation to all of the resulting clusters. Together, these findings suggest that our methodology facilitates translation of large-scale molecular data into biological knowledge. The method is made available as an R package on GitLab (https://gitlab.com/wolftower/masc).
Identifiants
pubmed: 33285150
pii: S0168-1656(20)30324-2
doi: 10.1016/j.jbiotec.2020.12.002
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
1-10Informations de copyright
Copyright © 2020 The Authors. Published by Elsevier B.V. All rights reserved.