Multi-assignment clustering: Machine learning from a biological perspective.

Annotation enrichment Clustering K-means Multiple cluster assignment Pathways Transcriptomics

Journal

Journal of biotechnology
ISSN: 1873-4863
Titre abrégé: J Biotechnol
Pays: Netherlands
ID NLM: 8411927

Informations de publication

Date de publication:
20 Jan 2021
Historique:
received: 17 06 2020
accepted: 03 12 2020
pubmed: 8 12 2020
medline: 25 9 2021
entrez: 7 12 2020
Statut: ppublish

Résumé

A common approach for analyzing large-scale molecular data is to cluster objects sharing similar characteristics. This assumes that genes with highly similar expression profiles are likely participating in a common molecular process. Biological systems are extremely complex and challenging to understand, with proteins having multiple functions that sometimes need to be activated or expressed in a time-dependent manner. Thus, the strategies applied for clustering of these molecules into groups are of key importance for translation of data to biologically interpretable findings. Here we implemented a multi-assignment clustering (MAsC) approach that allows molecules to be assigned to multiple clusters, rather than single ones as in commonly used clustering techniques. When applied to high-throughput transcriptomics data, MAsC increased power of the downstream pathway analysis and allowed identification of pathways with high biological relevance to the experimental setting and the biological systems studied. Multi-assignment clustering also reduced noise in the clustering partition by excluding genes with a low correlation to all of the resulting clusters. Together, these findings suggest that our methodology facilitates translation of large-scale molecular data into biological knowledge. The method is made available as an R package on GitLab (https://gitlab.com/wolftower/masc).

Identifiants

pubmed: 33285150
pii: S0168-1656(20)30324-2
doi: 10.1016/j.jbiotec.2020.12.002
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

1-10

Informations de copyright

Copyright © 2020 The Authors. Published by Elsevier B.V. All rights reserved.

Auteurs

Benjamin Ulfenborg (B)

School of Bioscience, University of Skövde, Skövde, Sweden. Electronic address: benjamin.ulfenborg@his.se.

Alexander Karlsson (A)

School of Informatics, University of Skövde, Skövde, Sweden.

Maria Riveiro (M)

School of Informatics, University of Skövde, Skövde, Sweden; Department of Computer Science and Informatics, School of Engineering, Jönköping University, Jönköping, Sweden.

Christian X Andersson (CX)

Takara Bio Europe AB, Gothenburg, Sweden.

Peter Sartipy (P)

School of Bioscience, University of Skövde, Skövde, Sweden.

Jane Synnergren (J)

School of Bioscience, University of Skövde, Skövde, Sweden.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Drought Resistance Gene Expression Profiling Gene Expression Regulation, Plant Gossypium Multigene Family

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis

Classifications MeSH