Spectral clustering of single cells using Siamese nerual network combined with improved affinity matrix.


Journal

Briefings in bioinformatics
ISSN: 1477-4054
Titre abrégé: Brief Bioinform
Pays: England
ID NLM: 100912837

Informations de publication

Date de publication:
13 05 2022
Historique:
received: 13 01 2022
revised: 02 03 2022
accepted: 08 03 2022
pubmed: 15 4 2022
medline: 24 5 2022
entrez: 14 4 2022
Statut: ppublish

Résumé

Limitations of bulk sequencing techniques on cell heterogeneity and diversity analysis have been pushed with the development of single-cell RNA-sequencing (scRNA-seq). To detect clusters of cells is a key step in the analysis of scRNA-seq. However, the high-dimensionality of scRNA-seq data and the imbalances in the number of different subcellular types are ubiquitous in real scRNA-seq data sets, which poses a huge challenge to the single-cell-type detection.We propose a meta-learning-based model, SiaClust, which is the combination of Siamese Convolutional Neural Network (CNN) and improved spectral clustering, to achieve scRNA-seq cell type detection. To be specific, with the help of the constrained Sigmoid kernel, the raw high-dimensionality data is mapped to a low-dimensional space, and the Siamese CNN learns the differences between the cell types in the low-dimensional feature space. The similarity matrix learned by Siamese CNN is used in combination with improved spectral clustering and t-distribution Stochastic Neighbor Embedding (t-SNE) for visualization. SiaClust highlights the differences between cell types by comparing the similarity of the samples, whereas blurring the differences within the cell types is better in processing high-dimensional and imbalanced data. SiaClust significantly improves clustering accuracy by using data generated by nine different species and tissues through different scNA-seq protocols for extensive evaluation, as well as analogies to state-of-the-art single-cell clustering models. More importantly, SiaClust accurately locates the exact site of dropout gene, and is more flexible with data size and cell type.

Identifiants

pubmed: 35419595
pii: 6567703
doi: 10.1093/bib/bbac113
pii:
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Auteurs

Hanjing Jiang (H)

School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, 430074, Wuhan, China.

Yabing Huang (Y)

Renmin Hospital of Wuhan University, Department of Pathology, 430060, Wuhan, China.

Qianpeng Li (Q)

Chinese Academy of Sciences, Institute of Automation, 100190, Beijing, China.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Drought Resistance Gene Expression Profiling Gene Expression Regulation, Plant Gossypium Multigene Family
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic

Classifications MeSH