Non-negative Independent Factor Analysis disentangles discrete and continuous sources of variation in scRNA-seq data.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
13 05 2022
Historique:
received: 22 03 2021
revised: 25 02 2022
accepted: 17 03 2022
entrez: 13 5 2022
pubmed: 14 5 2022
medline: 18 5 2022
Statut: ppublish

Résumé

Single-cell RNA-seq analysis has emerged as a powerful tool for understanding inter-cellular heterogeneity. Due to the inherent noise of the data, computational techniques often rely on dimensionality reduction (DR) as both a pre-processing step and an analysis tool. Ideally, DR should preserve the biological information while discarding the noise. However, if the DR is to be used directly to gain biological insight it must also be interpretable-that is the individual dimensions of the reduction should correspond to specific biological variables such as cell-type identity or pathway activity. Maximizing biological interpretability necessitates making assumption about the data structures and the choice of the model is critical. We present a new probabilistic single-cell factor analysis model, Non-negative Independent Factor Analysis (NIFA), that incorporates different interpretability inducing assumptions into a single modeling framework. The key advantage of our NIFA model is that it simultaneously models uni- and multi-modal latent factors, and thus isolates discrete cell-type identity and continuous pathway activity into separate components. We apply our approach to a range of datasets where cell-type identity is known, and we show that NIFA-derived factors outperform results from ICA, PCA, NMF and scCoGAPS (an NMF method designed for single-cell data) in terms of disentangling biological sources of variation. Studying an immunotherapy dataset in detail, we show that NIFA is able to reproduce and refine previous findings in a single analysis framework and enables the discovery of new clinically relevant cell states. NFIA is a R package which is freely available at GitHub (https://github.com/wgmao/NIFA). The test dataset is archived at https://zenodo.org/record/6286646. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 35561207
pii: 6585446
doi: 10.1093/bioinformatics/btac136
pmc: PMC9113312
doi:

Types de publication

Journal Article Research Support, U.S. Gov't, Non-P.H.S. Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

2749-2756

Subventions

Organisme : NEI NIH HHS
ID : R01 EY030546
Pays : United States
Organisme : NIH HHS
ID : R01 HG009299-5
Pays : United States
Organisme : NIAID NIH HHS
ID : R01 AI043603
Pays : United States
Organisme : NIDDK NIH HHS
ID : U24 DK112331
Pays : United States

Informations de copyright

© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Références

Elife. 2019 Jul 08;8:
pubmed: 31282856
Genome Biol. 2017 Sep 12;18(1):174
pubmed: 28899397
Cell. 2018 Nov 1;175(4):998-1013.e20
pubmed: 30388456
Bioinformatics. 2010 Nov 1;26(21):2792-3
pubmed: 20810601
Int Immunopharmacol. 2011 Oct;11(10):1489-96
pubmed: 21635972
PLoS Comput Biol. 2010 May 06;6(5):e1000770
pubmed: 20463871
Neural Netw. 2000 May-Jun;13(4-5):411-30
pubmed: 10946390
Immunity. 2018 Apr 17;48(4):812-830.e14
pubmed: 29628290
Mol Syst Biol. 2019 Feb 22;15(2):e8557
pubmed: 30796088
F1000Res. 2018 Jul 26;7:1141
pubmed: 30271584
PeerJ. 2017 Jan 19;5:e2888
pubmed: 28133571
Nat Commun. 2017 Jan 16;8:14049
pubmed: 28091601
Genome Biol. 2017 Nov 15;18(1):220
pubmed: 29141660
Biostatistics. 2009 Jul;10(3):515-34
pubmed: 19377034
BMC Bioinformatics. 2020 Oct 14;21(1):453
pubmed: 33054706
Nat Methods. 2017 May;14(5):483-486
pubmed: 28346451
Bioinformatics. 2011 Jun 15;27(12):1739-40
pubmed: 21546393
Nature. 2016 Sep 29;537(7622):698-702
pubmed: 27580035
Immunity. 2019 Aug 20;51(2):381-397.e6
pubmed: 31350177
Nat Biotechnol. 2015 Feb;33(2):155-60
pubmed: 25599176
Nat Biotechnol. 2018 Jun;36(5):411-420
pubmed: 29608179
Nature. 2014 Dec 4;516(7529):56-61
pubmed: 25471879
Cell Syst. 2019 May 22;8(5):395-411.e8
pubmed: 31121116
PLoS Genet. 2010 Sep 16;6(9):e1001117
pubmed: 20862358
Genome Res. 2015 Dec;25(12):1860-72
pubmed: 26430063
BMC Bioinformatics. 2018 Jun 8;19(1):220
pubmed: 29884114
J Hematol Oncol. 2018 Aug 23;11(1):107
pubmed: 30139373
N Engl J Med. 2015 Jul 2;373(1):23-34
pubmed: 26027431

Auteurs

Weiguang Mao (W)

Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA.
Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA 15260, USA.

Maziyar Baran Pouyan (MB)

Department of Developmental Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA.

Dennis Kostka (D)

Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA.
Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA 15260, USA.
Department of Developmental Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA.
Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA.

Maria Chikina (M)

Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA.
Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA 15260, USA.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Drought Resistance Gene Expression Profiling Gene Expression Regulation, Plant Gossypium Multigene Family
Cephalometry Humans Anatomic Landmarks Software Internet

Classifications MeSH