Anti-correlated feature selection prevents false discovery of subpopulations in scRNAseq.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
24 Jan 2024
Historique:
received: 13 12 2022
accepted: 07 11 2023
medline: 25 1 2024
pubmed: 25 1 2024
entrez: 24 1 2024
Statut: epublish

Résumé

While sub-clustering cell-populations has become popular in single cell-omics, negative controls for this process are lacking. Popular feature-selection/clustering algorithms fail the null-dataset problem, allowing erroneous subdivisions of homogenous clusters until nearly each cell is called its own cluster. Using real and synthetic datasets, we find that anti-correlated gene selection reduces or eliminates erroneous subdivisions, increases marker-gene selection efficacy, and efficiently scales to millions of cells.

Identifiants

pubmed: 38267438
doi: 10.1038/s41467-023-43406-9
pii: 10.1038/s41467-023-43406-9
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

699

Subventions

Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : K99HG011270

Informations de copyright

© 2024. The Author(s).

Références

Yang, P., Huang, H. & Liu, C. Feature selection revisited in the single-cell era. Genome Biol. 22, 321 (2021).
pubmed: 34847932 pmcid: 8638336 doi: 10.1186/s13059-021-02544-3
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e1821 (2019).
pubmed: 31178118 pmcid: 6687398 doi: 10.1016/j.cell.2019.05.031
Tyler, S. R. et al. PyMINEr finds gene and autocrine-paracrine networks from human Islet scRNA-Seq. Cell Rep. 26, 1951–1964.e1958 (2019).
pubmed: 30759402 pmcid: 6394844 doi: 10.1016/j.celrep.2019.01.063
Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093 (2013).
pubmed: 24056876 doi: 10.1038/nmeth.2645
Andrews, T. S. & Hemberg, M. M3Drop: dropout-based feature selection for scRNASeq. Bioinformatics 35, 2865–2867 (2018).
pmcid: 6691329 doi: 10.1093/bioinformatics/bty1044
Townes, F. W., Hicks, S. C., Aryee, M. J. & Irizarry, R. A. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol. 20, 295 (2019).
pubmed: 31870412 pmcid: 6927135 doi: 10.1186/s13059-019-1861-6
Kim, T. H., Zhou, X. & Chen, M. Demystifying “drop-outs” in single-cell UMI data. Genome Biol. 21, 196 (2020).
pubmed: 32762710 pmcid: 7412673 doi: 10.1186/s13059-020-02096-y
Madissoon, E. et al. scRNA-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation. Genome Biol. 21, 1 (2019).
pubmed: 31892341 pmcid: 6937944 doi: 10.1186/s13059-019-1906-x
Cui, Y. et al. Single-cell transcriptome analysis maps the developmental track of the human heart. Cell Rep. 26, 1934–1950.e1935 (2019).
pubmed: 30759401 doi: 10.1016/j.celrep.2019.01.079
Kaplan, N. et al. Single-Cell RNA transcriptome helps define the limbal/corneal epithelial stem/early transit amplifying cells and how autophagy affects this population. Investig. Ophthalmol. Vis. Sci. 60, 3570–3583 (2019).
doi: 10.1167/iovs.19-27656
Ayyaz, A. et al. Single-cell transcriptomes of the regenerating intestine reveal a revival stem cell. Nature 569, 121–125 (2019).
pubmed: 31019301 doi: 10.1038/s41586-019-1154-y
Kiselev, V. Y., Andrews, T. S. & Hemberg, M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 20, 273–282 (2019).
pubmed: 30617341 doi: 10.1038/s41576-018-0088-9
Kleinberg, J. An impossibility theorem for clustering. Adv. Neural Inf. Process. Syst. 15, 463–470 (2003).
Liu, H. et al. Systematically labeling developmental stage-specific genes for the study of pancreatic β-cell differentiation from human embryonic stem cells. Cell Res. 24, 1181–1200 (2014).
pubmed: 25190258 pmcid: 4185345 doi: 10.1038/cr.2014.118
Andrews, T.S. & Hemberg, M. Dropout-based feature selection for scRNASeq. bioRxiv, 065094 (2018).
Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).
pubmed: 28899397 pmcid: 5596896 doi: 10.1186/s13059-017-1305-0
Habib, N. et al. Div-Seq: single-nucleus RNA-Seq reveals dynamics of rare adult newborn neurons. Science 353, 925–928 (2016).
pubmed: 27471252 pmcid: 5480621 doi: 10.1126/science.aad7038
Dibaeinia, P. & Sinha, S. SERGIO: a single-cell expression simulator guided by gene regulatory networks. Cell Syst. 11, 252–271.e211 (2020).
pubmed: 32871105 pmcid: 7530147 doi: 10.1016/j.cels.2020.08.003
Gibson, G. Perspectives on rigor and reproducibility in single cell genomics. PLOS Genet. 18, e1010210 (2022).
pubmed: 35536863 pmcid: 9122178 doi: 10.1371/journal.pgen.1010210
Guo, M., Wang, H., Potter, S. S., Whitsett, J. A. & Xu, Y. SINCERA: a pipeline for single-cell RNA-Seq profiling analysis. PLoS Comput. Biol. 11, e1004575 (2015).
pubmed: 26600239 pmcid: 4658017 doi: 10.1371/journal.pcbi.1004575
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573-3587.e29 (2021).
Quah, F. X. & Hemberg, M. SC3s: efficient scaling of single cell consensus clustering to millions of cells. BMC Bioinforma. 23, 536 (2022).
doi: 10.1186/s12859-022-05085-z
Tran, B., Tran, D., Nguyen, H., Ro, S. & Nguyen, T. scCAN: single-cell clustering using autoencoder and network fusion. Sci. Rep. 12, 10267 (2022).
pubmed: 35715568 pmcid: 9206025 doi: 10.1038/s41598-022-14218-6
Tran, D. et al. Fast and precise single-cell data analysis using a hierarchical autoencoder. Nat. Commun. 12, 1029 (2021).
pubmed: 33589635 pmcid: 7884436 doi: 10.1038/s41467-021-21312-2
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
pubmed: 29409532 pmcid: 5802054 doi: 10.1186/s13059-017-1382-0
Li, J. et al. Single-cell transcriptomes reveal characteristic features of human pancreatic islet cell types. EMBO Rep. 17, 178–187 (2016).
pubmed: 26691212 doi: 10.15252/embr.201540946
Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394.e383 (2016).
pubmed: 27693023 pmcid: 5092539 doi: 10.1016/j.cels.2016.09.002
Segerstolpe, Å. et al. Single-cell transcriptome profiling of human pancreatic islets in health and Type 2 diabetes. Cell Metab. 24, 593–607 (2016).
pubmed: 27667667 pmcid: 5069352 doi: 10.1016/j.cmet.2016.08.020
Wang, Y. J. et al. Single-cell transcriptomics of the human endocrine pancreas. Diabetes 65, 3028–3038 (2016).
pubmed: 27364731 pmcid: 5033269 doi: 10.2337/db16-0405
Xin, Y. et al. RNA sequencing of single human islet cells reveals Type 2 diabetes genes. Cell Metab. 24, 608–615 (2016).
pubmed: 27667665 doi: 10.1016/j.cmet.2016.08.018
Raudvere, U. et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 47, W191–W198 (2019).
pubmed: 31066453 pmcid: 6602461 doi: 10.1093/nar/gkz369
Uhlén, M. et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
pubmed: 25613900 doi: 10.1126/science.1260419
Almanzar, N. et al. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature 583, 590–595 (2020).
doi: 10.1038/s41586-020-2496-1
Brereton, M. F., Vergari, E., Zhang, Q. & Clark, A. Alpha-, Delta- and PP-cells: are they the architectural cornerstones of islet structure and co-ordination? J. Histochem. Cytochem. 63, 575–591 (2015).
pubmed: 26216135 pmcid: 4530398 doi: 10.1369/0022155415583535
Yoshida, M. et al. Local and systemic responses to SARS-CoV-2 infection in children and adults. Nature 602, 321–327 (2022).
pubmed: 34937051 doi: 10.1038/s41586-021-04345-x
Hoffman, W., Lakkis, F. G. & Chalasani, G. B Cells, antibodies, and More. Clin. J. Am. Soc. Nephrol. 11, 137–154 (2016).
pubmed: 26700440 doi: 10.2215/CJN.09430915
Li, H. et al. Identification of novel B-1 transitional progenitors by B-1 lymphocyte fate-mapping transgenic mouse model Bhlhe41dTomato-Cre. Front. Immunol. 13, https://www.frontiersin.org/articles/10.3389/fimmu.2022.946202/full (2022).
Mousset, C. M. et al. Comprehensive phenotyping of T cells using flow cytometry. Cytom. Part A 95, 647–654 (2019).
doi: 10.1002/cyto.a.23724
Kumar, B. V., Connors, T. J. & Farber, D. L. Human T cell development, localization, and function throughout life. Immunity 48, 202–213 (2018).
pubmed: 29466753 pmcid: 5826622 doi: 10.1016/j.immuni.2018.01.007
van den Broek, T., Borghans, J. A. M. & van Wijk, F. The full spectrum of human naive T cells. Nat. Rev. Immunol. 18, 363–373 (2018).
pubmed: 29520044 doi: 10.1038/s41577-018-0001-y
Abbott, R. J. M. et al. Structural and functional characterization of a Novel T cell receptor co-regulatory protein complex, CD97-CD55 *. J. Biol. Chem. 282, 22023–22032 (2007).
pubmed: 17449467 doi: 10.1074/jbc.M702588200
Paillard, F., Sterkers, G. & Vaquero, C. Transcriptional and post-transcriptional regulation of TcR, CD4 and CD8 gene expression during activation of normal human T lymphocytes. EMBO J. 9, 1867–1872 (1990).
pubmed: 2140772 pmcid: 551892 doi: 10.1002/j.1460-2075.1990.tb08312.x
Utzschneider, D. T. et al. Early precursor T cells establish and propagate T cell exhaustion in chronic infection. Nat. Immunol. 21, 1256–1266 (2020).
pubmed: 32839610 doi: 10.1038/s41590-020-0760-z
Buzzelli, A. A., McWilliams, I. L., Shin, B., Bryars, M. T. & Harrington, L. E. Intrinsic STAT4 expression controls effector CD4 T cell migration and Th17 pathogenicity. J. Immunol 210, 1667–1676 (2023).
pubmed: 37093664 doi: 10.4049/jimmunol.2200606
Mahajan, S. et al. The role of ICOS in the development of CD4 T cell help and the reactivation of memory T cells. Eur. J. Immunol. 37, 1796–1808 (2007).
pubmed: 17549732 pmcid: 2699381 doi: 10.1002/eji.200636661
Chatenoud, L. Natural and induced T CD4+CD25+FOXP3+ regulatory T cells. Methods Mol. Biol. 677, 3–13 (2011).
pubmed: 20941599 doi: 10.1007/978-1-60761-869-0_1
Tyler, S.R., Bunyavanich, S. & Schadt, E.E. PMD uncovers widespread cell-state erasure by scRNAseq batch correction methods. bioRxiv, 2021.2011.2015.468733 (2021).
Vallania, F. et al. Multicohort analysis identifies monocyte gene signatures to accurately monitor subset-specific changes in human diseases. Front. Immunol. 12, 659255 (2021).
pubmed: 34054824 pmcid: 8160521 doi: 10.3389/fimmu.2021.659255
Zhang, B. et al. Single-cell RNA sequencing reveals induction of distinct trained-immunity programs in human monocytes. J. Clin. Investig. 132, https://www.jci.org/articles/view/147719/cite (2022).
Padmos, R. C. et al. Distinct monocyte gene-expression profiles in autoimmune diabetes. Diabetes 57, 2768–2773 (2008).
pubmed: 18599519 pmcid: 2551688 doi: 10.2337/db08-0496
Martinez, F. O., Combes, T. W., Orsenigo, F. & Gordon, S. Monocyte activation in systemic Covid-19 infection: Assay and rationale. EBioMedicine 59, 102964 (2020).
pubmed: 32861199 pmcid: 7456455 doi: 10.1016/j.ebiom.2020.102964
Travelli, C., Colombo, G., Mola, S., Genazzani, A. A. & Porta, C. NAMPT: a pleiotropic modulator of monocytes and macrophages. Pharmacol. Res. 135, 25–36 (2018).
pubmed: 30031171 doi: 10.1016/j.phrs.2018.06.022
Shalova, I. N. et al. Human monocytes undergo functional re-programming during sepsis mediated by hypoxia-inducible factor-1α. Immunity 42, 484–498 (2015).
pubmed: 25746953 doi: 10.1016/j.immuni.2015.02.001
Caroline, C. B., Elisabeth, L. P., Guylaine, M. S. & Darren, E. R. Hypoxic gene activation by lipopolysaccharide in macrophages: implication of hypoxia-inducible factor 1α. Blood 103, 1124–1130 (2004).
doi: 10.1182/blood-2003-07-2427
Hagberg, A., Chult, D. S. & Swart, P. Exploring network structure, dynamics, and function using NetworkX. In Proc. 7th Python in Science conference (SciPy 2008) (eds Varoquaux, G., Vaught, T. & Millman, J.) 11–15 (SciPy, 2008).
Ellson, J., Gansner, E.R., Koutsofios, E., North, S.C. & Woodhull, G. in Graph Drawing Software. Mathematics and Visualization (eds Jünger, M. & Mutzel, P.) 127–148 (Springer, 2004).
10x.Genomics 1k Heart Cells from an E18 mouse (v3 chemistry). https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/heart_1k_v3 (2018).
10x.Genomics 1k PBMCs from a Healthy Donor (v3 chemistry). https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_1k_v3 (2018).
Villani, A.-C. et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 356, eaah4573 (2017).
pubmed: 28428369 pmcid: 5775029 doi: 10.1126/science.aah4573
Tran, V. et al. High sensitivity single cell RNA sequencing with split pool barcoding. bioRxiv, 2022.2008.2027.505512 (2022).
McCarthy, D. J., Campbell, K. R., Lun, A. T. L. & Wills, Q. F. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179–1186 (2017).
pubmed: 28088763 pmcid: 5408845 doi: 10.1093/bioinformatics/btw777
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
pubmed: 20196867 pmcid: 2864565 doi: 10.1186/gb-2010-11-3-r25
Chari T, Pachter L (2023) The specious art of single-cell genomics. PLOS Computational Biology 19(8): e1011288. https://doi.org/10.1371/journal.pcbi.1011288 .
Page, L., Brin, S., Motwani, R. & Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report SIDL-WP-1999-0120, Stanford Digital Library Technologies Project (Stanford InfoLab, 1999).
Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Seabold, S. & Perktold, J. in Proceedings of the 9th Python in Science Conference, Vol. 57 10-25080 (Austin, TX, 2010).
Tyler, S. R., Guccione, E. & Schadt, E. E. L. -O. D. Anti-correlated Feature Selection Prevents False Discovery of Subpopulations in scRNAseq. figshare https://doi.org/10.6084/m9.figshare.23571921 (2023).

Auteurs

Scott R Tyler (SR)

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA. scottyler89@gmail.com.
Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA. scottyler89@gmail.com.

Daniel Lozano-Ojalvo (D)

Department of Dermatology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Ernesto Guccione (E)

Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Center for Therapeutics Discovery, Department of Oncological Sciences and Pharmacological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Bioinformatics for Next Generation Sequencing (BiNGS) Shared Resource Facility, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Eric E Schadt (EE)

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA. eric.schadt@mssm.edu.

Classifications MeSH