Automated single-cell omics end-to-end framework with data-driven batch inference.

Single-cell genomics batch identification cell type mapping information theory integration scATAC-seq scRNA-seq

Journal

bioRxiv : the preprint server for biology
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187

Informations de publication

Date de publication:
04 Nov 2023
Historique:
pubmed: 14 11 2023
medline: 14 11 2023
entrez: 14 11 2023
Statut: epublish

Résumé

To facilitate single cell multi-omics analysis and improve reproducibility, we present SPEEDI (Single-cell Pipeline for End to End Data Integration), a fully automated end-to-end framework for batch inference, data integration, and cell type labeling. SPEEDI introduces data-driven batch inference and transforms the often heterogeneous data matrices obtained from different samples into a uniformly annotated and integrated dataset. Without requiring user input, it automatically selects parameters and executes pre-processing, sample integration, and cell type mapping. It can also perform downstream analyses of differential signals between treatment conditions and gene functional modules. SPEEDI's data-driven batch inference method works with widely used integration and cell-typing tools. By developing data-driven batch inference, providing full end-to-end automation, and eliminating parameter selection, SPEEDI improves reproducibility and lowers the barrier to obtaining biological insight from these valuable single-cell datasets. The SPEEDI interactive web application can be accessed at https://speedi.princeton.edu/.

Identifiants

pubmed: 37961197
doi: 10.1101/2023.11.01.564815
pmc: PMC10635042
pii:
doi:

Types de publication

Preprint

Langues

eng

Subventions

Organisme : NIDDK NIH HHS
ID : R01 DK046943
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM071966
Pays : United States

Déclaration de conflit d'intérêts

Declaration of interests S.C.S. is interim Chief Scientific Officer, consultant, and equity owner of GNOMX Corp. Patents were filed related to this work.

Références

Nat Methods. 2022 Jan;19(1):41-50
pubmed: 34949812
Nat Immunol. 2019 Feb;20(2):163-172
pubmed: 30643263
Nat Genet. 2021 Mar;53(3):403-411
pubmed: 33633365
Anal Chem. 2009 Aug 15;81(16):6813-22
pubmed: 19601617
Genome Biol. 2014;15(12):550
pubmed: 25516281
Nat Methods. 2019 Dec;16(12):1289-1296
pubmed: 31740819
Nat Biotechnol. 2019 Jun;37(6):685-691
pubmed: 31061482
Nat Comput Sci. 2023 Jul;3(7):644-657
pubmed: 37974651
Genome Biol. 2020 Jan 16;21(1):12
pubmed: 31948481
Nat Biotechnol. 2018 Jun;36(5):421-427
pubmed: 29608177
Mol Syst Biol. 2019 Jun 19;15(6):e8746
pubmed: 31217225
STAR Protoc. 2022 Jun 07;3(2):101446
pubmed: 35693209
Nature. 2022 Dec;612(7938):141-147
pubmed: 36352227
Nat Commun. 2022 Mar 10;13(1):1246
pubmed: 35273156
Nat Med. 2019 Jul;25(7):1153-1163
pubmed: 31209336
Nat Biotechnol. 2021 Feb;39(2):149-153
pubmed: 33500565
Bioinformatics. 2017 Oct 01;33(19):3123-3125
pubmed: 28541377
Genome Biol. 2019 Dec 23;20(1):296
pubmed: 31870423
Mol Cells. 2023 Feb 28;46(2):106-119
pubmed: 36859475
Nat Rev Mol Cell Biol. 2023 Oct;24(10):695-713
pubmed: 37280296
J Exp Med. 2021 Aug 2;218(8):
pubmed: 34128959
Nat Commun. 2021 May 11;12(1):2677
pubmed: 33976139
PLoS Comput Biol. 2022 Jun 3;18(6):e1010097
pubmed: 35658001
Nat Protoc. 2021 Jun;16(6):2749-2764
pubmed: 34031612
Nat Neurosci. 2016 Nov;19(11):1454-1462
pubmed: 27479844
Cell. 2019 Jun 13;177(7):1888-1902.e21
pubmed: 31178118

Auteurs

Yuan Wang (Y)

Department of Computer Science, Princeton University, Princeton, NJ, USA.
Lewis-Sigler Institute of Integrative Genomics, Princeton University, Princeton, NJ, USA.
These authors contributed equally.

William Thistlethwaite (W)

Lewis-Sigler Institute of Integrative Genomics, Princeton University, Princeton, NJ, USA.
These authors contributed equally.

Alicja Tadych (A)

Lewis-Sigler Institute of Integrative Genomics, Princeton University, Princeton, NJ, USA.

Frederique Ruf-Zamojski (F)

Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Daniel J Bernard (DJ)

Department of Pharmacology and Therapeutics, McGill University, Montreal, QC, H3G 1Y6, Canada.

Antonio Cappuccio (A)

Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Elena Zaslavsky (E)

Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Xi Chen (X)

Lewis-Sigler Institute of Integrative Genomics, Princeton University, Princeton, NJ, USA.
Center for Computational Biology, Flatiron Institute, New York, NY, USA.

Stuart C Sealfon (SC)

Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Olga G Troyanskaya (OG)

Department of Computer Science, Princeton University, Princeton, NJ, USA.
Lewis-Sigler Institute of Integrative Genomics, Princeton University, Princeton, NJ, USA.
Center for Computational Biology, Flatiron Institute, New York, NY, USA.
Lead contact.

Classifications MeSH