Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets.

Animals Cluster Analysis Gene Expression Profiling Mice RNA, Small Cytoplasmic Sequence Analysis, RNA Single-Cell Analysis Software Zebrafish / genetics

10X genomics Alignment Cell Ranger Kallisto Opsin Pineal gland Single-cell RNA sequencing Zebrafish

Journal

BMC genomics

ISSN: 1471-2164

Titre abrégé: BMC Genomics

Pays: England

ID NLM: 100965258

Informations de publication

Date de publication:
14 Sep 2021

Historique:

received: 21 12 2020

accepted: 11 08 2021

entrez: 15 9 2021

pubmed: 16 9 2021

medline: 18 9 2021

Statut: epublish

Résumé

Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics' Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics' Cell Ranger pipeline. In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types. While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types.

CONCLUSION CONCLUSIONS

While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types.

Identifiants

DOI: 10.1186/s12864-021-07930-6 PMID: 34521337 PMC: PMC8439043

pubmed: 34521337

doi: 10.1186/s12864-021-07930-6

pii: 10.1186/s12864-021-07930-6

pmc: PMC8439043

doi:

Substances chimiques

RNA, Small Cytoplasmic 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

661

Informations de copyright

Références

Cell. 2015 May 21;161(5):1202-1214

pubmed: 26000488

Nat Commun. 2018 Jul 10;9(1):2667

pubmed: 29991676

Nat Biotechnol. 2016 May;34(5):525-7

pubmed: 27043002

Nat Ecol Evol. 2020 Oct;4(10):1416-1430

pubmed: 32690906

PLoS One. 2015 Mar 25;10(3):e0121330

pubmed: 25806532

Nat Commun. 2019 Oct 11;10(1):4667

pubmed: 31604912

Nat Biotechnol. 2014 May;32(5):462-4

pubmed: 24752080

Bioinformatics. 2010 Apr 1;26(7):873-81

pubmed: 20147302

Genome Biol. 2016 Feb 17;17:29

pubmed: 26887813

Cell. 2018 Aug 9;174(4):982-998.e20

pubmed: 29909982

Science. 2019 Sep 20;365(6459):

pubmed: 31488706

Bioinformatics. 2019 Nov 1;35(21):4472-4473

pubmed: 31073610

Nature. 2018 Aug;560(7719):494-498

pubmed: 30089906

Science. 2020 Sep 4;369(6508):

pubmed: 32883834

Genome Biol. 2019 Mar 22;20(1):63

pubmed: 30902100

Genome Biol. 2019 Mar 27;20(1):65

pubmed: 30917859

Curr Biol. 2019 Jun 17;29(12):2009-2019.e7

pubmed: 31178320

Mol Genet Genomics. 2014 Dec;289(6):1045-60

pubmed: 25092473

PLoS Biol. 2019 Jan 31;17(1):e2006250

pubmed: 30703098

Nature. 2019 Jul;571(7765):349-354

pubmed: 31292549

Elife. 2020 Aug 24;9:

pubmed: 32831172

G3 (Bethesda). 2020 May 4;10(5):1775-1783

pubmed: 32220951

Bioinformatics. 2013 Jan 1;29(1):15-21

pubmed: 23104886

Genome Biol. 2013 Apr 25;14(4):R36

pubmed: 23618408

Nat Biotechnol. 2021 Jul;39(7):813-818

pubmed: 33795888

Nat Protoc. 2018 Apr;13(4):599-604

pubmed: 29494575

Nat Methods. 2017 Apr;14(4):417-419

pubmed: 28263959

Curr Biol. 2018 Apr 2;28(7):1052-1065.e7

pubmed: 29576475

Neuron. 2021 Feb 17;109(4):645-662.e9

pubmed: 33357413

Nat Commun. 2017 Jan 16;8:14049

pubmed: 28091601

Cell. 2015 May 21;161(5):1187-1201

pubmed: 26000487

Comput Struct Biotechnol J. 2020 Jun 12;18:1569-1576

pubmed: 32637053

Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Inbal Shainer (I)

Manuel Stemmer (M)

Articles similaires

Evaluating the efficacy of telesurgery with dual console SSI Mantra Surgical Robotic System: experiment on animal model and clinical trials.

Odour generalisation and detection dog training.

Selecting optimal software code descriptors-The case of Java.

FBXO22 inhibits colitis and colorectal carcinogenesis by regulating the degradation of the S2448-phosphorylated form of mTOR.

Classifications MeSH