Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets.


Journal

BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258

Informations de publication

Date de publication:
14 Sep 2021
Historique:
received: 21 12 2020
accepted: 11 08 2021
entrez: 15 9 2021
pubmed: 16 9 2021
medline: 18 9 2021
Statut: epublish

Résumé

Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics' Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics' Cell Ranger pipeline. In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types. While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types.

Sections du résumé

BACKGROUND BACKGROUND
Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics' Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics' Cell Ranger pipeline.
RESULTS RESULTS
In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types.
CONCLUSION CONCLUSIONS
While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types.

Identifiants

pubmed: 34521337
doi: 10.1186/s12864-021-07930-6
pii: 10.1186/s12864-021-07930-6
pmc: PMC8439043
doi:

Substances chimiques

RNA, Small Cytoplasmic 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

661

Informations de copyright

© 2021. The Author(s).

Références

Cell. 2015 May 21;161(5):1202-1214
pubmed: 26000488
Nat Commun. 2018 Jul 10;9(1):2667
pubmed: 29991676
Nat Biotechnol. 2016 May;34(5):525-7
pubmed: 27043002
Nat Ecol Evol. 2020 Oct;4(10):1416-1430
pubmed: 32690906
PLoS One. 2015 Mar 25;10(3):e0121330
pubmed: 25806532
Nat Commun. 2019 Oct 11;10(1):4667
pubmed: 31604912
Nat Biotechnol. 2014 May;32(5):462-4
pubmed: 24752080
Bioinformatics. 2010 Apr 1;26(7):873-81
pubmed: 20147302
Genome Biol. 2016 Feb 17;17:29
pubmed: 26887813
Cell. 2018 Aug 9;174(4):982-998.e20
pubmed: 29909982
Science. 2019 Sep 20;365(6459):
pubmed: 31488706
Bioinformatics. 2019 Nov 1;35(21):4472-4473
pubmed: 31073610
Nature. 2018 Aug;560(7719):494-498
pubmed: 30089906
Science. 2020 Sep 4;369(6508):
pubmed: 32883834
Genome Biol. 2019 Mar 22;20(1):63
pubmed: 30902100
Genome Biol. 2019 Mar 27;20(1):65
pubmed: 30917859
Curr Biol. 2019 Jun 17;29(12):2009-2019.e7
pubmed: 31178320
Mol Genet Genomics. 2014 Dec;289(6):1045-60
pubmed: 25092473
PLoS Biol. 2019 Jan 31;17(1):e2006250
pubmed: 30703098
Nature. 2019 Jul;571(7765):349-354
pubmed: 31292549
Elife. 2020 Aug 24;9:
pubmed: 32831172
G3 (Bethesda). 2020 May 4;10(5):1775-1783
pubmed: 32220951
Bioinformatics. 2013 Jan 1;29(1):15-21
pubmed: 23104886
Genome Biol. 2013 Apr 25;14(4):R36
pubmed: 23618408
Nat Biotechnol. 2021 Jul;39(7):813-818
pubmed: 33795888
Nat Protoc. 2018 Apr;13(4):599-604
pubmed: 29494575
Nat Methods. 2017 Apr;14(4):417-419
pubmed: 28263959
Curr Biol. 2018 Apr 2;28(7):1052-1065.e7
pubmed: 29576475
Neuron. 2021 Feb 17;109(4):645-662.e9
pubmed: 33357413
Nat Commun. 2017 Jan 16;8:14049
pubmed: 28091601
Cell. 2015 May 21;161(5):1187-1201
pubmed: 26000487
Comput Struct Biotechnol J. 2020 Jun 12;18:1569-1576
pubmed: 32637053

Auteurs

Inbal Shainer (I)

Max Planck Institute of Neurobiology, Am Klopferspitz 18, 82152, Martinsried, Germany.

Manuel Stemmer (M)

Max Planck Institute of Neurobiology, Am Klopferspitz 18, 82152, Martinsried, Germany. stemmer@neuro.mpg.de.

Articles similaires

Robotic Surgical Procedures Animals Humans Telemedicine Models, Animal

Odour generalisation and detection dog training.

Lyn Caldicott, Thomas W Pike, Helen E Zulch et al.
1.00
Animals Odorants Dogs Generalization, Psychological Smell

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Animals TOR Serine-Threonine Kinases Colorectal Neoplasms Colitis Mice

Classifications MeSH