Detecting significant expression patterns in single-cell and spatial transcriptomics with a flexible computational approach.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
30 10 2024
Historique:
received: 05 07 2024
accepted: 04 10 2024
medline: 31 10 2024
pubmed: 31 10 2024
entrez: 31 10 2024
Statut: epublish

Résumé

Gene expression data holds the potential to shed light on multiple biological processes at once. However, data analysis methods for single cell sequencing mostly focus on finding cell clusters or the principal progression line of the data. Data analysis for spatial transcriptomics mostly addresses clustering and finding spatially variable genes. Existing data analysis methods are effective in finding the main data features, but they might miss less pronounced, albeit significant, processes, possibly involving a subset of the samples. In this work we present SPIRAL: Significant Process InfeRence ALgorithm. SPIRAL is based on Gaussian statistics to detect all statistically significant biological processes in single cell, bulk and spatial transcriptomics data. The algorithm outputs a list of structures, each defined by a set of genes working simultaneously in a specific population of cells. SPIRAL is unique in its flexibility: the structures are constructed by selecting subsets of genes and cells based on statistically significant and consistent differential expression. Every gene and every cell may be part of one structure, more or none. SPIRAL also provides several visual representations of structures and pathway enrichment information. We validated the statistical soundness of SPIRAL on synthetic datasets and applied it to single cell, spatial and bulk RNA-sequencing datasets. SPIRAL is available at https://spiral.technion.ac.il/ .

Identifiants

pubmed: 39478009
doi: 10.1038/s41598-024-75314-3
pii: 10.1038/s41598-024-75314-3
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

26121

Subventions

Organisme : The RESCUER project has received funding from the European Union's Horizon 2020 Research and Innovation Programme under Grant agreement No. 847912.
ID : 847912
Organisme : The RESCUER project has received funding from the European Union's Horizon 2020 Research and Innovation Programme under Grant agreement No. 847912.
ID : 847912

Informations de copyright

© 2024. The Author(s).

Références

Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: A tutorial. Mol. Syst. Biol. 15, e8746 (2019).
pubmed: 31217225 pmcid: 6582955 doi: 10.15252/msb.20188746
Vandenbon, A. & Diez, D. A clustering-independent method for finding differentially expressed genes in single-cell transcriptome data. Nat. Commun. 11, 1–10 (2020).
doi: 10.1038/s41467-020-17900-3
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
pubmed: 30936559 doi: 10.1038/s41587-019-0071-9
Anavy, L. et al. BLIND ordering of large-scale transcriptomic developmental timecourses. Development 141, 1161–1166 (2014).
pubmed: 24504336 doi: 10.1242/dev.105288
Song, D. & Li, J. J. PseudotimeDE: Inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data. Genome Biol. 22, 124 (2021).
pubmed: 33926517 pmcid: 8082818 doi: 10.1186/s13059-021-02341-y
Moussa, M. & Măndoiu, I. I. SC1: A web-based single cell RNA-seq analysis pipeline in 2018 IEEE 8th international conference on computational advances in bio and medical sciences (ICCABS) (2018), 1–1.
Guo, M., Wang, H., Potter, S. S., Whitsett, J. A. & Xu, Y. SINCERA: A pipeline for single-cell RNA-Seq profiling analysis. PLoS Comput. Biol. 11, e1004575 (2015).
pubmed: 26600239 pmcid: 4658017 doi: 10.1371/journal.pcbi.1004575
Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
pubmed: 25867923 pmcid: 4430369 doi: 10.1038/nbt.3192
Zhang, J. M., Kamath, G. M. & David, N. T. Valid post-clustering differential analysis for single-cell RNA-Seq. Cell Syst. 9, 383–392 (2019).
pubmed: 31521605 pmcid: 7202736 doi: 10.1016/j.cels.2019.07.012
Steinfeld, I., Navon, R., Ardigò, D., Zavaroni, I. & Yakhini, Z. Clinically driven semi-supervised class discovery in gene expression data. Bioinformatics 24, i90–i97 (2008).
pubmed: 18689846 doi: 10.1093/bioinformatics/btn279
Rao, A., Barkley, D., França, G. S. & Yanai, I. Exploring tissue architecture using spatial transcriptomics. Nature 596, 211–220 (2021).
pubmed: 34381231 pmcid: 8475179 doi: 10.1038/s41586-021-03634-9
Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).
pubmed: 27365449 doi: 10.1126/science.aaf2403
Rodriques, S. G. et al. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).
pubmed: 30923225 pmcid: 6927209 doi: 10.1126/science.aaw1219
Vickovic, S. et al. High-definition spatial transcriptomics for in situ tissue profiling. Nat. Methods 16, 987–990 (2019).
pubmed: 31501547 pmcid: 6765407 doi: 10.1038/s41592-019-0548-y
Levy-Jurgenson, A., Tekpli, X. & Yakhini, Z. Assessing heterogeneity in spatial data using the HTA index with applications to spatial transcriptomics and imaging. Bioinformatics 37, 3796–3804 (2021).
pubmed: 34358288 pmcid: 8598444 doi: 10.1093/bioinformatics/btab569
Svensson, V., Teichmann, S. A. & Stegle, O. SpatialDE: Identification of spatially variable genes. Nat. Methods 15, 343–346 (2018).
pubmed: 29553579 pmcid: 6350895 doi: 10.1038/nmeth.4636
Edsgärd, D., Johnsson, P. & Sandberg, R. Identification of spatial expression trends in single-cell gene expression data. Nat. Methods 15, 339–342 (2018).
pubmed: 29553578 pmcid: 6314435 doi: 10.1038/nmeth.4634
Sun, S., Zhu, J. & Zhou, X. Statistical analysis of spatial expression patterns for spatially resolved transcriptomic studies. Nat. Methods 17, 193–200 (2020).
pubmed: 31988518 pmcid: 7233129 doi: 10.1038/s41592-019-0701-7
Zhu, J., Sun, S. & Zhou, X. SPARK-X: Non-parametric modeling enables scalable and robust detection of spatial expression patterns for large spatial transcriptomic studies. Genome Biol. 22, 1–25 (2021).
doi: 10.1186/s13059-021-02404-0
Dries, R. et al. Giotto: A toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 22, 1–31 (2021).
doi: 10.1186/s13059-021-02286-2
BinTayyash, N. et al. Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments. bioRxiv, 2020–07 (2021).
Zappia, L., Phipson, B. & Oshlack, A. Splatter: Simulation of single-cell RNA sequencing data. Genome Biol. 18, 1–15 (2017).
doi: 10.1186/s13059-017-1305-0
Hullermeier, E. & Rifqi, M. A fuzzy variant of the rand index for comparing clustering structures in Joint 2009 International Fuzzy Systems Association World Congress and 2009 European Society of Fuzzy Logic and Technology Conference. IFSA-EUSFLAT 2009, 1294–1298 (2009).
Zhang, X. et al. Comparative analysis of droplet-based ultra-high-throughput single-cell RNA-seq systems. Mol. Cell 73, 130–142 (2019).
pubmed: 30472192 doi: 10.1016/j.molcel.2018.10.020
Wagner, D. E. et al. Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo. Science 360, 981–987 (2018).
pubmed: 29700229 pmcid: 6083445 doi: 10.1126/science.aar4362
Klopfenstein, D. et al. GOATOOLS: A python library for gene ontology analyses. Sci. Rep. 8, 1–17 (2018).
doi: 10.1038/s41598-018-28948-z
Hofmann, S. R. et al. Cytokines and their role in lymphoid development, differentiation and homeostasis. Curr. Opin. Allergy Clin. Immunol. 2, 495–506 (2002).
pubmed: 14752332 doi: 10.1097/00130832-200212000-00004
10x Genomics. Mouse Brain Serial Section 2 (Sagittal-Posterior), Spatial Gene Expression Dataset by Space Ranger 1.1.0 https://www.10xgenomics.com/resources/datasets/mouse-brain-serial-section-2-sagittal-posterior-1-standard-1-1-0 . Accessed: May 2021.
10x Genomics. Mouse Brain Serial Section 2 (Sagittal-Posterior) - analysis https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Mouse_Brain_Sagittal_Posterior_Section_2/V1_Mouse_Brain_Sagittal_Posterior_Section_2_web_summary.html . Accessed: February 2022.
Haucke, V., Neher, E. & Sigrist, S. J. Protein scaffolds in the coupling of synaptic exocytosis and endocytosis. Nat. Rev. Neurosci. 12, 127–138 (2011).
pubmed: 21304549 doi: 10.1038/nrn2948
Chen, W.-T. et al. Spatial transcriptomics and in situ sequencing to study Alzheimer’s disease. Cell 182, 976–991 (2020).
pubmed: 32702314 doi: 10.1016/j.cell.2020.06.038
Griffin, J. W. & Bradshaw, P. C. Amino acid catabolism in Alzheimer’s disease brain: Friend or foe? Oxidative medicine and cellular longevity 2017 (2017).
10x Genomics. Normal Human Prostate (FFPE), Spatial Gene Expression Dataset by Space Ranger 1.3.0 https://www.10xgenomics.com/resources/datasets/normal-human-prostate-ffpe-1-standard-1-3-0 . Accessed: January 2022.
Wehmas, L. C., Hester, S. D. & Wood, C. E. Direct formalin fixation induces widespread transcriptomic effects in archival tissue samples. Sci. Rep. 10, 14497 (2020).
pubmed: 32879405 pmcid: 7468282 doi: 10.1038/s41598-020-71521-w
Chiang, S., Shinohara, H., Huang, J.-H., Tsai, H.-K. & Okada, M. Inferring the transcriptional regulatory mechanism of signal-dependent gene expression via an integrative computational approach. FEBS Lett. 594, 1477–1496 (2020).
pubmed: 32052437 doi: 10.1002/1873-3468.13757
Shinohara, H. & Okada, M. High-temporal-resolution transcriptome analysis of the anti-IgM-stimulated mouse B cells https://www-ncbi-nlm-nih-gov/geo/query/acc.cgi?acc=GSE129536. Accessed: February 2022.
Bacher, R. & Kendziorski, C. Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol. 17, 1–14 (2016).
doi: 10.1186/s13059-016-0927-y
Zeng, Z., Li, Y., Li, Y. & Luo, Y. Statistical and machine learning methods for spatially resolved transcriptomics data analysis. Genome Biol. 23, 1–23 (2022).
doi: 10.1186/s13059-022-02653-7
Ben-Dor, A., Chor, B., Karp, R. & Yakhini, Z. Discovering local structure in gene expression data: The order-preserving submatrix problem in Proceedings of the sixth annual international conference on Computational biology (2002), 49–57.
Busygin, S., Prokopyev, O. & Pardalos, P. M. Biclustering in data mining. Comput. Oper. Res. 35, 2964–2987 (2008).
doi: 10.1016/j.cor.2007.01.005
Liu, J. & Wang, W. Op-cluster: Clustering by tendency in high dimensional space in Third IEEE international conference on data mining (2003), 187–194.
Shporer, S. Extending the Order Preserving Submatrix: New patterns in datasets (Tel Aviv University, 2003).
Koyuturk, M., Szpankowski, W. & Grama, A. Biclustering gene-feature matrices for statistically significant dense patterns in Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004. (2004), 480–484.
Uitert, M. v., Meuleman, W. & Wessels, L. Biclustering sparse binary genomic data. J. Comput. Biol. 15, 1329–1345 (2008).
Mishra, N., Ron, D. & Swaminathan, R. A new conceptual clustering framework. Mach. Learn. 56, 115–151 (2004).
doi: 10.1023/B:MACH.0000033117.77257.41
Li, J., Sim, K., Liu, G. & Wong, L. Maximal quasi-bicliques with balanced noise tolerance: Concepts and co-clustering applications in Proceedings of the 2008 SIAM International Conference on Data Mining (2008), 72–83.
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: Large-scale single-cell gene expression data analysis. Genome Biol. 19, 1–5 (2018).
doi: 10.1186/s13059-017-1382-0
Kinsella, R. J. et al. Ensembl BioMarts: A hub for data retrieval across taxonomic space. Database 2011 (2011).
Ilicic, T. et al. Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 17, 1–15 (2016).
doi: 10.1186/s13059-016-0888-1
Lun, A. T., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Research 5 (2016).
Iacono, G. et al. bigSCale: An analytical framework for big-scale single-cell data. Genome Res. 28, 878–890 (2018).
pubmed: 29724792 pmcid: 5991513 doi: 10.1101/gr.230771.117
Eden, E., Navon, R., Steinfeld, I., Lipson, D. & Yakhini, Z. GOrilla: A tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinform. 10, 1–7 (2009).
doi: 10.1186/1471-2105-10-48
Eden, E., Lipson, D., Yogev, S. & Yakhini, Z. Discovering motifs in ranked lists of DNA sequences. PLoS Comput. Biol. 3, e39 (2007).
pubmed: 17381235 pmcid: 1829477 doi: 10.1371/journal.pcbi.0030039
Zappia, L. Splat simulation parameters http://oshlacklab.com/splatter/articles/splat_params.html .
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420. https://doi.org/10.1038/nbt.4096 (2018).
doi: 10.1038/nbt.4096 pubmed: 29608179 pmcid: 6700744
Stuart, T. et al. Comprehensive Integration of Single-Cell Data. Cell 177, 1888–1902. https://doi.org/10.1016/j.cell.2019.05.031 (2019).
doi: 10.1016/j.cell.2019.05.031 pubmed: 31178118 pmcid: 6687398
Street, K. et al. Slingshot: Cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19, 1–16 (2018).
doi: 10.1186/s12864-018-4772-0
Van den Berge, K. et al. Trajectory-based differential expression analysis for single-cell sequencing data. Nat. Commun. 11, 1–13 (2020).
DeTomaso, D. & Yosef, N. Hotspot identifies informative gene modules across modalities of single-cell genomics. Cell Syst. 12, 446–456 (2021).
pubmed: 33951459 doi: 10.1016/j.cels.2021.04.005
Moncada, R. et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 38, 333–342 (2020).
pubmed: 31932730 doi: 10.1038/s41587-019-0392-8
Carmona-Saez, P., Pascual-Marqui, R. D., Tirado, F., Carazo, J. M. & Pascual-Montano, A. Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinformatics 7, 78 (2006).
pubmed: 16503973 pmcid: 1434777 doi: 10.1186/1471-2105-7-78
Gaujoux, R. & Seoighe, C. A flexible R package for nonnegative matrix factorization. BMC Bioinform. 11, 367. ISSN: 1471-2105. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-367 (2010).
Hashimshony, T. et al. CEL-Seq2: Sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol. 17, 1–7 (2016).
doi: 10.1186/s13059-016-0938-8
Girardot, C., Scholtalbers, J., Sauer, S., Su, S.-Y. & Furlong, E. E. Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers. BMC Bioinform. 17, 1–6 (2016).
doi: 10.1186/s12859-016-1284-2
Magoč, T. & Salzberg, S. L. FLASH: Fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957–2963 (2011).
pubmed: 21903629 pmcid: 3198573 doi: 10.1093/bioinformatics/btr507
Dobin, A. & Gingeras, T. R. Mapping RNA-seq reads with STAR. Curr. Protoc. Bioinform. 51, 11–14 (2015).
doi: 10.1002/0471250953.bi1114s51
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943 pmcid: 2723002 doi: 10.1093/bioinformatics/btp352
Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

Auteurs

Hadas Biran (H)

Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel. hadas.moriah@gmail.com.

Tamar Hashimshony (T)

Faculty of Biology, Technion - Israel Institute of Technology, Haifa, Israel.

Tamar Lahav (T)

Faculty of Biology, Technion - Israel Institute of Technology, Haifa, Israel.

Or Efrat (O)

Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel.

Yael Mandel-Gutfreund (Y)

Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel.
Faculty of Biology, Technion - Israel Institute of Technology, Haifa, Israel.

Zohar Yakhini (Z)

Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel.
Arazi School of Computer Science, Reichman University, Herzliya, Israel.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH