A scalable SCENIC workflow for single-cell gene regulatory network analysis.


Journal

Nature protocols
ISSN: 1750-2799
Titre abrégé: Nat Protoc
Pays: England
ID NLM: 101284307

Informations de publication

Date de publication:
07 2020
Historique:
received: 25 09 2019
accepted: 17 04 2020
pubmed: 21 6 2020
medline: 12 9 2020
entrez: 21 6 2020
Statut: ppublish

Résumé

This protocol explains how to perform a fast SCENIC analysis alongside standard best practices steps on single-cell RNA-sequencing data using software containers and Nextflow pipelines. SCENIC reconstructs regulons (i.e., transcription factors and their target genes) assesses the activity of these discovered regulons in individual cells and uses these cellular activity patterns to find meaningful clusters of cells. Here we present an improved version of SCENIC with several advances. SCENIC has been refactored and reimplemented in Python (pySCENIC), resulting in a tenfold increase in speed, and has been packaged into containers for ease of use. It is now also possible to use epigenomic track databases, as well as motifs, to refine regulons. In this protocol, we explain the different steps of SCENIC: the workflow starts from the count matrix depicting the gene abundances for all cells and consists of three stages. First, coexpression modules are inferred using a regression per-target approach (GRNBoost2). Next, the indirect targets are pruned from these modules using cis-regulatory motif discovery (cisTarget). Lastly, the activity of these regulons is quantified via an enrichment score for the regulon's target genes (AUCell). Nonlinear projection methods can be used to display visual groupings of cells based on the cellular activity patterns of these regulons. The results can be exported as a loom file and visualized in the SCope web application. This protocol is illustrated on two use cases: a peripheral blood mononuclear cell data set and a panel of single-cell RNA-sequencing cancer experiments. For a data set of 10,000 genes and 50,000 cells, the pipeline runs in <2 h.

Identifiants

pubmed: 32561888
doi: 10.1038/s41596-020-0336-2
pii: 10.1038/s41596-020-0336-2
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

2247-2276

Références

Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
pubmed: 25867923 pmcid: 4430369
Wolf, A. F., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
pubmed: 29409532 pmcid: 5802054
Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).
pubmed: 28991892 pmcid: 5937676
Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).
pubmed: 28398311
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
pubmed: 23104886 pmcid: 23104886
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
pubmed: 27043002 pmcid: 27043002
Parekh, S., Ziegenhain, C., Vieth, B., Enard, W. & Hellmann, I. zUMIs—a fast and flexible pipeline to process RNA sequencing data with UMIs. Gigascience 7, 1–9 (2018).
Srivastava, A., Malik, L., Smith, T., Sudbery, I. & Patro, R. Alevin efficiently estimates accurate gene abundances from dscRNA-seq data. Genome Biol. 20, 65 (2019).
pubmed: 30917859 pmcid: 6437997
Ilicic, T. et al. Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 17, 29 (2016).
pubmed: 4758103 pmcid: 4758103
Huynh-Thu, V., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, e12776 (2010).
pubmed: 20927193 pmcid: 2946910
Moerman, T. et al. GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks. Bioinformatics 35, 2159–2161 (2018).
Gaiteri, C., Ding, Y., French, B., Tseng, G. & Sibille, E. Beyond modules and hubs: the potential of gene coexpression networks for investigating molecular mechanisms of complex brain disorders. Genes Brain Behav. 13, 13–24 (2014).
pubmed: 24320616
Janky, R. et al. iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput. Biol. 10, e1003731 (2014).
pubmed: 25058159 pmcid: 4109854
Herrmann, C., de Sande, B., Potier, D. & Aerts, S. i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules. Nucleic Acids Res. 40, 1–44 (2012).
Imrichová, H., Hulselmans, G., Atak, Z., Potier, D. & Aerts, S. i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly. Nucleic Acids Res. 43, W57–W64 (2015).
pubmed: 25925574 pmcid: 4489282
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 9, 26 (2018).
Davie, K. et al. A single-cell transcriptome atlas of the aging Drosophila brain. Cell 174, 1–38 (2018).
Potier, D. et al. Mapping gene regulatory networks in Drosophila eye development by large-scale transcriptome perturbations and motif inference. Cell Rep. 9, 2290–2303 (2014).
pubmed: 25533349
Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).
pubmed: 29494575
Sanguinetti, G. & Huynh-Thu, V. A. Gene Regulatory Networks: Methods and Protocols (Springer, 2019).
Kurtzer, G. M., Sochat, V. & Bauer, M. W. Singularity: scientific containers for mobility of compute. PLoS ONE 12, e0177459 (2017).
pubmed: 28494014 pmcid: 5426675
Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A. & Murali, T. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 17, 147–154 (2020).
Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).
pubmed: 29425488
Fiers, M. W. et al. Mapping gene regulatory networks from single-cell omics data. Brief. Funct. Genomics 17, 246–254 (2018).
pubmed: 29342231 pmcid: 6063279
de Smet, R. & Marchal, K. Advantages and limitations of current network inference methods. Nat. Rev. Microbiol. 8, 717–729 (2010).
pubmed: 20805835
Guo, M., Wang, H., Potter, S. S., Whitsett, J. A. & Xu, Y. SINCERA: a pipeline for single-cell RNA-seq profiling analysis. PLoS Comput. Biol. 11, e1004575 (2015).
pubmed: 26600239 pmcid: 4658017
Mohammadi, S., Ravindra, V., Gleich, D. F. & Grama, A. A geometric approach to characterize the functional identity of single cells. Nat. Commun. 9, 1516 (2018).
pubmed: 29666373 pmcid: 5904143
van Dijk, D. et al. Recovering gene interactions from single-cell data using data diffusion. Cell 174, 716–729 (2018).
pubmed: 29961576 pmcid: 6771278
Deshpande, A., Chu, L.-F., Stewart, R. & Gitter, A. Network inference with Granger causality ensembles on single-cell transcriptomic data. Preprint at https://www.biorxiv.org/content/10.1101/534834v1 (2019).
Zeisel, A. et al. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 (2015).
Chen, X., Teichmann, S. A. & Meyer, K. B. From tissues to cell types and back: single-cell gene expression analysis of tissue architecture. Annu. Rev. Biomed. Data Sci. 1, 1–23 (2018).
Tirosh, I. & Suvà, M. L. Deciphering human tumor biology by single-cell expression profiling. Annu. Rev. Cancer Biol. 3, 1–16 (2018).
Obaldia, M. & Bhandoola, A. Transcriptional regulation of innate and adaptive lymphocyte lineages. Annu. Rev. Immunol. 33, 1–36 (2014).
Laresgoiti, U. et al. E2F2 and CREB cooperatively regulate transcriptional activity of cell cycle genes. Nucleic Acids Res. 41, 10185–10198 (2013).
pubmed: 24038359 pmcid: 3905855
Knox, J. J., Cosma, G. L., Betts, M. R. & McLane, L. M. Characterization of T-bet and eomes in peripheral human immune cells. Front. Immunol. 5, 217 (2014).
pubmed: 24860576 pmcid: 4030168
Lin, Y. C. et al. A global network of transcription factors, involving E2A, EBF1 and Foxo1, that orchestrates B cell fate. Nat. Immunol. 11, 635 (2010).
pubmed: 20543837 pmcid: 2896911
Boller, S. & Grosschedl, R. The regulatory network of B-cell differentiation: a focused view of early B-cell factor 1 function. Immunol. Rev. 261, 102–115 (2014).
pubmed: 25123279 pmcid: 4312928
Suo, S. et al. Revealing the critical regulators of cell identity in the mouse cell atlas. Cell Rep. 25, 1436–1445 (2018).
pubmed: 30404000 pmcid: 6281296
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
pubmed: 403769 pmcid: 403769
Jerby-Arnon, L. et al. A cancer cell program promotes T cell exclusion and resistance to checkpoint blockade. Cell 175, 984–997 (2018).
pubmed: 6410377 pmcid: 6410377
Puram, S. V. et al. Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell 171, 1611–1624 (2017).
pubmed: 29198524 pmcid: 5878932
Lambrechts, D. et al. Phenotype molding of stromal cells in the lung tumor microenvironment. Nat. Med. 24, 1277–1289 (2018).
Pavlidis, P. & Noble, W. S. Analysis of strain and regional variation in gene expression in mouse brain. Genome Biol. 2, research0042.1 (2001).
Zhang, B. & Horvath, S. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. 4, Article17 (2005).
Frith, M. C., Li, M. C. & Weng, Z. Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nucleic Acids Res. 31, 3666–3668 (2003).
pubmed: 12824389 pmcid: 168947
Zweig, A. S., Karolchik, D., Kuhn, R. M., Haussler, D. & Kent, J. W. UCSC genome browser tutorial. Genomics 92, 75–84 (2008).
pubmed: 18514479
Aerts, S. et al. Gene prioritization through genomic data fusion. Nat. Biotechnol. 24, 537–544 (2006).
pubmed: 16680138
Consortium, E. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Vilella, A. J. et al. EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 19, 327–335 (2009).
pubmed: 19029536 pmcid: 2652215
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).
pubmed: 17324271 pmcid: 1852410
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
pubmed: 16199517
Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15, e8746 (2019).
pubmed: 31217225 pmcid: 6582955
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
Pliner, H. A., Shendure, J. & Trapnell, C. Supervised classification enables rapid annotation of cell atlases. Nat. Methods 16, 983–986 (2019).
pubmed: 31501545 pmcid: 6791524

Auteurs

Bram Van de Sande (B)

VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.
Department of Human Genetics, KU Leuven, Leuven, Belgium.

Christopher Flerin (C)

VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.
Department of Human Genetics, KU Leuven, Leuven, Belgium.

Kristofer Davie (K)

VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.

Maxime De Waegeneer (M)

VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.
Department of Human Genetics, KU Leuven, Leuven, Belgium.

Gert Hulselmans (G)

VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.
Department of Human Genetics, KU Leuven, Leuven, Belgium.

Sara Aibar (S)

VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.
Department of Human Genetics, KU Leuven, Leuven, Belgium.

Ruth Seurinck (R)

Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.

Wouter Saelens (W)

Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.

Robrecht Cannoodt (R)

Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.
Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium.

Quentin Rouchon (Q)

Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.

Toni Verbeiren (T)

Janssen Pharmaceutica, Beerse, Belgium.
Data Intuitive, Ghent, Belgium.

Dries De Maeyer (D)

Janssen Pharmaceutica, Beerse, Belgium.

Joke Reumers (J)

Janssen Pharmaceutica, Beerse, Belgium.

Yvan Saeys (Y)

Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.

Stein Aerts (S)

VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium. stein.aerts@kuleuven.vib.be.
Department of Human Genetics, KU Leuven, Leuven, Belgium. stein.aerts@kuleuven.vib.be.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH