Representation and quantification of module activity from omics data with rROMA.


Journal

NPJ systems biology and applications
ISSN: 2056-7189
Titre abrégé: NPJ Syst Biol Appl
Pays: England
ID NLM: 101677786

Informations de publication

Date de publication:
19 Jan 2024
Historique:
received: 31 05 2023
accepted: 03 01 2024
medline: 20 1 2024
pubmed: 20 1 2024
entrez: 19 1 2024
Statut: epublish

Résumé

The efficiency of analyzing high-throughput data in systems biology has been demonstrated in numerous studies, where molecular data, such as transcriptomics and proteomics, offers great opportunities for understanding the complexity of biological processes. One important aspect of data analysis in systems biology is the shift from a reductionist approach that focuses on individual components to a more integrative perspective that considers the system as a whole, where the emphasis shifted from differential expression of individual genes to determining the activity of gene sets. Here, we present the rROMA software package for fast and accurate computation of the activity of gene sets with coordinated expression. The rROMA package incorporates significant improvements in the calculation algorithm, along with the implementation of several functions for statistical analysis and visualizing results. These additions greatly expand the package's capabilities and offer valuable tools for data analysis and interpretation. It is an open-source package available on github at: www.github.com/sysbio-curie/rROMA . Based on publicly available transcriptomic datasets, we applied rROMA to cystic fibrosis, highlighting biological mechanisms potentially involved in the establishment and progression of the disease and the associated genes. Results indicate that rROMA can detect disease-related active signaling pathways using transcriptomic and proteomic data. The results notably identified a significant mechanism relevant to cystic fibrosis, raised awareness of a possible bias related to cell culture, and uncovered an intriguing gene that warrants further investigation.

Identifiants

pubmed: 38242871
doi: 10.1038/s41540-024-00331-x
pii: 10.1038/s41540-024-00331-x
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

8

Subventions

Organisme : Association Vaincre la Mucoviscidose (Vaincre la Mucoviscidos)
ID : 20190502488

Informations de copyright

© 2024. The Author(s).

Références

Hawkins, R. D., Hon, G. C. & Ren, B. Next-generation genomics: an integrative approach. Nat. Rev. Genet. 11, 476–486 (2010).
doi: 10.1038/nrg2795 pubmed: 20531367 pmcid: 3321268
Barillot, E., Calzone, L., Hupe, P., Vert, J. P., & Zinovyev, A. Computational systems biology of cancer (CRC Press, 2012).
Wang, K., Li, M. & Hakonarson, H. Analysing biological pathways in genome-wide association studies. Nat. Rev. Genet. 11, 843–854 (2010).
doi: 10.1038/nrg2884 pubmed: 21085203
Levine, D. M. et al. Pathway and gene-set activation measurement from mRNA expression data: the tissue distribution of human pathways. Genome Biol. 7, 1–17 (2006).
doi: 10.1186/gb-2006-7-10-r93
Schreiber, A. W. & Baumann, U. A framework for gene expression analysis. Bioinformatics 23, 191–197 (2007).
doi: 10.1093/bioinformatics/btl591 pubmed: 17118957
Puthier, D. & van Helden, J. Statistics for bioinformatics - practicals - gene enrichment statistics, https://dputhier.github.io/ASG/practicals/go_statistics_td/go_statistics_td_2015.html (2015).
Subramanian, T. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Nat. Genet 34, 267–273 (2003).
pubmed: 12808457
Tomfohr, J., Lu, J. & Kepler, T. B. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinforma. 6, 1–11 (2005).
doi: 10.1186/1471-2105-6-225
Bild, A. H., Potti, A. & Nevins, J. R. Linking oncogenic pathways with therapeutic opportunities. Nat. Rev. Cancer 6, 735–741 (2006).
doi: 10.1038/nrc1976 pubmed: 16915294
Lim, S., Lee, S., Jung, I., Rhee, S. & Kim, S. Comprehensive and critical evaluation of individualized pathway activity measurement tools on pan-cancer data. Brief. Bioinforma. 21, 36–46 (2020).
Barbie, D. A. et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature 462, 108–112 (2009).
doi: 10.1038/nature08460 pubmed: 19847166 pmcid: 2783335
Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinforma. 14, 1–15 (2013).
doi: 10.1186/1471-2105-14-7
Vaske, C. J. et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 26, i237–i245 (2010).
doi: 10.1093/bioinformatics/btq182 pubmed: 20529912 pmcid: 2881367
Alvarez, M. J. et al. Network-based inference of protein activity helps functionalize the genetic landscape of cancer. Nat. Genet. 48, 838 (2016).
doi: 10.1038/ng.3593 pubmed: 27322546 pmcid: 5040167
Schubert, M. et al. Perturbation-response genes reveal signaling footprints in cancer gene expression. Nat. Commun. 9, 20 (2018).
doi: 10.1038/s41467-017-02391-6 pubmed: 29295995 pmcid: 5750219
Fan, J. et al. Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis. Nat. Methods 13, 241–244 (2016).
doi: 10.1038/nmeth.3734 pubmed: 26780092 pmcid: 4772672
Landais, Y. & Vallot, C. Multi-modal quantification of pathway activity with MAYA, Nature. Communications 14, 1668 (2023).
Holland, C. H. et al. Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data. Genome Biol. 21, 1–19 (2020).
doi: 10.1186/s13059-020-1949-z
Zhang, Y. et al. Benchmarking algorithms for pathway activity transformation of single-cell RNA-seq data. Comput. Struct. Biotechnol. J. 18, 2953–2961 (2020).
doi: 10.1016/j.csbj.2020.10.007 pubmed: 33209207 pmcid: 7642725
Martignetti, L., Calzone, L., Bonnet, E., Barillot, E. & Zinovyev, A. ROMA: representation and quantification of module activity from target expression data. Front. Genet. 7, 18 (2016).
doi: 10.3389/fgene.2016.00018 pubmed: 26925094 pmcid: 4760130
Golub, G. H., & Reinsch, C. Singular value decomposition and least squares solutions. In Handbook for Automatic Computation: Volume II: Linear Algebra, 134–151 (Springer Berlin Heidelberg, 1971).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).
Rehman, T. et al. Inflammatory cytokines TNF-α and IL-17 enhance the efficacy of cystic fibrosis transmembrane conductance regulator modulators. J. Clin. Investig. 131, e150398 (2021).
doi: 10.1172/JCI150398 pubmed: 34166230 pmcid: 8363270
Liberzon, A. et al. The molecular signatures database hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
doi: 10.1016/j.cels.2015.12.004 pubmed: 26771021 pmcid: 4707969
Cantiello, H. Role of actin filament organization in CFTR activation. Pflügers Arch. 443, S75–S80 (2001).
doi: 10.1007/s004240100649 pubmed: 11845308
Vasconcellos, C. A. et al. Reduction in viscosity of cystic fibrosis sputum in vitro by gelsolin. Science 263, 969–971 (1994).
doi: 10.1126/science.8310295 pubmed: 8310295
Bucki, R. et al. Enhancement of Pulmozyme activity in purulent sputum by combination with poly-aspartic acid or gelsolin. J. Cyst. Fibros. 14, 587–593 (2015).
doi: 10.1016/j.jcf.2015.02.001 pubmed: 25682700
Saint-Criq, V. et al. Choice of differentiation media significantly impacts cell lineage and response to CFTR modulators in fully differentiated primary cultures of cystic fibrosis human airway epithelial cells. Cells 9, 2137 (2020).
doi: 10.3390/cells9092137 pubmed: 32967385 pmcid: 7565948
Plasschaert, L. W. et al. A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte. Nature 560, 377–381 (2018).
doi: 10.1038/s41586-018-0394-6 pubmed: 30069046 pmcid: 6108322
Okuda, K. et al. Secretory cells dominate airway CFTR expression and function in human airway superficial epithelia. Am. J. Respir. Crit. Care Med. 203, 1275–1289 (2021).
doi: 10.1164/rccm.202008-3198OC pubmed: 33321047 pmcid: 8456462
Krug, K. et al. Proteogenomic landscape of breast cancer tumorigenesis and targeted therapy. Cell 183, 1436–1456 (2020).
doi: 10.1016/j.cell.2020.10.036 pubmed: 33212010 pmcid: 8077737
Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27, 1160 (2009).
doi: 10.1200/JCO.2008.18.1370 pubmed: 19204204 pmcid: 2667820
Fouad, T. M., Kogawa, T., Reuben, J. M. & Ueno, N. T. The role of inflammation in inflammatory breast cancer. Inflamm. Cancer 816, 53–73 (2014).
doi: 10.1007/978-3-0348-0837-8_3
Sarrió, D. et al. Epithelial-mesenchymal transition in breast cancer relates to the basal-like phenotype. Cancer Res. 68, 989–997 (2008).
doi: 10.1158/0008-5472.CAN-07-2017 pubmed: 18281472
Strandvik, B. Fatty acid metabolism in cystic fibrosis. Prostaglandins, leukotrienes Essent. Fat. acids 83, 121–129 (2010).
doi: 10.1016/j.plefa.2010.07.002
Baglama, J. IRLBA: fast partial singular value decomposition method. Handbook of Big Data, 125–136 (CRC press, 2016).
Tsuyuzaki, K., Sato, H., Sato, K. & Nikaido, I. Benchmarking principal component analysis for large-scale single-cell RNA-sequencing. Genome Biol. 21, 1–17 (2020).
doi: 10.1186/s13059-019-1900-3
Gorban, A. N., & Zinovyev, A. Y. Principal graphs and manifolds. Handbook of research on machine learning applications and trends: algorithms, methods and techniques, 28–59 (IGI Global 2010)
Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. The elements of statistical learning: data mining, inference, and prediction, 2, 1–758 (Springer, 2009).
Van Buuren, S. & Groothuis-Oudshoorn, K. Mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–67 (2011).
doi: 10.18637/jss.v045.i03
Kanehisa, M., The KEGG database, In Silico Simulation of Biological Processes: Novartis Foundation Symposium Chichester, 247, 91–103 (John Wiley & Sons, 2002).
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
doi: 10.1093/bioinformatics/btr260 pubmed: 21546393 pmcid: 3106198
Lee, E., Chuang, H. Y., Kim, J. W., Ideker, T. & Lee, D. Inferring pathway activity toward precise disease classification. PLoS Comput. Biol. 4, e1000217 (2008).
doi: 10.1371/journal.pcbi.1000217 pubmed: 18989396 pmcid: 2563693
Wagner, F. GO-PCA: an unsupervised method to explore gene expression data using prior knowledge. PloS One 10, e0143196 (2015).
doi: 10.1371/journal.pone.0143196 pubmed: 26575370 pmcid: 4648502
Frost, H. R., Li, Z. & Moore, J. H. Principal component gene set enrichment (PCGSE). BioData Min. 8, 1–18 (2015).
doi: 10.1186/s13040-015-0059-z
Drier, Y., Sheffer, M. & Domany, E. Pathway-based personalized analysis of cancer. Proc. Natl Acad. Sci. 110, 6388–6393 (2013).
doi: 10.1073/pnas.1219651110 pubmed: 23547110 pmcid: 3631698

Auteurs

Matthieu Najm (M)

INSERM U900, 75428, Paris, France.
Center for Computational Biology, Mines ParisTech, PSL Research University, 75006, Paris, France.
Institut Curie, PSL Research University, 75248, Paris, France.

Matthieu Cornet (M)

INSERM U900, 75428, Paris, France.
Center for Computational Biology, Mines ParisTech, PSL Research University, 75006, Paris, France.
Institut Curie, PSL Research University, 75248, Paris, France.

Luca Albergante (L)

INSERM U900, 75428, Paris, France.
Center for Computational Biology, Mines ParisTech, PSL Research University, 75006, Paris, France.
Institut Curie, PSL Research University, 75248, Paris, France.

Andrei Zinovyev (A)

INSERM U900, 75428, Paris, France.
Center for Computational Biology, Mines ParisTech, PSL Research University, 75006, Paris, France.
Institut Curie, PSL Research University, 75248, Paris, France.

Isabelle Sermet-Gaudelus (I)

Faculté de Médecine, Université de Paris, Paris, France.
Institut Necker Enfants Malades, INSERM U1151, Paris, France.
AP-HP. Centre - Université Paris Cité; Hôpital Necker Enfants Malades, Centre de Référence Maladie Rare - Mucoviscidose, Paris, France.

Véronique Stoven (V)

INSERM U900, 75428, Paris, France.
Center for Computational Biology, Mines ParisTech, PSL Research University, 75006, Paris, France.
Institut Curie, PSL Research University, 75248, Paris, France.

Laurence Calzone (L)

INSERM U900, 75428, Paris, France.
Center for Computational Biology, Mines ParisTech, PSL Research University, 75006, Paris, France.
Institut Curie, PSL Research University, 75248, Paris, France.

Loredana Martignetti (L)

INSERM U900, 75428, Paris, France. loredana.martignetti@curie.fr.
Center for Computational Biology, Mines ParisTech, PSL Research University, 75006, Paris, France. loredana.martignetti@curie.fr.
Institut Curie, PSL Research University, 75248, Paris, France. loredana.martignetti@curie.fr.

Classifications MeSH