GOAT: efficient and robust identification of gene set enrichment.
Journal
Communications biology
ISSN: 2399-3642
Titre abrégé: Commun Biol
Pays: England
ID NLM: 101719179
Informations de publication
Date de publication:
19 Jun 2024
19 Jun 2024
Historique:
received:
07
02
2024
accepted:
14
06
2024
medline:
20
6
2024
pubmed:
20
6
2024
entrez:
19
6
2024
Statut:
epublish
Résumé
Gene set enrichment analysis is foundational to the interpretation of high throughput biology. Identifying enriched Gene Ontology (GO) terms or disease-associated gene sets within a list of gene effect sizes that represent experimental outcomes is an everyday task in life science that crucially depends on robust and sensitive statistical tools. We here present GOAT, a parameter-free algorithm for gene set enrichment analysis of preranked gene lists. The algorithm can precompute null distributions from standardized gene scores, enabling enrichment testing of the GO database in one second. Validations using synthetic data show that estimated gene set p-values are well calibrated under the null hypothesis and invariant to gene list length and gene set size. Application to various real-world proteomics and gene expression studies demonstrates that GOAT identifies more significant GO terms as compared to current methods. GOAT is freely available as an R package and user-friendly online tool for gene set enrichment analyses that includes interactive data visualizations: https://ftwkoopmans.github.io/goat .
Identifiants
pubmed: 38898151
doi: 10.1038/s42003-024-06454-5
pii: 10.1038/s42003-024-06454-5
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
744Informations de copyright
© 2024. The Author(s).
Références
Maciejewski, H. Gene set analysis methods: statistical models and methodological differences. Brief. Bioinform. 15, 504–518 (2014).
doi: 10.1093/bib/bbt002
pubmed: 23413432
Nam, D. & Kim, S. Y. Gene-set approach for expression pattern analysis. Brief. Bioinform. 9, 189–197 (2008).
doi: 10.1093/bib/bbn001
pubmed: 18202032
Hung, J. H., Yang, T. H., Hu, Z., Weng, Z. & DeLisi, C. Gene set enrichment analysis: performance evaluation and usage guidelines. Brief. Bioinform. 13, 281–291 (2012).
doi: 10.1093/bib/bbr049
pubmed: 21900207
Maleki, F., Ovens, K., Hogan, D. J. & Kusalik, A. J. Gene set analysis: challenges, opportunities, and future research. Front. Genet. 11, 654 (2020).
doi: 10.3389/fgene.2020.00654
pubmed: 32695141
pmcid: 7339292
Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).
doi: 10.1038/nprot.2008.211
pubmed: 19131956
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).
doi: 10.1093/nar/gkw377
pubmed: 27141961
pmcid: 4987924
Mi, H., Poudel, S., Muruganujan, A., Casagrande, J. T. & Thomas, P. D. PANTHER version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Res. 44, D336–D342 (2016).
doi: 10.1093/nar/gkv1194
pubmed: 26578592
Kolberg, L. et al. g:Profiler-interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res. 51, W207–W212 (2023).
doi: 10.1093/nar/gkad347
pubmed: 37144459
pmcid: 10320099
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
doi: 10.1038/75556
pubmed: 10802651
pmcid: 3037419
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
doi: 10.1093/nar/28.1.27
pubmed: 10592173
pmcid: 102409
Tarca, A. L., Bhatti, G. & Romero, R. A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity. PLoS ONE 8, e79217 (2013).
doi: 10.1371/journal.pone.0079217
pubmed: 24260172
pmcid: 3829842
Wijesooriya, K., Jadaan, S. A., Perera, K. L., Kaur, T. & Ziemann, M. Urgent need for consistent standards in functional enrichment analysis. PLoS Comput. Biol. 18, e1009935 (2022).
doi: 10.1371/journal.pcbi.1009935
pubmed: 35263338
pmcid: 8936487
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
doi: 10.1073/pnas.0506580102
pubmed: 16199517
pmcid: 1239896
Korotkevich, G. et al. Fast gene set enrichment analysis. Preprint at bioRxiv https://doi.org/10.1101/060012 060012 (2021).
Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).
Lachmann, A., Xie, Z. & Ma’ayan, A. blitzGSEA: efficient computation of gene set enrichment analysis through gamma distribution approximation. Bioinformatics 38, 2356–2357 (2022).
doi: 10.1093/bioinformatics/btac076
pubmed: 35143610
pmcid: 9004650
Ma, Y. et al. Integrative differential expression and gene set enrichment analysis using summary statistics for scRNA-seq studies. Nat. Commun. 11, 1585 (2020).
doi: 10.1038/s41467-020-15298-6
pubmed: 32221292
pmcid: 7101316
Dong, X., Hao, Y., Wang, X. & Tian, W. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights. Sci. Rep. 6, 18871 (2016).
doi: 10.1038/srep18871
pubmed: 26750448
pmcid: 4707541
Foroutan, M. et al. Single sample scoring of molecular phenotypes. BMC Bioinform. 19, 404 (2018).
doi: 10.1186/s12859-018-2435-4
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
doi: 10.1371/journal.pcbi.1004219
pubmed: 25885710
pmcid: 4401657
Taleb, N. N. Statistical consequences of fat tails: real world preasymptotics, epistemology, and applications : papers and commentary. (STEM Academic Press, 2020).
Tamayo, P., Steinhardt, G., Liberzon, A. & Mesirov, J. P. The limitations of simple gene set enrichment analysis assuming gene independence. Stat. Methods Med. Res. 25, 472–487 (2016).
doi: 10.1177/0962280212460441
pubmed: 23070592
Colameo, D. et al. Pervasive compartment-specific regulation of gene expression during homeostatic synaptic scaling. EMBO Rep. 22, e52094 (2021).
doi: 10.15252/embr.202052094
pubmed: 34396684
pmcid: 8490987
Hong, G., Zhang, W., Li, H., Shen, X. & Guo, Z. Separate enrichment analysis of pathways for up- and downregulated genes. J. R. Soc. Interface 11, 20130950 (2014).
doi: 10.1098/rsif.2013.0950
pubmed: 24352673
pmcid: 3899863
Higginbotham, L. et al. Integrated proteomics reveals brain-based cerebrospinal fluid biomarkers in asymptomatic and symptomatic Alzheimer’s disease. Sci. Adv. 6, eaaz9360 (2020).
Hondius, D. C. et al. The proteome of granulovacuolar degeneration and neurofibrillary tangles in Alzheimer’s disease. Acta Neuropathol. 141, 341–358 (2021).
doi: 10.1007/s00401-020-02261-4
pubmed: 33492460
pmcid: 7882576
Sahadevan, S. et al. Synaptic FUS accumulation triggers early misregulation of synaptic RNAs in a mouse model of ALS. Nat. Commun. 12, 3027 (2021).
doi: 10.1038/s41467-021-23188-8
pubmed: 34021139
pmcid: 8140117
Wingo, A. P. et al. Shared proteomic effects of cerebral atherosclerosis and Alzheimer’s disease on the human brain. Nat. Neurosci. 23, 696–700 (2020).
doi: 10.1038/s41593-020-0635-5
pubmed: 32424284
pmcid: 7269838
Ewing, E., Planell-Picola, N., Jagodic, M. & Gomez-Cabrero, D. GeneSetCluster: a tool for summarizing and integrating gene-set analysis results. BMC Bioinform. 21, 443 (2020).
doi: 10.1186/s12859-020-03784-z
Gu, Z. & Hubschmann, D. simplifyEnrichment: a Bioconductor package for clustering and visualizing functional enrichment results. Genom. Proteom. Bioinform. 21, 190–202 (2023).
doi: 10.1016/j.gpb.2022.04.008
Merico, D., Isserlin, R., Stueker, O., Emili, A. & Bader, G. D. Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS ONE 5, e13984 (2010).
doi: 10.1371/journal.pone.0013984
pubmed: 21085593
pmcid: 2981572
Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015).
doi: 10.1038/nmeth.3252
pubmed: 25633503
pmcid: 4509590
Koopmans, F. et al. SynGO: an evidence-based, expert-curated knowledge base for the synapse. Neuron 103, 217–234 e214 (2019).
doi: 10.1016/j.neuron.2019.05.002
pubmed: 31171447
pmcid: 6764089
Koopmans, F. GOAT R package: version 1.0. Zenodo (2024).