GOAT: efficient and robust identification of gene set enrichment.


Journal

Communications biology
ISSN: 2399-3642
Titre abrégé: Commun Biol
Pays: England
ID NLM: 101719179

Informations de publication

Date de publication:
19 Jun 2024
Historique:
received: 07 02 2024
accepted: 14 06 2024
medline: 20 6 2024
pubmed: 20 6 2024
entrez: 19 6 2024
Statut: epublish

Résumé

Gene set enrichment analysis is foundational to the interpretation of high throughput biology. Identifying enriched Gene Ontology (GO) terms or disease-associated gene sets within a list of gene effect sizes that represent experimental outcomes is an everyday task in life science that crucially depends on robust and sensitive statistical tools. We here present GOAT, a parameter-free algorithm for gene set enrichment analysis of preranked gene lists. The algorithm can precompute null distributions from standardized gene scores, enabling enrichment testing of the GO database in one second. Validations using synthetic data show that estimated gene set p-values are well calibrated under the null hypothesis and invariant to gene list length and gene set size. Application to various real-world proteomics and gene expression studies demonstrates that GOAT identifies more significant GO terms as compared to current methods. GOAT is freely available as an R package and user-friendly online tool for gene set enrichment analyses that includes interactive data visualizations: https://ftwkoopmans.github.io/goat .

Identifiants

pubmed: 38898151
doi: 10.1038/s42003-024-06454-5
pii: 10.1038/s42003-024-06454-5
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

744

Informations de copyright

© 2024. The Author(s).

Références

Maciejewski, H. Gene set analysis methods: statistical models and methodological differences. Brief. Bioinform. 15, 504–518 (2014).
doi: 10.1093/bib/bbt002 pubmed: 23413432
Nam, D. & Kim, S. Y. Gene-set approach for expression pattern analysis. Brief. Bioinform. 9, 189–197 (2008).
doi: 10.1093/bib/bbn001 pubmed: 18202032
Hung, J. H., Yang, T. H., Hu, Z., Weng, Z. & DeLisi, C. Gene set enrichment analysis: performance evaluation and usage guidelines. Brief. Bioinform. 13, 281–291 (2012).
doi: 10.1093/bib/bbr049 pubmed: 21900207
Maleki, F., Ovens, K., Hogan, D. J. & Kusalik, A. J. Gene set analysis: challenges, opportunities, and future research. Front. Genet. 11, 654 (2020).
doi: 10.3389/fgene.2020.00654 pubmed: 32695141 pmcid: 7339292
Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).
doi: 10.1038/nprot.2008.211 pubmed: 19131956
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).
doi: 10.1093/nar/gkw377 pubmed: 27141961 pmcid: 4987924
Mi, H., Poudel, S., Muruganujan, A., Casagrande, J. T. & Thomas, P. D. PANTHER version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Res. 44, D336–D342 (2016).
doi: 10.1093/nar/gkv1194 pubmed: 26578592
Kolberg, L. et al. g:Profiler-interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res. 51, W207–W212 (2023).
doi: 10.1093/nar/gkad347 pubmed: 37144459 pmcid: 10320099
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
doi: 10.1038/75556 pubmed: 10802651 pmcid: 3037419
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
doi: 10.1093/nar/28.1.27 pubmed: 10592173 pmcid: 102409
Tarca, A. L., Bhatti, G. & Romero, R. A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity. PLoS ONE 8, e79217 (2013).
doi: 10.1371/journal.pone.0079217 pubmed: 24260172 pmcid: 3829842
Wijesooriya, K., Jadaan, S. A., Perera, K. L., Kaur, T. & Ziemann, M. Urgent need for consistent standards in functional enrichment analysis. PLoS Comput. Biol. 18, e1009935 (2022).
doi: 10.1371/journal.pcbi.1009935 pubmed: 35263338 pmcid: 8936487
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
doi: 10.1073/pnas.0506580102 pubmed: 16199517 pmcid: 1239896
Korotkevich, G. et al. Fast gene set enrichment analysis. Preprint at bioRxiv https://doi.org/10.1101/060012 060012 (2021).
Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).
Lachmann, A., Xie, Z. & Ma’ayan, A. blitzGSEA: efficient computation of gene set enrichment analysis through gamma distribution approximation. Bioinformatics 38, 2356–2357 (2022).
doi: 10.1093/bioinformatics/btac076 pubmed: 35143610 pmcid: 9004650
Ma, Y. et al. Integrative differential expression and gene set enrichment analysis using summary statistics for scRNA-seq studies. Nat. Commun. 11, 1585 (2020).
doi: 10.1038/s41467-020-15298-6 pubmed: 32221292 pmcid: 7101316
Dong, X., Hao, Y., Wang, X. & Tian, W. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights. Sci. Rep. 6, 18871 (2016).
doi: 10.1038/srep18871 pubmed: 26750448 pmcid: 4707541
Foroutan, M. et al. Single sample scoring of molecular phenotypes. BMC Bioinform. 19, 404 (2018).
doi: 10.1186/s12859-018-2435-4
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
doi: 10.1371/journal.pcbi.1004219 pubmed: 25885710 pmcid: 4401657
Taleb, N. N. Statistical consequences of fat tails: real world preasymptotics, epistemology, and applications : papers and commentary. (STEM Academic Press, 2020).
Tamayo, P., Steinhardt, G., Liberzon, A. & Mesirov, J. P. The limitations of simple gene set enrichment analysis assuming gene independence. Stat. Methods Med. Res. 25, 472–487 (2016).
doi: 10.1177/0962280212460441 pubmed: 23070592
Colameo, D. et al. Pervasive compartment-specific regulation of gene expression during homeostatic synaptic scaling. EMBO Rep. 22, e52094 (2021).
doi: 10.15252/embr.202052094 pubmed: 34396684 pmcid: 8490987
Hong, G., Zhang, W., Li, H., Shen, X. & Guo, Z. Separate enrichment analysis of pathways for up- and downregulated genes. J. R. Soc. Interface 11, 20130950 (2014).
doi: 10.1098/rsif.2013.0950 pubmed: 24352673 pmcid: 3899863
Higginbotham, L. et al. Integrated proteomics reveals brain-based cerebrospinal fluid biomarkers in asymptomatic and symptomatic Alzheimer’s disease. Sci. Adv. 6, eaaz9360 (2020).
Hondius, D. C. et al. The proteome of granulovacuolar degeneration and neurofibrillary tangles in Alzheimer’s disease. Acta Neuropathol. 141, 341–358 (2021).
doi: 10.1007/s00401-020-02261-4 pubmed: 33492460 pmcid: 7882576
Sahadevan, S. et al. Synaptic FUS accumulation triggers early misregulation of synaptic RNAs in a mouse model of ALS. Nat. Commun. 12, 3027 (2021).
doi: 10.1038/s41467-021-23188-8 pubmed: 34021139 pmcid: 8140117
Wingo, A. P. et al. Shared proteomic effects of cerebral atherosclerosis and Alzheimer’s disease on the human brain. Nat. Neurosci. 23, 696–700 (2020).
doi: 10.1038/s41593-020-0635-5 pubmed: 32424284 pmcid: 7269838
Ewing, E., Planell-Picola, N., Jagodic, M. & Gomez-Cabrero, D. GeneSetCluster: a tool for summarizing and integrating gene-set analysis results. BMC Bioinform. 21, 443 (2020).
doi: 10.1186/s12859-020-03784-z
Gu, Z. & Hubschmann, D. simplifyEnrichment: a Bioconductor package for clustering and visualizing functional enrichment results. Genom. Proteom. Bioinform. 21, 190–202 (2023).
doi: 10.1016/j.gpb.2022.04.008
Merico, D., Isserlin, R., Stueker, O., Emili, A. & Bader, G. D. Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS ONE 5, e13984 (2010).
doi: 10.1371/journal.pone.0013984 pubmed: 21085593 pmcid: 2981572
Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015).
doi: 10.1038/nmeth.3252 pubmed: 25633503 pmcid: 4509590
Koopmans, F. et al. SynGO: an evidence-based, expert-curated knowledge base for the synapse. Neuron 103, 217–234 e214 (2019).
doi: 10.1016/j.neuron.2019.05.002 pubmed: 31171447 pmcid: 6764089
Koopmans, F. GOAT R package: version 1.0. Zenodo (2024).

Auteurs

Frank Koopmans (F)

Department of Molecular and Cellular Neurobiology, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University, 1081 HV, Amsterdam, The Netherlands. frank.koopmans@vu.nl.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH