Improving the power of gene set enrichment analyses.
Enrichment analysis
Gene set enrichment analysis
Statistical power
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
17 May 2019
17 May 2019
Historique:
received:
26
10
2018
accepted:
25
04
2019
entrez:
19
5
2019
pubmed:
19
5
2019
medline:
21
6
2019
Statut:
epublish
Résumé
Set enrichment methods are commonly used to analyze high-dimensional molecular data and gain biological insight into molecular or clinical phenotypes. One important category of analysis methods employs an enrichment score, which is created from ranked univariate correlations between phenotype and each molecular attribute. Estimates of the significance of the associations are determined via a null distribution generated from phenotype permutation. We investigate some statistical properties of this method and demonstrate how alternative assessments of enrichment can be used to increase the statistical power of such analyses to detect associations between phenotype and biological processes and pathways. For this category of set enrichment analysis, the null distribution is largely independent of the number of samples with available molecular data. Hence, providing the sample cohort is not too small, we show that increased statistical power to identify associations between biological processes and phenotype can be achieved by splitting the cohort into two halves and using the average of the enrichment scores evaluated for each half as an alternative test statistic. Further, we demonstrate that this principle can be extended by averaging over multiple random splits of the cohort into halves. This enables the calculation of an enrichment statistic and associated p value of arbitrary precision, independent of the exact random splits used. It is possible to increase the statistical power of gene set enrichment analyses that employ enrichment scores created from running sums of univariate phenotype-attribute correlations and phenotype-permutation generated null distributions. This increase can be achieved by using alternative test statistics that average enrichment scores calculated for splits of the dataset. Apart from the special case of a close balance between up- and down-regulated genes within a gene set, statistical power can be improved, or at least maintained, by this method down to small sample sizes, where accurate assessment of univariate phenotype-gene correlations becomes unfeasible.
Sections du résumé
BACKGROUND
BACKGROUND
Set enrichment methods are commonly used to analyze high-dimensional molecular data and gain biological insight into molecular or clinical phenotypes. One important category of analysis methods employs an enrichment score, which is created from ranked univariate correlations between phenotype and each molecular attribute. Estimates of the significance of the associations are determined via a null distribution generated from phenotype permutation. We investigate some statistical properties of this method and demonstrate how alternative assessments of enrichment can be used to increase the statistical power of such analyses to detect associations between phenotype and biological processes and pathways.
RESULTS
RESULTS
For this category of set enrichment analysis, the null distribution is largely independent of the number of samples with available molecular data. Hence, providing the sample cohort is not too small, we show that increased statistical power to identify associations between biological processes and phenotype can be achieved by splitting the cohort into two halves and using the average of the enrichment scores evaluated for each half as an alternative test statistic. Further, we demonstrate that this principle can be extended by averaging over multiple random splits of the cohort into halves. This enables the calculation of an enrichment statistic and associated p value of arbitrary precision, independent of the exact random splits used.
CONCLUSIONS
CONCLUSIONS
It is possible to increase the statistical power of gene set enrichment analyses that employ enrichment scores created from running sums of univariate phenotype-attribute correlations and phenotype-permutation generated null distributions. This increase can be achieved by using alternative test statistics that average enrichment scores calculated for splits of the dataset. Apart from the special case of a close balance between up- and down-regulated genes within a gene set, statistical power can be improved, or at least maintained, by this method down to small sample sizes, where accurate assessment of univariate phenotype-gene correlations becomes unfeasible.
Identifiants
pubmed: 31101008
doi: 10.1186/s12859-019-2850-1
pii: 10.1186/s12859-019-2850-1
pmc: PMC6525372
doi:
Substances chimiques
RNA, Messenger
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
257Références
Nature. 2002 Jan 31;415(6871):530-6
pubmed: 11823860
N Engl J Med. 2002 Dec 19;347(25):1999-2009
pubmed: 12490681
Nat Genet. 2003 Jul;34(3):267-73
pubmed: 12808457
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50
pubmed: 16199517
BMC Bioinformatics. 2009 Feb 03;10:47
pubmed: 19192285
Methods Mol Biol. 2009;563:99-121
pubmed: 19597782
PLoS Comput Biol. 2011 Oct;7(10):e1002240
pubmed: 22028643
Stat Methods Med Res. 2016 Feb;25(1):472-87
pubmed: 23070592
PLoS One. 2013 Nov 15;8(11):e79217
pubmed: 24260172
Cell Syst. 2015 Dec 23;1(6):417-425
pubmed: 26771021
BMC Bioinformatics. 2017 May 12;18(1):256
pubmed: 28499413