Improving the power of gene set enrichment analyses.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
17 May 2019
Historique:
received: 26 10 2018
accepted: 25 04 2019
entrez: 19 5 2019
pubmed: 19 5 2019
medline: 21 6 2019
Statut: epublish

Résumé

Set enrichment methods are commonly used to analyze high-dimensional molecular data and gain biological insight into molecular or clinical phenotypes. One important category of analysis methods employs an enrichment score, which is created from ranked univariate correlations between phenotype and each molecular attribute. Estimates of the significance of the associations are determined via a null distribution generated from phenotype permutation. We investigate some statistical properties of this method and demonstrate how alternative assessments of enrichment can be used to increase the statistical power of such analyses to detect associations between phenotype and biological processes and pathways. For this category of set enrichment analysis, the null distribution is largely independent of the number of samples with available molecular data. Hence, providing the sample cohort is not too small, we show that increased statistical power to identify associations between biological processes and phenotype can be achieved by splitting the cohort into two halves and using the average of the enrichment scores evaluated for each half as an alternative test statistic. Further, we demonstrate that this principle can be extended by averaging over multiple random splits of the cohort into halves. This enables the calculation of an enrichment statistic and associated p value of arbitrary precision, independent of the exact random splits used. It is possible to increase the statistical power of gene set enrichment analyses that employ enrichment scores created from running sums of univariate phenotype-attribute correlations and phenotype-permutation generated null distributions. This increase can be achieved by using alternative test statistics that average enrichment scores calculated for splits of the dataset. Apart from the special case of a close balance between up- and down-regulated genes within a gene set, statistical power can be improved, or at least maintained, by this method down to small sample sizes, where accurate assessment of univariate phenotype-gene correlations becomes unfeasible.

Sections du résumé

BACKGROUND BACKGROUND
Set enrichment methods are commonly used to analyze high-dimensional molecular data and gain biological insight into molecular or clinical phenotypes. One important category of analysis methods employs an enrichment score, which is created from ranked univariate correlations between phenotype and each molecular attribute. Estimates of the significance of the associations are determined via a null distribution generated from phenotype permutation. We investigate some statistical properties of this method and demonstrate how alternative assessments of enrichment can be used to increase the statistical power of such analyses to detect associations between phenotype and biological processes and pathways.
RESULTS RESULTS
For this category of set enrichment analysis, the null distribution is largely independent of the number of samples with available molecular data. Hence, providing the sample cohort is not too small, we show that increased statistical power to identify associations between biological processes and phenotype can be achieved by splitting the cohort into two halves and using the average of the enrichment scores evaluated for each half as an alternative test statistic. Further, we demonstrate that this principle can be extended by averaging over multiple random splits of the cohort into halves. This enables the calculation of an enrichment statistic and associated p value of arbitrary precision, independent of the exact random splits used.
CONCLUSIONS CONCLUSIONS
It is possible to increase the statistical power of gene set enrichment analyses that employ enrichment scores created from running sums of univariate phenotype-attribute correlations and phenotype-permutation generated null distributions. This increase can be achieved by using alternative test statistics that average enrichment scores calculated for splits of the dataset. Apart from the special case of a close balance between up- and down-regulated genes within a gene set, statistical power can be improved, or at least maintained, by this method down to small sample sizes, where accurate assessment of univariate phenotype-gene correlations becomes unfeasible.

Identifiants

pubmed: 31101008
doi: 10.1186/s12859-019-2850-1
pii: 10.1186/s12859-019-2850-1
pmc: PMC6525372
doi:

Substances chimiques

RNA, Messenger 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

257

Références

Nature. 2002 Jan 31;415(6871):530-6
pubmed: 11823860
N Engl J Med. 2002 Dec 19;347(25):1999-2009
pubmed: 12490681
Nat Genet. 2003 Jul;34(3):267-73
pubmed: 12808457
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50
pubmed: 16199517
BMC Bioinformatics. 2009 Feb 03;10:47
pubmed: 19192285
Methods Mol Biol. 2009;563:99-121
pubmed: 19597782
PLoS Comput Biol. 2011 Oct;7(10):e1002240
pubmed: 22028643
Stat Methods Med Res. 2016 Feb;25(1):472-87
pubmed: 23070592
PLoS One. 2013 Nov 15;8(11):e79217
pubmed: 24260172
Cell Syst. 2015 Dec 23;1(6):417-425
pubmed: 26771021
BMC Bioinformatics. 2017 May 12;18(1):256
pubmed: 28499413

Auteurs

Joanna Roder (J)

Biodesix Inc, 2970 Wilderness Pl, Ste100, Boulder, CO, 80301, USA. joanna.roder@biodesix.com.

Benjamin Linstid (B)

Biodesix Inc, 2970 Wilderness Pl, Ste100, Boulder, CO, 80301, USA.

Carlos Oliveira (C)

Biodesix Inc, 2970 Wilderness Pl, Ste100, Boulder, CO, 80301, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH