Guidance for DNA methylation studies: statistical insights from the Illumina EPIC array.
DNA methylation
Epigenome-wide association study (EWAS)
Illumina EPIC array
Multiple testing
Power
Journal
BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258
Informations de publication
Date de publication:
14 May 2019
14 May 2019
Historique:
received:
01
10
2018
accepted:
02
05
2019
entrez:
16
5
2019
pubmed:
16
5
2019
medline:
5
9
2019
Statut:
epublish
Résumé
There has been a steady increase in the number of studies aiming to identify DNA methylation differences associated with complex phenotypes. Many of the challenges of epigenetic epidemiology regarding study design and interpretation have been discussed in detail, however there are analytical concerns that are outstanding and require further exploration. In this study we seek to address three analytical issues. First, we quantify the multiple testing burden and propose a standard statistical significance threshold for identifying DNA methylation sites that are associated with an outcome. Second, we establish whether linear regression, the chosen statistical tool for the majority of studies, is appropriate and whether it is biased by the underlying distribution of DNA methylation data. Finally, we assess the sample size required for adequately powered DNA methylation association studies. We quantified DNA methylation in the Understanding Society cohort (n = 1175), a large population based study, using the Illumina EPIC array to assess the statistical properties of DNA methylation association analyses. By simulating null DNA methylation studies, we generated the distribution of p-values expected by chance and calculated the 5% family-wise error for EPIC array studies to be 9 × 10 We propose that a significance threshold of P < 9 × 10
Sections du résumé
BACKGROUND
BACKGROUND
There has been a steady increase in the number of studies aiming to identify DNA methylation differences associated with complex phenotypes. Many of the challenges of epigenetic epidemiology regarding study design and interpretation have been discussed in detail, however there are analytical concerns that are outstanding and require further exploration. In this study we seek to address three analytical issues. First, we quantify the multiple testing burden and propose a standard statistical significance threshold for identifying DNA methylation sites that are associated with an outcome. Second, we establish whether linear regression, the chosen statistical tool for the majority of studies, is appropriate and whether it is biased by the underlying distribution of DNA methylation data. Finally, we assess the sample size required for adequately powered DNA methylation association studies.
RESULTS
RESULTS
We quantified DNA methylation in the Understanding Society cohort (n = 1175), a large population based study, using the Illumina EPIC array to assess the statistical properties of DNA methylation association analyses. By simulating null DNA methylation studies, we generated the distribution of p-values expected by chance and calculated the 5% family-wise error for EPIC array studies to be 9 × 10
CONCLUSION
CONCLUSIONS
We propose that a significance threshold of P < 9 × 10
Identifiants
pubmed: 31088362
doi: 10.1186/s12864-019-5761-7
pii: 10.1186/s12864-019-5761-7
pmc: PMC6518823
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
366Subventions
Organisme : Medical Research Council
ID : G1001799
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/N01104X/2
Pays : United Kingdom
Organisme : Medical Research Council
ID : K013807
Pays : United Kingdom
Organisme : Economic and Social Research Council
ID : ES/N00812X/1
Organisme : Medical Research Council
ID : MR/N01104X/1
Pays : United Kingdom
Organisme : Economic and Social Research Council
ID : ES/K005146/1
Références
Genet Epidemiol. 2008 Apr;32(3):227-34
pubmed: 18300295
Genet Epidemiol. 2008 Sep;32(6):567-73
pubmed: 18425821
Nat Genet. 2009 Feb;41(2):178-186
pubmed: 19151715
Nat Biotechnol. 2009 Dec;27(12):1135-7
pubmed: 20010596
Nat Rev Genet. 2010 Mar;11(3):191-203
pubmed: 20125086
J Am Stat Assoc. 2006 Mar 1;101(473):341
pubmed: 20157621
Epigenetics. 2011 Jan;6(1):86-94
pubmed: 20864804
PLoS Med. 2010 Oct 26;7(10):e1000356
pubmed: 21048988
BMC Bioinformatics. 2010 Nov 30;11:587
pubmed: 21118553
Nat Rev Genet. 2011 Jul 12;12(8):529-41
pubmed: 21747404
PLoS Genet. 2011 Sep;7(9):e1002300
pubmed: 21980303
PLoS Genet. 2012;8(4):e1002629
pubmed: 22532803
BMC Bioinformatics. 2012 May 08;13:86
pubmed: 22568884
Carcinogenesis. 2013 Jan;34(1):102-8
pubmed: 23054610
PLoS One. 2012;7(11):e50266
pubmed: 23209692
Nat Biotechnol. 2013 Feb;31(2):142-7
pubmed: 23334450
BMC Genomics. 2013 May 01;14:293
pubmed: 23631413
Nat Rev Genet. 2013 Aug;14(8):585-94
pubmed: 23817309
Epigenetics. 2013 Aug;8(8):816-26
pubmed: 23903776
Genome Res. 2013 Sep;23(9):1363-72
pubmed: 23908385
PLoS Genet. 2013;9(8):e1003678
pubmed: 23950730
Mol Psychiatry. 2014 Aug;19(8):862-71
pubmed: 23999529
Epigenetics. 2013 Nov;8(11):1188-97
pubmed: 24005183
Aging Cell. 2014 Feb;13(1):142-55
pubmed: 24112369
Genome Biol. 2013;14(10):R115
pubmed: 24138928
Clin Epigenetics. 2014 Feb 03;6(1):4
pubmed: 24485148
Lancet. 2014 Jun 7;383(9933):1952-4
pubmed: 24630775
PLoS Genet. 2014 May 29;10(5):e1004402
pubmed: 24875834
Nat Neurosci. 2014 Sep;17(9):1156-63
pubmed: 25129075
Nat Neurosci. 2014 Sep;17(9):1164-70
pubmed: 25129077
Genome Biol. 2014;15(10):483
pubmed: 25347937
Genome Res. 2015 Mar;25(3):338-52
pubmed: 25650246
Clin Epigenetics. 2015 Jan 22;7:6
pubmed: 25663950
Genome Biol. 2015 Feb 15;16:37
pubmed: 25853392
Int J Epidemiol. 2015 May 13;44(4):1429-1441
pubmed: 25972603
Epigenetics. 2015;10(11):1054-63
pubmed: 26646901
Clin Epigenetics. 2015 Dec 18;7:130
pubmed: 26691723
Environ Health Perspect. 2016 Jul;124(7):983-90
pubmed: 26731791
Am J Hum Genet. 2016 Apr 7;98(4):680-96
pubmed: 27040690
Genom Data. 2016 May 26;9:22-4
pubmed: 27330998
PLoS Genet. 2016 Jun 23;12(6):e1006105
pubmed: 27336614
Genome Biol. 2016 Aug 30;17(1):176
pubmed: 27572077
Circ Cardiovasc Genet. 2016 Oct;9(5):436-447
pubmed: 27651444
Nature. 2017 Jan 5;541(7635):81-86
pubmed: 28002404
Genet Epidemiol. 2018 Feb;42(1):20-33
pubmed: 29034560
Exp Ther Med. 2018 Jan;15(1):103-108
pubmed: 29375678
Genome Med. 2018 Mar 28;10(1):19
pubmed: 29587883
Am J Hum Genet. 2018 Nov 1;103(5):654-665
pubmed: 30401456
Bioinformatics. 2019 Mar 15;35(6):981-986
pubmed: 30875430