Estimands in epigenome-wide association studies.

DNA methylation Epigenome-wide association study (EWAS) Estimands Multiple testing Reproducible research

Journal

Clinical epigenetics
ISSN: 1868-7083
Titre abrégé: Clin Epigenetics
Pays: Germany
ID NLM: 101516977

Informations de publication

Date de publication:
29 04 2021
Historique:
received: 12 06 2020
accepted: 19 04 2021
entrez: 30 4 2021
pubmed: 1 5 2021
medline: 27 1 2022
Statut: epublish

Résumé

In DNA methylation analyses like epigenome-wide association studies, effects in differentially methylated CpG sites are assessed. Two kinds of outcomes can be used for statistical analysis: Beta-values and M-values. M-values follow a normal distribution and help to detect differentially methylated CpG sites. As biological effect measures, differences of M-values are more or less meaningless. Beta-values are of more interest since they can be interpreted directly as differences in percentage of DNA methylation at a given CpG site, but they have poor statistical properties. Different frameworks are proposed for reporting estimands in DNA methylation analysis, relying on Beta-values, M-values, or both. We present and discuss four possible approaches of achieving estimands in DNA methylation analysis. In addition, we present the usage of M-values or Beta-values in the context of bioinformatical pipelines, which often demand a predefined outcome. We show the dependencies between the differences in M-values to differences in Beta-values in two data simulations: a analysis with and without confounder effect. Without present confounder effects, M-values can be used for the statistical analysis and Beta-values statistics for the reporting. If confounder effects exist, we demonstrate the deviations and correct the effects by the intercept method. Finally, we demonstrate the theoretical problem on two large human genome-wide DNA methylation datasets to verify the results. The usage of M-values in the analysis of DNA methylation data will produce effect estimates, which cannot be biologically interpreted. The parallel usage of Beta-value statistics ignores possible confounder effects and can therefore not be recommended. Hence, if the differences in Beta-values are the focus of the study, the intercept method is recommendable. Hyper- or hypomethylated CpG sites must then be carefully evaluated. If an exploratory analysis of possible CpG sites is the aim of the study, M-values can be used for inference.

Sections du résumé

BACKGROUND
In DNA methylation analyses like epigenome-wide association studies, effects in differentially methylated CpG sites are assessed. Two kinds of outcomes can be used for statistical analysis: Beta-values and M-values. M-values follow a normal distribution and help to detect differentially methylated CpG sites. As biological effect measures, differences of M-values are more or less meaningless. Beta-values are of more interest since they can be interpreted directly as differences in percentage of DNA methylation at a given CpG site, but they have poor statistical properties. Different frameworks are proposed for reporting estimands in DNA methylation analysis, relying on Beta-values, M-values, or both.
RESULTS
We present and discuss four possible approaches of achieving estimands in DNA methylation analysis. In addition, we present the usage of M-values or Beta-values in the context of bioinformatical pipelines, which often demand a predefined outcome. We show the dependencies between the differences in M-values to differences in Beta-values in two data simulations: a analysis with and without confounder effect. Without present confounder effects, M-values can be used for the statistical analysis and Beta-values statistics for the reporting. If confounder effects exist, we demonstrate the deviations and correct the effects by the intercept method. Finally, we demonstrate the theoretical problem on two large human genome-wide DNA methylation datasets to verify the results.
CONCLUSIONS
The usage of M-values in the analysis of DNA methylation data will produce effect estimates, which cannot be biologically interpreted. The parallel usage of Beta-value statistics ignores possible confounder effects and can therefore not be recommended. Hence, if the differences in Beta-values are the focus of the study, the intercept method is recommendable. Hyper- or hypomethylated CpG sites must then be carefully evaluated. If an exploratory analysis of possible CpG sites is the aim of the study, M-values can be used for inference.

Identifiants

pubmed: 33926513
doi: 10.1186/s13148-021-01083-9
pii: 10.1186/s13148-021-01083-9
pmc: PMC8086103
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

98

Références

Stat Med. 2017 Jan 15;36(1):5-19
pubmed: 27435045
Epigenetics Chromatin. 2015 Jan 27;8:6
pubmed: 25972926
Genome Res. 2018 Sep;28(9):1285-1295
pubmed: 30072366
Plant Dis. 2016 Jun;100(6):1118-1124
pubmed: 30682275
BMC Nephrol. 2019 Aug 16;20(1):320
pubmed: 31419951
Curr Environ Health Rep. 2015 Jun;2(2):145-54
pubmed: 26231364
BMC Bioinformatics. 2015 Jul 10;16:217
pubmed: 26156501
Trends Biotechnol. 2017 Jun;35(6):498-507
pubmed: 28351613
Genome Biol. 2019 Nov 14;20(1):235
pubmed: 31727104
Eur J Epidemiol. 2019 Mar;34(3):211-219
pubmed: 30840181
Bioinformatics. 2019 Apr 1;35(7):1094-1097
pubmed: 30184051
J Biotechnol. 2017 Nov 10;261:105-115
pubmed: 28822795
BMC Genomics. 2019 May 14;20(1):366
pubmed: 31088362
Bioinformatics. 2014 May 15;30(10):1363-9
pubmed: 24478339
BMC Bioinformatics. 2016 Nov 22;17(1):480
pubmed: 27875981
Genome Res. 2016 Feb;26(2):256-62
pubmed: 26631489
Biostatistics. 2007 Jan;8(1):118-27
pubmed: 16632515
Front Genet. 2013 May 31;3:160
pubmed: 23755061
Epigenetics. 2020 Jan - Feb;15(1-2):174-182
pubmed: 31538540
Nat Methods. 2018 Dec;15(12):1059-1066
pubmed: 30504870
BMC Bioinformatics. 2019 Jan 22;20(1):47
pubmed: 30669962
Clin Epigenetics. 2018 Oct 16;10(1):123
pubmed: 30326963
Trends Pharmacol Sci. 1988 Jan;9(1):29-32
pubmed: 3245075
Genome Biol. 2012 Oct 03;13(10):R83
pubmed: 23034175
Nucleic Acids Res. 2015 Apr 20;43(7):e47
pubmed: 25605792
PLoS One. 2012;7(10):e46705
pubmed: 23071618
Bioinformatics. 2016 May 15;32(10):1446-53
pubmed: 26819470
Ther Innov Regul Sci. 2015 Jul;49(4):584-592
pubmed: 30222440
Genet Epidemiol. 2009 Nov;33(7):581-98
pubmed: 19278015
Nat Commun. 2019 Apr 23;10(1):1893
pubmed: 31015461
Bioinformatics. 2019 Oct 1;35(19):3635-3641
pubmed: 30799505
Genome Biol. 2015 Feb 15;16:37
pubmed: 25853392
Epigenomics. 2018 Jan;10(1):27-42
pubmed: 29172695
BMC Bioinformatics. 2010 Nov 30;11:587
pubmed: 21118553
Nat Rev Genet. 2019 Feb;20(2):109-127
pubmed: 30479381
Genome Biol. 2016 May 03;17:84
pubmed: 27142380
Nat Methods. 2017 Feb 28;14(3):218-219
pubmed: 28245214
Bioinformatics. 2016 Jan 15;32(2):286-8
pubmed: 26424855
Bioinformatics. 2011 Jun 1;27(11):1496-505
pubmed: 21471010
Nat Methods. 2014 Nov;11(11):1138-1140
pubmed: 25262207
F1000Res. 2016 Jun 08;5:1281
pubmed: 27347385
Nat Rev Nephrol. 2017 Jan;13(1):47-60
pubmed: 27890923
Am J Hum Genet. 2016 Apr 7;98(4):680-96
pubmed: 27040690
Genome Biol. 2012 Jun 15;13(6):R44
pubmed: 22703947
PLoS One. 2012;7(12):e50471
pubmed: 23227177
Genome Biol. 2019 Mar 14;20(1):55
pubmed: 30871603
Stat Med. 2014 Dec 30;33(30):5347-57
pubmed: 25042556
BMC Genomics. 2013 May 01;14:293
pubmed: 23631413
Bioinformatics. 2017 Dec 15;33(24):3982-3984
pubmed: 28961746
Cell. 2016 Jul 28;166(3):740-754
pubmed: 27397505

Auteurs

Jochen Kruppa (J)

Charité - University Medicine, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117, Berlin, Germany. jochen.kruppa@charite.de.
Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, 10178, Berlin, Germany. jochen.kruppa@charite.de.

Miriam Sieg (M)

Charité - University Medicine, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117, Berlin, Germany.
Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, 10178, Berlin, Germany.

Gesa Richter (G)

Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, 10178, Berlin, Germany.
Department of Periodontology and Synoptic Dentistry, Institute of Dental, Oral and Maxillary Medicine, Charité - University Medicine, Charitéplatz 1, 10117, Berlin, Germany.

Anne Pohrt (A)

Charité - University Medicine, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117, Berlin, Germany.
Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, 10178, Berlin, Germany.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH