Characterizing the properties of bisulfite sequencing data: maximizing power and sensitivity to identify between-group differences in DNA methylation.


Journal

BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258

Informations de publication

Date de publication:
15 Jun 2021
Historique:
received: 20 01 2021
accepted: 13 05 2021
entrez: 15 6 2021
pubmed: 16 6 2021
medline: 17 6 2021
Statut: epublish

Résumé

The combination of sodium bisulfite treatment with highly-parallel sequencing is a common method for quantifying DNA methylation across the genome. The power to detect between-group differences in DNA methylation using bisulfite-sequencing approaches is influenced by both experimental (e.g. read depth, missing data and sample size) and biological (e.g. mean level of DNA methylation and difference between groups) parameters. There is, however, no consensus about the optimal thresholds for filtering bisulfite sequencing data with implications for the reproducibility of findings in epigenetic epidemiology. We used a large reduced representation bisulfite sequencing (RRBS) dataset to assess the distribution of read depth across DNA methylation sites and the extent of missing data. To investigate how various study variables influence power to identify DNA methylation differences between groups, we developed a framework for simulating bisulfite sequencing data. As expected, sequencing read depth, group size, and the magnitude of DNA methylation difference between groups all impacted upon statistical power. The influence on power was not dependent on one specific parameter, but reflected the combination of study-specific variables. As a resource to the community, we have developed a tool, POWEREDBiSeq, which utilizes our simulation framework to predict study-specific power for the identification of DNAm differences between groups, taking into account user-defined read depth filtering parameters and the minimum sample size per group. Our data-driven approach highlights the importance of filtering bisulfite-sequencing data by minimum read depth and illustrates how the choice of threshold is influenced by the specific study design and the expected differences between groups being compared. The POWEREDBiSeq tool, which can be applied to different types of bisulfite sequencing data (e.g. RRBS, whole genome bisulfite sequencing (WGBS), targeted bisulfite sequencing and amplicon-based bisulfite sequencing), can help users identify the level of data filtering needed to optimize power and aims to improve the reproducibility of bisulfite sequencing studies.

Sections du résumé

BACKGROUND BACKGROUND
The combination of sodium bisulfite treatment with highly-parallel sequencing is a common method for quantifying DNA methylation across the genome. The power to detect between-group differences in DNA methylation using bisulfite-sequencing approaches is influenced by both experimental (e.g. read depth, missing data and sample size) and biological (e.g. mean level of DNA methylation and difference between groups) parameters. There is, however, no consensus about the optimal thresholds for filtering bisulfite sequencing data with implications for the reproducibility of findings in epigenetic epidemiology.
RESULTS RESULTS
We used a large reduced representation bisulfite sequencing (RRBS) dataset to assess the distribution of read depth across DNA methylation sites and the extent of missing data. To investigate how various study variables influence power to identify DNA methylation differences between groups, we developed a framework for simulating bisulfite sequencing data. As expected, sequencing read depth, group size, and the magnitude of DNA methylation difference between groups all impacted upon statistical power. The influence on power was not dependent on one specific parameter, but reflected the combination of study-specific variables. As a resource to the community, we have developed a tool, POWEREDBiSeq, which utilizes our simulation framework to predict study-specific power for the identification of DNAm differences between groups, taking into account user-defined read depth filtering parameters and the minimum sample size per group.
CONCLUSIONS CONCLUSIONS
Our data-driven approach highlights the importance of filtering bisulfite-sequencing data by minimum read depth and illustrates how the choice of threshold is influenced by the specific study design and the expected differences between groups being compared. The POWEREDBiSeq tool, which can be applied to different types of bisulfite sequencing data (e.g. RRBS, whole genome bisulfite sequencing (WGBS), targeted bisulfite sequencing and amplicon-based bisulfite sequencing), can help users identify the level of data filtering needed to optimize power and aims to improve the reproducibility of bisulfite sequencing studies.

Identifiants

pubmed: 34126923
doi: 10.1186/s12864-021-07721-z
pii: 10.1186/s12864-021-07721-z
pmc: PMC8204428
doi:

Substances chimiques

Sulfites 0
hydrogen sulfite OJ9787WBLU

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

446

Subventions

Organisme : Medical Research Council
ID : MR/M008924/1
Pays : United Kingdom

Références

Bioinformatics. 2014 May 15;30(10):1363-9
pubmed: 24478339
Nat Biotechnol. 2010 Oct;28(10):1106-14
pubmed: 20852634
Nat Rev Genet. 2014 Oct;15(10):647-61
pubmed: 25159599
Genome Res. 2013 Mar;23(3):555-67
pubmed: 23325432
Genome Biol. 2016 Oct 7;17(1):208
pubmed: 27717381
Nat Protoc. 2011 Apr;6(4):468-81
pubmed: 21412275
Nat Neurosci. 2014 Sep;17(9):1156-63
pubmed: 25129075
Transl Psychiatry. 2017 Dec 18;7(12):1287
pubmed: 29249830
Nat Methods. 2015 Mar;12(3):230-2, 1 p following 232
pubmed: 25362363
Brief Funct Genomics. 2016 Nov;15(6):432-442
pubmed: 27056100
Epigenetics Chromatin. 2016 Dec 7;9:56
pubmed: 27980682
Genome Biol. 2016 Aug 30;17(1):176
pubmed: 27572077
Nat Neurosci. 2014 Sep;17(9):1164-70
pubmed: 25129077
Lancet. 2014 Jun 7;383(9933):1952-4
pubmed: 24630775
BMC Bioinformatics. 2009 Jul 27;10:232
pubmed: 19635165
PLoS Comput Biol. 2013;9(8):e1003118
pubmed: 23950696
BMC Bioinformatics. 2018 Apr 3;19(1):111
pubmed: 29614954
Nat Commun. 2021 Jun 10;12(1):3517
pubmed: 34112773
Cell Rep. 2020 Feb 11;30(6):2040-2054.e5
pubmed: 32049030
F1000Res. 2017 Nov 28;6:2055
pubmed: 29333247
Epigenetics. 2020 Jan - Feb;15(1-2):174-182
pubmed: 31538540
BMC Bioinformatics. 2021 Mar 22;22(1):141
pubmed: 33752591
Nat Methods. 2010 Feb;7(2):133-6
pubmed: 20062050
Nat Rev Cancer. 2004 Feb;4(2):143-53
pubmed: 14732866
Genome Biol. 2015 Aug 28;16:180
pubmed: 26316348
Nature. 2008 Aug 7;454(7205):766-70
pubmed: 18600261
Genome Biol. 2014 Feb 20;15(2):R37
pubmed: 24555846
Bioinformatics. 2017 Aug 1;33(15):2381-2383
pubmed: 28369316
Genome Biol. 2017 Apr 11;18(1):68
pubmed: 28399939
Epigenomics. 2018 Apr 1;10(4):379-393
pubmed: 29528243
Bioinformatics. 2015 Jul 15;31(14):2371-3
pubmed: 25777524
Nucleic Acids Res. 2013 Apr;41(7):e90
pubmed: 23476028
Bioinformatics. 2010 Apr 1;26(7):873-81
pubmed: 20147302
BMC Bioinformatics. 2010 Nov 30;11:587
pubmed: 21118553
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
J Natl Cancer Inst. 2013 May 15;105(10):694-700
pubmed: 23578854
Clin Immunol. 2018 Nov;196:21-33
pubmed: 29605707
Epigenetics. 2015;10(7):662-9
pubmed: 26036609
Methods. 2009 Jul;48(3):226-32
pubmed: 19442738
Neuroepigenetics. 2015 Jun 25;3:1-6
pubmed: 26702400
Genome Biol. 2014 Dec 03;15(12):503
pubmed: 25599564
Epigenetics Chromatin. 2016 Jun 29;9:26
pubmed: 27358654
Bioinformatics. 2011 Jun 1;27(11):1571-2
pubmed: 21493656
Sci Adv. 2018 Jul 11;4(7):eaat2624
pubmed: 30009262
Bioinformatics. 2017 Feb 15;33(4):558-560
pubmed: 28035024
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Hum Mol Genet. 2014 Jan 15;23(2):534-45
pubmed: 24014485
Am J Psychiatry. 2017 Dec 1;174(12):1185-1194
pubmed: 28750583
Nature. 2010 Jul 8;466(7303):253-7
pubmed: 20613842
Genome Biol. 2012 Jun 15;13(6):R44
pubmed: 22703947
BMC Genomics. 2013 May 01;14:293
pubmed: 23631413
Nucleic Acids Res. 2014 Apr;42(8):e69
pubmed: 24561809

Auteurs

Dorothea Seiler Vellame (D)

College of Medicine and Health, University of Exeter, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK. ds420@exeter.ac.uk.

Isabel Castanho (I)

College of Medicine and Health, University of Exeter, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK.
Department of Pathology, Beth Israel Deaconess Medical Center, 330 Brookline-Avenue, Boston, Massachusetts, USA.
Harvard Medical School, Boston, Massachusetts, USA.

Aisha Dahir (A)

College of Medicine and Health, University of Exeter, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK.

Jonathan Mill (J)

College of Medicine and Health, University of Exeter, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK. j.mill@exeter.ac.uk.

Eilis Hannon (E)

College of Medicine and Health, University of Exeter, Royal Devon and Exeter Hospital, Exeter, EX2 5DW, UK. e.j.hannon@exeter.ac.uk.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Humans Middle Aged Female Male Surveys and Questionnaires

Classifications MeSH