Identification of cell type-specific methylation signals in bulk whole genome bisulfite sequencing data.


Journal

Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660

Informations de publication

Date de publication:
01 07 2020
Historique:
received: 15 10 2019
accepted: 29 05 2020
entrez: 2 7 2020
pubmed: 2 7 2020
medline: 7 7 2021
Statut: epublish

Résumé

The traditional approach to studying the epigenetic mechanism CpG methylation in tissue samples is to identify regions of concordant differential methylation spanning multiple CpG sites (differentially methylated regions). Variation limited to single or small numbers of CpGs has been assumed to reflect stochastic processes. To test this, we developed software, Cluster-Based analysis of CpG methylation (CluBCpG), and explored variation in read-level CpG methylation patterns in whole genome bisulfite sequencing data. Analysis of both human and mouse whole genome bisulfite sequencing datasets reveals read-level signatures associated with cell type and cell type-specific biological processes. These signatures, which are mostly orthogonal to classical differentially methylated regions, are enriched at cell type-specific enhancers and allow estimation of proportional cell composition in synthetic mixtures and improved prediction of gene expression. In tandem, we developed a machine learning algorithm, Precise Read-Level Imputation of Methylation (PReLIM), to increase coverage of existing whole genome bisulfite sequencing datasets by imputing CpG methylation states on individual sequencing reads. PReLIM both improves CluBCpG coverage and performance and enables identification of novel differentially methylated regions, which we independently validate. Our data indicate that, rather than stochastic variation, read-level CpG methylation patterns in tissue whole genome bisulfite sequencing libraries reflect cell type. Accordingly, these new computational tools should lead to an improved understanding of epigenetic regulation by DNA methylation.

Sections du résumé

BACKGROUND
The traditional approach to studying the epigenetic mechanism CpG methylation in tissue samples is to identify regions of concordant differential methylation spanning multiple CpG sites (differentially methylated regions). Variation limited to single or small numbers of CpGs has been assumed to reflect stochastic processes. To test this, we developed software, Cluster-Based analysis of CpG methylation (CluBCpG), and explored variation in read-level CpG methylation patterns in whole genome bisulfite sequencing data.
RESULTS
Analysis of both human and mouse whole genome bisulfite sequencing datasets reveals read-level signatures associated with cell type and cell type-specific biological processes. These signatures, which are mostly orthogonal to classical differentially methylated regions, are enriched at cell type-specific enhancers and allow estimation of proportional cell composition in synthetic mixtures and improved prediction of gene expression. In tandem, we developed a machine learning algorithm, Precise Read-Level Imputation of Methylation (PReLIM), to increase coverage of existing whole genome bisulfite sequencing datasets by imputing CpG methylation states on individual sequencing reads. PReLIM both improves CluBCpG coverage and performance and enables identification of novel differentially methylated regions, which we independently validate.
CONCLUSIONS
Our data indicate that, rather than stochastic variation, read-level CpG methylation patterns in tissue whole genome bisulfite sequencing libraries reflect cell type. Accordingly, these new computational tools should lead to an improved understanding of epigenetic regulation by DNA methylation.

Identifiants

pubmed: 32605651
doi: 10.1186/s13059-020-02065-5
pii: 10.1186/s13059-020-02065-5
pmc: PMC7329512
doi:

Types de publication

Evaluation Study Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

156

Subventions

Organisme : NIDDK NIH HHS
ID : 1R01DK111522
Pays : United States
Organisme : NIDDK NIH HHS
ID : R01 DK111831
Pays : United States

Références

PLoS One. 2016 Jan 11;11(1):e0146498
pubmed: 26752725
Mol Aspects Med. 2018 Feb;59:62-69
pubmed: 28923782
Comput Biol Med. 2012 Apr;42(4):408-13
pubmed: 22209047
Nature. 2015 Jul 9;523(7559):212-6
pubmed: 26030523
Nat Mater. 2019 May;18(5):422-427
pubmed: 30478452
Nat Genet. 2012 Nov;44(11):1207-14
pubmed: 23064413
PLoS Genet. 2010 Dec 23;6(12):e1001252
pubmed: 21203497
Nat Methods. 2016 Mar;13(3):229-232
pubmed: 26752769
Nucleic Acids Res. 2015 Dec 2;43(21):e141
pubmed: 26184873
Genome Res. 2013 Sep;23(9):1541-53
pubmed: 23804401
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Arthritis Rheum. 2008 Sep;58(9):2686-93
pubmed: 18759290
Genome Biol. 2015 Jan 24;16:14
pubmed: 25616342
Nat Neurosci. 2019 Feb;22(2):307-316
pubmed: 30643296
Gastroenterology. 2010 May;138(5):1898-908
pubmed: 20038433
Cell. 2016 Nov 17;167(5):1145-1149
pubmed: 27863232
Nat Biotechnol. 2010 Oct;28(10):1045-8
pubmed: 20944595
Nat Methods. 2014 Aug;11(8):817-820
pubmed: 25042786
Science. 2013 Aug 9;341(6146):1237905
pubmed: 23828890
Proc Natl Acad Sci U S A. 2010 Jan 26;107 Suppl 1:1757-64
pubmed: 20080672
Nucleic Acids Res. 2017 May 19;45(9):5100-5111
pubmed: 28168293
Bioinformatics. 2016 May 15;32(10):1446-53
pubmed: 26819470
Nature. 2015 Feb 19;518(7539):317-30
pubmed: 25693563
J Steroid Biochem Mol Biol. 2017 Jul;171:209-217
pubmed: 28412323
Genome Biol. 2019 Jun 3;20(1):105
pubmed: 31155008
Cell Stem Cell. 2016 Dec 1;19(6):808-822
pubmed: 27867036
Sci Rep. 2016 Sep 02;6:32298
pubmed: 27585862
Genome Biol. 2017 Apr 11;18(1):67
pubmed: 28395661
Nat Commun. 2019 Dec 2;10(1):5364
pubmed: 31792207
Genome Biol. 2017 Feb 21;18(1):38
pubmed: 28222791
Nat Rev Genet. 2018 Mar;19(3):129-147
pubmed: 29129922
Nat Biotechnol. 2010 May;28(5):495-501
pubmed: 20436461
Cell Rep. 2015 Mar 3;10(8):1386-97
pubmed: 25732828
FEBS Lett. 2005 Aug 15;579(20):4302-8
pubmed: 16051225
Bioinformatics. 2011 Jun 1;27(11):1571-2
pubmed: 21493656
Nat Struct Mol Biol. 2013 Mar;20(3):274-81
pubmed: 23463312
Nat Rev Genet. 2012 May 29;13(7):484-92
pubmed: 22641018
Nat Genet. 2017 Apr;49(4):635-642
pubmed: 28263317
BMC Genomics. 2019 Feb 1;20(1):102
pubmed: 30709331
Nat Genet. 2017 May;49(5):719-729
pubmed: 28346445

Auteurs

C Anthony Scott (CA)

Department of Pediatrics, Baylor College of Medicine, USDA/ARS Children's Nutrition Research Center, Houston, TX, USA.

Jack D Duryea (JD)

Department of Pediatrics, Baylor College of Medicine, USDA/ARS Children's Nutrition Research Center, Houston, TX, USA.

Harry MacKay (H)

Department of Pediatrics, Baylor College of Medicine, USDA/ARS Children's Nutrition Research Center, Houston, TX, USA.

Maria S Baker (MS)

Department of Pediatrics, Baylor College of Medicine, USDA/ARS Children's Nutrition Research Center, Houston, TX, USA.

Eleonora Laritsky (E)

Department of Pediatrics, Baylor College of Medicine, USDA/ARS Children's Nutrition Research Center, Houston, TX, USA.

Chathura J Gunasekara (CJ)

Department of Pediatrics, Baylor College of Medicine, USDA/ARS Children's Nutrition Research Center, Houston, TX, USA.

Cristian Coarfa (C)

Department of Molecular & Cell Biology, Baylor College of Medicine, Houston, TX, USA. coarfa@bcm.edu.
Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA. coarfa@bcm.edu.

Robert A Waterland (RA)

Department of Pediatrics, Baylor College of Medicine, USDA/ARS Children's Nutrition Research Center, Houston, TX, USA. waterland@bcm.edu.
Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA. waterland@bcm.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH