DNA 5-methylcytosine detection and methylation phasing using PacBio circular consensus sequencing.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
08 07 2023
08 07 2023
Historique:
received:
20
11
2022
accepted:
22
06
2023
medline:
10
7
2023
pubmed:
9
7
2023
entrez:
8
7
2023
Statut:
epublish
Résumé
Long single-molecular sequencing technologies, such as PacBio circular consensus sequencing (CCS) and nanopore sequencing, are advantageous in detecting DNA 5-methylcytosine in CpGs (5mCpGs), especially in repetitive genomic regions. However, existing methods for detecting 5mCpGs using PacBio CCS are less accurate and robust. Here, we present ccsmeth, a deep-learning method to detect DNA 5mCpGs using CCS reads. We sequence polymerase-chain-reaction treated and M.SssI-methyltransferase treated DNA of one human sample using PacBio CCS for training ccsmeth. Using long (≥10 Kb) CCS reads, ccsmeth achieves 0.90 accuracy and 0.97 Area Under the Curve on 5mCpG detection at single-molecule resolution. At the genome-wide site level, ccsmeth achieves >0.90 correlations with bisulfite sequencing and nanopore sequencing using only 10× reads. Furthermore, we develop a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, and then sequence a Chinese family trio to validate it. ccsmeth and ccsmethphase can be robust and accurate tools for detecting DNA 5-methylcytosines.
Identifiants
pubmed: 37422489
doi: 10.1038/s41467-023-39784-9
pii: 10.1038/s41467-023-39784-9
pmc: PMC10329642
doi:
Substances chimiques
5-Methylcytosine
6R795CQT4H
DNA
9007-49-2
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
4054Informations de copyright
© 2023. The Author(s).
Références
Elife. 2022 Jul 05;11:
pubmed: 35787786
Nat Commun. 2021 Jun 8;12(1):3438
pubmed: 34103501
Epigenetics Chromatin. 2019 Oct 8;12(1):60
pubmed: 31594537
Nat Biotechnol. 2023 Feb;41(2):232-238
pubmed: 36050551
Nat Commun. 2016 Nov 08;7:13316
pubmed: 27824329
Epigenetics Chromatin. 2015 Jul 21;8:24
pubmed: 26195987
Genome Biol. 2021 Sep 14;22(1):268
pubmed: 34521442
Nucleic Acids Res. 2022 Jan 7;50(D1):D27-D38
pubmed: 34718731
Science. 2022 Apr;376(6588):44-53
pubmed: 35357919
Science. 2022 Feb 4;375(6580):515-522
pubmed: 35113693
Genome Res. 2021 Jan 19;:
pubmed: 33468551
Genome Biol. 2021 Feb 22;22(1):68
pubmed: 33618748
Genome Res. 2021 Jun 17;:
pubmed: 34140313
Nat Biotechnol. 2019 Apr;37(4):424-429
pubmed: 30804537
Genome Res. 2014 Apr;24(4):554-69
pubmed: 24402520
PLoS Genet. 2012 Jun;8(6):e1002750
pubmed: 22761581
Bioinformatics. 2019 Nov 1;35(22):4586-4595
pubmed: 30994904
Nat Rev Mol Cell Biol. 2019 Oct;20(10):590-607
pubmed: 31399642
Proc Natl Acad Sci U S A. 1992 Mar 1;89(5):1827-31
pubmed: 1542678
Genome Biol. 2020 Feb 7;21(1):30
pubmed: 32033565
Nat Methods. 2021 Nov;18(11):1322-1332
pubmed: 34725481
J Pathol. 2007 Feb;211(3):261-8
pubmed: 17177177
Nat Rev Genet. 2014 Oct;15(10):647-61
pubmed: 25159599
Genome Biol. 2021 Oct 27;22(1):299
pubmed: 34706745
Nat Commun. 2023 Jul 8;14(1):4054
pubmed: 37422489
Nucleic Acids Res. 2021 Aug 20;49(14):e81
pubmed: 34019650
Nat Commun. 2023 May 29;14(1):3090
pubmed: 37248219
Epigenomics. 2018 Jul;10(7):941-954
pubmed: 29962238
Bioinformatics. 2016 May 15;32(10):1446-53
pubmed: 26819470
Science. 2022 Apr;376(6588):eabl4178
pubmed: 35357911
J Appl Physiol (1985). 2010 Aug;109(2):586-97
pubmed: 20448029
Nat Biotechnol. 2019 Oct;37(10):1155-1162
pubmed: 31406327
PLoS Comput Biol. 2013;9(3):e1002935
pubmed: 23516341
Bioinformatics. 2016 Oct 1;32(19):2911-9
pubmed: 27318202
Proc Natl Acad Sci U S A. 2021 Feb 2;118(5):
pubmed: 33495335
Science. 2022 Apr;376(6588):eabj6965
pubmed: 35357917
Am J Hum Genet. 2016 Sep 1;99(3):555-566
pubmed: 27569549
Genome Res. 2002 Jun;12(6):996-1006
pubmed: 12045153
F1000Res. 2016 Jun 23;5:1479
pubmed: 27429743
Genome Biol. 2021 Oct 18;22(1):295
pubmed: 34663425
Science. 2022 Apr;376(6588):eabk3112
pubmed: 35357925
Nat Biotechnol. 2017 Apr 11;35(4):316-319
pubmed: 28398311
Genomics Proteomics Bioinformatics. 2021 Aug;19(4):578-583
pubmed: 34400360
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D590-8
pubmed: 16381938
Genome Biol. 2020 Mar 3;21(1):54
pubmed: 32127008
Cell Genom. 2022 Dec 21;3(1):100233
pubmed: 36777186
Essays Biochem. 2019 Dec 20;63(6):639-648
pubmed: 31755932
Sci Data. 2016 Jun 07;3:160025
pubmed: 27271295
Nat Rev Genet. 2011 Nov 29;13(1):36-46
pubmed: 22124482
Nat Methods. 2010 Jun;7(6):461-5
pubmed: 20453866
Bioinformatics. 2011 Jun 1;27(11):1571-2
pubmed: 21493656
Nat Methods. 2017 Apr;14(4):407-410
pubmed: 28218898
Nat Genet. 2018 Nov;50(11):1542-1552
pubmed: 30349119
Nat Methods. 2022 Dec;19(12):1590-1598
pubmed: 36357692
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Genome Biol. 2021 Dec 6;22(1):332
pubmed: 34872606