DNA-m6A calling and integrated long-read epigenetic and genetic analysis with fibertools.
Journal
Genome research
ISSN: 1549-5469
Titre abrégé: Genome Res
Pays: United States
ID NLM: 9518021
Informations de publication
Date de publication:
07 Jun 2024
07 Jun 2024
Historique:
received:
09
02
2024
accepted:
21
05
2024
medline:
8
6
2024
pubmed:
8
6
2024
entrez:
7
6
2024
Statut:
aheadofprint
Résumé
Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation as well as the identification of exogenously placed DNA N6-methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using single-molecule sequencing, as well as coprocessing single-molecule genetic and epigenetic architectures, is limited by computational demands and a lack of supporting tools. Here, we introduce fibertools, a state-of-the-art toolkit that features a semisupervised convolutional neural network for fast and accurate identification of m6A-marked bases using PacBio single-molecule long-read sequencing, as well as the coprocessing of long-read genetic and epigenetic data produced using either PacBio or Oxford Nanopore sequencing platforms. We demonstrate accurate DNA-m6A identification (>90% precision and recall) along >20 kilobase long DNA molecules with a ~1,000-fold improvement in speed. In addition, we demonstrate that fibertools can readily integrate genetic and epigenetic data at single-molecule resolution, including the seamless conversion between molecular and reference coordinate systems, allowing for accurate genetic and epigenetic analyses of long-read data within structurally and somatically variable genomic regions.
Identifiants
pubmed: 38849157
pii: gr.279095.124
doi: 10.1101/gr.279095.124
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
Published by Cold Spring Harbor Laboratory Press.