Composite Hedges Nanopores codec system for rapid and portable DNA data readout with high INDEL-Correction.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
30 Oct 2024
Historique:
received: 11 05 2024
accepted: 11 10 2024
medline: 31 10 2024
pubmed: 31 10 2024
entrez: 31 10 2024
Statut: epublish

Résumé

Reading digital information from highly dense but lightweight DNA medium nowadays relies on time-consuming next-generation sequencing. Nanopore sequencing holds the promise to overcome the efficiency problem, but high indel error rates lead to the requirement of large amount of high quality data for accurate readout. Here we introduce Composite Hedges Nanopores, capable of handling indel rates up to 15.9% and substitution rates up to 7.8%. The overall information density can be doubled from 0.59 to 1.17 by utilizing a degenerated eight-letter alphabet. We demonstrate that sequencing times of 20 and 120 minutes are sufficient for processing representative text and image files, respectively. Moreover, to achieve complete data recovery, it is estimated that text and image data require 4× and 8× physical redundancy of composite strands, respectively. Our codec system excels on both molecular design and equalized dictionary usage, laying a solid foundation approaching to real-time DNA data retrieval and encoding.

Identifiants

pubmed: 39477940
doi: 10.1038/s41467-024-53455-3
pii: 10.1038/s41467-024-53455-3
doi:

Substances chimiques

DNA 9007-49-2

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

9395

Subventions

Organisme : National Natural Science Foundation of China (National Science Foundation of China)
ID : 62171211
Organisme : National Natural Science Foundation of China (National Science Foundation of China)
ID : 32371526
Organisme : National Natural Science Foundation of China (National Science Foundation of China)
ID : 32100021
Organisme : Shenzhen Science and Technology Innovation Commission
ID : JCYJ20220814170440001
Organisme : Shenzhen Science and Technology Innovation Commission
ID : JCYJ20220818100218039
Organisme : Shenzhen Science and Technology Innovation Commission
ID : JCYJ20220530113013030
Organisme : Shenzhen Science and Technology Innovation Commission
ID : JCYJ20230807092459028

Informations de copyright

© 2024. The Author(s).

Références

Tabatabaei, S. K. et al. Expanding the molecular alphabet of DNA-based data storage systems with neural network nanopore readout processing. Nano Lett. 22, 1905–1914 (2022).
doi: 10.1021/acs.nanolett.1c04203 pubmed: 35212544 pmcid: 8915253
Kawabe, H. et al. Enzymatic synthesis and nanopore sequencing of 12-letter supernumerary DNA. Nat. Commun. 14, 6820 (2023).
doi: 10.1038/s41467-023-42406-z pubmed: 37884513 pmcid: 10603101
Church, G. M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science 337, 1628–1628 (2012).
doi: 10.1126/science.1226355 pubmed: 22903519
Ceze, L., Nivala, J. & Strauss, K. Molecular digital data storage using DNA. Nat. Rev. Genet. 20, 456–466 (2019).
doi: 10.1038/s41576-019-0125-3 pubmed: 31068682
Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science. 355, 950–954 (2017).
doi: 10.1126/science.aaj2038 pubmed: 28254941
Luby, M., Shokrollahi, A., Watson, M. & Stockhammer, T. RaptorQ forward error correction scheme for object delivery. IETF RFC 6330 53, 1689–1699 (2013). at.
Press, W. H., Hawkins, J. A., Schaub, J. M., Schaub, J. M. & Finkelstein, I. J. HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints. Proc. Natl. Acad. Sci. USA 117, 18489–18496 (2020).
doi: 10.1073/pnas.2004821117 pubmed: 32675237 pmcid: 7414044
Chen, W. et al. An artificial chromosome for data storage. Natl. Sci. Rev. 8, 1–9 (2021).
doi: 10.1093/nsr/nwab028
Deamer, D., Akeson, M. & Branton, D. Three decades of nanopore sequencing. Nat. Biotechnol. 34, 518–524 (2016).
doi: 10.1038/nbt.3423 pubmed: 27153285 pmcid: 6733523
Van der Verren, S. E. et al. A dual-constriction biological nanopore resolves homonucleotide sequences with high fidelity. Nat. Biotechnol. 38, 1415–1420 (2020).
doi: 10.1038/s41587-020-0570-8 pubmed: 32632300 pmcid: 7610451
Wang, Y., Zhao, Y., Bollas, A., Wang, Y. & Au, K. F. Nanopore sequencing technology, bioinformatics and applications. Nat. Biotechnol. 39, 1348–1365 (2021).
doi: 10.1038/s41587-021-01108-x pubmed: 34750572 pmcid: 8988251
Loose, M., Malla, S. & Stout, M. Real-time selective sequencing using nanopore technology. Nat. Methods 13, 751–754 (2016).
doi: 10.1038/nmeth.3930 pubmed: 27454285 pmcid: 5008457
Payne, A. et al. Readfish enables targeted nanopore sequencing of gigabase-sized genomes. Nat. Biotechnol. 39, 442–450 (2021).
doi: 10.1038/s41587-020-00746-x pubmed: 33257864
Kovaka, S., Fan, Y., Ni, B., Timp, W. & Schatz, M. C. Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED. Nat. Biotechnol. 39, 431–441 (2021).
doi: 10.1038/s41587-020-0731-9 pubmed: 33257863
Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 597–614 (2020).
doi: 10.1038/s41576-020-0236-x pubmed: 32504078 pmcid: 7877196
Hossein TabatabaeiYazdi, S. M., Gabrys, R. & Milenkovic, O. Portable and error-free DNA-based data storage. Sci. Rep. 7, 1–6 (2017).
Organick, L. et al. Random access in large-scale DNA data storage. Nat. Biotechnol. 36, 242–248 (2018).
doi: 10.1038/nbt.4079 pubmed: 29457795
Lee, H. H., Kalhor, R., Goela, N., Bolot, J. & Church, G. M. Terminator-free template-independent enzymatic DNA synthesis for digital information storage. Nat. Commun. 10, 1–12 (2019).
Sun, F. et al. Mobile and self-sustained data storage in an extremophile genomic DNA. Adv. Sci. 10, 2206201 (2023).
doi: 10.1002/advs.202206201
Lopez, R. et al. DNA assembly for nanopore data storage readout. Nat. Commun. 10, 2933 (2019).
doi: 10.1038/s41467-019-10978-4 pubmed: 31270330 pmcid: 6610119
Anavy, L., Vaknin, I., Atar, O., Amit, R. & Yakhini, Z. Data storage in DNA with fewer synthesis cycles using composite DNA letters. Nat. Biotechnol. 37, 1229–1236 (2019).
doi: 10.1038/s41587-019-0240-x pubmed: 31501560
Banal, J. L. et al. Random access DNA memory using Boolean search in an archival file storage system. Nat. Mater. 20, 1272–1280 (2021).
doi: 10.1038/s41563-021-01021-3 pubmed: 34112975
Bögels, B. W. A. et al. DNA storage in thermoresponsive microcapsules for repeated random multiplexed data access. Nat. Nanotechnol. 18, 912–921 (2023).
doi: 10.1038/s41565-023-01377-4 pubmed: 37142708 pmcid: 10427423
Koch, J. et al. A DNA-of-things storage architecture to create materials with embedded memory. Nat. Biotechnol. 38, 39–43 (2020).
doi: 10.1038/s41587-019-0356-z pubmed: 31819259
Pfeiffer, F. et al. Systematic evaluation of error rates and causes in short samples in next-generation sequencing. Sci. Rep. 8, 1–14 (2018).
doi: 10.1038/s41598-018-29325-6
Ping, Z. et al. Towards practical and robust DNA-based data archiving using the yin–yang codec system. Nat. Comput. Sci. 2, 234–242 (2022).
doi: 10.1038/s43588-022-00231-2 pubmed: 38177542
Thomas, C. A. et al. Assessing readability of an 8-letter expanded deoxyribonucleic acid alphabet with nanopores. J. Am. Chem. Soc. 145, 8560–8568 (2023).
Welzel, M. et al. DNA-Aeon provides flexible arithmetic coding for constraint adherence and error correction in DNA storage. Nat. Commun. 14, 628 (2023).
doi: 10.1038/s41467-023-36297-3 pubmed: 36746948 pmcid: 9902613
Doroschak, K. et al. Rapid and robust assembly and decoding of molecular tags with DNA-based nanopore signatures. Nat. Commun. 11, 1–8 (2020).
doi: 10.1038/s41467-020-19151-8
Weilguny, L. et al. Dynamic, adaptive sampling during nanopore sequencing using Bayesian experimental design. Nat. Biotechnol. 41, 1018–1025 (2023).
doi: 10.1038/s41587-022-01580-z pubmed: 36593407 pmcid: 10344778
Grass, R. N. et al. Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angew. Chem. Int. Ed. 54, 2552–2555 (2015).
doi: 10.1002/anie.201411378
Gunter, H. M. et al. Library adaptors with integrated reference controls improve the accuracy and reliability of nanopore sequencing. Nat. Commun. 13, 6437 (2022).
doi: 10.1038/s41467-022-34028-8 pubmed: 36307482 pmcid: 9616880
Xu, C., Zhao, C., Ma, B. & Liu, H. Uncertainties in synthetic DNA-based data storage. Nucleic Acids Res. 49, 5451–5469 (2021).
doi: 10.1093/nar/gkab230 pubmed: 33836076 pmcid: 8191772
Ping, Z. et al. Chamaeleo: an integrated evaluation platform for DNA storage. Synth. Biol. J. 2, 412–427 (2021).
Ren, R. et al. Multiplexed detection of viral antigen and RNA using nanopore sensing and encoded molecular probes. Nat. Commun. 14, 7362 (2023).
doi: 10.1038/s41467-023-43004-9 pubmed: 37963924 pmcid: 10646045
Gunter, H. M. et al. mRNA vaccine quality analysis using RNA sequencing. Nat. Commun. 14, 5663 (2023).
doi: 10.1038/s41467-023-41354-y pubmed: 37735471 pmcid: 10514319
Nahum, Y., Ben-Tolila, E. & Anavy, L. Single-read reconstruction for DNA data storage using transformers. Preprint at https://doi.org/10.48550/arXiv.2109.05478 (2021).
Zhang, K. et al. A nanopore interface for higher bandwidth DNA computing. Nat. Commun. 13, 4904 (2022).
doi: 10.1038/s41467-022-32526-3 pubmed: 35987925 pmcid: 9392746
Lv, H. et al. DNA-based programmable gate arrays for general-purpose DNA computing. Nature 622, 292–300 (2023).
doi: 10.1038/s41586-023-06484-9 pubmed: 37704731
Sun, F. et al. Mobile and self‐sustained data storage in an extremophile genomic DNA. Adv. Sci. 10, 1–14 (2023).
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
doi: 10.1038/nbt.3988 pubmed: 29035372
Edgar, R. C. Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nat. Commun. 13, 6968 (2022).
doi: 10.1038/s41467-022-34630-w pubmed: 36379955 pmcid: 9664440
Zhao, X. & Fan, Q. ysfhtxn/Composite-Hedges-Nanopores: Composite Hedges Nanopores. https://doi.org/10.5281/zenodo.13353187 (2024).

Auteurs

Xuyang Zhao (X)

School of Microelectronics, MOE Engineering Research Center of Integrated Circuits for Next Generation Communications, Southern University of Science and Technology, Shenzhen, China.

Junyao Li (J)

School of Microelectronics, MOE Engineering Research Center of Integrated Circuits for Next Generation Communications, Southern University of Science and Technology, Shenzhen, China.

Qingyuan Fan (Q)

School of Microelectronics, MOE Engineering Research Center of Integrated Circuits for Next Generation Communications, Southern University of Science and Technology, Shenzhen, China.

Jing Dai (J)

School of Microelectronics, MOE Engineering Research Center of Integrated Circuits for Next Generation Communications, Southern University of Science and Technology, Shenzhen, China.

Yanping Long (Y)

Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.

Ronghui Liu (R)

School of Microelectronics, MOE Engineering Research Center of Integrated Circuits for Next Generation Communications, Southern University of Science and Technology, Shenzhen, China.

Jixian Zhai (J)

Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.

Qing Pan (Q)

College of Information Engineering, Zhejiang University of Technology, Hangzhou, China. pqpq@zjut.edu.cn.

Yi Li (Y)

School of Microelectronics, MOE Engineering Research Center of Integrated Circuits for Next Generation Communications, Southern University of Science and Technology, Shenzhen, China. liy37@sustech.edu.cn.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH