Deep structure of DNA for genomic analysis.


Journal

Human molecular genetics
ISSN: 1460-2083
Titre abrégé: Hum Mol Genet
Pays: England
ID NLM: 9208958

Informations de publication

Date de publication:
21 02 2022
Historique:
received: 22 06 2021
revised: 06 09 2021
accepted: 07 09 2021
pubmed: 12 9 2021
medline: 28 4 2022
entrez: 11 9 2021
Statut: ppublish

Résumé

Recent advances in next-generation sequencing, deep networks and other bioinformatic tools have enabled us to mine huge amount of genomic information about living organisms in the post-microarray era. However, these tools do not explicitly factor in the role of the underlying DNA biochemistry (particularly, DNA hybridization) essential to life processes. Here, we focus more precisely on the role that DNA hybridization plays in determining properties of biological organisms at the macro-level. We illustrate its role with solutions to challenging problems in human disease. These solutions are made possible by novel structural properties of DNA hybridization landscapes revealed by a metric model of oligonucleotides of a common length that makes them reminiscent of some planets in our solar system, particularly Earth and Saturn. They allow a judicious selection of so-called noncrosshybridizing (nxh) bases that offer substantial reduction of DNA sequences of arbitrary length into a few informative features. The quality assessment of the information extracted by them is high because of their very low Shannon Entropy, i.e. they minimize the degree of uncertainty in hybridization that makes results on standard microarrays irreproducible. For example, SNP classification (pathogenic/non-pathogenic) and pathogen identification can be solved with high sensitivity (~77%/100%) and specificity (~92%/100%, respectively) for combined taxa on a sample of over 264 fully coding sequences in whole bacterial genomes and fungal mitochondrial genomes using machine learning (ML) models. These methods can be applied to several other interesting research questions that could be addressed with similar genomic analyses.

Identifiants

pubmed: 34508577
pii: 6368512
doi: 10.1093/hmg/ddab272
doi:

Substances chimiques

DNA 9007-49-2

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

576-586

Informations de copyright

© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Auteurs

Max Garzon (M)

The University of Memphis, Computer Science, Memphis, TN 38152, USA.

Sambriddhi Mainali (S)

The University of Memphis, Computer Science, Memphis, TN 38152, USA.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH