Classifying single nucleotide polymorphisms in humans.

Digital genomic signature Gibbs free energy Hybridization Machine learning Nonpathogenic/benign SNP Pathogenic/malign SNP h-Distance

Journal

Molecular genetics and genomics : MGG
ISSN: 1617-4623
Titre abrégé: Mol Genet Genomics
Pays: Germany
ID NLM: 101093320

Informations de publication

Date de publication:
Sep 2021
Historique:
received: 09 01 2021
accepted: 16 06 2021
pubmed: 15 7 2021
medline: 13 8 2021
entrez: 14 7 2021
Statut: ppublish

Résumé

Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation amongst the human population and are key to personalized medicine. New tests are presented to distinguish pathogenic/malign (i.e., likely to contribute to or cause a disease) from nonpathogenic/benign SNPs, regardless of whether they occur in coding (exon) or noncoding (intron) regions in the human genome. The tests are based on the nearest neighbor (NN) model of Gibbs free energy landscapes of DNA hybridization and on deep structural properties of DNA revealed by an approximating metric (the h-distance) in DNA spaces of oligonucleotides of a common size. The quality assessments show that the newly defined PNPG test can classify a SNP with an accuracy about 73% for the required parameters. The best performance among machine learning models is a feed-forward neural network with fivefold cross-validation accuracy of at least 73%. These results may provide valuable tools to solve the SNP classification problem, where tools are lacking, to assess the likelihood of disease causing in unclassified SNPs. These tests highlight the significance of hybridization chemistry in SNPs. They can be applied to further the effectiveness of research in the areas of genomics and metabolomics.

Identifiants

pubmed: 34259913
doi: 10.1007/s00438-021-01805-x
pii: 10.1007/s00438-021-01805-x
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

1161-1173

Informations de copyright

© 2021. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.

Références

Andronescu M, Aguirre-Hernandez R, Condon A, Hoos HH (2003) RNAsoft: a suite of RNA secondary structure prediction and design software tools. Nucleic Acids Res 31(13):3416–3422
doi: 10.1093/nar/gkg612
Andronescu M, Condon A, Hoos HH, Mathews DH, Murphy KP (2007) Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics 23(13):i19–i28
doi: 10.1093/bioinformatics/btm223
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Yeh LSL (2004) UniProt: the universal protein knowledgebase. Nucl Acid Res 32:115–119
doi: 10.1093/nar/gkh131
Cáceres M (2015) Structural variants, much ado about nothing? Brief Funct Genom 14:303–304
doi: 10.1093/bfgp/elv031
Altshuler David L. and 475 more (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073
doi: 10.1038/nature09534
Garzon MH, Bobba KC (2012) A geometric approach to Gibbs energy landscapes and optimal DNA codeword design. In: Stefanovic D, Turberfield A (eds) DNA computing and molecular programming. Springer, Berlin, pp 73–85
doi: 10.1007/978-3-642-32208-2_6
Guo X (2015) Searching genome-wide disease association through SNP data. Dissertation, Georgia State University. https://scholarworks.gsu.edu/cs_diss/101
Hedrick PW (2011) Population genetics of malaria resistance in humans. Heredity 107(4):283–304
Kim S, Misra A (2007) SNP genotyping: technologies and biomedical applications. Annu Rev Biomed Eng 9:289–320
doi: 10.1146/annurev.bioeng.9.060906.152037
Kitts A, Sherry S. (2002). The single nucleotide polymorphism database (dbSNP) of nucleotide sequence variation. The NCBI handbook. McEntyre J, Ostell J, eds. Bethesda, MD: US national center for biotechnology information
Mainali S, Garzon M, Colorado FA. Profiling environmental conditions from DNA. (2020). In: proceedings IWBBIO 2020-Work-conference on bioinformatics and biomedical engineering. I. Rojas et al. (eds.) Lecture notes in bioinformatics 12108, 647–658
Phan V, Garzon MH (2009) On codeword design in metric DNA spaces. Nat Comput 8(3):571
doi: 10.1007/s11047-008-9088-6
Reymond A, Friedli M, Henrichsen CN, Chapot F, Deutsch S, Ucla C, Antonarakis SE (2001) From PREDs and open reading frames to cDNA isolation: revisiting the human chromosome 21 transcription map. Genomics 78(1–2):46–54
doi: 10.1006/geno.2001.6640
Safa A, Omrani MD, Nicknafs F, Komaki A, Taheri M, Ghafouri-Fard S (2020) A single nucleotide polymorphism within molybdenum cofactor sulfurase gene is associated with neuropsychiatric conditions. Front Mol Biosci. https://doi.org/10.3389/fmolb.2020.540375
doi: 10.3389/fmolb.2020.540375 pubmed: 33195404 pmcid: 7542180
Schlötterer C (2004) The evolution of molecular markers—just a matter of fashion? Nat Rev Genet 5(1):63–69
doi: 10.1038/nrg1249
Shah C (2020) A hands-on introduction to data science. Cambridge U press, Cambridge (ISBN: 978-1-108-47244-9)
doi: 10.1017/9781108560412
Sherry ST, Ward M, Sirotkin K (1999) dbSNP—database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res 9(8):677–679
pubmed: 10447503
Sokolov BP (1990) Primer extension technique for the detection of single nucleotide in genomic DNA. Nucleic Acids Res 18(12):3671
doi: 10.1093/nar/18.12.3671
Sun H, Yu G (2019) New insights into the pathogenicity of non-synonymous variants through multi-level analysis. Sci Rep 9(1):1–11
Wu X, Hurst LD (2016) Determinants of the usage of splice-associated cis-motifs predict the distribution of human pathogenic SNPs. Mol Biol Evol 33(2):518–529
doi: 10.1093/molbev/msv251
Xu J, Murphy S L, Kochanek, KD. (2020). Mortality in the United States, 2018. NCHS data brief no. 355.

Auteurs

Shima Azizzadeh-Roodpish (S)

Computer Science, The University of Memphis, Memphis, TN, 38152, USA.

Max H Garzon (MH)

Computer Science, The University of Memphis, Memphis, TN, 38152, USA. mgarzon@memphis.edu.

Sambriddhi Mainali (S)

Computer Science, The University of Memphis, Memphis, TN, 38152, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH