Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements.
Journal
Translational vision science & technology
ISSN: 2164-2591
Titre abrégé: Transl Vis Sci Technol
Pays: United States
ID NLM: 101595919
Informations de publication
Date de publication:
01 04 2022
01 04 2022
Historique:
entrez:
18
4
2022
pubmed:
19
4
2022
medline:
21
4
2022
Statut:
ppublish
Résumé
Prior studies have demonstrated the significance of specific cis-regulatory variants in retinal disease; however, determining the functional impact of regulatory variants remains a major challenge. In this study, we utilized a machine learning approach, trained on epigenomic data from the adult human retina, to systematically quantify the predicted impact of cis-regulatory variants. We used human retinal DNA accessibility data (ATAC-seq) to determine a set of 18.9k high-confidence, putative cis-regulatory elements. Eighty percent of these elements were used to train a machine learning model utilizing a gapped k-mer support vector machine-based approach. In silico saturation mutagenesis and variant scoring was applied to predict the functional impact of all potential single nucleotide variants within cis-regulatory elements. Impact scores were tested in a 20% hold-out dataset and compared to allele population frequency, phylogenetic conservation, transcription factor (TF) binding motifs, and existing massively parallel reporter assay data. We generated a model that distinguishes between human retinal regulatory elements and negative test sequences with 95% accuracy. Among a hold-out test set of 3.7k human retinal CREs, all possible single nucleotide variants were scored. Variants with negative impact scores correlated with higher phylogenetic conservation of the reference allele, disruption of predicted TF binding motifs, and massively parallel reporter expression. We demonstrated the utility of human retinal epigenomic data to train a machine learning model for the purpose of predicting the impact of non-coding regulatory sequence variants. Our model accurately scored sequences and predicted putative transcription factor binding motifs. This approach has the potential to expedite the characterization of pathogenic non-coding sequence variants in the context of unexplained retinal disease. This workflow and resulting dataset serve as a promising genomic tool to facilitate the clinical prioritization of functionally disruptive non-coding mutations in the retina.
Identifiants
pubmed: 35435921
pii: 2778754
doi: 10.1167/tvst.11.4.16
pmc: PMC9034719
doi:
Substances chimiques
Nucleotides
0
Transcription Factors
0
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
16Références
Nat Biotechnol. 2012 Nov;30(11):1095-106
pubmed: 23138309
Cell. 2013 Mar 14;152(6):1237-51
pubmed: 23498934
Science. 2012 Sep 7;337(6099):1190-5
pubmed: 22955828
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Prog Retin Eye Res. 2010 Sep;29(5):335-75
pubmed: 20362068
Dev Biol. 2004 Jul 1;271(1):109-18
pubmed: 15196954
Cell. 2013 Dec 19;155(7):1521-31
pubmed: 24360275
Adv Exp Med Biol. 2019;1185:359-364
pubmed: 31884638
Nat Genet. 2016 Feb;48(2):134-43
pubmed: 26691988
Nat Cell Biol. 2018 Aug;20(8):900-908
pubmed: 30013107
Bioinformatics. 2011 Apr 1;27(7):1017-8
pubmed: 21330290
Bioinformatics. 2010 Mar 1;26(5):589-95
pubmed: 20080505
Genetics. 2012 Nov;192(3):973-85
pubmed: 22887818
Elife. 2016 Mar 07;5:e11613
pubmed: 26949250
Commun Biol. 2021 Mar 2;4(1):274
pubmed: 33654266
Cell Rep. 2016 Oct 25;17(5):1247-1254
pubmed: 27783940
Chromosome Res. 2020 Mar;28(1):69-85
pubmed: 31776829
Nature. 2020 May;581(7809):434-443
pubmed: 32461654
Science. 1989 Aug 25;245(4920):831-8
pubmed: 2788922
Nat Neurosci. 2011 May;14(5):578-86
pubmed: 21441919
Bioinformatics. 2016 Jul 15;32(14):2196-8
pubmed: 27153584
Nat Rev Genet. 2021 May;22(5):324-336
pubmed: 33442000
Nat Genet. 2019 Apr;51(4):606-610
pubmed: 30742112
Trends Genet. 2020 Nov;36(11):880-891
pubmed: 32741549
Proc Natl Acad Sci U S A. 2020 Apr 21;117(16):9001-9012
pubmed: 32265282
Genome Res. 2013 Sep;23(9):1514-21
pubmed: 23861382
Nat Commun. 2018 Apr 10;9(1):1364
pubmed: 29636475
Gigascience. 2021 Feb 16;10(2):
pubmed: 33590861
Hum Mutat. 2017 Sep;38(9):1251-1258
pubmed: 28120510
G3 (Bethesda). 2012 Sep;2(9):987-1002
pubmed: 22973536
Genome Res. 2018 Aug;28(8):1243-1252
pubmed: 29945882
Nature. 2012 Sep 6;489(7414):57-74
pubmed: 22955616
Bioinformatics. 2005 Oct 15;21(20):3940-1
pubmed: 16096348
Transl Vis Sci Technol. 2018 Jul 18;7(4):6
pubmed: 30034950
Nucleic Acids Res. 2018 Jan 4;46(D1):D252-D259
pubmed: 29140464
Nature. 2012 Feb 05;482(7385):390-4
pubmed: 22307276
Genome Biol. 2008;9(9):R137
pubmed: 18798982
Nat Rev Genet. 2012 Sep;13(9):613-26
pubmed: 22868264
Genome Res. 2002 Jun;12(6):996-1006
pubmed: 12045153
Neuron. 2017 May 3;94(3):550-568.e10
pubmed: 28472656
PLoS Genet. 2020 Sep 1;16(9):e1008934
pubmed: 32870927
Mol Cell. 2010 May 28;38(4):576-89
pubmed: 20513432
Nat Methods. 2012 Feb 28;9(3):215-6
pubmed: 22373907
Nat Methods. 2015 Oct;12(10):931-4
pubmed: 26301843
Proc Natl Acad Sci U S A. 2018 Jul 24;115(30):E7222-E7230
pubmed: 29987030
Genome Res. 2018 Oct;28(10):1520-1531
pubmed: 30158147
Nat Genet. 2015 Aug;47(8):955-61
pubmed: 26075791
Hum Mutat. 2019 Sep;40(9):1280-1291
pubmed: 31106481
PLoS One. 2019 Jun 17;14(6):e0218073
pubmed: 31206543
Epigenetics. 2017 Jul 3;12(7):505-514
pubmed: 28524769
Nat Genet. 2017 Apr;49(4):559-567
pubmed: 28250457
Development. 2020 Feb 5;147(3):
pubmed: 31915147
Elife. 2021 Sep 06;10:
pubmed: 34486522
Dev Cell. 2021 Mar 8;56(5):575-587
pubmed: 33689769
PLoS Comput Biol. 2014 Jul 17;10(7):e1003711
pubmed: 25033408
Proc Natl Acad Sci U S A. 2012 Nov 20;109(47):19498-503
pubmed: 23129659
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Mol Biotechnol. 2008 Feb;38(2):179-83
pubmed: 17943463
Genome Res. 2010 Jan;20(1):110-21
pubmed: 19858363
Genomics Proteomics Bioinformatics. 2021 Aug;19(4):565-577
pubmed: 33581335