TAGOOS: genome-wide supervised learning of non-coding loci associated to complex phenotypes.
Computational Biology
/ methods
Gene Expression Regulation
Genetic Predisposition to Disease
/ genetics
Genome-Wide Association Study
/ methods
Humans
Internet
Linkage Disequilibrium
Phenotype
Polymorphism, Single Nucleotide
Quantitative Trait Loci
/ genetics
Regulatory Sequences, Nucleic Acid
/ genetics
Supervised Machine Learning
Journal
Nucleic acids research
ISSN: 1362-4962
Titre abrégé: Nucleic Acids Res
Pays: England
ID NLM: 0411011
Informations de publication
Date de publication:
22 08 2019
22 08 2019
Historique:
accepted:
18
04
2019
revised:
07
04
2019
received:
29
01
2019
pubmed:
3
5
2019
medline:
4
12
2019
entrez:
3
5
2019
Statut:
ppublish
Résumé
Genome-wide association studies (GWAS) associate single nucleotide polymorphisms (SNPs) to complex phenotypes. Most human SNPs fall in non-coding regions and are likely regulatory SNPs, but linkage disequilibrium (LD) blocks make it difficult to distinguish functional SNPs. Therefore, putative functional SNPs are usually annotated with molecular markers of gene regulatory regions and prioritized with dedicated prediction tools. We integrated associated SNPs, LD blocks and regulatory features into a supervised model called TAGOOS (TAG SNP bOOSting) and computed scores genome-wide. The TAGOOS scores enriched and prioritized unseen associated SNPs with an odds ratio of 4.3 and 3.5 and an area under the curve (AUC) of 0.65 and 0.6 for intronic and intergenic regions, respectively. The TAGOOS score was correlated with the maximal significance of associated SNPs and expression quantitative trait loci (eQTLs) and with the number of biological samples annotated for key regulatory features. Analysis of loci and regions associated to cleft lip and human adult height phenotypes recovered known functional loci and predicted new functional loci enriched in transcriptions factors related to the phenotypes. In conclusion, we trained a supervised model based on associated SNPs to prioritize putative functional regions. The TAGOOS scores, annotations and UCSC genome tracks are available here: https://tagoos.readthedocs.io.
Identifiants
pubmed: 31045203
pii: 5482505
doi: 10.1093/nar/gkz320
pmc: PMC6698643
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e79Informations de copyright
© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
Références
Nature. 2012 Aug 23;488(7412):504-7
pubmed: 22820252
Nature. 2014 Mar 27;507(7493):455-461
pubmed: 24670763
Clin Genet. 2009 Jul;76(1):117-9
pubmed: 19659764
Nat Genet. 2000 Jun;25(2):209-12
pubmed: 10835639
Cell. 2013 Nov 7;155(4):934-47
pubmed: 24119843
Dis Model Mech. 2013 Sep;6(5):1285-91
pubmed: 23720234
Nucleic Acids Res. 2018 Jan 4;46(D1):D252-D259
pubmed: 29140464
Genome Res. 2002 Jun;12(6):996-1006
pubmed: 12045153
Nat Biotechnol. 2011 Jan;29(1):24-6
pubmed: 21221095
Nat Rev Genet. 2008 Jun;9(6):465-76
pubmed: 18463664
Nat Genet. 2017 Apr;49(4):618-624
pubmed: 28288115
Bioinformatics. 2015 Aug 15;31(16):2601-6
pubmed: 25886982
Nat Genet. 2014 Nov;46(11):1173-86
pubmed: 25282103
Am J Hum Genet. 2016 Sep 1;99(3):595-606
pubmed: 27569544
Am J Hum Genet. 2017 Jul 6;101(1):5-22
pubmed: 28686856
Nat Genet. 2016 Feb;48(2):214-20
pubmed: 26727659
Am J Hum Genet. 2015 Mar 5;96(3):397-411
pubmed: 25704602
Nucleic Acids Res. 2016 Jan 4;44(D1):D869-76
pubmed: 26615194
Nat Genet. 2008 Jul;40(7):897-903
pubmed: 18552846
Bioinformatics. 2011 Mar 15;27(6):870-1
pubmed: 21325299
Nat Rev Mol Cell Biol. 2015 Mar;16(3):144-54
pubmed: 25650801
Nucleic Acids Res. 2018 May 4;46(8):e47
pubmed: 29390075
Nat Methods. 2014 Mar;11(3):294-6
pubmed: 24487584
Hum Mutat. 2017 Sep;38(9):1259-1265
pubmed: 28224684
Nucleic Acids Res. 2012 Jan;40(Database issue):D930-4
pubmed: 22064851
Nucleic Acids Res. 2012 Oct;40(18):e139
pubmed: 22684628
Nat Genet. 2010 Jan;42(1):24-6
pubmed: 20023658
Gigascience. 2015 Feb 25;4:7
pubmed: 25722852
Nucleic Acids Res. 2016 Jan 4;44(D1):D862-8
pubmed: 26582918
Proc Natl Acad Sci U S A. 2009 Jun 9;106(23):9362-7
pubmed: 19474294
Nucleic Acids Res. 2015 Jan;43(Database issue):D799-804
pubmed: 25428361
Bioinformatics. 2012 Oct 1;28(19):2520-2
pubmed: 22908215
Hum Mutat. 2017 Sep;38(9):1251-1258
pubmed: 28120510
Nucleic Acids Res. 2000 Jan 1;28(1):352-5
pubmed: 10592272
Nucleic Acids Res. 2013 Jan;41(2):827-41
pubmed: 23221638
Nature. 2015 Feb 19;518(7539):317-30
pubmed: 25693563
Nucleic Acids Res. 2017 Jul 27;45(13):e119
pubmed: 28591841
Genome Biol. 2016 Dec 6;17(1):252
pubmed: 27923386
Hum Genomics. 2014 Jun 30;8:11
pubmed: 24980617
Nat Genet. 2013 Jun;45(6):580-5
pubmed: 23715323
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7
pubmed: 15608251
Genome Biol. 2018 Oct 25;19(1):173
pubmed: 30359302
Nat Rev Genet. 2012 Sep;13(9):613-26
pubmed: 22868264
Nucleic Acids Res. 2018 Jan 4;46(D1):D267-D275
pubmed: 29126285
Genome Biol. 2014;15(10):480
pubmed: 25273974
Bioinformatics. 2015 Mar 1;31(5):761-3
pubmed: 25338716
Nucleic Acids Res. 2017 Jan 4;45(D1):D896-D901
pubmed: 27899670
Nat Methods. 2015 Oct;12(10):931-4
pubmed: 26301843
Nat Genet. 2014 Mar;46(3):310-5
pubmed: 24487276
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Nucleic Acids Res. 2015 Jul 1;43(W1):W50-6
pubmed: 25904632
Science. 2012 Sep 7;337(6099):1190-5
pubmed: 22955828
Curr Protoc Bioinformatics. 2014 Sep 08;47:11.12.1-34
pubmed: 25199790
Nucleic Acids Res. 2016 Jul 8;44(W1):W90-7
pubmed: 27141961
Nucleic Acids Res. 2015 Feb 27;43(4):e27
pubmed: 25477382
Nat Genet. 2010 Mar;42(3):255-9
pubmed: 20118932
Nucleic Acids Res. 2018 Jan 4;46(D1):D836-D842
pubmed: 29092072
Development. 2017 Oct 15;144(20):3646-3658
pubmed: 29042476
PLoS Genet. 2007 Aug;3(8):e136
pubmed: 17708682
Cell. 2015 Apr 23;161(3):661-673
pubmed: 25910213
PLoS Genet. 2010 Apr 01;6(4):e1000888
pubmed: 20369019
Nat Genet. 2015 Mar;47(3):276-83
pubmed: 25599402
Bioinformatics. 2016 Feb 15;32(4):542-8
pubmed: 26504140