Predicting target genes of non-coding regulatory variants with IRT.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
15 08 2020
Historique:
received: 18 12 2019
revised: 15 03 2020
accepted: 17 04 2020
pubmed: 25 4 2020
medline: 20 2 2021
entrez: 25 4 2020
Statut: ppublish

Résumé

Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies. Codes and data used in this work are available at https://github.com/miaecle/eQTL_Trees. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 32330225
pii: 5824790
doi: 10.1093/bioinformatics/btaa254
pmc: PMC7575052
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

4440-4448

Subventions

Organisme : NHGRI NIH HHS
ID : K99 HG009677
Pays : United States
Organisme : NIA NIH HHS
ID : P30 AG059307
Pays : United States
Organisme : NHGRI NIH HHS
ID : R00 HG009677
Pays : United States
Organisme : NIMHD NIH HHS
ID : R21 MD012867
Pays : United States

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Références

Nat Genet. 2016 May;48(5):488-96
pubmed: 27064255
Genome Res. 2001 Nov;11(11):1854-60
pubmed: 11691850
Nucleic Acids Res. 2016 Jan 4;44(D1):D126-32
pubmed: 26578589
Nature. 2017 Oct 11;550(7675):204-213
pubmed: 29022597
J Invest Dermatol. 2016 May;136(5):930-937
pubmed: 26829030
Bioinformatics. 2017 Dec 15;33(24):3895-3901
pubmed: 28961785
Nat Commun. 2016 Jul 18;7:12048
pubmed: 27424798
Am J Hum Genet. 2014 Sep 4;95(3):245-56
pubmed: 25192044
Nat Protoc. 2017 Dec;12(12):2478-2492
pubmed: 29120462
Bioinformatics. 2010 May 15;26(10):1340-7
pubmed: 20385727
Hum Mol Genet. 2015 Jun 1;24(11):3296-303
pubmed: 25724930
Database (Oxford). 2017 Jan 1;2017:
pubmed: 28605766
Am J Hum Genet. 2016 Sep 1;99(3):595-606
pubmed: 27569544
Nucleic Acids Res. 2019 Dec 2;47(21):e134
pubmed: 31511901
Cell Rep. 2020 May 19;31(7):107663
pubmed: 32433972
Ann Hum Genet. 2009 Mar;73(2):160-70
pubmed: 19208107
Nat Genet. 2016 Feb;48(2):214-20
pubmed: 26727659
Cell. 2013 Nov 21;155(5):1022-33
pubmed: 24267888
Cell. 2014 Dec 18;159(7):1665-80
pubmed: 25497547
Nat Genet. 2017 Oct;49(10):1428-1436
pubmed: 28869592
Nat Methods. 2014 Mar;11(3):294-6
pubmed: 24487584
Hum Mol Genet. 2005 Feb 1;14(3):421-7
pubmed: 15590696
N Engl J Med. 2015 Jun 4;372(23):2235-42
pubmed: 26014595
Hum Mol Genet. 2003 Oct 1;12(19):2411-5
pubmed: 12915446
Proc Natl Acad Sci U S A. 2009 Jun 9;106(23):9362-7
pubmed: 19474294
PLoS Genet. 2011 Jul;7(7):e1002144
pubmed: 21811411
Nat Genet. 2018 Aug;50(8):1171-1179
pubmed: 30013180
Genet Med. 2015 May;17(5):405-24
pubmed: 25741868
Nature. 2017 Oct 11;550(7675):239-243
pubmed: 29022581
J Invest Dermatol. 2015 Jul;135(7):1735-1742
pubmed: 25705849
Nature. 2010 Oct 28;467(7319):1099-103
pubmed: 20981099
Cell. 2007 May 18;129(4):823-37
pubmed: 17512414
Nucleic Acids Res. 2019 Jan 8;47(D1):D886-D894
pubmed: 30371827
Genome Res. 2012 Sep;22(9):1760-74
pubmed: 22955987
Nat Methods. 2015 Oct;12(10):931-4
pubmed: 26301843
Genome Biol. 2017 Mar 16;18(1):52
pubmed: 28302177
Nat Genet. 2014 Mar;46(3):310-5
pubmed: 24487276
Bioinformatics. 2015 May 15;31(10):1536-43
pubmed: 25583119
Nat Genet. 2019 Dec;51(12):1664-1669
pubmed: 31784727
Mol Cell Biol. 2004 Jun;24(12):5475-84
pubmed: 15169908
Science. 2015 May 8;348(6235):648-60
pubmed: 25954001
Nat Genet. 2017 Apr;49(4):618-624
pubmed: 28288115
Crit Rev Biochem Mol Biol. 2015;50(6):550-73
pubmed: 26446758

Auteurs

Zhenqin Wu (Z)

Department of Chemistry, Stanford University, CA 94305, USA.
Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, 94305 CA, USA.

Nilah M Ioannidis (NM)

Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, 94305 CA, USA.

James Zou (J)

Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, 94305 CA, USA.
Chan-Zuckerberg Biohub, San Francisco, 94158 CA, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH