Exploiting deep transfer learning for the prediction of functional non-coding variants using genomic sequence.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
13 06 2022
Historique:
received: 03 09 2021
revised: 04 03 2022
accepted: 06 04 2022
pubmed: 8 4 2022
medline: 15 11 2022
entrez: 7 4 2022
Statut: ppublish

Résumé

Though genome-wide association studies have identified tens of thousands of variants associated with complex traits and most of them fall within the non-coding regions, they may not be the causal ones. The development of high-throughput functional assays leads to the discovery of experimental validated non-coding functional variants. However, these validated variants are rare due to technical difficulty and financial cost. The small sample size of validated variants makes it less reliable to develop a supervised machine learning model for achieving a whole genome-wide prediction of non-coding causal variants. We will exploit a deep transfer learning model, which is based on convolutional neural network, to improve the prediction for functional non-coding variants (NCVs). To address the challenge of small sample size, the transfer learning model leverages both large-scale generic functional NCVs to improve the learning of low-level features and context-specific functional NCVs to learn high-level features toward the context-specific prediction task. By evaluating the deep transfer learning model on three MPRA datasets and 16 GWAS datasets, we demonstrate that the proposed model outperforms deep learning models without pretraining or retraining. In addition, the deep transfer learning model outperforms 18 existing computational methods in both MPRA and GWAS datasets. https://github.com/lichen-lab/TLVar. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 35389435
pii: 6564688
doi: 10.1093/bioinformatics/btac214
pmc: PMC9890318
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

3164-3172

Subventions

Organisme : NIGMS NIH HHS
ID : R35 GM142701
Pays : United States

Informations de copyright

© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Références

Bioinformatics. 2019 May 1;35(9):1573-1575
pubmed: 30304335
Genome Biol. 2016 Dec 6;17(1):252
pubmed: 27923386
Nature. 2015 Feb 19;518(7539):337-43
pubmed: 25363779
Alzheimers Dement (Amst). 2021 May 14;13(1):e12140
pubmed: 34027015
Brief Bioinform. 2021 Nov 5;22(6):
pubmed: 34021560
Nat Commun. 2019 Nov 20;10(1):5241
pubmed: 31748530
Nucleic Acids Res. 2019 Jan 8;47(D1):D941-D947
pubmed: 30371878
Nat Biotechnol. 2010 Oct;28(10):1045-8
pubmed: 20944595
Gigascience. 2020 Jul 1;9(7):
pubmed: 32649756
Nucleic Acids Res. 2012 Jan;40(Database issue):D1047-54
pubmed: 22139925
Nat Commun. 2018 Dec 5;9(1):5199
pubmed: 30518757
Nature. 2017 Sep 13;549(7671):219-226
pubmed: 28905911
Nat Methods. 2014 Mar;11(3):294-6
pubmed: 24487584
Bioinformatics. 2019 May 1;35(9):1453-1460
pubmed: 30256891
Hum Mutat. 2017 Sep;38(9):1240-1250
pubmed: 28220625
Nucleic Acids Res. 2016 Jan 4;44(D1):D862-8
pubmed: 26582918
Nucleic Acids Res. 2016 Jan 4;44(D1):D126-32
pubmed: 26578589
Sci Rep. 2021 Jun 9;11(1):12183
pubmed: 34108595
Nucleic Acids Res. 2020 Jul 9;48(12):e68
pubmed: 32392348
Int J Mol Sci. 2020 Jan 19;21(2):
pubmed: 31963842
Curr Protoc Bioinformatics. 2012 Sep;Chapter 1:1.13.1-1.13.20
pubmed: 22948725
Cell Syst. 2019 May 22;8(5):380-394.e4
pubmed: 31121115
Nat Genet. 2019 Feb;51(2):335-342
pubmed: 30559490
Bioinformatics. 2016 Sep 15;32(18):2729-36
pubmed: 27273672
Hum Genet. 2020 Oct;139(10):1197-1207
pubmed: 32596782
Bioinformatics. 2020 Mar 1;36(5):1553-1561
pubmed: 31608946
Sci Rep. 2015 May 27;5:10576
pubmed: 26015273
Cell. 2016 Jun 2;165(6):1519-1529
pubmed: 27259153
Genome Med. 2018 Jul 11;10(1):53
pubmed: 29996888
Genome Biol. 2014;15(10):480
pubmed: 25273974
Bioinformatics. 2018 Feb 1;34(3):511-513
pubmed: 28968714
Nat Methods. 2015 Oct;12(10):931-4
pubmed: 26301843
Genome Biol. 2019 Aug 12;20(1):165
pubmed: 31405383
PLoS Comput Biol. 2020 Nov 2;16(11):e1008399
pubmed: 33137098
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
BMC Res Notes. 2017 Oct 30;10(1):530
pubmed: 29084591
Bioinformatics. 2015 May 15;31(10):1536-43
pubmed: 25583119
Nat Methods. 2019 Sep;16(9):875-878
pubmed: 31471617
Biochim Biophys Acta. 2014 Oct;1842(10):1910-1922
pubmed: 24667321
J Vis Exp. 2014 Aug 17;(90):
pubmed: 25177895
Insights Imaging. 2018 Aug;9(4):611-629
pubmed: 29934920
Science. 2004 Oct 22;306(5696):636-40
pubmed: 15499007
Science. 2015 May 8;348(6235):648-60
pubmed: 25954001
Nat Rev Genet. 2020 Aug;21(8):448
pubmed: 32488197
Nat Genet. 2015 Mar;47(3):276-83
pubmed: 25599402

Auteurs

Li Chen (L)

Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.

Ye Wang (Y)

Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.

Fengdi Zhao (F)

Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.

Articles similaires

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Humans Macular Degeneration Mendelian Randomization Analysis Life Style Genome-Wide Association Study

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis
Coal Metagenome Phylogeny Bacteria Genome, Bacterial

Classifications MeSH