Predicting mean ribosome load for 5'UTR of any length using deep learning.
Journal
PLoS computational biology
ISSN: 1553-7358
Titre abrégé: PLoS Comput Biol
Pays: United States
ID NLM: 101238922
Informations de publication
Date de publication:
05 2021
05 2021
Historique:
received:
14
07
2020
accepted:
19
04
2021
revised:
20
05
2021
pubmed:
11
5
2021
medline:
15
9
2021
entrez:
10
5
2021
Statut:
epublish
Résumé
The 5' untranslated region plays a key role in regulating mRNA translation and consequently protein abundance. Therefore, accurate modeling of 5'UTR regulatory sequences shall provide insights into translational control mechanisms and help interpret genetic variants. Recently, a model was trained on a massively parallel reporter assay to predict mean ribosome load (MRL)-a proxy for translation rate-directly from 5'UTR sequence with a high degree of accuracy. However, this model is restricted to sequence lengths investigated in the reporter assay and therefore cannot be applied to the majority of human sequences without a substantial loss of information. Here, we introduced frame pooling, a novel neural network operation that enabled the development of an MRL prediction model for 5'UTRs of any length. Our model shows state-of-the-art performance on fixed length randomized sequences, while offering better generalization performance on longer sequences and on a variety of translation-related genome-wide datasets. Variant interpretation is demonstrated on a 5'UTR variant of the gene HBB associated with beta-thalassemia. Frame pooling could find applications in other bioinformatics predictive tasks. Moreover, our model, released open source, could help pinpoint pathogenic genetic variants.
Identifiants
pubmed: 33970899
doi: 10.1371/journal.pcbi.1008982
pii: PCOMPBIOL-D-20-01257
pmc: PMC8136849
doi:
Substances chimiques
5' Untranslated Regions
0
RNA, Messenger
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e1008982Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
Bioinformatics. 2017 Jul 15;33(14):i234-i242
pubmed: 28881981
Hemoglobin. 1991;15(1-2):67-76
pubmed: 1717406
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276
Nature. 2017 Jul 26;547(7664):E19-E20
pubmed: 28748932
Proc Natl Acad Sci U S A. 1986 May;83(9):2850-4
pubmed: 3458245
Nat Commun. 2016 May 24;7:11663
pubmed: 27216465
Mol Syst Biol. 2014 Aug 28;10:748
pubmed: 25170020
Br J Haematol. 2004 Jan;124(2):224-31
pubmed: 14687034
Nat Biotechnol. 2019 Jul;37(7):803-809
pubmed: 31267113
Nature. 2011 May 19;473(7347):337-42
pubmed: 21593866
Nat Rev Mol Cell Biol. 2010 Feb;11(2):113-27
pubmed: 20094052
Nat Genet. 1999 Jan;21(1):128-32
pubmed: 9916806
Mol Cell. 2014 Oct 2;56(1):104-15
pubmed: 25263593
Nucleic Acids Res. 1987 Oct 26;15(20):8125-48
pubmed: 3313277
Nature. 2014 May 29;509(7502):582-7
pubmed: 24870543
Genome Biol. 2002;3(3):REVIEWS0004
pubmed: 11897027
Mol Syst Biol. 2019 Feb 18;15(2):e8513
pubmed: 30777893
Nucleic Acids Res. 2018 Jan 25;46(2):985-994
pubmed: 29228265
Mol Syst Biol. 2015 Aug 07;11(8):825
pubmed: 26253569
J Comput Biol. 2005 Jul-Aug;12(6):702-18
pubmed: 16108712
Proc Natl Acad Sci U S A. 2009 May 5;106(18):7507-12
pubmed: 19372376
RNA Biol. 2016 Oct 2;13(10):927-933
pubmed: 27442807
Science. 2009 Apr 10;324(5924):218-23
pubmed: 19213877
Nat Genet. 2021 Mar;53(3):354-366
pubmed: 33603233
Mol Syst Biol. 2010 Aug 24;6:400
pubmed: 20739923
Proc Natl Acad Sci U S A. 1990 Nov;87(21):8301-5
pubmed: 2236042
Cell. 2016 Apr 21;165(3):535-50
pubmed: 27104977
Bioinformatics. 2000 Sep;16(9):799-807
pubmed: 11108702
Science. 2016 Jan 15;351(6270):
pubmed: 26816383
Genome Res. 2015 Nov;25(11):1610-21
pubmed: 26297486
Nat Rev Genet. 2019 Jul;20(7):389-403
pubmed: 30971806
Wiley Interdiscip Rev RNA. 2018 Jul;9(4):e1474
pubmed: 29582564
Nat Commun. 2016 Apr 04;7:11194
pubmed: 27041671
Annu Rev Biochem. 2014;83:779-812
pubmed: 24499181
Cell. 1978 Dec;15(4):1109-23
pubmed: 215319
Elife. 2016 Jan 06;5:
pubmed: 26735365
Proc Int Conf Intell Syst Mol Biol. 1997;5:226-33
pubmed: 9322041
PLoS Comput Biol. 2017 May 8;13(5):e1005535
pubmed: 28481885
Nat Biotechnol. 2019 Jun;37(6):592-600
pubmed: 31138913
Bioinformatics. 2019 Jul 15;35(14):i173-i182
pubmed: 31510661
Genome Res. 2010 Jan;20(1):110-21
pubmed: 19858363
Bioinformatics. 2018 Apr 15;34(8):1261-1269
pubmed: 29155928
Elife. 2015 Jan 26;4:e03971
pubmed: 25621764
PLoS Comput Biol. 2016 Oct 21;12(10):e1005170
pubmed: 27768687