Predicting mean ribosome load for 5'UTR of any length using deep learning.


Journal

PLoS computational biology
ISSN: 1553-7358
Titre abrégé: PLoS Comput Biol
Pays: United States
ID NLM: 101238922

Informations de publication

Date de publication:
05 2021
Historique:
received: 14 07 2020
accepted: 19 04 2021
revised: 20 05 2021
pubmed: 11 5 2021
medline: 15 9 2021
entrez: 10 5 2021
Statut: epublish

Résumé

The 5' untranslated region plays a key role in regulating mRNA translation and consequently protein abundance. Therefore, accurate modeling of 5'UTR regulatory sequences shall provide insights into translational control mechanisms and help interpret genetic variants. Recently, a model was trained on a massively parallel reporter assay to predict mean ribosome load (MRL)-a proxy for translation rate-directly from 5'UTR sequence with a high degree of accuracy. However, this model is restricted to sequence lengths investigated in the reporter assay and therefore cannot be applied to the majority of human sequences without a substantial loss of information. Here, we introduced frame pooling, a novel neural network operation that enabled the development of an MRL prediction model for 5'UTRs of any length. Our model shows state-of-the-art performance on fixed length randomized sequences, while offering better generalization performance on longer sequences and on a variety of translation-related genome-wide datasets. Variant interpretation is demonstrated on a 5'UTR variant of the gene HBB associated with beta-thalassemia. Frame pooling could find applications in other bioinformatics predictive tasks. Moreover, our model, released open source, could help pinpoint pathogenic genetic variants.

Identifiants

pubmed: 33970899
doi: 10.1371/journal.pcbi.1008982
pii: PCOMPBIOL-D-20-01257
pmc: PMC8136849
doi:

Substances chimiques

5' Untranslated Regions 0
RNA, Messenger 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e1008982

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Bioinformatics. 2017 Jul 15;33(14):i234-i242
pubmed: 28881981
Hemoglobin. 1991;15(1-2):67-76
pubmed: 1717406
Neural Comput. 1997 Nov 15;9(8):1735-80
pubmed: 9377276
Nature. 2017 Jul 26;547(7664):E19-E20
pubmed: 28748932
Proc Natl Acad Sci U S A. 1986 May;83(9):2850-4
pubmed: 3458245
Nat Commun. 2016 May 24;7:11663
pubmed: 27216465
Mol Syst Biol. 2014 Aug 28;10:748
pubmed: 25170020
Br J Haematol. 2004 Jan;124(2):224-31
pubmed: 14687034
Nat Biotechnol. 2019 Jul;37(7):803-809
pubmed: 31267113
Nature. 2011 May 19;473(7347):337-42
pubmed: 21593866
Nat Rev Mol Cell Biol. 2010 Feb;11(2):113-27
pubmed: 20094052
Nat Genet. 1999 Jan;21(1):128-32
pubmed: 9916806
Mol Cell. 2014 Oct 2;56(1):104-15
pubmed: 25263593
Nucleic Acids Res. 1987 Oct 26;15(20):8125-48
pubmed: 3313277
Nature. 2014 May 29;509(7502):582-7
pubmed: 24870543
Genome Biol. 2002;3(3):REVIEWS0004
pubmed: 11897027
Mol Syst Biol. 2019 Feb 18;15(2):e8513
pubmed: 30777893
Nucleic Acids Res. 2018 Jan 25;46(2):985-994
pubmed: 29228265
Mol Syst Biol. 2015 Aug 07;11(8):825
pubmed: 26253569
J Comput Biol. 2005 Jul-Aug;12(6):702-18
pubmed: 16108712
Proc Natl Acad Sci U S A. 2009 May 5;106(18):7507-12
pubmed: 19372376
RNA Biol. 2016 Oct 2;13(10):927-933
pubmed: 27442807
Science. 2009 Apr 10;324(5924):218-23
pubmed: 19213877
Nat Genet. 2021 Mar;53(3):354-366
pubmed: 33603233
Mol Syst Biol. 2010 Aug 24;6:400
pubmed: 20739923
Proc Natl Acad Sci U S A. 1990 Nov;87(21):8301-5
pubmed: 2236042
Cell. 2016 Apr 21;165(3):535-50
pubmed: 27104977
Bioinformatics. 2000 Sep;16(9):799-807
pubmed: 11108702
Science. 2016 Jan 15;351(6270):
pubmed: 26816383
Genome Res. 2015 Nov;25(11):1610-21
pubmed: 26297486
Nat Rev Genet. 2019 Jul;20(7):389-403
pubmed: 30971806
Wiley Interdiscip Rev RNA. 2018 Jul;9(4):e1474
pubmed: 29582564
Nat Commun. 2016 Apr 04;7:11194
pubmed: 27041671
Annu Rev Biochem. 2014;83:779-812
pubmed: 24499181
Cell. 1978 Dec;15(4):1109-23
pubmed: 215319
Elife. 2016 Jan 06;5:
pubmed: 26735365
Proc Int Conf Intell Syst Mol Biol. 1997;5:226-33
pubmed: 9322041
PLoS Comput Biol. 2017 May 8;13(5):e1005535
pubmed: 28481885
Nat Biotechnol. 2019 Jun;37(6):592-600
pubmed: 31138913
Bioinformatics. 2019 Jul 15;35(14):i173-i182
pubmed: 31510661
Genome Res. 2010 Jan;20(1):110-21
pubmed: 19858363
Bioinformatics. 2018 Apr 15;34(8):1261-1269
pubmed: 29155928
Elife. 2015 Jan 26;4:e03971
pubmed: 25621764
PLoS Comput Biol. 2016 Oct 21;12(10):e1005170
pubmed: 27768687

Auteurs

Alexander Karollus (A)

Department of Informatics, Technical University of Munich, Garching, Germany.

Žiga Avsec (Ž)

Department of Informatics, Technical University of Munich, Garching, Germany.
Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, Munich, Germany.

Julien Gagneur (J)

Department of Informatics, Technical University of Munich, Garching, Germany.
Institute of Human Genetics, Technical University of Munich, Munich, Germany.
Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH