Deep learning models for predicting RNA degradation via dual crowdsourcing.


Journal

ArXiv
ISSN: 2331-8422
Titre abrégé: ArXiv
Pays: United States
ID NLM: 101759493

Informations de publication

Date de publication:
14 Oct 2021
Historique:
revised: 22 04 2022
pubmed: 22 10 2021
medline: 22 10 2021
entrez: 21 10 2021
Statut: epublish

Résumé

Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition ("Stanford OpenVaccine") on Kaggle, involving single-nucleotide resolution measurements on 6043 102-130-nucleotide diverse RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504-1588 nucleotides) with improved accuracy compared to previously published models. Top teams integrated natural language processing architectures and data augmentation techniques with predictions from previous dynamic programming models for RNA secondary structure. These results indicate that such models are capable of representing in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for data set creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales.

Identifiants

pubmed: 34671698
pii: 2110.07531
pmc: PMC8528079
pii:

Types de publication

Preprint

Langues

eng

Subventions

Organisme : NIGMS NIH HHS
ID : R35 GM122579
Pays : United States

Commentaires et corrections

Type : UpdateIn

Références

Mol Ther. 2008 Nov;16(11):1833-40
pubmed: 18797453
Nat Protoc. 2006;1(3):1610-6
pubmed: 17406453
J Mol Biol. 2016 Feb 27;428(5 Pt A):748-757
pubmed: 26902426
Proc Natl Acad Sci U S A. 2019 Nov 26;116(48):24075-24083
pubmed: 31712433
J Pharm Sci. 2021 Mar;110(3):997-1001
pubmed: 33321139
Proc Natl Acad Sci U S A. 2014 Feb 11;111(6):2122-7
pubmed: 24469816
Nucleic Acids Res. 2021 Oct 11;49(18):10604-10617
pubmed: 34520542
Nature. 2006 May 4;441(7089):101-5
pubmed: 16625202
Nat Methods. 2022 Oct;19(10):1234-1242
pubmed: 36192461
N Engl J Med. 2020 Dec 31;383(27):2603-2615
pubmed: 33301246
Cell. 2020 Sep 3;182(5):1271-1283.e16
pubmed: 32795413
Curr Opin Biotechnol. 2022 Feb;73:329-336
pubmed: 34715546
N Engl J Med. 2020 Oct 15;383(16):1544-1555
pubmed: 32722908
Algorithms Mol Biol. 2011 Nov 24;6:26
pubmed: 22115189
bioRxiv. 2021 Feb 19;:
pubmed: 32869022
Genome Med. 2017 Jun 27;9(1):60
pubmed: 28655327
Nat Commun. 2022 Mar 22;13(1):1536
pubmed: 35318324
N Engl J Med. 2021 Feb 4;384(5):403-416
pubmed: 33378609
Methods Mol Biol. 2017;1499:1-11
pubmed: 27987140
Nucleic Acids Res. 2017 Mar 17;45(5):e35
pubmed: 27899588
Int J Pharm. 2021 May 15;601:120586
pubmed: 33839230
Nucleic Acids Res. 2018 Jun 20;46(11):5381-5394
pubmed: 29746666
Methods Mol Biol. 2014;1086:95-117
pubmed: 24136600
Annu Rev Biophys Biomol Struct. 2001;30:457-75
pubmed: 11441810

Auteurs

Hannah K Wayment-Steele (HK)

Department of Chemistry, Stanford University, Stanford, California 94305, USA.
Eterna Massive Open Laboratory.

Wipapat Kladwang (W)

Department of Biochemistry, Stanford University, California 94305, USA.
Eterna Massive Open Laboratory.

Andrew M Watkins (AM)

Department of Biochemistry, Stanford University, California 94305, USA.
Eterna Massive Open Laboratory.

Do Soon Kim (DS)

Department of Biochemistry, Stanford University, California 94305, USA.
Eterna Massive Open Laboratory.

Bojan Tunguz (B)

Department of Biochemistry, Stanford University, California 94305, USA.
NVIDIA Corporation, Santa Clara, California 95051.

Walter Reade (W)

Kaggle, San Francisco, California 94107.

Maggie Demkin (M)

Kaggle, San Francisco, California 94107.

Jonathan Romano (J)

Department of Biochemistry, Stanford University, California 94305, USA.
Eterna Massive Open Laboratory.
Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, New York, 14260, USA.

Roger Wellington-Oguri (R)

Eterna Massive Open Laboratory.

John J Nicol (JJ)

Eterna Massive Open Laboratory.

Jiayang Gao (J)

High-flyer AI, Hangzhou, Zhejiang, China, 310000.

Kazuki Onodera (K)

NVIDIA Corporation, Minato-ku, Tokyo 107-0052, Japan.

Kazuki Fujikawa (K)

DeNA, Shibuya-ku, Tokyo 150-6140, Japan.

Hanfei Mao (H)

Yanfu Investments, Shanghai, China, 200000.

Gilles Vandewiele (G)

IDLab, Ghent University, Technologiepark-Zwijnaarde, Gent, Belgium, B-9052.

Michele Tinti (M)

College of Life Sciences, University of Dundee, Dundee DD1 4HN, United Kingdom.

Bram Steenwinckel (B)

IDLab, Ghent University, Technologiepark-Zwijnaarde, Gent, Belgium, B-9052.

Takuya Ito (T)

Universal Knowledge Inc., Tokyo 150-0013, Japan.

Taiga Noumi (T)

Keyence Corporation, 1-3-14, Higashi-Nakajima, Higashi-Yodogawa-ku, Osaka, 533-8555, Japan.

Shujun He (S)

Department of Chemical Engineering, Texas A&M University, College Station, TX 77843.

Keiichiro Ishi (K)

Rist Inc, Meguro-ku, Tokyo 153-0063, Japan.

Youhan Lee (Y)

Kakao Brain, Seongnam, Gyeonggi-do, Republic of Korea.

Fatih Öztürk (F)

H2O, Istanbul, 3400, Turkey.

Anthony Chiu (A)

Clover Health, Hong Kong, 999077, PRC.

Emin Öztürk (E)

Afiniti, Istanbul, 3400, Turkey.

Karim Amer (K)

Center for Informatics Science, Nile University, Sheikh Zayed, Giza, Egypt, 12588.

Mohamed Fares (M)

National Research Centre, Dokki, Cairo, Egypt, 12622.

Eterna Participants (E)

Eterna Massive Open Laboratory.

Rhiju Das (R)

Department of Biochemistry, Stanford University, California 94305, USA.
Eterna Massive Open Laboratory.
Department of Physics, Stanford University, California 94305, USA.

Classifications MeSH