MSLP: mRNA subcellular localization predictor based on machine learning techniques.

Localization prediction Machine learning RNA Sequence analysis Subcellular localization mRNA

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
22 Mar 2023
Historique:
received: 04 11 2022
accepted: 15 03 2023
entrez: 23 3 2023
pubmed: 24 3 2023
medline: 25 3 2023
Statut: epublish

Résumé

Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefore, in silico approaches for this purpose are attaining great attention in the RNA community. In this article, we propose MSLP, a machine learning-based method to predict the subcellular localization of mRNA. We propose a novel combination of four types of features representing k-mer, pseudo k-tuple nucleotide composition (PseKNC), physicochemical properties of nucleotides, and 3D representation of sequences based on Z-curve transformation to feed into machine learning algorithm to predict the subcellular localization of mRNAs. Considering the combination of the above-mentioned features, ennsemble-based models achieved state-of-the-art results in mRNA subcellular localization prediction tasks for multiple benchmark datasets. We evaluated the performance of our method  in ten subcellular locations, covering cytoplasm, nucleus, endoplasmic reticulum (ER), extracellular region (ExR), mitochondria, cytosol, pseudopodium, posterior, exosome, and the ribosome. Ablation study highlighted k-mer and PseKNC to be more dominant than other features for predicting cytoplasm, nucleus, and ER localizations. On the other hand, physicochemical properties and Z-curve based features contributed the most to ExR and mitochondria detection. SHAP-based analysis revealed the relative importance of features to provide better insights into the proposed approach. We have implemented a Docker container and API for end users to run their sequences on our model. Datasets, the code of API and the Docker are shared for the community in GitHub at: https://github.com/smusleh/MSLP .

Sections du résumé

BACKGROUND BACKGROUND
Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefore, in silico approaches for this purpose are attaining great attention in the RNA community.
METHODS METHODS
In this article, we propose MSLP, a machine learning-based method to predict the subcellular localization of mRNA. We propose a novel combination of four types of features representing k-mer, pseudo k-tuple nucleotide composition (PseKNC), physicochemical properties of nucleotides, and 3D representation of sequences based on Z-curve transformation to feed into machine learning algorithm to predict the subcellular localization of mRNAs.
RESULTS RESULTS
Considering the combination of the above-mentioned features, ennsemble-based models achieved state-of-the-art results in mRNA subcellular localization prediction tasks for multiple benchmark datasets. We evaluated the performance of our method  in ten subcellular locations, covering cytoplasm, nucleus, endoplasmic reticulum (ER), extracellular region (ExR), mitochondria, cytosol, pseudopodium, posterior, exosome, and the ribosome. Ablation study highlighted k-mer and PseKNC to be more dominant than other features for predicting cytoplasm, nucleus, and ER localizations. On the other hand, physicochemical properties and Z-curve based features contributed the most to ExR and mitochondria detection. SHAP-based analysis revealed the relative importance of features to provide better insights into the proposed approach.
AVAILABILITY BACKGROUND
We have implemented a Docker container and API for end users to run their sequences on our model. Datasets, the code of API and the Docker are shared for the community in GitHub at: https://github.com/smusleh/MSLP .

Identifiants

pubmed: 36949389
doi: 10.1186/s12859-023-05232-0
pii: 10.1186/s12859-023-05232-0
pmc: PMC10035125
doi:

Substances chimiques

RNA, Messenger 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

109

Commentaires et corrections

Type : ErratumIn

Informations de copyright

© 2023. The Author(s).

Références

J Vis Exp. 2018 May 25;(135):
pubmed: 29889208
Methods. 2017 Aug 15;126:138-148
pubmed: 28579403
Elife. 2017 Dec 14;6:
pubmed: 29239719
Cell. 2009 Feb 20;136(4):719-30
pubmed: 19239891
Nucleic Acids Res. 1991 Nov 25;19(22):6313-7
pubmed: 1956790
Trends Cell Biol. 2009 Sep;19(9):465-74
pubmed: 19716303
Development. 2012 Sep;139(18):3263-76
pubmed: 22912410
Nucleic Acids Res. 2022 Jan 7;50(D1):D333-D339
pubmed: 34551440
Int J Mol Sci. 2020 Oct 01;21(19):
pubmed: 33019721
Cell. 1986 May 9;45(3):407-15
pubmed: 3698103
Noncoding RNA. 2020 Nov 30;6(4):
pubmed: 33266128
Bioinformatics. 2019 Jul 15;35(14):i333-i342
pubmed: 31510698
Nucleic Acids Res. 2020 Jul 2;48(W1):W239-W243
pubmed: 32421834
Nucleic Acids Res. 2021 May 7;49(8):e46
pubmed: 33503258
Brief Bioinform. 2020 May 21;21(3):1047-1057
pubmed: 31067315
Mol Ther. 2021 Aug 4;29(8):2617-2623
pubmed: 33823302
Brief Bioinform. 2021 Sep 2;22(5):
pubmed: 33388743
Brief Funct Genomic Proteomic. 2004 Nov;3(3):240-56
pubmed: 15642187
Nucleic Acids Res. 2017 Jan 4;45(D1):D135-D138
pubmed: 27543076
Bioinformatics. 2015 Apr 15;31(8):1307-9
pubmed: 25504848
Nature. 2007 Dec 13;450(7172):983-90
pubmed: 18075577
Bioinformation. 2006 Oct 07;1(6):197-202
pubmed: 17597888
Cell. 2014 Mar 27;157(1):26-40
pubmed: 24679524
Dev Biol. 1983 Oct;99(2):408-17
pubmed: 6194032
Nucleic Acids Res. 2021 Jun 4;49(10):e60
pubmed: 33660783
Brief Bioinform. 2021 Jan 18;22(1):526-535
pubmed: 31994694
Int J Mol Med. 2014 Apr;33(4):747-62
pubmed: 24452120
BMC Bioinformatics. 2021 Jun 24;22(1):342
pubmed: 34167457
Curr Genomics. 2014 Apr;15(2):78-94
pubmed: 24822026
Methods. 2017 Apr 15;118-119:101-110
pubmed: 27664292
Nat Rev Neurosci. 2012 Apr 13;13(5):308-24
pubmed: 22498899
Bioinformatics. 2004 Mar 22;20(5):673-81
pubmed: 14764563

Auteurs

Saleh Musleh (S)

College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.

Mohammad Tariqul Islam (MT)

Computer Science Department, Southern Connecticut State University, New Haven, CT, USA.

Rizwan Qureshi (R)

College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.

Nehad M Alajez (NM)

Translational Cancer and Immunity Center (TCIC), Qatar Biomedical Research Institute (QBRI), Hamad Bin Khalifa University, Doha, Qatar.
College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar.

Tanvir Alam (T)

College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar. talam@hbku.edu.qa.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis
Humans Endoribonucleases RNA, Messenger RNA Caps Gene Expression Regulation

Classifications MeSH