MSLP: mRNA subcellular localization predictor based on machine learning techniques.
Localization prediction
Machine learning
RNA
Sequence analysis
Subcellular localization
mRNA
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
22 Mar 2023
22 Mar 2023
Historique:
received:
04
11
2022
accepted:
15
03
2023
entrez:
23
3
2023
pubmed:
24
3
2023
medline:
25
3
2023
Statut:
epublish
Résumé
Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefore, in silico approaches for this purpose are attaining great attention in the RNA community. In this article, we propose MSLP, a machine learning-based method to predict the subcellular localization of mRNA. We propose a novel combination of four types of features representing k-mer, pseudo k-tuple nucleotide composition (PseKNC), physicochemical properties of nucleotides, and 3D representation of sequences based on Z-curve transformation to feed into machine learning algorithm to predict the subcellular localization of mRNAs. Considering the combination of the above-mentioned features, ennsemble-based models achieved state-of-the-art results in mRNA subcellular localization prediction tasks for multiple benchmark datasets. We evaluated the performance of our method in ten subcellular locations, covering cytoplasm, nucleus, endoplasmic reticulum (ER), extracellular region (ExR), mitochondria, cytosol, pseudopodium, posterior, exosome, and the ribosome. Ablation study highlighted k-mer and PseKNC to be more dominant than other features for predicting cytoplasm, nucleus, and ER localizations. On the other hand, physicochemical properties and Z-curve based features contributed the most to ExR and mitochondria detection. SHAP-based analysis revealed the relative importance of features to provide better insights into the proposed approach. We have implemented a Docker container and API for end users to run their sequences on our model. Datasets, the code of API and the Docker are shared for the community in GitHub at: https://github.com/smusleh/MSLP .
Sections du résumé
BACKGROUND
BACKGROUND
Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefore, in silico approaches for this purpose are attaining great attention in the RNA community.
METHODS
METHODS
In this article, we propose MSLP, a machine learning-based method to predict the subcellular localization of mRNA. We propose a novel combination of four types of features representing k-mer, pseudo k-tuple nucleotide composition (PseKNC), physicochemical properties of nucleotides, and 3D representation of sequences based on Z-curve transformation to feed into machine learning algorithm to predict the subcellular localization of mRNAs.
RESULTS
RESULTS
Considering the combination of the above-mentioned features, ennsemble-based models achieved state-of-the-art results in mRNA subcellular localization prediction tasks for multiple benchmark datasets. We evaluated the performance of our method in ten subcellular locations, covering cytoplasm, nucleus, endoplasmic reticulum (ER), extracellular region (ExR), mitochondria, cytosol, pseudopodium, posterior, exosome, and the ribosome. Ablation study highlighted k-mer and PseKNC to be more dominant than other features for predicting cytoplasm, nucleus, and ER localizations. On the other hand, physicochemical properties and Z-curve based features contributed the most to ExR and mitochondria detection. SHAP-based analysis revealed the relative importance of features to provide better insights into the proposed approach.
AVAILABILITY
BACKGROUND
We have implemented a Docker container and API for end users to run their sequences on our model. Datasets, the code of API and the Docker are shared for the community in GitHub at: https://github.com/smusleh/MSLP .
Identifiants
pubmed: 36949389
doi: 10.1186/s12859-023-05232-0
pii: 10.1186/s12859-023-05232-0
pmc: PMC10035125
doi:
Substances chimiques
RNA, Messenger
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
109Commentaires et corrections
Type : ErratumIn
Informations de copyright
© 2023. The Author(s).
Références
J Vis Exp. 2018 May 25;(135):
pubmed: 29889208
Methods. 2017 Aug 15;126:138-148
pubmed: 28579403
Elife. 2017 Dec 14;6:
pubmed: 29239719
Cell. 2009 Feb 20;136(4):719-30
pubmed: 19239891
Nucleic Acids Res. 1991 Nov 25;19(22):6313-7
pubmed: 1956790
Trends Cell Biol. 2009 Sep;19(9):465-74
pubmed: 19716303
Development. 2012 Sep;139(18):3263-76
pubmed: 22912410
Nucleic Acids Res. 2022 Jan 7;50(D1):D333-D339
pubmed: 34551440
Int J Mol Sci. 2020 Oct 01;21(19):
pubmed: 33019721
Cell. 1986 May 9;45(3):407-15
pubmed: 3698103
Noncoding RNA. 2020 Nov 30;6(4):
pubmed: 33266128
Bioinformatics. 2019 Jul 15;35(14):i333-i342
pubmed: 31510698
Nucleic Acids Res. 2020 Jul 2;48(W1):W239-W243
pubmed: 32421834
Nucleic Acids Res. 2021 May 7;49(8):e46
pubmed: 33503258
Brief Bioinform. 2020 May 21;21(3):1047-1057
pubmed: 31067315
Mol Ther. 2021 Aug 4;29(8):2617-2623
pubmed: 33823302
Brief Bioinform. 2021 Sep 2;22(5):
pubmed: 33388743
Brief Funct Genomic Proteomic. 2004 Nov;3(3):240-56
pubmed: 15642187
Nucleic Acids Res. 2017 Jan 4;45(D1):D135-D138
pubmed: 27543076
Bioinformatics. 2015 Apr 15;31(8):1307-9
pubmed: 25504848
Nature. 2007 Dec 13;450(7172):983-90
pubmed: 18075577
Bioinformation. 2006 Oct 07;1(6):197-202
pubmed: 17597888
Cell. 2014 Mar 27;157(1):26-40
pubmed: 24679524
Dev Biol. 1983 Oct;99(2):408-17
pubmed: 6194032
Nucleic Acids Res. 2021 Jun 4;49(10):e60
pubmed: 33660783
Brief Bioinform. 2021 Jan 18;22(1):526-535
pubmed: 31994694
Int J Mol Med. 2014 Apr;33(4):747-62
pubmed: 24452120
BMC Bioinformatics. 2021 Jun 24;22(1):342
pubmed: 34167457
Curr Genomics. 2014 Apr;15(2):78-94
pubmed: 24822026
Methods. 2017 Apr 15;118-119:101-110
pubmed: 27664292
Nat Rev Neurosci. 2012 Apr 13;13(5):308-24
pubmed: 22498899
Bioinformatics. 2004 Mar 22;20(5):673-81
pubmed: 14764563