A robust deep learning approach for identification of RNA 5-methyluridine sites.
Deep-learning
Physicochemical properties
Principal component analysis
RNA 5-methyluridine
RNA modifications
Transcript RNA
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
28 Oct 2024
28 Oct 2024
Historique:
received:
16
05
2024
accepted:
10
10
2024
medline:
28
10
2024
pubmed:
28
10
2024
entrez:
28
10
2024
Statut:
epublish
Résumé
RNA 5-methyluridine (m5U) sites play a significant role in understanding RNA modifications, which influence numerous biological processes such as gene expression and cellular functioning. Consequently, the identification of m5U sites can play a vital role in the integrity, structure, and function of RNA molecules. Therefore, this study introduces GRUpred-m5U, a novel deep learning-based framework based on a gated recurrent unit in mature RNA and full transcript RNA datasets. We used three descriptor groups: nucleic acid composition, pseudo nucleic acid composition, and physicochemical properties, which include five feature extraction methods ENAC, Kmer, DPCP, DPCP type 2, and PseDNC. Initially, we aggregated all the feature extraction methods and created a new merged set. Three hybrid models were developed employing deep-learning methods and evaluated through 10-fold cross-validation with seven evaluation metrics. After a comprehensive evaluation, the GRUpred-m5U model outperformed the other applied models, obtaining 98.41% and 96.70% accuracy on the two datasets, respectively. To our knowledge, the proposed model outperformed all the existing state-of-the-art technology. The proposed supervised machine learning model was evaluated using unsupervised machine learning techniques such as principal component analysis (PCA), and it was observed that the proposed method provided a valid performance for identifying m5U. Considering its multi-layered construction, the GRUpred-m5U model has tremendous potential for future applications in the biological industry. The model, which consisted of neurons processing complicated input, excelled at pattern recognition and produced reliable results. Despite its greater size, the model obtained accurate results, essential in detecting m5U.
Identifiants
pubmed: 39465261
doi: 10.1038/s41598-024-76148-9
pii: 10.1038/s41598-024-76148-9
doi:
Substances chimiques
RNA
63231-63-0
Uridine
WHI7HQ7H85
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
25688Informations de copyright
© 2024. The Author(s).
Références
Carlile, T. M., Rojas-Duran, M. F. & Gilbert, W. V. Pseudo-Seq: genome-wide detection of pseudouridine modifications in RNA. In Methods in enzymology (Vol. 560, pp. 219–245). Academic Press. (2015). https://doi.org/10.1016/bs.mie.2015.03.011
Li, S. & Mason, C. E. The pivotal regulatory landscape of RNA modifications. Annu. Rev. Genom. Hum. Genet. 15, 127–150. https://doi.org/10.1146/annurev-genom-090413-025405 (2014).
doi: 10.1146/annurev-genom-090413-025405
Boccaletto, P. et al. MODOMICS: a database of RNA modification pathways. 2021 update. Nucleic Acids Res. 50 (D1). https://doi.org/10.1093/nar/gkab1083 (2022).
Saletore, Y. et al. The birth of the Epitranscriptome: deciphering the function of RNA modifications. Genome Biol. 13, 1–12. https://doi.org/10.1186/gb-2012-13-10-175 (2012).
doi: 10.1186/gb-2012-13-10-175
Xiong, Q. & Zhang, Y. Small RNA modifications: regulatory molecules and potential applications. J Hematol Oncol. 16(1), 1–24. https://doi.org/10.1186/s13045-023-01466-w (2023).
doi: 10.1186/s13045-023-01466-w
Desrosiers, R., Friderici, K. & Rottman, F. Identification of methylated nucleosides in messenger RNA from Novikoff hepatoma cells. Proc Natl Acad Sci. 71(10), 3971–3975. https://doi.org/10.1073/pnas.71.10.3971 (1974).
doi: 10.1073/pnas.71.10.3971
pubmed: 4372599
pmcid: 434308
Oerum, S., Meynier, V., Catala, M. & Tisné, C. A comprehensive review of m6A/m6Am RNA methyltransferase structures. Nucleic Acids Res. 49 (13), 7239–7255. https://doi.org/10.1093/nar/gkab378 (2021).
doi: 10.1093/nar/gkab378
pubmed: 34023900
pmcid: 8287941
Alarcón, C. R., Lee, H., Goodarzi, H., Halberg, N. & Tavazoie, S. F. N 6-methyladenosine marks primary microRNAs for processing. Nature. 519 (7544), 482–485. https://doi.org/10.1038/nature14281 (2015).
doi: 10.1038/nature14281
pubmed: 25799998
pmcid: 4475635
Bujnicki, J. M., Feder, M., Ayres, C. L. & Redman, K. L. Sequence–structure–function studies of tRNA: m5C methyltransferase Trm4p and its relationship to DNA: m5C and RNA: m5U methyltransferases. Nucleic Acids Res. 32 (8), 2453–2463. https://doi.org/10.1093/nar/gkh564 (2004).
doi: 10.1093/nar/gkh564
pubmed: 15121902
pmcid: 419452
Urbonavičius, J., Jäger, G. & Björk, G. R. Amino acid residues of the Escherichia coli tRNA (m5U54) methyltransferase (TrmA) critical for stability, covalent binding of tRNA and enzymatic activity. Nucleic Acids Res. 35 (10), 3297–3305. https://doi.org/10.1093/nar/gkm205 (2007).
doi: 10.1093/nar/gkm205
pubmed: 17459887
pmcid: 1904294
Powell, C. A. & Minczuk, M. TRMT2B is responsible for both tRNA and rRNA m5U-methylation in human mitochondria. RNA Biol. 17 (4), 451–462. https://doi.org/10.1080/15476286.2020.1712544 (2020).
doi: 10.1080/15476286.2020.1712544
pubmed: 31948311
pmcid: 7237155
Pereira, M. et al. m5U54 tRNA hypomodification by lack of TRMT2A drives the generation of tRNA-derived small RNAs. Int. J. Mol. Sci. 22 (6), 2941. https://doi.org/10.3390/ijms22062941 (2021).
doi: 10.3390/ijms22062941
pubmed: 33799331
pmcid: 8001983
Carter, J. M. et al. FICC-Seq: a method for enzyme-specified profiling of methyl-5-uridine in cellular RNA. Nucleic Acids Res. 47 (19). https://doi.org/10.1093/nar/gkz658 (2019).
Nordlund, M. E., JOHANSSON, J. M., von Pawel-Rammingen, U. & BYSTROeM, A. S. Identification of the TRM2 gene encoding the tRNA (m5U54) methyltransferase of Saccharomyces cerevisiae. Rna. 6 (6), 844–860. https://doi.org/10.1017/S1355838200992422 (2000).
doi: 10.1017/S1355838200992422
pubmed: 10864043
pmcid: 1369962
Ranaei-Siadat, E. et al. RNA-methyltransferase TrmA is a dual-specific enzyme responsible for C5-methylation of uridine in both tmRNA and tRNA. RNA Biol. 10 (4), 572–578. https://doi.org/10.4161/rna.24327 (2013).
doi: 10.4161/rna.24327
pubmed: 23603891
pmcid: 3710363
Gu, X., Ofengand, J. & Santi, D. V. In vitro methylation of Escherichia coli 16S rRNA by tRNA (m5U54)-methyltransferase. Biochemistry. 33 (8), 2255–2261. https://doi.org/10.1021/bi00174a036 (1994).
doi: 10.1021/bi00174a036
pubmed: 8117682
Mathoux, J., Henshall, D. C. & Brennan, G. P. Regulatory mechanisms of the RNA modification m6A and significance in brain function in health and disease. Front. Cell. Neurosci. 15, 671932. https://doi.org/10.3389/fncel.2021.671932 (2021).
doi: 10.3389/fncel.2021.671932
pubmed: 34093133
pmcid: 8170084
Livneh, I., Moshitch-Moshkovitz, S., Amariglio, N., Rechavi, G. & Dominissini, D. The m6A epitranscriptome: transcriptome plasticity in brain development and function. Nat. Rev. Neurosci. 21 (1), 36–51. https://doi.org/10.1038/s41583-019-0244-z (2020).
doi: 10.1038/s41583-019-0244-z
pubmed: 31804615
Zhang, M., Zhai, Y., Zhang, S., Dai, X. & Li, Z. Roles of N6-Methyladenosine (m6A) in stem cell fate decisions and early embryonic development in mammals. Front. Cell. Dev. Biology. 8, 782. https://doi.org/10.3389/fcell.2020.00782 (2020).
doi: 10.3389/fcell.2020.00782
Delaunay, S. & Frye, M. RNA modifications regulating cell fate in cancer. Nat. Cell Biol. 21 (5), 552–559. https://doi.org/10.1038/s41556-019-0319-0 (2019).
doi: 10.1038/s41556-019-0319-0
pubmed: 31048770
Liang, W., Lin, Z., Du, C., Qiu, D. & Zhang, Q. mRNA modification orchestrates cancer stem cell fate decisions. Mol. Cancer. 19(1), 1–12. https://doi.org/10.1186/s12943-020-01166-w (2020).
doi: 10.1186/s12943-020-01166-w
Wang, Y. et al. Identification of tRNA nucleoside modification genes critical for stress response and development in rice and Arabidopsis. BMC Plant Biol. 17 (1), 1–15. https://doi.org/10.1186/s12870-017-1206-0 (2017).
doi: 10.1186/s12870-017-1206-0
Jiang, J. et al. m5UPred: a web server for the prediction of RNA 5-methyluridine sites from sequences. Mol. Therapy-Nucleic Acids. 22, 742–747. https://doi.org/10.1016/j.omtn.2020.09.031 (2020).
doi: 10.1016/j.omtn.2020.09.031
Feng, P. & Chen, W. iRNA-m5U: a sequence based predictor for identifying 5-methyluridine modification sites in saccharomyces cerevisiae. Methods. 203, 28–31. https://doi.org/10.1016/j.ymeth.2021.04.013 (2022).
doi: 10.1016/j.ymeth.2021.04.013
pubmed: 33882361
Li, Z., Mao, J., Huang, D., Song, B. & Meng, J. RNADSN: transfer-learning 5-Methyluridine (m5U) modification on mRNAs from common features of tRNA. Int. J. Mol. Sci. 23 (21), 13493. https://doi.org/10.3390/ijms232113493 (2022).
doi: 10.3390/ijms232113493
pubmed: 36362279
pmcid: 9655583
Yu, L. et al. Evaluation and development of deep neural networks for RNA 5-Methyluridine classifications using autoBioSeqpy. Front. Microbiol. 14, 1175925. https://doi.org/10.3389/fmicb.2023.1175925 (2023).
doi: 10.3389/fmicb.2023.1175925
pubmed: 37275146
pmcid: 10232852
Jing, R. et al. autoBioSeqpy: a deep learning tool for the classification of biological sequences. J. Chem. Inf. Model. 60 (8), 3755–3764. https://doi.org/10.1021/acs.jcim.0c00409 (2020).
doi: 10.1021/acs.jcim.0c00409
pubmed: 32786512
Ao, C., Ye, X., Sakurai, T., Zou, Q. & Yu, L. m5U-SVM: identification of RNA 5-methyluridine modification sites based on multi-view features of physicochemical features and distributed representation. BMC Biol. 21 (1). https://doi.org/10.1186/s12915-023-01596-0 (2023).
Akbar, S., Zou, Q., Raza, A. & Alarfaj, F. K. iAFPs-Mv-BiTCN: Predicting antifungal peptides using self-attention transformer embedding and transform evolutionary based multi-view features with bidirectional temporal convolutional networks. Art Intell Med. 151, 102860 (2024).
doi: 10.1016/j.artmed.2024.102860
Akbar, S., Raza, A. & Zou, Q. Deepstacked-AVPs: predicting antiviral peptides using tri-segment evolutionary profile and word embedding based multi-perspective features with deep stacking model. BMC Bioinform. 25 (1), 102. https://doi.org/10.1186/s12859-024-05726-5 (2024).
doi: 10.1186/s12859-024-05726-5
Ullah, M., Akbar, S., Raza, A. & Zou, Q. DeepAVP-TPPred: identification of antiviral peptides using transformed image-based localized descriptors and binary tree growth algorithm. Bioinformatics. 40 (5), btae305. https://doi.org/10.1093/bioinformatics/btae305 (2024).
doi: 10.1093/bioinformatics/btae305
pubmed: 38710482
pmcid: 11256913
Raza, A. et al. AIPs-SnTCN: Predicting anti-inflammatory peptides using fastText and transformer encoder-based hybrid word embedding with self-normalized temporal convolutional networks. J. Chem. Inf. Model. 63 (21), 6537–6554. https://doi.org/10.1021/acs.jcim.3c01563 (2023).
doi: 10.1021/acs.jcim.3c01563
pubmed: 37905969
Akbar, S. et al. pAtbP-EnC: identifying anti-tubercular peptides using multi-feature representation and genetic algorithm based deep ensemble model. IEEE Access. 11, 137099–137114. https://doi.org/10.1109/ACCESS.2023.3321100 (2023).
doi: 10.1109/ACCESS.2023.3321100
Khan, S. et al. Sequence based model using deep neural network and hybrid features for identification of 5-hydroxymethylcytosine modification. Sci. Rep. 14 (1), 9116. https://doi.org/10.1038/s41598-024-59777-y (2024).
doi: 10.1038/s41598-024-59777-y
pubmed: 38643305
Naeem, M. & Qiyas, M. Deep intelligent predictive model for the identification of diabetes. AIMS Math.8 (7), 16446–16462. https://doi.org/10.3934/math.2023840 (2023).
doi: 10.3934/math.2023840
Aurangzeb, K. DBSCAN-based energy users clustering for performance enhancement of deep learning model. J. Intell. Fuzzy Syst. 46 (3), 5555–5573. https://doi.org/10.3233/JIFS-235873 (2024).
Khan, S. et al. Enhancing sumoylation site prediction: A deep neural network with discriminative features. Life. 13(11), 2153 (2023).
doi: 10.3390/life13112153
pubmed: 38004293
pmcid: 10672286
Abbas, Z., Tayara, H. & Chong, K. T. ENet-6 mA: identification of 6 mA modification sites in plant genomes using ElasticNet and neural networks. Int. J. Mol. Sci. 23 (15), 8314. https://doi.org/10.3390/ijms23158314 (2022).
doi: 10.3390/ijms23158314
pubmed: 35955447
pmcid: 9369089
Abbas, Z., Rehman, M. U., Tayara, H. & Chong, K. T. ORI-Explorer: a unified cell-specific tool for origin of replication sites prediction by feature fusion. Bioinformatics. 39 (11), btad664. https://doi.org/10.1093/bioinformatics/btad664 (2023).
doi: 10.1093/bioinformatics/btad664
pubmed: 37929975
pmcid: 10639035
Khan, S., Khan, M., Iqbal, N., Khan, S. A. & Chou, K. C. Prediction of piRNAs and their function based on discriminative intelligent model using hybrid features into Chou’s PseKNC. Chemometr. Intell. Lab. Syst. 203, 104056. https://doi.org/10.1016/j.chemolab.2020.104056 (2020).
doi: 10.1016/j.chemolab.2020.104056
Khan, S., Khan, M., Iqbal, N., Rahman, M. A. A. & Karim, M. K. A. Deep-PiRNA: bi-layered prediction model for PIWI-interacting RNA using discriminative features. Comput. Mater. Contin. 72, 2243–2258 (2022).
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 28 (23), 3150–3152. https://doi.org/10.1093/bioinformatics/bts565 (2012).
doi: 10.1093/bioinformatics/bts565
pubmed: 23060610
pmcid: 3516142
Chen, Z. et al. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief. Bioinform. 21 (3), 1047–1057. https://doi.org/10.1093/bib/bbz041 (2020).
doi: 10.1093/bib/bbz041
pubmed: 31067315
Lee, D., Karchin, R. & Beer, M. A. Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res.21 (12), 2167–2180. https://doi.org/10.1101/gr.121905.111 (2011). http://www.genome.org/cgi/doi/
doi: 10.1101/gr.121905.111
pubmed: 21875935
pmcid: 3227105
Manavalan, B. et al. 4mCpred-EL: an ensemble learning framework for identification of DNA N4-methylcytosine sites in the mouse genome. Cells. 8 (11), 1332. https://doi.org/10.3390/cells8111332 (2019).
doi: 10.3390/cells8111332
pubmed: 31661923
pmcid: 6912380
Liu, B., Liu, F., Fang, L., Wang, X. & Chou, K. C. repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. Bioinformatics. 31 (8), 1307–1309. https://doi.org/10.1093/bioinformatics/btu820 (2015).
doi: 10.1093/bioinformatics/btu820
pubmed: 25504848
Liu, B. et al. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res. 43 (W1), W65–W71. https://doi.org/10.1093/nar/gkv458 (2015).
doi: 10.1093/nar/gkv458
pubmed: 25958395
pmcid: 4489303
Chen, Z. et al. iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization. Nucleic Acids Res.49 (10), e60–e60. https://doi.org/10.1093/nar/gkab122 (2021).
doi: 10.1093/nar/gkab122
pubmed: 33660783
pmcid: 8191785
Umakantha, N. A New Approach to Probability Theory with reference to statistics and statistical physics. J. Mod. Phys. 7 (09), 989. https://doi.org/10.4236/jmp.2016.79090 (2016).
doi: 10.4236/jmp.2016.79090
Radhika, C. & Priya, N. Prediction of learning disability of the children using adaptive effective feature Engineering techniques. J. Posit. School Psychol. 6 (5), 2768–2783. https://doi.org/10.1002/9781118445112.stat00365.pub2 (2022).
Basith, S., Manavalan, B., Shin, H., Lee, G. & T. and Machine intelligence in peptide therapeutics: a next-generation tool for rapid disease screening. Med. Res. Rev. 40 (4), 1276–1314. https://doi.org/10.1002/med.21658 (2020).
doi: 10.1002/med.21658
pubmed: 31922268
Xu, Z., Wang, X., Meng, J., Zhang, L. & Song, B. m5U-GEPred: prediction of RNA 5-methyluridine sites based on sequence-derived and graph embedding features. Front Microbiol. 14. https://doi.org/10.3389/fmicb.2023.1277099 (2023).
Wang, Y. et al. RNAincoder: a deep learning-based encoder for RNA and RNA-associated interaction. Nucleic Acids Res. 51 (W1), W509–W519. https://doi.org/10.1093/nar/gkad404 (2023).
Khan, S., AlQahtani, S. A., Noor, S. & Ahmad, N. PSSM-Sumo: deep learning based intelligent model for prediction of sumoylation sites using discriminative features. BMC Bioinform. 25 (1), 284. https://doi.org/10.1186/s12859-024-05917-0 (2024).
doi: 10.1186/s12859-024-05917-0
Huang, Y., He, N., Chen, Y., Chen, Z. & Li, L. BERMP: a cross-species classifier for predicting m6A sites by integrating a deep learning algorithm and a random forest approach. Int J Biol Sci. 14(12), 1669. https://doi.org/10.7150/ijbs.27819 (2018).
doi: 10.7150/ijbs.27819
pubmed: 30416381
pmcid: 6216033
El Allali, A., Elhamraoui, Z. & Daoud, R. Machine learning applications in RNA modification sites prediction. Comput Struct Biotechnol J. 19, 5510–5524. https://doi.org/10.1016/j.csbj.2021.09.025 (2021).
doi: 10.1016/j.csbj.2021.09.025
pubmed: 34712397
pmcid: 8517552
Orozco-Arias, S. et al. K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes. PeerJ. 9,e11456. https://doi.org/10.7717/peerj.11456 (2021).
Ferreira, L. M., Sáfadi, T. & Ferreira, J. L. K-mer applied in Mycobacterium tuberculosis genome cluster analysis. Brazilian J. Biology. 84, e258258. https://doi.org/10.1590/1519-6984.258258 (2022).
doi: 10.1590/1519-6984.258258
Teng, Z. et al. i6mA-Vote: cross-species identification of DNA N6-methyladenine sites in plant genomes based on ensemble learning with voting. Front. Plant Sci. 13, 845835. https://doi.org/10.3389/fpls.2022.845835 (2022).
doi: 10.3389/fpls.2022.845835
pubmed: 35237293
pmcid: 8882731
Chen, R. et al. ATTIC is an integrated approach for predicting A-to-I RNA editing sites in three species. Brief. Bioinform. 24 (3), 170. https://doi.org/10.1093/bib/bbad170 (2023).
doi: 10.1093/bib/bbad170
Chen, Z. et al. iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets. Nucleic Acids Res. 50 (W1), W434–W447. https://doi.org/10.1093/nar/gkac351 (2022).
doi: 10.1093/nar/gkac351
pubmed: 35524557
pmcid: 9252729
Chen, W., Feng, P. M., Lin, H. & Chou, K. C. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res. 41(6), e68. https://doi.org/10.1093/nar/gks1450 (2013).
doi: 10.1093/nar/gks1450
pubmed: 23303794
pmcid: 3616736
Zheng, L. et al. RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou’s five-step rule. Database. 2019, baz131. https://doi.org/10.1093/database/baz131 (2019).
doi: 10.1093/database/baz131
pubmed: 31802128
pmcid: 6893003