IDDLncLoc: Subcellular Localization of LncRNAs Based on a Framework for Imbalanced Data Distributions.

Ensemble model Imbalanced learning Sequence feature Subcellular localization of lncRNA

Journal

Interdisciplinary sciences, computational life sciences
ISSN: 1867-1462
Titre abrégé: Interdiscip Sci
Pays: Germany
ID NLM: 101515919

Informations de publication

Date de publication:
Jun 2022
Historique:
received: 21 05 2021
accepted: 20 12 2021
revised: 16 12 2021
pubmed: 23 2 2022
medline: 25 5 2022
entrez: 22 2 2022
Statut: ppublish

Résumé

Long non-coding RNAs play a crucial role in many life processes of cell, such as genetic markers, RNA splicing, signaling, and protein regulation. Considering that identifying lncRNA's localization in the cell through experimental methods is complicated, hard to reproduce, and expensive, we propose a novel method named IDDLncLoc in this paper, which adopts an ensemble model to solve the problem of the subcellular localization. In the proposal model, dinucleotide-based auto-cross covariance features, k-mer nucleotide composition features, and composition, transition, and distribution features are introduced to encode a raw RNA sequence to vector. To screen out reliable features, feature selection through binomial distribution, and recursive feature elimination is employed. Furthermore, strategies of oversampling in mini-batch, random sampling, and stacking ensemble strategies are customized to overcome the problem of data imbalance on the benchmark dataset. Finally, compared with the latest methods, IDDLncLoc achieves an accuracy of 94.96% on the benchmark dataset, which is 2.59% higher than the best method, and the results further demonstrate IDDLncLoc is excellent on the subcellular localization of lncRNA. Besides, a user-friendly web server is available at http://lncloc.club .

Identifiants

pubmed: 35192174
doi: 10.1007/s12539-021-00497-6
pii: 10.1007/s12539-021-00497-6
doi:

Substances chimiques

Nucleotides 0
Proteins 0
RNA, Long Noncoding 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

409-420

Subventions

Organisme : the National Natural Science Foundation of China
ID : No. 62072212
Organisme : the National Natural Science Foundation of China
ID : 61902144
Organisme : the Development Project of Jilin Province of China
ID : Nos. 20200401083GX
Organisme : the Development Project of Jilin Province of China
ID : 2020C003
Organisme : the Development Project of Jilin Province of China
ID : 20200403172SF
Organisme : Chinese Postdoctoral Science Foundation
ID : No. 801212011421

Informations de copyright

© 2022. International Association of Scientists in the Interdisciplinary Areas.

Références

Perkel JM (2013) Visiting “Noncodarnia.” Biotechniques 54:301–304. https://doi.org/10.2144/000114037
doi: 10.2144/000114037 pubmed: 23750541
Gong C, Maquat LE (2011) lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3′UTRs via Alu elements. Nature 470:284. https://doi.org/10.1038/nature09701
doi: 10.1038/nature09701 pubmed: 21307942 pmcid: 3073508
Huarte M, Guttman M, Feldser D et al (2010) A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell 142:409–419. https://doi.org/10.1016/j.cell.2010.06.040
doi: 10.1016/j.cell.2010.06.040 pubmed: 20673990 pmcid: 2956184
Hung T, Wang Y, Lin MF et al (2011) Extensive and coordinated transcription of noncoding RNAs within cell-cycle promoters. Nat Genet 43:621-U196. https://doi.org/10.1038/ng.848
doi: 10.1038/ng.848
Kino T, Hurt DE, Ichijo T et al (2010) Noncoding RNA Gas5 Is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor. Sci Signal. https://doi.org/10.1126/scisignal.2000568
doi: 10.1126/scisignal.2000568 pubmed: 20124551 pmcid: 2819218
Yu B, Shan G (2016) Functions of long noncoding RNAs in the nucleus. Nucleus 7:155–166. https://doi.org/10.1080/19491034.2016.1179408
doi: 10.1080/19491034.2016.1179408 pubmed: 27105038 pmcid: 4916869
Sun Q, Hao Q, Prasanth KV (2018) Nuclear long noncoding RNAs: key regulators of gene expression. Trends Genet 34:142–157. https://doi.org/10.1016/j.tig.2017.11.005
doi: 10.1016/j.tig.2017.11.005 pubmed: 29249332 pmcid: 6002860
Ahmad I, Valverde A, Ahmad F, Naqvi AR (2020) Long noncoding RNA in myeloid and lymphoid cell differentiation, polarization and function. Cells. https://doi.org/10.3390/cells9020269
doi: 10.3390/cells9020269 pubmed: 33352976 pmcid: 7767330
Schmitt AM, Chang HY (2016) Long noncoding RNAs in cancer pathways. Cancer Cell 29:452–463. https://doi.org/10.1016/j.ccell.2016.03.010
doi: 10.1016/j.ccell.2016.03.010 pubmed: 27070700 pmcid: 4831138
Tseng YY, Moriarity BS, Gong W et al (2014) PVT1 dependence in cancer with MYC copy-number increase. Nature 512:82–86. https://doi.org/10.1038/nature13311
doi: 10.1038/nature13311 pubmed: 25043044 pmcid: 4767149
Wang Y, Wang K, Zhang L et al (2020) Targeted overexpression of the long noncoding RNA ODSM can regulate osteoblast function in vitro and in vivo. Cell Death Dis. https://doi.org/10.1038/s41419-020-2325-3
doi: 10.1038/s41419-020-2325-3 pubmed: 33414409 pmcid: 7791068
Liu B, Sun L, Liu Q et al (2015) A cytoplasmic NF-κB interacting long noncoding RNA blocks IκB phosphorylation and suppresses breast cancer metastasis. Cancer Cell 27:370–381. https://doi.org/10.1016/j.ccell.2015.02.004
doi: 10.1016/j.ccell.2015.02.004 pubmed: 25759022
Hu Y-P, Jin Y-P, Wu X-S et al (2019) LncRNA-HGBC stabilized by HuR promotes gallbladder cancer progression by regulating miR-502-3p/SET/AKT axis. Mol Cancer 18:167. https://doi.org/10.1186/s12943-019-1097-9
doi: 10.1186/s12943-019-1097-9 pubmed: 31752906 pmcid: 6868746
Kang CM, Bai HL, Li XH et al (2019) The binding of lncRNA RP11-732M18.3 with 14–3-3 β/α accelerates p21 degradation and promotes glioma growth. EBioMedicine 45:58–69. https://doi.org/10.1016/j.ebiom.2019.06.002
doi: 10.1016/j.ebiom.2019.06.002 pubmed: 31202814 pmcid: 6642068
Dohm JC, Lottaz C, Borodina T, Himmelbauer H (2008) Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. https://doi.org/10.1093/nar/gkn425
doi: 10.1093/nar/gkn425 pubmed: 18660515 pmcid: 2532726
Saiki RK, Scharf S, Faloona F et al (1992) Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. 1985. Biotechnology 24:476–480. https://doi.org/10.1007/BF00985904
doi: 10.1007/BF00985904 pubmed: 1422056
Maclary E, Buttigieg E, Hinten M et al (2014) Differentiation-dependent requirement of Tsix long non-coding RNA in imprinted X-chromosome inactivation. Nat Commun 5:1–14. https://doi.org/10.1038/ncomms5209
doi: 10.1038/ncomms5209
Hacisuleyman E, Goff LA, Trapnell C et al (2014) Topological organization of multichromosomal regions by the long intergenic noncoding RNA Firre. Nat Struct Mol Biol 21:198–206. https://doi.org/10.1038/nsmb.2764
doi: 10.1038/nsmb.2764 pubmed: 24463464 pmcid: 3950333
Woźniak M, Połap D, Kośmider L, Cłapa T (2018) Automated fluorescence microscopy image analysis of Pseudomonas aeruginosa bacteria in alive and dead stadium. Eng Appl Artif Intell 67:100–110. https://doi.org/10.1016/j.engappai.2017.09.003
doi: 10.1016/j.engappai.2017.09.003
Feng P, Zhang J, Tang H et al (2017) Predicting the organelle location of noncoding RNAs using pseudo nucleotide compositions. Interdiscip Sci Comput Life Sci 9:540–544. https://doi.org/10.1007/s12539-016-0193-4
doi: 10.1007/s12539-016-0193-4
Cheng X, Xiao X, Chou KC (2018) pLoc-mEuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. Genomics 110:50–58. https://doi.org/10.1016/j.ygeno.2017.08.005
doi: 10.1016/j.ygeno.2017.08.005 pubmed: 28818512
Cao Z, Pan X, Yang Y et al (2018) The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier. Bioinformatics 34:2185–2194. https://doi.org/10.1093/bioinformatics/bty085
doi: 10.1093/bioinformatics/bty085 pubmed: 29462250
Su ZD, Huang Y, Zhang ZY et al (2018) ILoc-lncRNA: Predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics 34:4196–4204. https://doi.org/10.1093/bioinformatics/bty508
doi: 10.1093/bioinformatics/bty508 pubmed: 29931187
Chen Z, Zhao P, Li F et al (2020) iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief Bioinform 21:1047–1057. https://doi.org/10.1093/bib/bbz041
doi: 10.1093/bib/bbz041 pubmed: 31067315
Wei L, Zhou C, Chen H et al (2018) ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics 34:4007–4016. https://doi.org/10.1093/bioinformatics/bty451
doi: 10.1093/bioinformatics/bty451 pubmed: 29868903 pmcid: 6247924
Granitto PM, Furlanello C, Biasioli F, Gasperi F (2006) Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemom Intell Lab Syst 83:83–90. https://doi.org/10.1016/j.chemolab.2006.01.007
doi: 10.1016/j.chemolab.2006.01.007
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659. https://doi.org/10.1093/bioinformatics/btl158
doi: 10.1093/bioinformatics/btl158 pubmed: 16731699
Liu B, Liu F, Wang X et al (2015) Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res 43:W65–W71. https://doi.org/10.1093/nar/gkv458
doi: 10.1093/nar/gkv458 pubmed: 25958395 pmcid: 4489303
Dong Q, Zhou S, Guan J (2009) A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25:2655–2662. https://doi.org/10.1093/bioinformatics/btp500
doi: 10.1093/bioinformatics/btp500 pubmed: 19706744
Chen W, Zhang X, Brooker J et al (2015) PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics 31:119–120. https://doi.org/10.1093/bioinformatics/btu602
doi: 10.1093/bioinformatics/btu602 pubmed: 25231908
Chen W, Lei T-Y, Jin D-C et al (2014) PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. Anal Biochem 456:53–60. https://doi.org/10.1016/j.ab.2014.04.001
doi: 10.1016/j.ab.2014.04.001 pubmed: 24732113
Zhu L, Yang J, Song J-N et al (2010) Improving the accuracy of predicting disulfide connectivity by feature selection. J Comput Chem 31:1478–1485. https://doi.org/10.1002/jcc.21433
doi: 10.1002/jcc.21433 pubmed: 20127740
Chen W, Yang H, Feng P et al (2017) IDNA4mC: identifying DNA N 4 -methylcytosine sites based on nucleotide chemical properties. Bioinformatics 33:3518–3523. https://doi.org/10.1093/bioinformatics/btx479
doi: 10.1093/bioinformatics/btx479 pubmed: 28961687
Ding C, Yuan L-F, Guo S-H et al (2012) Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. J Proteom 77:321–328. https://doi.org/10.1016/j.jprot.2012.09.006
doi: 10.1016/j.jprot.2012.09.006
Feng P-M, Chen W, Lin H, Chou K-C (2013) iHSP-PseRAAAC: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem 442:118–125. https://doi.org/10.1016/j.ab.2013.05.024
doi: 10.1016/j.ab.2013.05.024 pubmed: 23756733
Tang H, Su Z-D, Wei H-H et al (2016) Prediction of cell-penetrating peptides with feature selection techniques. Biochem Biophys Res Commun 477:150–154. https://doi.org/10.1016/j.bbrc.2016.06.035
doi: 10.1016/j.bbrc.2016.06.035 pubmed: 27291150
Wang T, Yang J, Shen H-B, Chou K-C (2008) Predicting membrane protein types by the LLDA algorithm. Protein Pept Lett 15:915–921. https://doi.org/10.2174/092986608785849308
doi: 10.2174/092986608785849308 pubmed: 18991767
Yang H, Tang H, Chen X-X et al (2016) Identification of secretory proteins in mycobacterium tuberculosis using pseudo amino acid composition. Biomed Res Int. https://doi.org/10.1155/2016/5413903
doi: 10.1155/2016/5413903 pubmed: 28119926 pmcid: 5227121
Zhao Y-W, Lai H-Y, Tang H et al (2016) Prediction of phosphothreonine sites in human proteins by fusing different features. Sci Rep. https://doi.org/10.1038/srep34817
doi: 10.1038/srep34817 pubmed: 28009010 pmcid: 5180247
Zhao Y-W, Su Z-D, Yang W et al (2017) IonchanPred 2.0: a tool to predict ion channels and their types. Int J Mol Sci. https://doi.org/10.3390/ijms18091838
doi: 10.3390/ijms18091838 pubmed: 29286291 pmcid: 5796034
Lai H-Y, Chen X-X, Chen W et al (2017) Sequence-based predictive modeling to identify cancer lectins. Oncotarget 8:28169–28175. https://doi.org/10.18632/oncotarget.15963
doi: 10.18632/oncotarget.15963 pubmed: 28423655 pmcid: 5438640
Virtanen P, Gommers R, Oliphant TE et al (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17:261–272. https://doi.org/10.1038/s41592-019-0686-2
doi: 10.1038/s41592-019-0686-2 pubmed: 32015543 pmcid: 7056644
Lee J, Yoon W, Kim S et al (2019) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36:1234–1240. https://doi.org/10.1093/bioinformatics/btz682
doi: 10.1093/bioinformatics/btz682 pmcid: 7703786
Liu L, Ouyang W, Wang X et al (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128:261–318. https://doi.org/10.1007/s11263-019-01247-4
doi: 10.1007/s11263-019-01247-4
Kang Q, Shi L, Zhou M et al (2018) A distance-based weighted undersampling scheme for support vector machines and its application to imbalanced classification. IEEE Trans Neural Netw Learn Syst 29:4152–4165. https://doi.org/10.1109/TNNLS.2017.2755595
doi: 10.1109/TNNLS.2017.2755595 pubmed: 29990027
Fan Y, Chen M, Zhu Q (2020) LncLocPred: predicting LncRNA subcellular localization using multiple sequence feature information. IEEE Access 8:124702–124711. https://doi.org/10.1109/ACCESS.2020.3007317
doi: 10.1109/ACCESS.2020.3007317
Ahmad A, Lin H, Shatabda S (2020) Locate-R: subcellular localization of long non-coding RNAs using nucleotide compositions. Genomics 112:2583–2589. https://doi.org/10.1016/j.ygeno.2020.02.011
doi: 10.1016/j.ygeno.2020.02.011 pubmed: 32068122

Auteurs

Yan Wang (Y)

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
School of Artificial Intelligence, Jilin University, Changchun, China.

Xiaopeng Zhu (X)

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.

Lili Yang (L)

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
Department of Obstetrics, The First Hospital of Jilin University, Changchun, China.

Xuemei Hu (X)

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.

Kai He (K)

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.

Cuinan Yu (C)

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.

Shaoqing Jiao (S)

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.

Jiali Chen (J)

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.

Rui Guo (R)

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.

Sen Yang (S)

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China. ystop2020@gmail.com.

Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic
Metabolic Networks and Pathways Saccharomyces cerevisiae Computational Biology Synthetic Biology Computer Simulation

Classifications MeSH