SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction.

Biological sequence DNA sequence Data mining Ensemble learning Time series

Journal

PeerJ
ISSN: 2167-8359
Titre abrégé: PeerJ
Pays: United States
ID NLM: 101603425

Informations de publication

Date de publication:
2023
Historique:
received: 31 03 2023
accepted: 06 09 2023
medline: 1 11 2023
pubmed: 9 10 2023
entrez: 9 10 2023
Statut: epublish

Résumé

Biological sequence data mining is hot spot in bioinformatics. A biological sequence can be regarded as a set of characters. Time series is similar to biological sequences in terms of both representation and mechanism. Therefore, in the article, biological sequences are represented with time series to obtain biological time sequence (BTS). Hybrid ensemble learning framework (SaPt-CNN-LSTM-AR-EA) for BTS is proposed. Single-sequence and multi-sequence models are respectively constructed with self-adaption pre-training one-dimensional convolutional recurrent neural network and autoregressive fractional integrated moving average fused evolutionary algorithm. In DNA sequence experiments with six viruses, SaPt-CNN-LSTM-AR-EA realized the good overall prediction performance and the prediction accuracy and correlation respectively reached 1.7073 and 0.9186. SaPt-CNN-LSTM-AR-EA was compared with other five benchmark models so as to verify its effectiveness and stability. SaPt-CNN-LSTM-AR-EA increased the average accuracy by about 30%. The framework proposed in this article is significant in biology, biomedicine, and computer science, and can be widely applied in sequence splicing, computational biology, bioinformation, and other fields.

Identifiants

pubmed: 37810796
doi: 10.7717/peerj.16192
pii: 16192
pmc: PMC10559882
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e16192

Informations de copyright

© 2023 Yan et al.

Déclaration de conflit d'intérêts

The authors declare that they have no competing interests.

Références

Genome Res. 2021 Oct;31(10):1767-1780
pubmed: 34088715
NPJ Syst Biol Appl. 2022 Aug 16;8(1):29
pubmed: 35974022
PeerJ. 2020 Oct 1;8:e10088
pubmed: 33062454
Neural Netw. 2019 Aug;116:237-245
pubmed: 31121421
PeerJ. 2021 Jul 15;9:e11748
pubmed: 34316402
PeerJ. 2022 Mar 11;10:e12880
pubmed: 35295554
PeerJ. 2022 Jun 8;10:e13404
pubmed: 35698617
PeerJ. 2021 Nov 30;9:e12570
pubmed: 34909283
ACS Omega. 2023 Feb 01;8(6):5561-5570
pubmed: 36816680
BMC Bioinformatics. 2022 Apr 7;23(1):122
pubmed: 35392798
J Theor Biol. 2022 Apr 7;538:111039
pubmed: 35085534
PeerJ. 2022 Jun 24;10:e13613
pubmed: 35769139
PeerJ Comput Sci. 2021 Apr 21;7:e492
pubmed: 33981841
Big Data. 2023 Feb 24;:
pubmed: 36827458
iScience. 2021 Sep 25;24(10):103164
pubmed: 34646994
PeerJ. 2020 Nov 13;8:e10340
pubmed: 33240651
PeerJ. 2021 May 3;9:e11262
pubmed: 33986992
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2208-2217
pubmed: 31985440
PeerJ. 2022 Nov 4;10:e14275
pubmed: 36353602
PeerJ. 2023 Jan 10;11:e14559
pubmed: 36643621
Nat Commun. 2022 Apr 6;13(1):1861
pubmed: 35387992
Expert Syst Appl. 2021 Nov 15;182:
pubmed: 36211616
Comput Struct Biotechnol J. 2022 Feb 22;20:1044-1055
pubmed: 35284047
J Med Chem. 2023 Jan 26;66(2):1543-1561
pubmed: 36608175
J Chem Theory Comput. 2022 Nov 8;18(11):6670-6689
pubmed: 36218328
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii127-ii133
pubmed: 36124795
Comput Struct Biotechnol J. 2021 Nov 10;19:6263-6271
pubmed: 34900136
Bioinformatics. 2021 Oct 11;37(19):3252-3262
pubmed: 33974008
Nucleic Acids Res. 2019 May 7;47(8):e45
pubmed: 30773592
J Phys Chem Lett. 2022 Dec 15;13(49):11564-11570
pubmed: 36475710
Bioinformatics. 2022 Aug 10;38(16):4019-4026
pubmed: 35771606
Chem Commun (Camb). 2022 May 5;58(37):5630-5633
pubmed: 35438096

Auteurs

Wu Yan (W)

School of Biotechnology, Jiangsu University of Science & Technology, Zhenjiang, China.
School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, Jiangxi, China.
Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, China.

Li Tan (L)

College of Physics and Electronic Information, Gannan Normal University, Ganzhou, China.

Li Meng-Shan (L)

College of Physics and Electronic Information, Gannan Normal University, Ganzhou, China.

Sheng Sheng (S)

School of Biotechnology, Jiangsu University of Science & Technology, Zhenjiang, China.
Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, China.

Wang Jun (W)

School of Biotechnology, Jiangsu University of Science & Technology, Zhenjiang, China.
Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, China.

Wu Fu-An (W)

School of Biotechnology, Jiangsu University of Science & Technology, Zhenjiang, China.
Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, China.

Articles similaires

Humans Meals Time Factors Female Adult

Vancomycin-associated DRESS demonstrates delay in AST abnormalities.

Ahmed Hussein, Kateri L Schoettinger, Jourdan Hydol-Smith et al.
1.00
Humans Drug Hypersensitivity Syndrome Vancomycin Female Male

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH