SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction.
Biological sequence
DNA sequence
Data mining
Ensemble learning
Time series
Journal
PeerJ
ISSN: 2167-8359
Titre abrégé: PeerJ
Pays: United States
ID NLM: 101603425
Informations de publication
Date de publication:
2023
2023
Historique:
received:
31
03
2023
accepted:
06
09
2023
medline:
1
11
2023
pubmed:
9
10
2023
entrez:
9
10
2023
Statut:
epublish
Résumé
Biological sequence data mining is hot spot in bioinformatics. A biological sequence can be regarded as a set of characters. Time series is similar to biological sequences in terms of both representation and mechanism. Therefore, in the article, biological sequences are represented with time series to obtain biological time sequence (BTS). Hybrid ensemble learning framework (SaPt-CNN-LSTM-AR-EA) for BTS is proposed. Single-sequence and multi-sequence models are respectively constructed with self-adaption pre-training one-dimensional convolutional recurrent neural network and autoregressive fractional integrated moving average fused evolutionary algorithm. In DNA sequence experiments with six viruses, SaPt-CNN-LSTM-AR-EA realized the good overall prediction performance and the prediction accuracy and correlation respectively reached 1.7073 and 0.9186. SaPt-CNN-LSTM-AR-EA was compared with other five benchmark models so as to verify its effectiveness and stability. SaPt-CNN-LSTM-AR-EA increased the average accuracy by about 30%. The framework proposed in this article is significant in biology, biomedicine, and computer science, and can be widely applied in sequence splicing, computational biology, bioinformation, and other fields.
Identifiants
pubmed: 37810796
doi: 10.7717/peerj.16192
pii: 16192
pmc: PMC10559882
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e16192Informations de copyright
© 2023 Yan et al.
Déclaration de conflit d'intérêts
The authors declare that they have no competing interests.
Références
Genome Res. 2021 Oct;31(10):1767-1780
pubmed: 34088715
NPJ Syst Biol Appl. 2022 Aug 16;8(1):29
pubmed: 35974022
PeerJ. 2020 Oct 1;8:e10088
pubmed: 33062454
Neural Netw. 2019 Aug;116:237-245
pubmed: 31121421
PeerJ. 2021 Jul 15;9:e11748
pubmed: 34316402
PeerJ. 2022 Mar 11;10:e12880
pubmed: 35295554
PeerJ. 2022 Jun 8;10:e13404
pubmed: 35698617
PeerJ. 2021 Nov 30;9:e12570
pubmed: 34909283
ACS Omega. 2023 Feb 01;8(6):5561-5570
pubmed: 36816680
BMC Bioinformatics. 2022 Apr 7;23(1):122
pubmed: 35392798
J Theor Biol. 2022 Apr 7;538:111039
pubmed: 35085534
PeerJ. 2022 Jun 24;10:e13613
pubmed: 35769139
PeerJ Comput Sci. 2021 Apr 21;7:e492
pubmed: 33981841
Big Data. 2023 Feb 24;:
pubmed: 36827458
iScience. 2021 Sep 25;24(10):103164
pubmed: 34646994
PeerJ. 2020 Nov 13;8:e10340
pubmed: 33240651
PeerJ. 2021 May 3;9:e11262
pubmed: 33986992
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2208-2217
pubmed: 31985440
PeerJ. 2022 Nov 4;10:e14275
pubmed: 36353602
PeerJ. 2023 Jan 10;11:e14559
pubmed: 36643621
Nat Commun. 2022 Apr 6;13(1):1861
pubmed: 35387992
Expert Syst Appl. 2021 Nov 15;182:
pubmed: 36211616
Comput Struct Biotechnol J. 2022 Feb 22;20:1044-1055
pubmed: 35284047
J Med Chem. 2023 Jan 26;66(2):1543-1561
pubmed: 36608175
J Chem Theory Comput. 2022 Nov 8;18(11):6670-6689
pubmed: 36218328
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii127-ii133
pubmed: 36124795
Comput Struct Biotechnol J. 2021 Nov 10;19:6263-6271
pubmed: 34900136
Bioinformatics. 2021 Oct 11;37(19):3252-3262
pubmed: 33974008
Nucleic Acids Res. 2019 May 7;47(8):e45
pubmed: 30773592
J Phys Chem Lett. 2022 Dec 15;13(49):11564-11570
pubmed: 36475710
Bioinformatics. 2022 Aug 10;38(16):4019-4026
pubmed: 35771606
Chem Commun (Camb). 2022 May 5;58(37):5630-5633
pubmed: 35438096