Mining algorithm of accumulation sequence of unbalanced data based on probability matrix decomposition.


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2023
Historique:
received: 12 01 2023
accepted: 20 06 2023
medline: 10 7 2023
pubmed: 7 7 2023
entrez: 7 7 2023
Statut: epublish

Résumé

Due to the inherent characteristics of accumulation sequence of unbalanced data, the mining results of this kind of data are often affected by a large number of categories, resulting in the decline of mining performance. To solve the above problems, the performance of data cumulative sequence mining is optimized. The algorithm for mining cumulative sequence of unbalanced data based on probability matrix decomposition is studied. The natural nearest neighbor of a few samples in the unbalanced data cumulative sequence is determined, and the few samples in the unbalanced data cumulative sequence are clustered according to the natural nearest neighbor relationship. In the same cluster, new samples are generated from the core points of dense regions and non core points of sparse regions, and then new samples are added to the original data accumulation sequence to balance the data accumulation sequence. The probability matrix decomposition method is used to generate two random number matrices with Gaussian distribution in the cumulative sequence of balanced data, and the linear combination of low dimensional eigenvectors is used to explain the preference of specific users for the data sequence; At the same time, from a global perspective, the AdaBoost idea is used to adaptively adjust the sample weight and optimize the probability matrix decomposition algorithm. Experimental results show that the algorithm can effectively generate new samples, improve the imbalance of data accumulation sequence, and obtain more accurate mining results. Optimizing global errors as well as more efficient single-sample errors. When the decomposition dimension is 5, the minimum RMSE is obtained. The proposed algorithm has good classification performance for the cumulative sequence of balanced data, and the average ranking of index F value, G mean and AUC is the best.

Identifiants

pubmed: 37418443
doi: 10.1371/journal.pone.0288140
pii: PONE-D-23-00970
pmc: PMC10328230
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0288140

Informations de copyright

Copyright: © 2023 Mou, Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Sensors (Basel). 2020 Apr 25;20(9):
pubmed: 32344855
Sensors (Basel). 2020 Jun 02;20(11):
pubmed: 32498271
Sensors (Basel). 2020 Aug 23;20(17):
pubmed: 32842459

Auteurs

Shaoxia Mou (S)

University of Perpetual Help System Dalta, Graduate School Eternal University, Las Piñas, Philippines.

Heming Zhang (H)

Binzhou Medical University Hospital, Binzhou, Shandong, China.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Algorithms Software Artificial Intelligence Computer Simulation

Unsupervised learning for real-time and continuous gait phase detection.

Dollaporn Anopas, Yodchanan Wongsawat, Jetsada Arnin
1.00
Humans Gait Neural Networks, Computer Unsupervised Machine Learning Walking

Classifications MeSH