A quantum-based oversampling method for classification of highly imbalanced and overlapped data.

Classification class imbalance class overlapping oversampling quantum potential energy

Journal

Experimental biology and medicine (Maywood, N.J.)
ISSN: 1535-3699
Titre abrégé: Exp Biol Med (Maywood)
Pays: England
ID NLM: 100973463

Informations de publication

Date de publication:
28 Jan 2024
Historique:
medline: 28 1 2024
pubmed: 28 1 2024
entrez: 28 1 2024
Statut: aheadofprint

Résumé

Data imbalance is a challenging problem in classification tasks, and when combined with class overlapping, it further deteriorates classification performance. However, existing studies have rarely addressed both issues simultaneously. In this article, we propose a novel quantum-based oversampling method (QOSM) to effectively tackle data imbalance and class overlapping, thereby improving classification performance. QOSM utilizes the quantum potential theory to calculate the potential energy of each sample and selects the sample with the lowest potential as the center of each cover generated by a constructive covering algorithm. This approach optimizes cover center selection and better captures the distribution of the original samples, particularly in the overlapping regions. In addition, oversampling is performed on the samples of the minority class covers to mitigate the imbalance ratio (IR). We evaluated QOSM using three traditional classifiers (support vector machines [SVM], k-nearest neighbor [KNN], and naive Bayes [NB] classifier) on 10 publicly available KEEL data sets characterized by high IRs and varying degrees of overlap. Experimental results demonstrate that QOSM significantly improves classification accuracy compared to approaches that do not address class imbalance and overlapping. Moreover, QOSM consistently outperforms existing oversampling methods tested. With its compatibility with different classifiers, QOSM exhibits promising potential to improve the classification performance of highly imbalanced and overlapped data.

Identifiants

pubmed: 38281087
doi: 10.1177/15353702231220665
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

15353702231220665

Déclaration de conflit d'intérêts

Declaration Of Conflicting InterestsThe author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Auteurs

Bei Yang (B)

School of Computer and Artificial Intelligence, National Supercomputing Center in Zhengzhou, Zhengzhou University, Zhengzhou 450001, China.

Guilan Tian (G)

School of Computer and Artificial Intelligence, National Supercomputing Center in Zhengzhou, Zhengzhou University, Zhengzhou 450001, China.

Joseph Luttrell (J)

School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA.

Ping Gong (P)

Environmental Lab, U.S. Army Engineer Research and Development Center, Vicksburg, MS 39180, USA.

Chaoyang Zhang (C)

School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA.

Classifications MeSH