R-HEFS: Rough set based heterogeneous ensemble feature selection method for medical data classification.
Classification
Ensemble feature selection
Medical data
Rough set
Stability
Journal
Artificial intelligence in medicine
ISSN: 1873-2860
Titre abrégé: Artif Intell Med
Pays: Netherlands
ID NLM: 8915031
Informations de publication
Date de publication:
04 2021
04 2021
Historique:
received:
04
06
2020
revised:
11
02
2021
accepted:
21
02
2021
entrez:
20
4
2021
pubmed:
21
4
2021
medline:
19
8
2021
Statut:
ppublish
Résumé
Feature selection is one of the trustworthy processes of dimensionality reduction technique to select a subset of relevant and non-redundant features from large datasets. Ensemble feature selection (EFS) approach is a recent technique aiming at accumulating diversity in the subset of selected features. It improves the performance of learning algorithms and obtains more stable and robust results. In this paper, a novel rough set theory (RST) based heterogeneous EFS method (R-HEFS) is proposed for selecting the less redundant and highly relevant features during the aggregation of diverse feature subsets by applying the feature-class, feature-feature rough dependency and feature-significance measures. In R-HEFS five state-of-the-art RST based filter methods are used as a base feature selectors. Experiments are carried out on 10 benchmark medical datasets collected from the UCI repository. For the imputation of the missing values and discretization of the continuous features, k nearest neighbor (kNN) imputation method and RST based discretization techniques are applied. The effectiveness of the proposed R-HEFS method is evaluated and analyzed by using four benchmark classifiers viz., Naïve Bayes (NB), random forest (RF), support vector machine (SVM), and AdaBoost. The proposed R-HEFS method turns out to be effective by removing the non-relevant and redundant features during the process of aggregation of base feature selectors and it assists to increase the classification accuracy. Out of 10 different medical datasets, on 7 datasets, R-HEFS has achieved better average classification accuracy. So, the overall results strongly suggest that the proposed R-HEFS method can reduce the dimension of large medical datasets and may help the physicians or medical experts to diagnose (classify) different diseases with lesser computational complexities.
Identifiants
pubmed: 33875164
pii: S0933-3657(21)00042-7
doi: 10.1016/j.artmed.2021.102049
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
102049Informations de copyright
Copyright © 2021 Elsevier B.V. All rights reserved.