R-HEFS: Rough set based heterogeneous ensemble feature selection method for medical data classification.

Classification Ensemble feature selection Medical data Rough set Stability

Journal

Artificial intelligence in medicine
ISSN: 1873-2860
Titre abrégé: Artif Intell Med
Pays: Netherlands
ID NLM: 8915031

Informations de publication

Date de publication:
04 2021
Historique:
received: 04 06 2020
revised: 11 02 2021
accepted: 21 02 2021
entrez: 20 4 2021
pubmed: 21 4 2021
medline: 19 8 2021
Statut: ppublish

Résumé

Feature selection is one of the trustworthy processes of dimensionality reduction technique to select a subset of relevant and non-redundant features from large datasets. Ensemble feature selection (EFS) approach is a recent technique aiming at accumulating diversity in the subset of selected features. It improves the performance of learning algorithms and obtains more stable and robust results. In this paper, a novel rough set theory (RST) based heterogeneous EFS method (R-HEFS) is proposed for selecting the less redundant and highly relevant features during the aggregation of diverse feature subsets by applying the feature-class, feature-feature rough dependency and feature-significance measures. In R-HEFS five state-of-the-art RST based filter methods are used as a base feature selectors. Experiments are carried out on 10 benchmark medical datasets collected from the UCI repository. For the imputation of the missing values and discretization of the continuous features, k nearest neighbor (kNN) imputation method and RST based discretization techniques are applied. The effectiveness of the proposed R-HEFS method is evaluated and analyzed by using four benchmark classifiers viz., Naïve Bayes (NB), random forest (RF), support vector machine (SVM), and AdaBoost. The proposed R-HEFS method turns out to be effective by removing the non-relevant and redundant features during the process of aggregation of base feature selectors and it assists to increase the classification accuracy. Out of 10 different medical datasets, on 7 datasets, R-HEFS has achieved better average classification accuracy. So, the overall results strongly suggest that the proposed R-HEFS method can reduce the dimension of large medical datasets and may help the physicians or medical experts to diagnose (classify) different diseases with lesser computational complexities.

Identifiants

pubmed: 33875164
pii: S0933-3657(21)00042-7
doi: 10.1016/j.artmed.2021.102049
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

102049

Informations de copyright

Copyright © 2021 Elsevier B.V. All rights reserved.

Auteurs

Rubul Kumar Bania (RK)

Department of Computer Application, North-Eastern Hill University, Tura Campus, Tura 794002, Meghalaya, India. Electronic address: rubul.bania@nehu.ac.in.

Anindya Halder (A)

Department of Computer Application, North-Eastern Hill University, Tura Campus, Tura 794002, Meghalaya, India. Electronic address: anindya.halder@gmail.com.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Algorithms Software Artificial Intelligence Computer Simulation

Classifications MeSH