Semiparametric Clustering: A Robust Alternative to Parametric Clustering.


Journal

IEEE transactions on neural networks and learning systems
ISSN: 2162-2388
Titre abrégé: IEEE Trans Neural Netw Learn Syst
Pays: United States
ID NLM: 101616214

Informations de publication

Date de publication:
Sep 2019
Historique:
pubmed: 4 1 2019
medline: 4 1 2019
entrez: 4 1 2019
Statut: ppublish

Résumé

Clustering aims at naturally grouping the data according to the underlying data distribution. The data distribution is often estimated using a parametric or nonparametric model, e.g., Gaussian mixture or kernel density estimation. Compared with nonparametric models, parametric models are statistically stable, i.e., a small perturbation of data points leads to a small change in the estimated density. However, parametric models are highly sensitive to outliers because the data distribution is far away from the parametric assumptions in the presence of outliers. Given a parametric clustering algorithm, this paper shows how to turn this algorithm into a robust one. The idea is to modify the original parametric density into a semiparametric one. The high-density data that form the core of each cluster are modeled with the original parametric density. The low-density data are often far away from the cluster cores and may have an arbitrary shape, thus are modeled using a nonparametric density. A combination of parametric and nonparametric clustering algorithms is used to group the data modeled as a semiparametric density. From the robust statistical point of view, the proposed method has good robustness properties. We test the proposed algorithm on several synthetic and 70 UCI data sets. The results indicate that the semiparametric method could significantly improve the clustering performance.

Identifiants

pubmed: 30602425
doi: 10.1109/TNNLS.2018.2884790
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

2583-2597

Auteurs

Classifications MeSH