Robust variable selection in the framework of classification with label noise and outliers: Applications to spectroscopic data in agri-food.

Agri-food Label noise Mid infrared spectroscopy Near infrared spectroscopy Outlier detection Robust classification Variable selection

Journal

Analytica chimica acta
ISSN: 1873-4324
Titre abrégé: Anal Chim Acta
Pays: Netherlands
ID NLM: 0370534

Informations de publication

Date de publication:
08 Apr 2021
Historique:
received: 05 10 2020
revised: 23 12 2020
accepted: 20 01 2021
entrez: 14 3 2021
pubmed: 15 3 2021
medline: 15 3 2021
Statut: ppublish

Résumé

Classification of high-dimensional spectroscopic data is a common task in analytical chemistry. Well-established procedures like support vector machines (SVMs) and partial least squares discriminant analysis (PLS-DA) are the most common methods for tackling this supervised learning problem. Nonetheless, interpretation of these models remains sometimes difficult, and solutions based on feature selection are often adopted as they lead to the automatic identification of the most informative wavelengths. Unfortunately, for some delicate applications like food authenticity, mislabeled and adulterated spectra occur both in the calibration and/or validation sets, with dramatic effects on the model development, its prediction accuracy and robustness. Motivated by these issues, the present paper proposes a robust model-based method that simultaneously performs variable selection, outliers and label noise detection. We demonstrate the effectiveness of our proposal in dealing with three agri-food spectroscopic studies, where several forms of perturbations are considered. Our approach succeeds in diminishing problem complexity, identifying anomalous spectra and attaining competitive predictive accuracy considering a very low number of selected wavelengths.

Identifiants

pubmed: 33714445
pii: S0003-2670(21)00071-4
doi: 10.1016/j.aca.2021.338245
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

338245

Informations de copyright

Copyright © 2021 Elsevier B.V. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Andrea Cappozzo (A)

Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy. Electronic address: andrea.cappozzo@unimib.it.

Ludovic Duponchel (L)

Univ. Lille, CNRS, UMR 8516, LASIRE-Laboratoire avancé de spectroscopie pour les interactions, la réactivité et l'environnement, F-59000, Lille, France. Electronic address: ludovic.duponchel@univ-lille.fr.

Francesca Greselin (F)

Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy. Electronic address: francesca.greselin@unimib.it.

Thomas Brendan Murphy (TB)

School of Mathematics & Statistics and Insight Research Centre, University College Dublin, Dublin, Ireland. Electronic address: brendan.murphy@ucd.ie.

Classifications MeSH