Interpretability of selected variables and performance comparison of variable selection methods in a polyethylene and polypropylene NIR classification task.
Classification analysis
Dimension reduction
NIR spectroscopy
Variable selection
Journal
Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy
ISSN: 1873-3557
Titre abrégé: Spectrochim Acta A Mol Biomol Spectrosc
Pays: England
ID NLM: 9602533
Informations de publication
Date de publication:
05 Sep 2021
05 Sep 2021
Historique:
received:
04
12
2020
revised:
08
04
2021
accepted:
13
04
2021
pubmed:
7
5
2021
medline:
7
5
2021
entrez:
6
5
2021
Statut:
ppublish
Résumé
Near infrared (NIR) spectra are collected as a high amount of absorption values which usually greatly exceeds the sample size. Variable selection methods are employed in NIR spectroscopy to avoid "curse of dimensionality" related issues. In this paper, we examined the interpretability of selected variables, that is, how much selected wavelengths are related to the chemical structure of the materials studied, and if the relation is important for classification performance. Additionally, we examined classification performance in dependence on the number of selected variables. For this purpose, relative standard deviation (RSD), successive projection algorithm (SPA), stepwise decorrelation of variables (SELECT), genetic algorithm (GA), principal component analysis (PCA), and random (RANDOM) variable selection were applied in two-class classification modelling using linear discriminant analysis (LDA) or a support vector machine (SVM). Different pre-treatments and sample sizes were considered. Variable selection improved classification performance and variables selected by a majority of the methods considered were conveniently related to chemical structure. Interpretability and performance increase/decrease depend greatly on the number of selected variables, however. Since selected variables reveal great chemical interpretability, some variable selection methods could be employed to determine material characteristic absorption bands. SELECT and SPA displayed the best properties among the methods considered. To avoid faulty results, optimization of the number of selected variables should become the crucial stage in the variable selection process.
Identifiants
pubmed: 33957449
pii: S1386-1425(21)00426-1
doi: 10.1016/j.saa.2021.119850
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
119850Informations de copyright
Copyright © 2021 Elsevier B.V. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.