Interpretability of selected variables and performance comparison of variable selection methods in a polyethylene and polypropylene NIR classification task.

Classification analysis Dimension reduction NIR spectroscopy Variable selection

Journal

Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy
ISSN: 1873-3557
Titre abrégé: Spectrochim Acta A Mol Biomol Spectrosc
Pays: England
ID NLM: 9602533

Informations de publication

Date de publication:
05 Sep 2021
Historique:
received: 04 12 2020
revised: 08 04 2021
accepted: 13 04 2021
pubmed: 7 5 2021
medline: 7 5 2021
entrez: 6 5 2021
Statut: ppublish

Résumé

Near infrared (NIR) spectra are collected as a high amount of absorption values which usually greatly exceeds the sample size. Variable selection methods are employed in NIR spectroscopy to avoid "curse of dimensionality" related issues. In this paper, we examined the interpretability of selected variables, that is, how much selected wavelengths are related to the chemical structure of the materials studied, and if the relation is important for classification performance. Additionally, we examined classification performance in dependence on the number of selected variables. For this purpose, relative standard deviation (RSD), successive projection algorithm (SPA), stepwise decorrelation of variables (SELECT), genetic algorithm (GA), principal component analysis (PCA), and random (RANDOM) variable selection were applied in two-class classification modelling using linear discriminant analysis (LDA) or a support vector machine (SVM). Different pre-treatments and sample sizes were considered. Variable selection improved classification performance and variables selected by a majority of the methods considered were conveniently related to chemical structure. Interpretability and performance increase/decrease depend greatly on the number of selected variables, however. Since selected variables reveal great chemical interpretability, some variable selection methods could be employed to determine material characteristic absorption bands. SELECT and SPA displayed the best properties among the methods considered. To avoid faulty results, optimization of the number of selected variables should become the crucial stage in the variable selection process.

Identifiants

pubmed: 33957449
pii: S1386-1425(21)00426-1
doi: 10.1016/j.saa.2021.119850
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

119850

Informations de copyright

Copyright © 2021 Elsevier B.V. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Vilma Sem (V)

Faculty of Agriculture and Life Sciences, University of Maribor, Pivola 10, 2311 Hoce, Slovenia. Electronic address: vilma.sem@um.si.

Classifications MeSH