Majority scoring with backward elimination in PLS for high dimensional spectrum data.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
20 08 2021
20 08 2021
Historique:
received:
23
04
2021
accepted:
03
08
2021
entrez:
21
8
2021
pubmed:
22
8
2021
medline:
22
8
2021
Statut:
epublish
Résumé
Variable selection is crucial issue for high dimensional data modeling, where sample size is smaller compared to number of variables. Recently, majority scoring of filter measures in PLS (MS-PLS) is introduced for variable selection in high dimensional data. Filter measures are not greedy for optimal performance, hence we have proposed majority scoring with backward elimination in PLS (MSBE-PLS). In MSBE-PLS we have considered variable importance on projection (VIP) and selectivity ratio (SR). In each iteration of backward elimination in PLS variables are considered influential if they were selected by both filter indicator. The proposed method is implemented for corn's and diesel's content prediction. The corn contents include protein, oil, starch and moisture while diesel contents include boiling point at 50% recovery, cetane number, density, freezing temperature of the fuel, total aromatics, and viscosity. The proposed method outperforms in terms of RMSE when compared with reference methods. In addition to validating the spectrum models, data properties are also examined for explaining prediction behaviors. Moreover, MSBE-PLS select the moderate number of influential variables, hence it presents the parsimonious model for predicting contents based on spectrum data.
Identifiants
pubmed: 34417501
doi: 10.1038/s41598-021-96389-2
pii: 10.1038/s41598-021-96389-2
pmc: PMC8379245
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
16974Informations de copyright
© 2021. The Author(s).
Références
Martens, H. & Naes, T. Multivariate calibration (Wiley, Hoboken, 1992).
Mehmood, T. & Ahmed, B. The diversity in the applications of partial least squares: An overview. J. Chem. 30, 4–17 (2016).
doi: 10.1002/cem.2762
Mehmood, T., Sæbø, S. & Liland, K. H. Comparison of variable selection methods in partial least squares regression. J. Chem. 34, e3226 (2020).
doi: 10.1002/cem.3226
Liland, K. H., Høy, M., Martens, H. & Sæbø, S. Distribution based truncation for variable selection in subspace methods for multivariate regression. Chem. Intell. Lab. Syst. 122, 103–111 (2013).
doi: 10.1016/j.chemolab.2013.01.008
Mehmood, T. Hotelling t 2 based variable selection in partial least squares regression. Chem. Intell. Lab. Syst. 154, 23–28 (2016).
doi: 10.1016/j.chemolab.2016.03.001
Alenezi, F. N. & Mehmood, T. Majority scoring based pls filter mixture for variable selection in spectroscopic data. Chem. Intell. Lab. Syst. 212, 104282 (2021).
doi: 10.1016/j.chemolab.2021.104282
Mehmood, T., Liland, K. H., Snipen, L. & Sæbø, S. A review of variable selection methods in partial least squares regression. Chem. Intell. Lab. Syst. 118, 62–69 (2012).
doi: 10.1016/j.chemolab.2012.07.010
Wold, S., Johansson, E. & Cocchi, M. Pls: partial least squares projections to latent structures. 3D QSAR Drug Des. 1, 523–550 (1993).
Kvalheim, O. & Karstang, T. Interpretation of latent-variable regression models. Chem. Intell. Lab. Syst. 7, 39–51 (1989).
doi: 10.1016/0169-7439(89)80110-8
Wold, S., Martens, H. & Wold, H. The multivariate calibration problem in chemistry solved by the PLS method. in Conference Proceeding Matrix pencils 286–293 (Springer, 1983).
Martens, H. & Næs, T. Multivariate Calibration (Wiley, Hoboken, 1989).
Frank, I. Intermediate least squares regression method. Chem. Intell. Lab. Syst. 1, 233–242 (1987).
doi: 10.1016/0169-7439(87)80067-9
Frenich, A. et al. Wavelength selection method for multicomponent spectrophotometric determinations using partial least squares. Analyst 120, 2787–2792 (1995).
doi: 10.1039/an9952002787
Filzmoser, P., Liebmann, B. & Varmuza, K. Repeated double cross validation. J. Chem. 23, 160–171 (2009).
doi: 10.1002/cem.1225
Sæbø, S., Almøy, T., Aarøe, J. & Aastveit, A. H. St-pls: A multi-dimensional nearest shrunken centroid type classifier via pls. J. Chem. 20, 54–62 (2007).
Kuhn, M. et al. Classification and regression training (R package version 4) (2011).
dos Santos, C. A. T. Development of new methodologies based on vibrational spectroscopy and chemometrics for wine characterization and classification (2017).