Finding features - variable extraction strategies for dimensionality reduction and marker compounds identification in GC-IMS data.


Journal

Food research international (Ottawa, Ont.)
ISSN: 1873-7145
Titre abrégé: Food Res Int
Pays: Canada
ID NLM: 9210143

Informations de publication

Date de publication:
11 2022
Historique:
received: 15 06 2022
revised: 24 07 2022
accepted: 17 08 2022
entrez: 4 10 2022
pubmed: 5 10 2022
medline: 6 10 2022
Statut: ppublish

Résumé

Gas chromatography hyphenated to ion mobility spectrometry (GC-IMS) is a powerful, two-dimensional separation and detection technique for volatile organic compounds (VOC). Low detection limits, high selectivity and robust operation characterize it as an ideal tool for non-target screening (NTS) approaches. Combined with multivariate data analysis, it has been successfully applied to several areas in food science, such as authenticity control and flavor profiling. The recorded raw data feature high numbers of variables due to the high scan speeds of the instrument. Additionally, NTS approaches - by design - record more data than required. Therefore, reducing the number of variables is a key step in any machine learning pipeline to reduce overfitting, overlong training times and model complexity. The aim of the study is a comparison between the two most used dimensionality reduction techniques, PCA and PLS, regarding interpretability, as a tool to find marker compounds, and performance as a preprocessing step for supervised learning. Both feature per variable visualizations, which allows easy interpretation of results and retains a connection to the input data, which can lead to the discovery of marker compounds. A GC-IMS dataset about the botanical origin of honey is used, and all formatting steps necessary to apply PCA and PLS to higher dimensional data and obtain intuitive figures are explained. To evaluate effectiveness as a preprocessing step in a supervised pipeline four supervised algorithms were fitted with PCA or PLS variable reduction. PLS proved to be a more effective step in a supervised workflow in terms of accuracy, while PCA is highly effective for revealing preprocessing weaknesses such as misalignments.

Identifiants

pubmed: 36192933
pii: S0963-9969(22)00837-7
doi: 10.1016/j.foodres.2022.111779
pii:
doi:

Substances chimiques

Volatile Organic Compounds 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

111779

Informations de copyright

Copyright © 2022 Elsevier Ltd. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Joscha Christmann (J)

Institute for Instrumental Analytics and Bioanalysis, Mannheim University of Applied Sciences, Paul-Wittsack-Straße 10, 68163 Mannheim, Germany; Hamburg School of Food Science, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany.

Sascha Rohn (S)

Hamburg School of Food Science, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany; Department of Food Chemistry and Analysis, Institute of Food, Technology and Food Chemistry, Technische Universität Berlin, TIB 4/3-1, Gustav-Meyer-Allee 25, 13355 Berlin, Germany.

Philipp Weller (P)

Institute for Instrumental Analytics and Bioanalysis, Mannheim University of Applied Sciences, Paul-Wittsack-Straße 10, 68163 Mannheim, Germany. Electronic address: p.weller@hs-mannheim.de.

Articles similaires

Humans Periodontitis Male Female Nutrition Surveys
Klebsiella pneumoniae Volatile Organic Compounds Metabolomics Ion Mobility Spectrometry Bacterial Proteins

Biofilms in modern CaCO

Mirosław Słowakiewicz, Andrzej Borkowski, Edoardo Perri et al.
1.00
Biofilms Fresh Water Calcium Carbonate Geologic Sediments Viruses
Microwaves Antioxidants Plant Extracts Flavonoids Phenols

Classifications MeSH