Finding features - variable extraction strategies for dimensionality reduction and marker compounds identification in GC-IMS data.

Gas Chromatography-Mass Spectrometry / methods Honey / analysis Ion Mobility Spectrometry / methods Principal Component Analysis Volatile Organic Compounds / analysis

Chemometrics Food authenticity Non-target screening Python VOC profiling

Journal

Food research international (Ottawa, Ont.)

ISSN: 1873-7145

Titre abrégé: Food Res Int

Pays: Canada

ID NLM: 9210143

Informations de publication

Date de publication:
11 2022

Historique:

received: 15 06 2022

revised: 24 07 2022

accepted: 17 08 2022

entrez: 4 10 2022

pubmed: 5 10 2022

medline: 6 10 2022

Statut: ppublish

Résumé

Gas chromatography hyphenated to ion mobility spectrometry (GC-IMS) is a powerful, two-dimensional separation and detection technique for volatile organic compounds (VOC). Low detection limits, high selectivity and robust operation characterize it as an ideal tool for non-target screening (NTS) approaches. Combined with multivariate data analysis, it has been successfully applied to several areas in food science, such as authenticity control and flavor profiling. The recorded raw data feature high numbers of variables due to the high scan speeds of the instrument. Additionally, NTS approaches - by design - record more data than required. Therefore, reducing the number of variables is a key step in any machine learning pipeline to reduce overfitting, overlong training times and model complexity. The aim of the study is a comparison between the two most used dimensionality reduction techniques, PCA and PLS, regarding interpretability, as a tool to find marker compounds, and performance as a preprocessing step for supervised learning. Both feature per variable visualizations, which allows easy interpretation of results and retains a connection to the input data, which can lead to the discovery of marker compounds. A GC-IMS dataset about the botanical origin of honey is used, and all formatting steps necessary to apply PCA and PLS to higher dimensional data and obtain intuitive figures are explained. To evaluate effectiveness as a preprocessing step in a supervised pipeline four supervised algorithms were fitted with PCA or PLS variable reduction. PLS proved to be a more effective step in a supervised workflow in terms of accuracy, while PCA is highly effective for revealing preprocessing weaknesses such as misalignments.

Identifiants

DOI: 10.1016/j.foodres.2022.111779 PMID: 36192933

pubmed: 36192933

pii: S0963-9969(22)00837-7

doi: 10.1016/j.foodres.2022.111779

pii:

doi:

Substances chimiques

Volatile Organic Compounds 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

111779

Informations de copyright

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Finding features - variable extraction strategies for dimensionality reduction and marker compounds identification in GC-IMS data.

Journal

Informations de publication

Résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Déclaration de conflit d'intérêts

Auteurs

Joscha Christmann (J)

Sascha Rohn (S)

Philipp Weller (P)

Articles similaires

Mediating role of systemic inflammation in the association between volatile organic compounds exposure and periodontitis: NHANES 2011-2014.

The detection and utilization of volatile metabolomics in Klebsiella pneumoniae by gas chromatography-ion mobility spectrometry.

Biofilms in modern CaCO

Comparison of microwave and pulsed electric field methods on extracting antioxidant compounds from Arvaneh plant (Hymenocrater platystegius Rech. F).

Classifications MeSH