Robust Classification of High-Dimensional Spectroscopy Data Using Deep Learning and Data Synthesis.


Journal

Journal of chemical information and modeling
ISSN: 1549-960X
Titre abrégé: J Chem Inf Model
Pays: United States
ID NLM: 101230060

Informations de publication

Date de publication:
27 04 2020
Historique:
pubmed: 7 3 2020
medline: 22 6 2021
entrez: 7 3 2020
Statut: ppublish

Résumé

This paper presents a new approach to classification of high-dimensional spectroscopy data and demonstrates that it outperforms other current state-of-the art approaches. The specific task we consider is identifying whether samples contain chlorinated solvents or not, based on their Raman spectra. We also examine robustness to classification of outlier samples that are not represented in the training set (negative outliers). A novel application of a locally connected neural network (NN) for the binary classification of spectroscopy data is proposed and demonstrated to yield improved accuracy over traditionally popular algorithms. Additionally, we present the ability to further increase the accuracy of the locally connected NN algorithm through the use of synthetic training spectra, and we investigate the use of autoencoder based one-class classifiers and outlier detectors. Finally, a two-step classification process is presented as an alternative to the binary and one-class classification paradigms. This process combines the locally connected NN classifier, the use of synthetic training data, and an autoencoder based outlier detector to produce a model which is shown to both produce high classification accuracy and be robust in the presence of negative outliers.

Identifiants

pubmed: 32142271
doi: 10.1021/acs.jcim.9b01037
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

1936-1954

Auteurs

James Houston (J)

School of Computer Science, National University of Ireland, Galway H91 TK33, Ireland.

Frank G Glavin (FG)

School of Computer Science, National University of Ireland, Galway H91 TK33, Ireland.

Michael G Madden (MG)

School of Computer Science, National University of Ireland, Galway H91 TK33, Ireland.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Algorithms Software Artificial Intelligence Computer Simulation

Classifications MeSH