A topological data analysis based classification method for multiple measurements.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
29 Jul 2020
Historique:
received: 20 08 2019
accepted: 13 07 2020
entrez: 31 7 2020
pubmed: 31 7 2020
medline: 5 9 2020
Statut: epublish

Résumé

Machine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. A machine learning model with cross-validation is then applied for classification. When test this on three case studies, accuracy exceeds an alternative support vector machine (SVM) voting model in most situations tested, with additional benefits such as reporting data subsets with high purity along with feature values. For 100 examples of 3 different tree species, the model reached 80% classification accuracy after 30 datapoints, which was improved to 90% after increased sampling to 400 datapoints. The alternative SVM classifier achieved a maximum accuracy of 68.7%. Using data from 100 examples from each class of 6 different random point processes, the classifier achieved 96.8% accuracy, vastly outperforming the SVM. Using two outcomes in neuron spiking data, the TDA classifier was similarly accurate to the SVM in one case (both converged to 97.8% accuracy), but was outperformed in the other (relative accuracies 79.8% and 92.2%, respectively). This algorithm and software can be beneficial for repeated measurement data common in biological sciences, as both an accurate classifier and a feature selection tool.

Sections du résumé

BACKGROUND BACKGROUND
Machine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. A machine learning model with cross-validation is then applied for classification. When test this on three case studies, accuracy exceeds an alternative support vector machine (SVM) voting model in most situations tested, with additional benefits such as reporting data subsets with high purity along with feature values.
RESULTS RESULTS
For 100 examples of 3 different tree species, the model reached 80% classification accuracy after 30 datapoints, which was improved to 90% after increased sampling to 400 datapoints. The alternative SVM classifier achieved a maximum accuracy of 68.7%. Using data from 100 examples from each class of 6 different random point processes, the classifier achieved 96.8% accuracy, vastly outperforming the SVM. Using two outcomes in neuron spiking data, the TDA classifier was similarly accurate to the SVM in one case (both converged to 97.8% accuracy), but was outperformed in the other (relative accuracies 79.8% and 92.2%, respectively).
CONCLUSIONS CONCLUSIONS
This algorithm and software can be beneficial for repeated measurement data common in biological sciences, as both an accurate classifier and a feature selection tool.

Identifiants

pubmed: 32727348
doi: 10.1186/s12859-020-03659-3
pii: 10.1186/s12859-020-03659-3
pmc: PMC7392670
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

336

Références

Cell. 2015 Oct 8;163(2):456-92
pubmed: 26451489
Ann Appl Stat. 2016;10(1):198-218
pubmed: 27642379
J Theor Biol. 2015 Jan 21;365:226-37
pubmed: 25308508
Proc Natl Acad Sci U S A. 2011 Apr 26;108(17):7265-70
pubmed: 21482760
Neuroinformatics. 2018 Jan;16(1):3-13
pubmed: 28975511
J Theor Biol. 1998 Mar 7;191(1):1-46
pubmed: 9593655
Front Comput Neurosci. 2017 Jun 12;11:48
pubmed: 28659782
Sci Transl Med. 2015 Oct 28;7(311):311ra174
pubmed: 26511511

Auteurs

Henri Riihimäki (H)

University of Aberdeen, Aberdeen, UK.

Wojciech Chachólski (W)

KTH-The Royal Institute of Technology, Stockholm, Sweden.

Jakob Theorell (J)

Karolinska Institutet, Stockholm, Sweden.
University of Oxford, Oxford, UK.

Jan Hillert (J)

Karolinska Institutet, Stockholm, Sweden.

Ryan Ramanujam (R)

KTH-The Royal Institute of Technology, Stockholm, Sweden. ryan.ramanujam@ki.se.
Karolinska Institutet, Stockholm, Sweden. ryan.ramanujam@ki.se.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH