Identifying homogeneous subgroups of patients and important features: a topological machine learning approach.

Clustering Machine learning Topological data analysis

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
20 Sep 2021
Historique:
received: 05 03 2021
accepted: 07 09 2021
entrez: 21 9 2021
pubmed: 22 9 2021
medline: 23 9 2021
Statut: epublish

Résumé

This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper. Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline .

Sections du résumé

BACKGROUND BACKGROUND
This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph.
RESULTS RESULTS
We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper.
CONCLUSIONS CONCLUSIONS
Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline .

Identifiants

pubmed: 34544357
doi: 10.1186/s12859-021-04360-9
pii: 10.1186/s12859-021-04360-9
pmc: PMC8451168
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

449

Subventions

Organisme : Brain and Behavior Research Foundation
ID : 26338

Informations de copyright

© 2021. The Author(s).

Références

Sci Rep. 2013;3:1236
pubmed: 23393618
Psychol Med. 2010 Aug;40(8):1367-77
pubmed: 19863842
J Hum Genet. 2021 Jan;66(1):67-74
pubmed: 32772049
BMC Bioinformatics. 2020 Jul 29;21(1):336
pubmed: 32727348
Nat Biotechnol. 2017 Jun;35(6):551-560
pubmed: 28459448
R Soc Open Sci. 2021 Jan 28;8(1):201823
pubmed: 33614100
Am J Psychiatry. 2010 May;167(5):555-64
pubmed: 20360315
Psychol Med. 2016 Sep;46(12):2455-65
pubmed: 27406289

Auteurs

Ewan Carr (E)

Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.

Mathieu Carrière (M)

Inria Sophia-Antipolis, DataShape Team, Biot, France.

Bertrand Michel (B)

Ecole Centrale de Nantes, LMJL - UMR CNRS 6629, Nantes, France.

Frédéric Chazal (F)

Inria Saclay, Ile-de-France, Alan Turing Building, Palaiseau, France.

Raquel Iniesta (R)

Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK. raquel.iniesta@kcl.ac.uk.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH