Identifying homogeneous subgroups of patients and important features: a topological machine learning approach.
Clustering
Machine learning
Topological data analysis
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
20 Sep 2021
20 Sep 2021
Historique:
received:
05
03
2021
accepted:
07
09
2021
entrez:
21
9
2021
pubmed:
22
9
2021
medline:
23
9
2021
Statut:
epublish
Résumé
This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper. Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline .
Sections du résumé
BACKGROUND
BACKGROUND
This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph.
RESULTS
RESULTS
We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper.
CONCLUSIONS
CONCLUSIONS
Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline .
Identifiants
pubmed: 34544357
doi: 10.1186/s12859-021-04360-9
pii: 10.1186/s12859-021-04360-9
pmc: PMC8451168
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
449Subventions
Organisme : Brain and Behavior Research Foundation
ID : 26338
Informations de copyright
© 2021. The Author(s).
Références
Sci Rep. 2013;3:1236
pubmed: 23393618
Psychol Med. 2010 Aug;40(8):1367-77
pubmed: 19863842
J Hum Genet. 2021 Jan;66(1):67-74
pubmed: 32772049
BMC Bioinformatics. 2020 Jul 29;21(1):336
pubmed: 32727348
Nat Biotechnol. 2017 Jun;35(6):551-560
pubmed: 28459448
R Soc Open Sci. 2021 Jan 28;8(1):201823
pubmed: 33614100
Am J Psychiatry. 2010 May;167(5):555-64
pubmed: 20360315
Psychol Med. 2016 Sep;46(12):2455-65
pubmed: 27406289