Geometric anomaly detection in data.

persistent cohomology singularities stratification inference

Journal

Proceedings of the National Academy of Sciences of the United States of America
ISSN: 1091-6490
Titre abrégé: Proc Natl Acad Sci U S A
Pays: United States
ID NLM: 7505876

Informations de publication

Date de publication:
18 08 2020
Historique:
pubmed: 5 8 2020
medline: 5 8 2020
entrez: 5 8 2020
Statut: ppublish

Résumé

The quest for low-dimensional models which approximate high-dimensional data is pervasive across the physical, natural, and social sciences. The dominant paradigm underlying most standard modeling techniques assumes that the data are concentrated near a single unknown manifold of relatively small intrinsic dimension. Here, we present a systematic framework for detecting interfaces and related anomalies in data which may fail to satisfy the manifold hypothesis. By computing the local topology of small regions around each data point, we are able to partition a given dataset into disjoint classes, each of which can be individually approximated by a single manifold. Since these manifolds may have different intrinsic dimensions, local topology discovers singular regions in data even when none of the points have been sampled precisely from the singularities. We showcase this method by identifying the intersection of two surfaces in the 24-dimensional space of cyclo-octane conformations and by locating all of the self-intersections of a Henneberg minimal surface immersed in 3-dimensional space. Due to the local nature of the topological computations, the algorithmic burden of performing such data stratification is readily distributable across several processors.

Identifiants

pubmed: 32747569
pii: 2001741117
doi: 10.1073/pnas.2001741117
pmc: PMC7443892
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

19664-19669

Subventions

Organisme : Medical Research Council
Pays : United Kingdom

Informations de copyright

Copyright © 2020 the Author(s). Published by PNAS.

Déclaration de conflit d'intérêts

The authors declare no competing interest.

Références

J Chem Phys. 2010 Jun 21;132(23):234115
pubmed: 20572697
Science. 2000 Dec 22;290(5500):2319-23
pubmed: 11125149
Science. 2000 Dec 22;290(5500):2268-9
pubmed: 11188725
Nat Biotechnol. 2008 Mar;26(3):303-4
pubmed: 18327243
EPJ Data Sci. 2017;6(1):17
pubmed: 32025466

Auteurs

Bernadette J Stolz (BJ)

Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom.

Jared Tanner (J)

Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom.
The Alan Turing Institute, British Library, London NW1 2DB, United Kingdom.

Heather A Harrington (HA)

Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom.
The Alan Turing Institute, British Library, London NW1 2DB, United Kingdom.

Vidit Nanda (V)

Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom; nanda@maths.ox.ac.uk.
The Alan Turing Institute, British Library, London NW1 2DB, United Kingdom.

Classifications MeSH