Using Global t-SNE to Preserve Intercluster Data Structure.


Journal

Neural computation
ISSN: 1530-888X
Titre abrégé: Neural Comput
Pays: United States
ID NLM: 9426182

Informations de publication

Date de publication:
14 07 2022
Historique:
received: 27 09 2021
accepted: 22 01 2022
pubmed: 8 7 2022
medline: 20 7 2022
entrez: 7 7 2022
Statut: ppublish

Résumé

The t-distributed stochastic neighbor embedding (t-SNE) method is one of the leading techniques for data visualization and clustering. This method finds lower-dimensional embedding of data points while minimizing distortions in distances between neighboring data points. By construction, t-SNE discards information about large-scale structure of the data. We show that adding a global cost function to the t-SNE cost function makes it possible to cluster the data while preserving global intercluster data structure. We test the new global t-SNE (g-SNE) method on one synthetic and two real data sets on flower shapes and human brain cells. We find that significant and meaningful global structure exists in both the plant and human brain data sets. In all cases, g-SNE outperforms t-SNE and UMAP in preserving the global structure. Topological analysis of the clustering result makes it possible to find an appropriate trade-off of data distribution across scales. We find differences in how data are distributed across scales between the two subjects that were part of the human brain data set. Thus, by striving to produce both accurate clustering and positioning between clusters, the g-SNE method can identify new aspects of data organization across scales.

Identifiants

pubmed: 35798323
pii: 111786
doi: 10.1162/neco_a_01504
pmc: PMC10010455
mid: NIHMS1848571
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, U.S. Gov't, Non-P.H.S. Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

1637-1651

Subventions

Organisme : NIA NIH HHS
ID : P30 AG068635
Pays : United States
Organisme : NINDS NIH HHS
ID : U19 NS112959
Pays : United States

Informations de copyright

© 2022 Massachusetts Institute of Technology.

Références

Cell Rep. 2016 Feb 2;14(4):966-977
pubmed: 26804912
Nat Biotechnol. 2013 Jun;31(6):545-52
pubmed: 23685480
Cell Syst. 2018 Dec 26;7(6):656-666.e4
pubmed: 30528274
Nat Commun. 2019 Nov 28;10(1):5415
pubmed: 31780669
Nat Methods. 2019 Mar;16(3):243-245
pubmed: 30742040
Nature. 2012 Sep 20;489(7416):391-399
pubmed: 22996553
Proc Natl Acad Sci U S A. 2015 Nov 3;112(44):13455-60
pubmed: 26487684
Nat Biotechnol. 2018 Dec 03;:
pubmed: 30531897
Methods. 2015 Feb;73:79-89
pubmed: 25449901
Science. 2015 May 8;348(6235):660-5
pubmed: 25954002
Science. 1980 Oct 24;210(4468):390-8
pubmed: 17837406
Nat Biotechnol. 2018 Jun;36(5):442-450
pubmed: 29608178
Sci Adv. 2018 Aug 29;4(8):eaaq1458
pubmed: 30167457
Nat Commun. 2018 May 21;9(1):2002
pubmed: 29784946
Nat Commun. 2019 Nov 28;10(1):5416
pubmed: 31780648
Nature. 2015 Feb 26;518(7540):529-33
pubmed: 25719670

Auteurs

Yuansheng Zhou (Y)

Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, U.S.A.
Division of Biological Sciences, University of California San Diego, La Jolla, CA 92037, U.S.A. yuz461@ucsd.edu.

Tatyana O Sharpee (TO)

Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, U.S.A.
Department of Physics, University of California San Diego, La Jolla, CA 92037, U.S.A. sharpee@salk.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH