Multi-view clustering for multi-omics data using unified embedding.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
12 Aug 2020
12 Aug 2020
Historique:
received:
31
12
2019
accepted:
13
07
2020
entrez:
14
8
2020
pubmed:
14
8
2020
medline:
14
8
2020
Statut:
epublish
Résumé
In real world applications, data sets are often comprised of multiple views, which provide consensus and complementary information to each other. Embedding learning is an effective strategy for nearest neighbour search and dimensionality reduction in large data sets. This paper attempts to learn a unified probability distribution of the points across different views and generates a unified embedding in a low-dimensional space to optimally preserve neighbourhood identity. Probability distributions generated for each point for each view are combined by conflation method to create a single unified distribution. The goal is to approximate this unified distribution as much as possible when a similar operation is performed on the embedded space. As a cost function, the sum of Kullback-Leibler divergence over the samples is used, which leads to a simple gradient adjusting the position of the samples in the embedded space. The proposed methodology can generate embedding from both complete and incomplete multi-view data sets. Finally, a multi-objective clustering technique (AMOSA) is applied to group the samples in the embedded space. The proposed methodology, Multi-view Neighbourhood Embedding (MvNE), shows an improvement of approximately 2-3% over state-of-the-art models when evaluated on 10 omics data sets.
Identifiants
pubmed: 32788601
doi: 10.1038/s41598-020-70229-1
pii: 10.1038/s41598-020-70229-1
pmc: PMC7423957
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
13654Références
IEEE Trans Pattern Anal Mach Intell. 2020 Jan;42(1):86-99
pubmed: 30369436
PLoS Genet. 2006 Aug 18;2(8):e130
pubmed: 16934000
Cancer Cell. 2010 Jan 19;17(1):98-110
pubmed: 20129251
Bioinformatics. 2016 Jun 1;32(11):1724-32
pubmed: 26833341
Biostatistics. 2018 Jan 1;19(1):71-86
pubmed: 28541380
Nat Med. 2001 Jun;7(6):673-9
pubmed: 11385503
Nat Methods. 2014 Mar;11(3):333-7
pubmed: 24464287
Neural Netw. 2007 Jan;20(1):139-52
pubmed: 17113263
Proc Natl Acad Sci U S A. 2002 May 14;99(10):6567-72
pubmed: 12011421
Stat Appl Genet Mol Biol. 2009;8:Article28
pubmed: 19572827
Front Genet. 2019 Aug 20;10:744
pubmed: 31497031
PLoS One. 2019 May 23;14(5):e0216904
pubmed: 31120942
Genome Res. 2017 Dec;27(12):2025-2039
pubmed: 29066617
Acta Oncol. 2008;47(4):725-34
pubmed: 18465341
BMC Bioinformatics. 2015 Aug 19;16:261
pubmed: 26283178
BMC Genomics. 2015 Dec 01;16:1022
pubmed: 26626453
Nucleic Acids Res. 2012 Oct;40(19):9379-91
pubmed: 22879375
Bioinformatics. 2017 Sep 1;33(17):2706-2714
pubmed: 28520848
IEEE Trans Image Process. 2017 Feb 08;26(6):3016-3027
pubmed: 28186894
Nucleic Acids Res. 2018 Nov 16;46(20):10546-10562
pubmed: 30295871
J Clin Oncol. 2009 Mar 10;27(8):1160-7
pubmed: 19204204
Proc Natl Acad Sci U S A. 2013 Mar 12;110(11):4245-50
pubmed: 23431203
Chaos. 2011 Sep;21(3):033102
pubmed: 21974637
IEEE J Biomed Health Inform. 2016 Mar;20(2):691-8
pubmed: 25706936
Nat Methods. 2008 Jan;5(1):16-8
pubmed: 18165802
Neural Netw. 2017 Apr;88:74-89
pubmed: 28214692
Science. 1999 Oct 15;286(5439):531-7
pubmed: 10521349
IEEE Trans Pattern Anal Mach Intell. 2012 Apr;34(4):778-90
pubmed: 21808087