Multiple kernel learning for integrative consensus clustering of omic datasets.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
15 09 2020
15 09 2020
Historique:
received:
01
03
2020
revised:
18
05
2020
accepted:
19
06
2020
pubmed:
28
6
2020
medline:
4
3
2021
entrez:
28
6
2020
Statut:
ppublish
Résumé
Diverse applications-particularly in tumour subtyping-have demonstrated the importance of integrative clustering techniques for combining information from multiple data sources. Cluster Of Clusters Analysis (COCA) is one such approach that has been widely applied in the context of tumour subtyping. However, the properties of COCA have never been systematically explored, and its robustness to the inclusion of noisy datasets is unclear. We rigorously benchmark COCA, and present Kernel Learning Integrative Clustering (KLIC) as an alternative strategy. KLIC frames the challenge of combining clustering structures as a multiple kernel learning problem, in which different datasets each provide a weighted contribution to the final clustering. This allows the contribution of noisy datasets to be down-weighted relative to more informative datasets. We compare the performances of KLIC and COCA in a variety of situations through simulation studies. We also present the output of KLIC and COCA in real data applications to cancer subtyping and transcriptional module discovery. R packages klic and coca are available on the Comprehensive R Archive Network. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 32592464
pii: 5864023
doi: 10.1093/bioinformatics/btaa593
pmc: PMC7750932
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
4789-4796Subventions
Organisme : Medical Research Council
ID : MC_UU_00002/10
Pays : United Kingdom
Organisme : Medical Research Council
ID : MC_UU_00002/13
Pays : United Kingdom
Informations de copyright
© The Author(s) 2020. Published by Oxford University Press.
Références
IEEE Trans Neural Netw. 2002;13(3):780-4
pubmed: 18244475
PLoS Comput Biol. 2017 Oct 16;13(10):e1005781
pubmed: 29036190
Nature. 2011 Jun 29;474(7353):609-15
pubmed: 21720365
Nature. 2012 Oct 4;490(7418):61-70
pubmed: 23000897
Bioinformatics. 2004 Nov 1;20(16):2626-35
pubmed: 15130933
J Am Stat Assoc. 2010 Jun 1;105(490):713-726
pubmed: 20811510
BMC Bioinformatics. 2010 Jun 08;11:309
pubmed: 20529363
Nat Methods. 2017 Apr;14(4):414-416
pubmed: 28263960
Nature. 2004 Sep 2;431(7004):99-104
pubmed: 15343339
Bioinformatics. 2010 Jun 15;26(12):i158-67
pubmed: 20529901
Neural Comput. 2000 Oct;12(10):2385-404
pubmed: 11032039
BMC Bioinformatics. 2008 Aug 04;9:327
pubmed: 18680592
Genome Biol. 2010;11(3):R24
pubmed: 20193063
Cell. 2014 Aug 14;158(4):929-944
pubmed: 25109877
BMC Bioinformatics. 2011 Oct 13;12:399
pubmed: 21995452
Bioinformatics. 2010 Jun 15;26(12):1572-3
pubmed: 20427518
Bioinformatics. 2006 Nov 15;22(22):2753-60
pubmed: 16966363
Brief Bioinform. 2018 Mar 1;19(2):286-302
pubmed: 27881428
Nat Rev Cancer. 2014 May;14(5):299-313
pubmed: 24759209
Bioinformatics. 2009 Nov 15;25(22):2906-12
pubmed: 19759197
Bioinformatics. 2012 Dec 15;28(24):3290-7
pubmed: 23047558
Breast Cancer Res. 2017 Mar 29;19(1):44
pubmed: 28356166
Bioinformatics. 2020 Mar 1;36(5):1484-1491
pubmed: 31608923
Stat Appl Genet Mol Biol. 2016 Mar;15(1):83-6
pubmed: 26910751
Nat Genet. 2002 Aug;31(4):370-7
pubmed: 12134151
Bioinformatics. 2013 Oct 15;29(20):2610-6
pubmed: 23990412
Ann Appl Stat. 2013 Apr 9;7(1):269-294
pubmed: 24587839