Data-driven characterization of molecular phenotypes across heterogeneous sample collections.
Journal
Nucleic acids research
ISSN: 1362-4962
Titre abrégé: Nucleic Acids Res
Pays: England
ID NLM: 0411011
Informations de publication
Date de publication:
26 07 2019
26 07 2019
Historique:
accepted:
10
04
2019
revised:
02
04
2019
received:
23
02
2019
pubmed:
23
7
2019
medline:
7
1
2020
entrez:
23
7
2019
Statut:
ppublish
Résumé
Existing large gene expression data repositories hold enormous potential to elucidate disease mechanisms, characterize changes in cellular pathways, and to stratify patients based on molecular profiles. To achieve this goal, integrative resources and tools are needed that allow comparison of results across datasets and data types. We propose an intuitive approach for data-driven stratifications of molecular profiles and benchmark our methodology using the dimensionality reduction algorithm t-distributed stochastic neighbor embedding (t-SNE) with multi-study and multi-platform data on hematological malignancies. Our approach enables assessing the contribution of biological versus technical variation to sample clustering, direct incorporation of additional datasets to the same low dimensional representation, comparison of molecular disease subtypes identified from separate t-SNE representations, and characterization of the obtained clusters based on pathway databases and additional data. In this manner, we performed an integrative analysis across multi-omics acute myeloid leukemia studies. Our approach indicated new molecular subtypes with differential survival and drug responsiveness among samples lacking fusion genes, including a novel myelodysplastic syndrome-like cluster and a cluster characterized with CEBPA mutations and differential activity of the S-adenosylmethionine-dependent DNA methylation pathway. In summary, integration across multiple studies can help to identify novel molecular disease subtypes and generate insight into disease biology.
Identifiants
pubmed: 31329928
pii: 5477460
doi: 10.1093/nar/gkz281
pmc: PMC6648337
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e76Informations de copyright
© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
Références
Bioinformatics. 2015 Sep 15;31(18):3069-71
pubmed: 25990557
Sci Rep. 2017 Jan 03;7:39921
pubmed: 28045081
Nucleic Acids Res. 2002 Jan 1;30(1):207-10
pubmed: 11752295
Brief Bioinform. 2016 Jul;17(4):628-41
pubmed: 26969681
BMC Bioinformatics. 2003 Oct 13;4:48
pubmed: 14552657
PLoS One. 2013 Apr 16;8(4):e61872
pubmed: 23613961
N Engl J Med. 2013 May 30;368(22):2059-74
pubmed: 23634996
Nucleic Acids Res. 2011 Jan;39(Database issue):D685-90
pubmed: 21071392
Nature. 2012 Sep 6;489(7414):57-74
pubmed: 22955616
Nature. 2018 Oct;562(7728):526-531
pubmed: 30333627
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50
pubmed: 16199517
Nat Methods. 2013 Jun;10(6):577-83
pubmed: 23603899
BMC Bioinformatics. 2013 Jan 16;14:7
pubmed: 23323831
Genome Biol. 2013 Apr 25;14(4):R36
pubmed: 23618408
Mod Pathol. 2015 May;28(5):706-14
pubmed: 25412851
Pharmacogenomics J. 2010 Aug;10(4):278-91
pubmed: 20676067
Cancer Cell. 2010 Jan 19;17(1):13-27
pubmed: 20060365
Proc Natl Acad Sci U S A. 2007 Feb 6;104(6):1777-82
pubmed: 17267599
Genome Biol. 2008;9(2):R26
pubmed: 18248669
Blood. 2003 Oct 15;102(8):2951-9
pubmed: 12730115
Genome Biol. 2015 Jan 05;16:22
pubmed: 25723102
N Engl J Med. 2004 Apr 15;350(16):1617-28
pubmed: 15084694
Proc Natl Acad Sci U S A. 2014 Jan 7;111(1):202-7
pubmed: 24344260
BMC Bioinformatics. 2008 May 20;9:244
pubmed: 18492285
Nucleic Acids Res. 2003 Feb 15;31(4):e15
pubmed: 12582260
Nat Biotechnol. 2010 Apr;28(4):322-4
pubmed: 20379172
Cancer Causes Control. 2016 Aug;27(8):1019-26
pubmed: 27351920
Nucleic Acids Res. 2016 Jan 4;44(D1):D488-94
pubmed: 26481357
PLoS Genet. 2007 Sep;3(9):1724-35
pubmed: 17907809
Nat Rev Genet. 2010 Oct;11(10):733-9
pubmed: 20838408
Nat Methods. 2015 Mar;12(3):211-4, 3 p following 214
pubmed: 25581801
Genome Biol. 2013 Sep 24;14(9):r105
pubmed: 24063430
Mol Syst Biol. 2018 Jun 20;14(6):e8124
pubmed: 29925568