Uncovering biomarker genes with enriched classification potential from Hallmark gene sets.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
05 07 2019
05 07 2019
Historique:
received:
27
12
2018
accepted:
20
06
2019
entrez:
7
7
2019
pubmed:
7
7
2019
medline:
23
10
2020
Statut:
epublish
Résumé
Given the complex relationship between gene expression and phenotypic outcomes, computationally efficient approaches are needed to sift through large high-dimensional datasets in order to identify biologically relevant biomarkers. In this report, we describe a method of identifying the most salient biomarker genes in a dataset, which we call "candidate genes", by evaluating the ability of gene combinations to classify samples from a dataset, which we call "classification potential". Our algorithm, Gene Oracle, uses a neural network to test user defined gene sets for polygenic classification potential and then uses a combinatorial approach to further decompose selected gene sets into candidate and non-candidate biomarker genes. We tested this algorithm on curated gene sets from the Molecular Signatures Database (MSigDB) quantified in RNAseq gene expression matrices obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) data repositories. First, we identified which MSigDB Hallmark subsets have significant classification potential for both the TCGA and GTEx datasets. Then, we identified the most discriminatory candidate biomarker genes in each Hallmark gene set and provide evidence that the improved biomarker potential of these genes may be due to reduced functional complexity.
Identifiants
pubmed: 31278367
doi: 10.1038/s41598-019-46059-1
pii: 10.1038/s41598-019-46059-1
pmc: PMC6611793
doi:
Substances chimiques
Biomarkers, Tumor
0
Types de publication
Journal Article
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
9747Références
Nat Protoc. 2016 Sep;11(9):1650-67
pubmed: 27560171
Cell. 2014 Aug 14;158(4):929-944
pubmed: 25109877
Neural Netw. 2008 Mar-Apr;21(2-3):427-36
pubmed: 18272329
Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
Genome Med. 2015 Jun 26;7(1):61
pubmed: 26170901
Nucleic Acids Res. 2017 Jan 4;45(D1):D353-D361
pubmed: 27899662
Cell. 2016 Jan 28;164(3):550-63
pubmed: 26824661
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50
pubmed: 16199517
Genome Res. 2012 Sep;22(9):1760-74
pubmed: 22955987
Biopreserv Biobank. 2015 Oct;13(5):311-9
pubmed: 26484571
Nat Genet. 2013 Oct;45(10):1113-20
pubmed: 24071849
Nucleic Acids Res. 2018 Jan 4;46(D1):D649-D655
pubmed: 29145629
Curr Opin HIV AIDS. 2010 Nov;5(6):463-6
pubmed: 20978388
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W305-11
pubmed: 19465376
Nucleic Acids Res. 2014 Jan;42(Database issue):D358-63
pubmed: 24234451
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D535-9
pubmed: 16381927
Mol Pharm. 2016 May 2;13(5):1445-54
pubmed: 27007977
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761
pubmed: 29155950
Cell. 2018 Apr 5;173(2):291-304.e6
pubmed: 29625048
Sci Rep. 2018 May 25;8(1):8180
pubmed: 29802335