A comparison of survival analysis methods for cancer gene expression RNA-Sequencing data.
Base Sequence
Biomarkers, Tumor
/ genetics
Data Interpretation, Statistical
Gene Expression Profiling
/ methods
Gene Expression Regulation, Neoplastic
/ genetics
Humans
Kaplan-Meier Estimate
Neoplasms
/ genetics
Prognosis
Proportional Hazards Models
ROC Curve
Sequence Analysis, RNA
/ methods
Survival Analysis
Cancer
Gene expression
Kaplan–Meier
Survival analysis
TCGA
Journal
Cancer genetics
ISSN: 2210-7762
Titre abrégé: Cancer Genet
Pays: United States
ID NLM: 101539150
Informations de publication
Date de publication:
06 2019
06 2019
Historique:
received:
15
11
2018
revised:
19
03
2019
accepted:
09
04
2019
entrez:
13
7
2019
pubmed:
13
7
2019
medline:
7
3
2020
Statut:
ppublish
Résumé
Identifying genetic biomarkers of patient survival remains a major goal of large-scale cancer profiling studies. Using gene expression data to predict the outcome of a patient's tumor makes biomarker discovery a compelling tool for improving patient care. As genomic technologies expand, multiple data types may serve as informative biomarkers, and bioinformatic strategies have evolved around these different applications. For categorical variables such as a gene's mutation status, biomarker identification to predict survival time is straightforward. However, for continuous variables like gene expression, the available methods generate highly-variable results, and studies on best practices are lacking. We investigated the performance of eight methods that deal specifically with continuous data. K-means, Cox regression, concordance index, D-index, 25th-75th percentile split, median-split, distribution-based splitting, and KaplanScan were applied to four RNA-sequencing (RNA-seq) datasets from the Cancer Genome Atlas. The reliability of the eight methods was assessed by splitting each dataset into two groups and comparing the overlap of the results. Gene sets that had been identified from the literature for a specific tumor type served as positive controls to assess the accuracy of each biomarker using receiver operating characteristic (ROC) curves. Artificial RNA-Seq data were generated to test the robustness of these methods under fixed levels of gene expression noise. Our results show that methods based on dichotomizing tend to have consistently poor performance while C-index, D-index, and k-means perform well in most settings. Overall, the Cox regression method had the strongest performance based on tests of accuracy, reliability, and robustness.
Identifiants
pubmed: 31296308
pii: S2210-7762(18)30489-7
doi: 10.1016/j.cancergen.2019.04.004
pii:
doi:
Substances chimiques
Biomarkers, Tumor
0
Types de publication
Comparative Study
Journal Article
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
1-12Informations de copyright
Copyright © 2019 Elsevier Inc. All rights reserved.