A comparison of survival analysis methods for cancer gene expression RNA-Sequencing data.

Base Sequence Biomarkers, Tumor / genetics Data Interpretation, Statistical Gene Expression Profiling / methods Gene Expression Regulation, Neoplastic / genetics Humans Kaplan-Meier Estimate Neoplasms / genetics Prognosis Proportional Hazards Models ROC Curve Sequence Analysis, RNA / methods Survival Analysis

Cancer Gene expression Kaplan–Meier Survival analysis TCGA

Journal

Cancer genetics

ISSN: 2210-7762

Titre abrégé: Cancer Genet

Pays: United States

ID NLM: 101539150

Informations de publication

Date de publication:
06 2019

Historique:

received: 15 11 2018

revised: 19 03 2019

accepted: 09 04 2019

entrez: 13 7 2019

pubmed: 13 7 2019

medline: 7 3 2020

Statut: ppublish

Résumé

Identifying genetic biomarkers of patient survival remains a major goal of large-scale cancer profiling studies. Using gene expression data to predict the outcome of a patient's tumor makes biomarker discovery a compelling tool for improving patient care. As genomic technologies expand, multiple data types may serve as informative biomarkers, and bioinformatic strategies have evolved around these different applications. For categorical variables such as a gene's mutation status, biomarker identification to predict survival time is straightforward. However, for continuous variables like gene expression, the available methods generate highly-variable results, and studies on best practices are lacking. We investigated the performance of eight methods that deal specifically with continuous data. K-means, Cox regression, concordance index, D-index, 25th-75th percentile split, median-split, distribution-based splitting, and KaplanScan were applied to four RNA-sequencing (RNA-seq) datasets from the Cancer Genome Atlas. The reliability of the eight methods was assessed by splitting each dataset into two groups and comparing the overlap of the results. Gene sets that had been identified from the literature for a specific tumor type served as positive controls to assess the accuracy of each biomarker using receiver operating characteristic (ROC) curves. Artificial RNA-Seq data were generated to test the robustness of these methods under fixed levels of gene expression noise. Our results show that methods based on dichotomizing tend to have consistently poor performance while C-index, D-index, and k-means perform well in most settings. Overall, the Cox regression method had the strongest performance based on tests of accuracy, reliability, and robustness.

Identifiants

DOI: 10.1016/j.cancergen.2019.04.004 PMID: 31296308

pubmed: 31296308

pii: S2210-7762(18)30489-7

doi: 10.1016/j.cancergen.2019.04.004

pii:

doi:

Substances chimiques

Biomarkers, Tumor 0

Types de publication

Comparative Study Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

A comparison of survival analysis methods for cancer gene expression RNA-Sequencing data.

Journal

Informations de publication

Résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Auteurs

Pichai Raman (P)

Samuel Zimmerman (S)

Komal S Rathi (KS)

Laurence de Torrenté (L)

Mahdi Sarmady (M)

Chao Wu (C)

Jeremy Leipzig (J)

Deanne M Taylor (DM)

Aydin Tozeren (A)

Jessica C Mar (JC)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH