Before and After: Comparison of Legacy and Harmonized TCGA Genomic Data Commons' Data.


Journal

Cell systems
ISSN: 2405-4720
Titre abrégé: Cell Syst
Pays: United States
ID NLM: 101656080

Informations de publication

Date de publication:
24 07 2019
Historique:
received: 19 01 2019
revised: 18 03 2019
accepted: 13 06 2019
entrez: 26 7 2019
pubmed: 26 7 2019
medline: 31 7 2020
Statut: ppublish

Résumé

We present a systematic analysis of the effects of synchronizing a large-scale, deeply characterized, multi-omic dataset to the current human reference genome, using updated software, pipelines, and annotations. For each of 5 molecular data platforms in The Cancer Genome Atlas (TCGA)-mRNA and miRNA expression, single nucleotide variants, DNA methylation and copy number alterations-comprehensive sample, gene, and probe-level studies were performed, towards quantifying the degree of similarity between the 'legacy' GRCh37 (hg19) TCGA data and its GRCh38 (hg38) version as 'harmonized' by the Genomic Data Commons. We offer gene lists to elucidate differences that remained after controlling for confounders, and strategies to mitigate their impact on biological interpretation. Our results demonstrate that the hg19 and hg38 TCGA datasets are very highly concordant, promote informed use of either legacy or harmonized omics data, and provide a rubric that encourages similar comparisons as new data emerge and reference data evolve.

Identifiants

pubmed: 31344359
pii: S2405-4712(19)30201-7
doi: 10.1016/j.cels.2019.06.006
pmc: PMC6707074
mid: NIHMS1535521
pii:
doi:

Substances chimiques

MicroRNAs 0

Types de publication

Comparative Study Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

24-34.e10

Subventions

Organisme : NCI NIH HHS
ID : U24 CA210978
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA210950
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA210974
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA210989
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA210952
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA210957
Pays : United States
Organisme : NCI NIH HHS
ID : R01 CA175486
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA210949
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA209851
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA210990
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA211000
Pays : United States
Organisme : NIEHS NIH HHS
ID : P30 ES010126
Pays : United States
Organisme : NCI NIH HHS
ID : P30 CA016672
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA210969
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA210988
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA143883
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA211006
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA210999
Pays : United States

Informations de copyright

Copyright © 2019 The Authors. Published by Elsevier Inc. All rights reserved.

Références

Cancer Res. 1999 Apr 1;59(7):1445-8
pubmed: 10197611
Nucleic Acids Res. 2001 Jan 1;29(1):308-11
pubmed: 11125122
Genome Biol. 2001;2(6):RESEARCH0018
pubmed: 11423007
Genomics. 2004 Apr;83(4):679-93
pubmed: 15028290
Biostatistics. 2004 Oct;5(4):557-72
pubmed: 15475419
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D140-4
pubmed: 16381832
Gynecol Oncol. 2007 Feb;104(2):331-7
pubmed: 17064757
Nucleic Acids Res. 2007 Jan;35(Database issue):D61-5
pubmed: 17130148
Am J Surg Pathol. 2008 Oct;32(10):1566-71
pubmed: 18724243
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Nucleic Acids Res. 2010 Oct;38(18):e178
pubmed: 20802226
Genome Biol. 2011;12(4):R41
pubmed: 21527027
Am J Surg Pathol. 2011 Jun;35(6):816-26
pubmed: 21552115
Nature. 2011 Jun 29;474(7353):609-15
pubmed: 21720365
PLoS Biol. 2011 Jul;9(7):e1001091
pubmed: 21750661
BMC Bioinformatics. 2011 Aug 04;12:323
pubmed: 21816040
Genomics. 2011 Oct;98(4):288-95
pubmed: 21839163
Bioinformatics. 2012 Feb 1;28(3):311-7
pubmed: 22155872
Genome Res. 2012 Mar;22(3):568-76
pubmed: 22300766
Genome Biol. 2012 Jun 15;13(6):R44
pubmed: 22703947
Genome Res. 2012 Sep;22(9):1760-74
pubmed: 22955987
Bioinformatics. 2013 Jan 1;29(1):15-21
pubmed: 23104886
Bioinformatics. 2013 Jan 15;29(2):189-96
pubmed: 23175756
Nat Biotechnol. 2013 Mar;31(3):213-9
pubmed: 23396013
Nucleic Acids Res. 2013 Apr;41(7):e90
pubmed: 23476028
Am J Hum Genet. 2013 Sep 5;93(3):411-21
pubmed: 23932108
Bioinformatics. 2015 Jan 15;31(2):166-9
pubmed: 25260700
PLoS One. 2014 Nov 18;9(11):e111516
pubmed: 25405470
Blood. 2015 Jan 22;125(4):600-5
pubmed: 25499761
Genome Biol. 2014 Dec 03;15(12):503
pubmed: 25599564
Nucleic Acids Res. 2016 Jan 8;44(1):e3
pubmed: 26271990
Curr Protoc Bioinformatics. 2015 Sep 03;51:11.14.1-19
pubmed: 26334920
Nat Med. 2015 Nov;21(11):1253-61
pubmed: 26540387
Nat Med. 2016 Jan;22(1):97-104
pubmed: 26657142
BMC Genomics. 2016 Jun 22;17:469
pubmed: 27334613
Nature. 2016 Aug 17;536(7616):285-91
pubmed: 27535533
Genome Biol. 2016 Aug 24;17(1):178
pubmed: 27557938
Nucleic Acids Res. 2017 Feb 28;45(4):e22
pubmed: 27924034
Cancer Res. 2017 Nov 1;77(21):e7-e10
pubmed: 29092928
Oncogene. 2018 Apr;37(17):2213-2224
pubmed: 29379162
Cell Syst. 2018 Mar 28;6(3):271-281.e7
pubmed: 29596782
Nat Genet. 2018 Apr;50(4):591-602
pubmed: 29610480
Cell Rep. 2018 Apr 3;23(1):297-312.e12
pubmed: 29617668
Cancer Cell. 2018 Apr 9;33(4):706-720.e9
pubmed: 29622465
Cell. 2018 Apr 5;173(2):283-285
pubmed: 29625045
Nucleic Acids Res. 2018 Nov 16;46(20):e123
pubmed: 30085201
Curr Opin Genet Dev. 1996 Dec;6(6):743-8
pubmed: 8994846

Auteurs

Galen F Gao (GF)

Eli and Edythe L. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA; The University of Texas Southwestern Medical School, Dallas, TX 75390, USA.

Joel S Parker (JS)

Department of Genetics, Lineberger Comprehensive Cancer Center, the University of North Carolin at Chapel Hill, Chapel Hill, NC 27599, USA.

Sheila M Reynolds (SM)

Institute for Systems Biology, Seattle, WA 98109, USA.

Tiago C Silva (TC)

Center for Bioinformatics and Functional Genomics, Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA; Department of Genetics, Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, SP 14.040-905, Brazil.

Liang-Bo Wang (LB)

Department of Medicine, Washington University in St Louis, Saint Louis, MO 63108, USA; McDonnell Genome Institute, Washington University in St Louis, Saint Louis, MO 63108, USA; Siteman Cancer Center, Washington University in St Louis, Saint Louis, MO 63108, USA.

Wanding Zhou (W)

Center for Epigenetics, Van Andel Research Institute, Grand Rapids, MI 49503, USA.

Rehan Akbani (R)

Department of Bioinformatics and Computational Biology, the University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.

Matthew Bailey (M)

Department of Medicine, Washington University in St Louis, Saint Louis, MO 63108, USA; McDonnell Genome Institute, Washington University in St Louis, Saint Louis, MO 63108, USA; Siteman Cancer Center, Washington University in St Louis, Saint Louis, MO 63108, USA.

Saianand Balu (S)

Lineberger Comprehensive Cancer Center, Bioinformatics Core, the University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Benjamin P Berman (BP)

Center for Bioinformatics and Functional Genomics, Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA; Faculty of Medicine, Department of Developmental Biology and Cancer Research, the Hebrew University of Jerusalem, Jerusalem 91120, Israel.

Denise Brooks (D)

Canada's Michael Smith Genome Sciences Centre, Vancouver, BC V5Z 4S6, Canada.

Hu Chen (H)

Department of Bioinformatics and Computational Biology, the University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX 77030, USA.

Andrew D Cherniack (AD)

Eli and Edythe L. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.

John A Demchok (JA)

National Cancer Institute, Bethesda, MD 20892, USA.

Li Ding (L)

Department of Medicine, Washington University in St Louis, Saint Louis, MO 63108, USA; McDonnell Genome Institute, Washington University in St Louis, Saint Louis, MO 63108, USA; Siteman Cancer Center, Washington University in St Louis, Saint Louis, MO 63108, USA.

Ina Felau (I)

National Cancer Institute, Bethesda, MD 20892, USA.

Sharon Gaheen (S)

Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD 21702, USA.

Daniela S Gerhard (DS)

National Cancer Institute, Bethesda, MD 20892, USA.

David I Heiman (DI)

Eli and Edythe L. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA.

Kyle M Hernandez (KM)

Department of Pediatrics, the University of Chicago, Chicago, IL 60637, USA; Center for Research Informatics, the University of Chicago, Chicago, IL 60637, USA.

Katherine A Hoadley (KA)

Department of Genetics, Lineberger Comprehensive Cancer Center, the University of North Carolin at Chapel Hill, Chapel Hill, NC 27599, USA.

Reyka Jayasinghe (R)

Department of Medicine, Washington University in St Louis, Saint Louis, MO 63108, USA.

Anab Kemal (A)

National Cancer Institute, Bethesda, MD 20892, USA.

Theo A Knijnenburg (TA)

Institute for Systems Biology, Seattle, WA 98109, USA.

Peter W Laird (PW)

Center for Epigenetics, Van Andel Research Institute, Grand Rapids, MI 49503, USA.

Michael K A Mensah (MKA)

National Cancer Institute, Bethesda, MD 20892, USA.

Andrew J Mungall (AJ)

Canada's Michael Smith Genome Sciences Centre, Vancouver, BC V5Z 4S6, Canada.

A Gordon Robertson (AG)

Canada's Michael Smith Genome Sciences Centre, Vancouver, BC V5Z 4S6, Canada.

Hui Shen (H)

Center for Epigenetics, Van Andel Research Institute, Grand Rapids, MI 49503, USA.

Roy Tarnuzzer (R)

National Cancer Institute, Bethesda, MD 20892, USA.

Zhining Wang (Z)

National Cancer Institute, Bethesda, MD 20892, USA.

Matthew Wyczalkowski (M)

Department of Medicine, Washington University in St Louis, Saint Louis, MO 63108, USA; McDonnell Genome Institute, Washington University in St Louis, Saint Louis, MO 63108, USA; Siteman Cancer Center, Washington University in St Louis, Saint Louis, MO 63108, USA.

Liming Yang (L)

National Cancer Institute, Bethesda, MD 20892, USA.

Jean C Zenklusen (JC)

National Cancer Institute, Bethesda, MD 20892, USA.

Zhenyu Zhang (Z)

Center for Translational Data Science, the University of Chicago, Chicago, IL 60615, USA.

Han Liang (H)

Department of Bioinformatics and Computational Biology, the University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX 77030, USA; Department of Systems Biology, the University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA. Electronic address: hliang1@mdanderson.org.

Michael S Noble (MS)

Eli and Edythe L. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA. Electronic address: mnoble@cogenimmune.com.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH