CUP-AI-Dx: A tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence.


Journal

EBioMedicine
ISSN: 2352-3964
Titre abrégé: EBioMedicine
Pays: Netherlands
ID NLM: 101647039

Informations de publication

Date de publication:
Nov 2020
Historique:
received: 04 02 2020
revised: 10 09 2020
accepted: 11 09 2020
pubmed: 12 10 2020
medline: 5 8 2021
entrez: 11 10 2020
Statut: ppublish

Résumé

Cancer of unknown primary (CUP), representing approximately 3-5% of all malignancies, is defined as metastatic cancer where a primary site of origin cannot be found despite a standard diagnostic workup. Because knowledge of a patient's primary cancer remains fundamental to their treatment, CUP patients are significantly disadvantaged and most have a poor survival outcome. Developing robust and accessible diagnostic methods for resolving cancer tissue of origin, therefore, has significant value for CUP patients. We developed an RNA-based classifier called CUP-AI-Dx that utilizes a 1D Inception convolutional neural network (1D-Inception) model to infer a tumor's primary tissue of origin. CUP-AI-Dx was trained using the transcriptional profiles of 18,217 primary tumours representing 32 cancer types from The Cancer Genome Atlas project (TCGA) and International Cancer Genome Consortium (ICGC). Gene expression data was ordered by gene chromosomal coordinates as input to the 1D-CNN model, and the model utilizes multiple convolutional kernels with different configurations simultaneously to improve generality. The model was optimized through extensive hyperparameter tuning, including different max-pooling layers and dropout settings. For 11 tumour types, we also developed a random forest model that can classify the tumour's molecular subtype according to prior TCGA studies. The optimised CUP-AI-Dx tissue of origin classifier was tested on 394 metastatic samples from 11 tumour types from TCGA and 92 formalin-fixed paraffin-embedded (FFPE) samples representing 18 cancer types from two clinical laboratories. The CUP-AI-Dx molecular subtype was also independently tested on independent ovarian and breast cancer microarray datasets FINDINGS: CUP-AI-Dx identifies the primary site with an overall top-1-accuracy of 98.54% in cross-validation and 96.70% on a test dataset. When applied to two independent clinical-grade RNA-seq datasets generated from two different institutes from the US and Australia, our model predicted the primary site with a top-1-accuracy of 86.96% and 72.46% respectively. The CUP-AI-Dx predicts tumour primary site and molecular subtype with high accuracy and therefore can be used to assist the diagnostic work-up of cancers of unknown primary or uncertain origin using a common and accessible genomics platform. NIH R35 GM133562, NCI P30 CA034196, Victorian Cancer Agency Australia.

Sections du résumé

BACKGROUND BACKGROUND
Cancer of unknown primary (CUP), representing approximately 3-5% of all malignancies, is defined as metastatic cancer where a primary site of origin cannot be found despite a standard diagnostic workup. Because knowledge of a patient's primary cancer remains fundamental to their treatment, CUP patients are significantly disadvantaged and most have a poor survival outcome. Developing robust and accessible diagnostic methods for resolving cancer tissue of origin, therefore, has significant value for CUP patients.
METHODS METHODS
We developed an RNA-based classifier called CUP-AI-Dx that utilizes a 1D Inception convolutional neural network (1D-Inception) model to infer a tumor's primary tissue of origin. CUP-AI-Dx was trained using the transcriptional profiles of 18,217 primary tumours representing 32 cancer types from The Cancer Genome Atlas project (TCGA) and International Cancer Genome Consortium (ICGC). Gene expression data was ordered by gene chromosomal coordinates as input to the 1D-CNN model, and the model utilizes multiple convolutional kernels with different configurations simultaneously to improve generality. The model was optimized through extensive hyperparameter tuning, including different max-pooling layers and dropout settings. For 11 tumour types, we also developed a random forest model that can classify the tumour's molecular subtype according to prior TCGA studies. The optimised CUP-AI-Dx tissue of origin classifier was tested on 394 metastatic samples from 11 tumour types from TCGA and 92 formalin-fixed paraffin-embedded (FFPE) samples representing 18 cancer types from two clinical laboratories. The CUP-AI-Dx molecular subtype was also independently tested on independent ovarian and breast cancer microarray datasets FINDINGS: CUP-AI-Dx identifies the primary site with an overall top-1-accuracy of 98.54% in cross-validation and 96.70% on a test dataset. When applied to two independent clinical-grade RNA-seq datasets generated from two different institutes from the US and Australia, our model predicted the primary site with a top-1-accuracy of 86.96% and 72.46% respectively.
INTERPRETATION CONCLUSIONS
The CUP-AI-Dx predicts tumour primary site and molecular subtype with high accuracy and therefore can be used to assist the diagnostic work-up of cancers of unknown primary or uncertain origin using a common and accessible genomics platform.
FUNDING BACKGROUND
NIH R35 GM133562, NCI P30 CA034196, Victorian Cancer Agency Australia.

Identifiants

pubmed: 33039710
pii: S2352-3964(20)30406-0
doi: 10.1016/j.ebiom.2020.103030
pmc: PMC7553237
pii:
doi:

Substances chimiques

Biomarkers, Tumor 0
RNA 63231-63-0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

103030

Subventions

Organisme : NCI NIH HHS
ID : P30 CA034196
Pays : United States
Organisme : NIGMS NIH HHS
ID : R35 GM133562
Pays : United States

Informations de copyright

Copyright © 2020 The Authors. Published by Elsevier B.V. All rights reserved.

Références

Cancer. 2004 May 1;100(9):1776-85
pubmed: 15112256
EMBO Mol Med. 2020 Jul 7;12(7):e11756
pubmed: 32511869
Cell. 2016 Jan 28;164(3):550-63
pubmed: 26824661
Nature. 2013 Jul 4;499(7456):43-9
pubmed: 23792563
PLoS One. 2012;7(6):e39320
pubmed: 22761762
Nature. 2008 Oct 23;455(7216):1061-8
pubmed: 18772890
Nat Rev Cancer. 2016 Apr 26;16(5):305-18
pubmed: 27112208
Arch Pathol Lab Med. 2015 Jul;139(7):848-57
pubmed: 26125425
Hum Pathol. 2013 Feb;44(2):244-54
pubmed: 22974476
Nat Med. 2018 Oct;24(10):1559-1567
pubmed: 30224757
BMC Genomics. 2018 Aug 13;19(Suppl 6):565
pubmed: 30367576
J Mol Diagn. 2011 Jan;13(1):48-56
pubmed: 21227394
Nature. 2015 Jan 29;517(7536):576-82
pubmed: 25631445
JAMA Oncol. 2015 Apr;1(1):40-49
pubmed: 26182302
Int J Clin Oncol. 2014;19(3):479-84
pubmed: 23813044
Clin Cancer Res. 2012 Jul 15;18(14):3952-60
pubmed: 22648269
N Engl J Med. 2016 Jan 14;374(2):135-45
pubmed: 26536169
Pathology. 2009 Feb;41(2):161-7
pubmed: 19320058
Nature. 2017 Jan 12;541(7636):169-175
pubmed: 28052061
Cancer Cell. 2017 Mar 13;31(3):411-423
pubmed: 28292439
J Pathol. 2013 Dec;231(4):413-23
pubmed: 24037760
Lancet. 2012 Apr 14;379(9824):1428-35
pubmed: 22414598
PLoS One. 2013;8(1):e54699
pubmed: 23382942
Nature. 2011 Jun 29;474(7353):609-15
pubmed: 21720365
Theory Biosci. 2012 Dec;131(4):281-5
pubmed: 22872506
Nature. 2014 Jul 31;511(7511):543-50
pubmed: 25079552
Nat Biotechnol. 2016 May;34(5):525-7
pubmed: 27043002
J Pathol. 2011 Jan;223(1):72-80
pubmed: 21125666
Ann Oncol. 2016 Feb;27(2):339-44
pubmed: 26578722
Cureus. 2019 Sep 2;11(9):e5552
pubmed: 31695975
J Exp Med. 2017 Apr 3;214(4):1065-1079
pubmed: 28270406
Nature. 2012 Oct 4;490(7418):61-70
pubmed: 23000897
Cancers (Basel). 2019 Jul 09;11(7):
pubmed: 31324031
Sci Rep. 2016 Apr 25;6:24949
pubmed: 27109935
Nat Commun. 2020 Feb 5;11(1):728
pubmed: 32024849
Nature. 2000 Aug 17;406(6797):747-52
pubmed: 10963602
Cancer Cell. 2018 Apr 9;33(4):721-735.e8
pubmed: 29622466
Cell Rep. 2018 Apr 3;23(1):194-212.e6
pubmed: 29617660
Pathology. 2015 Jan;47(1):7-12
pubmed: 25485653
Gastrointest Cancer Res. 2007 Nov;1(6):229-35
pubmed: 19262901
Oncologist. 2012;17(6):801-12
pubmed: 22618571
J Clin Oncol. 2013 Jan 10;31(2):217-23
pubmed: 23032625
Nature. 2014 Sep 11;513(7517):202-9
pubmed: 25079317
Clin Cancer Res. 2011 Jun 15;17(12):4063-70
pubmed: 21531815
Cell Rep. 2018 Apr 3;23(1):313-326.e5
pubmed: 29617669
J Natl Cancer Inst. 2013 Jun 5;105(11):782-90
pubmed: 23641043
Nucleic Acids Res. 2020 Jan 8;48(D1):D682-D688
pubmed: 31691826
F1000Res. 2015 Dec 30;4:1521
pubmed: 26925227
Cell. 2015 Jun 18;161(7):1681-96
pubmed: 26091043
BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):476
pubmed: 28155641
J Clin Oncol. 2009 Mar 10;27(8):1160-7
pubmed: 19204204
BMC Med. 2015 Dec 18;13:303
pubmed: 26684470
J Mol Diagn. 2011 Sep;13(5):493-503
pubmed: 21708287
J Mol Diagn. 2013 Mar;15(2):263-9
pubmed: 23287002
Cancer Discov. 2012 May;2(5):401-4
pubmed: 22588877
Ann Oncol. 2017 Dec 01;28(12):3015-3021
pubmed: 29045506
Clin Cancer Res. 2017 Jul 15;23(14):3794-3801
pubmed: 28159814
Am J Surg Pathol. 2013 Jul;37(7):1067-75
pubmed: 23648464
Breast. 2015 Nov;24 Suppl 2:S26-35
pubmed: 26253814
JAMA Oncol. 2020 Jan 01;6(1):84-91
pubmed: 31725847
Nat Rev Clin Oncol. 2017 Nov;14(11):682-694
pubmed: 28675165
Ann Oncol. 2012 Sep;23 Suppl 10:x271-7
pubmed: 22987975
JAMA Netw Open. 2019 Apr 5;2(4):e192597
pubmed: 31026023
Nat Genet. 2013 Oct;45(10):1113-20
pubmed: 24071849
Int J Cancer. 2020 Jun 1;146(11):3053-3064
pubmed: 31970771
Nature. 2012 Jul 18;487(7407):330-7
pubmed: 22810696
Cancer Cell. 2018 Apr 9;33(4):690-705.e9
pubmed: 29622464
Cell. 2015 Nov 5;163(4):1011-25
pubmed: 26544944
Sci Signal. 2013 Apr 02;6(269):pl1
pubmed: 23550210
Nat Methods. 2015 Feb;12(2):115-21
pubmed: 25633503
Nature. 2012 Sep 27;489(7417):519-25
pubmed: 22960745
World J Gastroenterol. 2013 Sep 14;19(34):5598-606
pubmed: 24039351
Psychooncology. 2013 Sep;22(9):2009-15
pubmed: 23359412
Nat Rev Clin Oncol. 2011 Nov 01;8(12):701-10
pubmed: 22048624
Nature. 2013 May 2;497(7447):67-73
pubmed: 23636398
Clin Cancer Res. 2008 Aug 15;14(16):5198-208
pubmed: 18698038
Lancet Oncol. 2016 Oct;17(10):1386-1395
pubmed: 27575023
Nature. 2012 Apr 18;486(7403):346-52
pubmed: 22522925
J Adv Res. 2015 May;6(3):375-82
pubmed: 26257935

Auteurs

Yue Zhao (Y)

The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA.

Ziwei Pan (Z)

The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA.

Sandeep Namburi (S)

The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA.

Andrew Pattison (A)

Department of Clinical Pathology and Centre for Cancer Research, University of Melbourne, Parkville, Melbourne, Australia.

Atara Posner (A)

Department of Clinical Pathology and Centre for Cancer Research, University of Melbourne, Parkville, Melbourne, Australia.

Shiva Balachander (S)

Department of Clinical Pathology and Centre for Cancer Research, University of Melbourne, Parkville, Melbourne, Australia.

Carolyn A Paisie (CA)

The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA.

Honey V Reddi (HV)

The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; The Jackson Laboratory Cancer Center, Bar Harbor, ME, USA.

Jens Rueter (J)

The Jackson Laboratory Cancer Center, Bar Harbor, ME, USA.

Anthony J Gill (AJ)

Cancer Diagnosis and Pathology Group, Kolling Institute of Medical Research, Royal North Shore Hospital, St Leonards, New South Wales 2065 Australia; NSW Health Pathology, Department of Anatomical Pathology, Royal North Shore Hospital, Sydney, New South Wales 2065 Australia; Department of Anatomical Pathology, Douglass Hanly Moir Pathology, Macquarie Park, New South Wales 2113 Australia; University of Sydney, Sydney, New South Wales 2006 Australia.

Stephen Fox (S)

Peter MacCallum Cancer Centre, Department of Pathology, University of Melbourne, Victoria, Australia.

Kanwal P S Raghav (KPS)

Department of Gastrointestinal Medical Oncology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.

William F Flynn (WF)

The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA.

Richard W Tothill (RW)

Department of Clinical Pathology and Centre for Cancer Research, University of Melbourne, Parkville, Melbourne, Australia; Peter MacCallum Cancer Centre, Parkville, Melbourne, Australia. Electronic address: rtothill@unimelb.edu.au.

Sheng Li (S)

The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; The Jackson Laboratory Cancer Center, Bar Harbor, ME, USA; Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA; Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA. Electronic address: sheng.li@jax.org.

R Krishna Murthy Karuturi (RKM)

The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; The Jackson Laboratory Cancer Center, Bar Harbor, ME, USA; Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA. Electronic address: krishna.karuturi@jax.org.

Joshy George (J)

The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; The Jackson Laboratory Cancer Center, Bar Harbor, ME, USA. Electronic address: joshy.george@jax.org.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH