CUP-AI-Dx: A tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence.
Algorithms
Artificial Intelligence
Biomarkers, Tumor
/ genetics
Computational Biology
/ methods
Databases, Genetic
Genomics
/ methods
Humans
Machine Learning
Neoplasm Metastasis
/ diagnosis
Neoplasms, Unknown Primary
/ diagnosis
Neural Networks, Computer
RNA
Reproducibility of Results
Software
Workflow
Cancer
Cancer-of-unknown-primary
Cell-of-origin
Classification
Convolutional neural network
Deep learning
Inception model
Machine learning
TCGA
Journal
EBioMedicine
ISSN: 2352-3964
Titre abrégé: EBioMedicine
Pays: Netherlands
ID NLM: 101647039
Informations de publication
Date de publication:
Nov 2020
Nov 2020
Historique:
received:
04
02
2020
revised:
10
09
2020
accepted:
11
09
2020
pubmed:
12
10
2020
medline:
5
8
2021
entrez:
11
10
2020
Statut:
ppublish
Résumé
Cancer of unknown primary (CUP), representing approximately 3-5% of all malignancies, is defined as metastatic cancer where a primary site of origin cannot be found despite a standard diagnostic workup. Because knowledge of a patient's primary cancer remains fundamental to their treatment, CUP patients are significantly disadvantaged and most have a poor survival outcome. Developing robust and accessible diagnostic methods for resolving cancer tissue of origin, therefore, has significant value for CUP patients. We developed an RNA-based classifier called CUP-AI-Dx that utilizes a 1D Inception convolutional neural network (1D-Inception) model to infer a tumor's primary tissue of origin. CUP-AI-Dx was trained using the transcriptional profiles of 18,217 primary tumours representing 32 cancer types from The Cancer Genome Atlas project (TCGA) and International Cancer Genome Consortium (ICGC). Gene expression data was ordered by gene chromosomal coordinates as input to the 1D-CNN model, and the model utilizes multiple convolutional kernels with different configurations simultaneously to improve generality. The model was optimized through extensive hyperparameter tuning, including different max-pooling layers and dropout settings. For 11 tumour types, we also developed a random forest model that can classify the tumour's molecular subtype according to prior TCGA studies. The optimised CUP-AI-Dx tissue of origin classifier was tested on 394 metastatic samples from 11 tumour types from TCGA and 92 formalin-fixed paraffin-embedded (FFPE) samples representing 18 cancer types from two clinical laboratories. The CUP-AI-Dx molecular subtype was also independently tested on independent ovarian and breast cancer microarray datasets FINDINGS: CUP-AI-Dx identifies the primary site with an overall top-1-accuracy of 98.54% in cross-validation and 96.70% on a test dataset. When applied to two independent clinical-grade RNA-seq datasets generated from two different institutes from the US and Australia, our model predicted the primary site with a top-1-accuracy of 86.96% and 72.46% respectively. The CUP-AI-Dx predicts tumour primary site and molecular subtype with high accuracy and therefore can be used to assist the diagnostic work-up of cancers of unknown primary or uncertain origin using a common and accessible genomics platform. NIH R35 GM133562, NCI P30 CA034196, Victorian Cancer Agency Australia.
Sections du résumé
BACKGROUND
BACKGROUND
Cancer of unknown primary (CUP), representing approximately 3-5% of all malignancies, is defined as metastatic cancer where a primary site of origin cannot be found despite a standard diagnostic workup. Because knowledge of a patient's primary cancer remains fundamental to their treatment, CUP patients are significantly disadvantaged and most have a poor survival outcome. Developing robust and accessible diagnostic methods for resolving cancer tissue of origin, therefore, has significant value for CUP patients.
METHODS
METHODS
We developed an RNA-based classifier called CUP-AI-Dx that utilizes a 1D Inception convolutional neural network (1D-Inception) model to infer a tumor's primary tissue of origin. CUP-AI-Dx was trained using the transcriptional profiles of 18,217 primary tumours representing 32 cancer types from The Cancer Genome Atlas project (TCGA) and International Cancer Genome Consortium (ICGC). Gene expression data was ordered by gene chromosomal coordinates as input to the 1D-CNN model, and the model utilizes multiple convolutional kernels with different configurations simultaneously to improve generality. The model was optimized through extensive hyperparameter tuning, including different max-pooling layers and dropout settings. For 11 tumour types, we also developed a random forest model that can classify the tumour's molecular subtype according to prior TCGA studies. The optimised CUP-AI-Dx tissue of origin classifier was tested on 394 metastatic samples from 11 tumour types from TCGA and 92 formalin-fixed paraffin-embedded (FFPE) samples representing 18 cancer types from two clinical laboratories. The CUP-AI-Dx molecular subtype was also independently tested on independent ovarian and breast cancer microarray datasets FINDINGS: CUP-AI-Dx identifies the primary site with an overall top-1-accuracy of 98.54% in cross-validation and 96.70% on a test dataset. When applied to two independent clinical-grade RNA-seq datasets generated from two different institutes from the US and Australia, our model predicted the primary site with a top-1-accuracy of 86.96% and 72.46% respectively.
INTERPRETATION
CONCLUSIONS
The CUP-AI-Dx predicts tumour primary site and molecular subtype with high accuracy and therefore can be used to assist the diagnostic work-up of cancers of unknown primary or uncertain origin using a common and accessible genomics platform.
FUNDING
BACKGROUND
NIH R35 GM133562, NCI P30 CA034196, Victorian Cancer Agency Australia.
Identifiants
pubmed: 33039710
pii: S2352-3964(20)30406-0
doi: 10.1016/j.ebiom.2020.103030
pmc: PMC7553237
pii:
doi:
Substances chimiques
Biomarkers, Tumor
0
RNA
63231-63-0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
103030Subventions
Organisme : NCI NIH HHS
ID : P30 CA034196
Pays : United States
Organisme : NIGMS NIH HHS
ID : R35 GM133562
Pays : United States
Informations de copyright
Copyright © 2020 The Authors. Published by Elsevier B.V. All rights reserved.
Références
Cancer. 2004 May 1;100(9):1776-85
pubmed: 15112256
EMBO Mol Med. 2020 Jul 7;12(7):e11756
pubmed: 32511869
Cell. 2016 Jan 28;164(3):550-63
pubmed: 26824661
Nature. 2013 Jul 4;499(7456):43-9
pubmed: 23792563
PLoS One. 2012;7(6):e39320
pubmed: 22761762
Nature. 2008 Oct 23;455(7216):1061-8
pubmed: 18772890
Nat Rev Cancer. 2016 Apr 26;16(5):305-18
pubmed: 27112208
Arch Pathol Lab Med. 2015 Jul;139(7):848-57
pubmed: 26125425
Hum Pathol. 2013 Feb;44(2):244-54
pubmed: 22974476
Nat Med. 2018 Oct;24(10):1559-1567
pubmed: 30224757
BMC Genomics. 2018 Aug 13;19(Suppl 6):565
pubmed: 30367576
J Mol Diagn. 2011 Jan;13(1):48-56
pubmed: 21227394
Nature. 2015 Jan 29;517(7536):576-82
pubmed: 25631445
JAMA Oncol. 2015 Apr;1(1):40-49
pubmed: 26182302
Int J Clin Oncol. 2014;19(3):479-84
pubmed: 23813044
Clin Cancer Res. 2012 Jul 15;18(14):3952-60
pubmed: 22648269
N Engl J Med. 2016 Jan 14;374(2):135-45
pubmed: 26536169
Pathology. 2009 Feb;41(2):161-7
pubmed: 19320058
Nature. 2017 Jan 12;541(7636):169-175
pubmed: 28052061
Cancer Cell. 2017 Mar 13;31(3):411-423
pubmed: 28292439
J Pathol. 2013 Dec;231(4):413-23
pubmed: 24037760
Lancet. 2012 Apr 14;379(9824):1428-35
pubmed: 22414598
PLoS One. 2013;8(1):e54699
pubmed: 23382942
Nature. 2011 Jun 29;474(7353):609-15
pubmed: 21720365
Theory Biosci. 2012 Dec;131(4):281-5
pubmed: 22872506
Nature. 2014 Jul 31;511(7511):543-50
pubmed: 25079552
Nat Biotechnol. 2016 May;34(5):525-7
pubmed: 27043002
J Pathol. 2011 Jan;223(1):72-80
pubmed: 21125666
Ann Oncol. 2016 Feb;27(2):339-44
pubmed: 26578722
Cureus. 2019 Sep 2;11(9):e5552
pubmed: 31695975
J Exp Med. 2017 Apr 3;214(4):1065-1079
pubmed: 28270406
Nature. 2012 Oct 4;490(7418):61-70
pubmed: 23000897
Cancers (Basel). 2019 Jul 09;11(7):
pubmed: 31324031
Sci Rep. 2016 Apr 25;6:24949
pubmed: 27109935
Nat Commun. 2020 Feb 5;11(1):728
pubmed: 32024849
Nature. 2000 Aug 17;406(6797):747-52
pubmed: 10963602
Cancer Cell. 2018 Apr 9;33(4):721-735.e8
pubmed: 29622466
Cell Rep. 2018 Apr 3;23(1):194-212.e6
pubmed: 29617660
Pathology. 2015 Jan;47(1):7-12
pubmed: 25485653
Gastrointest Cancer Res. 2007 Nov;1(6):229-35
pubmed: 19262901
Oncologist. 2012;17(6):801-12
pubmed: 22618571
J Clin Oncol. 2013 Jan 10;31(2):217-23
pubmed: 23032625
Nature. 2014 Sep 11;513(7517):202-9
pubmed: 25079317
Clin Cancer Res. 2011 Jun 15;17(12):4063-70
pubmed: 21531815
Cell Rep. 2018 Apr 3;23(1):313-326.e5
pubmed: 29617669
J Natl Cancer Inst. 2013 Jun 5;105(11):782-90
pubmed: 23641043
Nucleic Acids Res. 2020 Jan 8;48(D1):D682-D688
pubmed: 31691826
F1000Res. 2015 Dec 30;4:1521
pubmed: 26925227
Cell. 2015 Jun 18;161(7):1681-96
pubmed: 26091043
BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):476
pubmed: 28155641
J Clin Oncol. 2009 Mar 10;27(8):1160-7
pubmed: 19204204
BMC Med. 2015 Dec 18;13:303
pubmed: 26684470
J Mol Diagn. 2011 Sep;13(5):493-503
pubmed: 21708287
J Mol Diagn. 2013 Mar;15(2):263-9
pubmed: 23287002
Cancer Discov. 2012 May;2(5):401-4
pubmed: 22588877
Ann Oncol. 2017 Dec 01;28(12):3015-3021
pubmed: 29045506
Clin Cancer Res. 2017 Jul 15;23(14):3794-3801
pubmed: 28159814
Am J Surg Pathol. 2013 Jul;37(7):1067-75
pubmed: 23648464
Breast. 2015 Nov;24 Suppl 2:S26-35
pubmed: 26253814
JAMA Oncol. 2020 Jan 01;6(1):84-91
pubmed: 31725847
Nat Rev Clin Oncol. 2017 Nov;14(11):682-694
pubmed: 28675165
Ann Oncol. 2012 Sep;23 Suppl 10:x271-7
pubmed: 22987975
JAMA Netw Open. 2019 Apr 5;2(4):e192597
pubmed: 31026023
Nat Genet. 2013 Oct;45(10):1113-20
pubmed: 24071849
Int J Cancer. 2020 Jun 1;146(11):3053-3064
pubmed: 31970771
Nature. 2012 Jul 18;487(7407):330-7
pubmed: 22810696
Cancer Cell. 2018 Apr 9;33(4):690-705.e9
pubmed: 29622464
Cell. 2015 Nov 5;163(4):1011-25
pubmed: 26544944
Sci Signal. 2013 Apr 02;6(269):pl1
pubmed: 23550210
Nat Methods. 2015 Feb;12(2):115-21
pubmed: 25633503
Nature. 2012 Sep 27;489(7417):519-25
pubmed: 22960745
World J Gastroenterol. 2013 Sep 14;19(34):5598-606
pubmed: 24039351
Psychooncology. 2013 Sep;22(9):2009-15
pubmed: 23359412
Nat Rev Clin Oncol. 2011 Nov 01;8(12):701-10
pubmed: 22048624
Nature. 2013 May 2;497(7447):67-73
pubmed: 23636398
Clin Cancer Res. 2008 Aug 15;14(16):5198-208
pubmed: 18698038
Lancet Oncol. 2016 Oct;17(10):1386-1395
pubmed: 27575023
Nature. 2012 Apr 18;486(7403):346-52
pubmed: 22522925
J Adv Res. 2015 May;6(3):375-82
pubmed: 26257935