A machine learning framework to trace tumor tissue-of-origin of 13 types of cancer based on DNA somatic mutation.
Cancers of unknown primary
Cross-validation
Gene length
Random forest
Somatic mutation
Tissue-of-origin
Journal
Biochimica et biophysica acta. Molecular basis of disease
ISSN: 1879-260X
Titre abrégé: Biochim Biophys Acta Mol Basis Dis
Pays: Netherlands
ID NLM: 101731730
Informations de publication
Date de publication:
01 11 2020
01 11 2020
Historique:
received:
28
05
2020
revised:
20
07
2020
accepted:
03
08
2020
pubmed:
11
8
2020
medline:
15
12
2020
entrez:
11
8
2020
Statut:
ppublish
Résumé
Carcinoma of unknown primary (CUP), defined as metastatic cancers with unknown cancer origin, occurs in 3-5 per 100 cancer patients in the United States. Heterogeneity and metastasis of cancer brings great difficulties to the follow-up diagnosis and treatment for CUP. To find the tissue-of-origin (TOO) of the CUP, multiple methods have been raised. However, the accuracies for computed tomography (CT) and positron emission tomography (PET) to identify TOO were 20%-27% and 24%-40% respectively, which were not enough for determining targeted therapies. In this study, we provide a machine learning framework to trace tumor tissue origin by using gene length-normalized somatic mutation sequencing data. Somatic mutation data was downloaded from the Data Portal (Release 28) of the International Cancer Genome Consortium (ICGC), and 4909 samples for 13 cancers was used to identify primary site of cancers. Optimal results were obtained based on a 600-gene set by using the random forest algorithm with 10-fold cross-validation, and the average accuracy and F1-score were 0.8822 and 0.8886 respectively across 13 types of cancer. In conclusion, we provide an effective computational framework to infer cancer tissue-of-origin by combining DNA sequencing and machine learning techniques, which is promising in assisting clinical diagnosis of cancers.
Identifiants
pubmed: 32771416
pii: S0925-4439(20)30264-7
doi: 10.1016/j.bbadis.2020.165916
pii:
doi:
Substances chimiques
DNA
9007-49-2
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
165916Informations de copyright
Copyright © 2020. Published by Elsevier B.V.
Déclaration de conflit d'intérêts
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.