A machine learning framework to trace tumor tissue-of-origin of 13 types of cancer based on DNA somatic mutation.


Journal

Biochimica et biophysica acta. Molecular basis of disease
ISSN: 1879-260X
Titre abrégé: Biochim Biophys Acta Mol Basis Dis
Pays: Netherlands
ID NLM: 101731730

Informations de publication

Date de publication:
01 11 2020
Historique:
received: 28 05 2020
revised: 20 07 2020
accepted: 03 08 2020
pubmed: 11 8 2020
medline: 15 12 2020
entrez: 11 8 2020
Statut: ppublish

Résumé

Carcinoma of unknown primary (CUP), defined as metastatic cancers with unknown cancer origin, occurs in 3-5 per 100 cancer patients in the United States. Heterogeneity and metastasis of cancer brings great difficulties to the follow-up diagnosis and treatment for CUP. To find the tissue-of-origin (TOO) of the CUP, multiple methods have been raised. However, the accuracies for computed tomography (CT) and positron emission tomography (PET) to identify TOO were 20%-27% and 24%-40% respectively, which were not enough for determining targeted therapies. In this study, we provide a machine learning framework to trace tumor tissue origin by using gene length-normalized somatic mutation sequencing data. Somatic mutation data was downloaded from the Data Portal (Release 28) of the International Cancer Genome Consortium (ICGC), and 4909 samples for 13 cancers was used to identify primary site of cancers. Optimal results were obtained based on a 600-gene set by using the random forest algorithm with 10-fold cross-validation, and the average accuracy and F1-score were 0.8822 and 0.8886 respectively across 13 types of cancer. In conclusion, we provide an effective computational framework to infer cancer tissue-of-origin by combining DNA sequencing and machine learning techniques, which is promising in assisting clinical diagnosis of cancers.

Identifiants

pubmed: 32771416
pii: S0925-4439(20)30264-7
doi: 10.1016/j.bbadis.2020.165916
pii:
doi:

Substances chimiques

DNA 9007-49-2

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

165916

Informations de copyright

Copyright © 2020. Published by Elsevier B.V.

Déclaration de conflit d'intérêts

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Bingsheng He (B)

Academician Workstation, Changsha Medical University, Changsha 410219, China. Electronic address: hbscsmu@163.com.

Chan Dai (C)

Geneis Beijing Co., Ltd., Beijing 100102, China.

Jidong Lang (J)

Geneis Beijing Co., Ltd., Beijing 100102, China.

Pingping Bing (P)

Academician Workstation, Changsha Medical University, Changsha 410219, China.

Geng Tian (G)

Geneis Beijing Co., Ltd., Beijing 100102, China.

Bo Wang (B)

Geneis Beijing Co., Ltd., Beijing 100102, China. Electronic address: wangb@geneis.cn.

Jialiang Yang (J)

Academician Workstation, Changsha Medical University, Changsha 410219, China; Geneis Beijing Co., Ltd., Beijing 100102, China. Electronic address: yangjl@geneis.cn.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
T-Lymphocytes, Regulatory Lung Neoplasms Proto-Oncogene Proteins p21(ras) Animals Humans

Pathogenic mitochondrial DNA mutations inhibit melanoma metastasis.

Spencer D Shelton, Sara House, Luiza Martins Nascentes Melo et al.
1.00
DNA, Mitochondrial Humans Melanoma Mutation Neoplasm Metastasis

Classifications MeSH