A survey on single and multi omics data mining methods in cancer data classification.
Cancer classification
Data integration
Gene selection
High dimensional datasets
Single and multi omics data
Journal
Journal of biomedical informatics
ISSN: 1532-0480
Titre abrégé: J Biomed Inform
Pays: United States
ID NLM: 100970413
Informations de publication
Date de publication:
07 2020
07 2020
Historique:
received:
11
12
2019
revised:
01
05
2020
accepted:
31
05
2020
pubmed:
12
6
2020
medline:
29
7
2021
entrez:
12
6
2020
Statut:
ppublish
Résumé
Data analytics is routinely used to support biomedical research in all areas, with particular focus on the most relevant clinical conditions, such as cancer. Bioinformatics approaches, in particular, have been used to characterize the molecular aspects of diseases. In recent years, numerous studies have been performed on cancer based upon single and multi-omics data. For example, Single-omics-based studies have employed a diverse set of data, such as gene expression, DNA methylation, or miRNA, to name only a few instances. Despite that, a significant part of literature reports studies on gene expression with microarray datasets. Single-omics data have high numbers of attributes and very low sample counts. This characteristic makes them paradigmatic of an under-sampled, small-n large-p machine learning problem. An important goal of single-omics data analysis is to find the most relevant genes, in terms of their potential use in clinics and research, in the batch of available data. This problem has been addressed in gene selection as one of the pre-processing steps in data mining. An analysis that use only one type of data (single-omics) often miss the complexity of the landscape of molecular phenomena underlying the disease. As a result, they provide limited and sometimes poorly reliable information about the disease mechanisms. Therefore, in recent years, researchers have been eager to build models that are more complex, obtaining more reliable results using multi-omics data. However, to achieve this, the most important challenge is data integration. In this paper, we provide a comprehensive overview of the challenges in single and multi-omics data analysis of cancer data, focusing on gene selection and data integration methods.
Identifiants
pubmed: 32525020
pii: S1532-0464(20)30093-9
doi: 10.1016/j.jbi.2020.103466
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Review
Langues
eng
Sous-ensembles de citation
IM
Pagination
103466Informations de copyright
Copyright © 2020 Elsevier Inc. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.