A Python Clustering Analysis Protocol of Genes Expression Data Sets.

DEGs SNPs clustering data mining microarrays unsupervised learning

Journal

Genes
ISSN: 2073-4425
Titre abrégé: Genes (Basel)
Pays: Switzerland
ID NLM: 101551097

Informations de publication

Date de publication:
12 10 2022
Historique:
received: 14 09 2022
revised: 05 10 2022
accepted: 08 10 2022
entrez: 27 10 2022
pubmed: 28 10 2022
medline: 29 10 2022
Statut: epublish

Résumé

Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis is used to analyze data that do not contain any specific subgroups. The goal is to use the data itself to recognize meaningful and informative subgroups. In addition, cluster investigation helps data reduction purposes, exposes hidden patterns, and generates hypotheses regarding the relationship between genes and phenotypes. Cluster analysis could also be used to identify bio-markers and yield computational predictive models. The methods used to analyze microarrays data can profoundly influence the interpretation of the results. Therefore, a basic understanding of these computational tools is necessary for optimal experimental design and meaningful data analysis. This manuscript provides an analysis protocol to effectively analyze gene expression data sets through the K-means and DBSCAN algorithms. The general protocol enables analyzing omics data to identify subsets of features with low redundancy and high robustness, speeding up the identification of new bio-markers through pathway enrichment analysis. In addition, to demonstrate the effectiveness of our clustering analysis protocol, we analyze a real data set from the GEO database. Finally, the manuscript provides some best practice and tips to overcome some issues in the analysis of omics data sets through unsupervised learning.

Identifiants

pubmed: 36292724
pii: genes13101839
doi: 10.3390/genes13101839
pmc: PMC9601308
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Références

Brief Bioinform. 2005 Dec;6(4):331-43
pubmed: 16420732
J Biomed Inform. 2015 Aug;56:273-83
pubmed: 26092773
BMC Cancer. 2009 Oct 06;9:353
pubmed: 19807914
Cancer Gene Ther. 2022 Aug 18;:
pubmed: 35982221
Nat Genet. 2007 Jul;39(7 Suppl):S16-21
pubmed: 17597776
Bioinformatics. 2020 Feb 15;36(4):1114-1120
pubmed: 31593229
PLoS Comput Biol. 2020 Mar 16;16(3):e1007665
pubmed: 32176694
Annu Rev Biomed Eng. 2002;4:129-53
pubmed: 12117754
Nat Genet. 2002 Dec;32 Suppl:496-501
pubmed: 12454644
PeerJ Comput Sci. 2020 Apr 13;6:e270
pubmed: 33816921
Trends Endocrinol Metab. 2019 Jan;30(1):25-38
pubmed: 30471920
Nucleic Acids Res. 2020 Jan 8;48(D1):D479-D488
pubmed: 31733064
Cancer Chemother Pharmacol. 2016 Jan;77(1):205-9
pubmed: 26607259
Cells. 2022 Jan 06;11(2):
pubmed: 35053305
Brief Bioinform. 2016 Jul;17(4):553-61
pubmed: 26351205
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D562-6
pubmed: 15608262
Nat Rev Genet. 2001 Jun;2(6):418-27
pubmed: 11389458
Sci Rep. 2022 Jun 15;12(1):9962
pubmed: 35705632
Oncotarget. 2016 Dec 27;7(52):85895-85904
pubmed: 27825144
Oncotarget. 2016 Aug 16;7(33):54028-54050
pubmed: 27304055
Genes (Basel). 2021 Mar 29;12(4):
pubmed: 33805424
PLoS Comput Biol. 2013;9(1):e1002833
pubmed: 23341759
Physiol Genomics. 2006 May 16;25(3):355-63
pubmed: 16554544
Clin Pharmacol Ther. 2019 Aug;106(2):422-431
pubmed: 30739312
Comput Biol Chem. 2005 Feb;29(1):37-46
pubmed: 15680584
PLoS One. 2012;7(11):e48146
pubmed: 23133614
BMC Bioinformatics. 2012 Oct 05;13:258
pubmed: 23035929
Clin Cancer Res. 2008 Oct 1;14(19):5959-66
pubmed: 18829474
Pharmacogenomics. 2003 Jan;4(1):41-52
pubmed: 12517285
Adv Biochem Eng Biotechnol. 2008;109:433-53
pubmed: 17985099
Mol Cell Endocrinol. 2014 Jan 25;382(1):673-682
pubmed: 23791814
J Bioinform Comput Biol. 2018 Dec;16(6):1840023
pubmed: 30567479
Methods Mol Biol. 2022;2401:121-146
pubmed: 34902126
Science. 1996 Dec 13;274(5294):1855-9
pubmed: 8943189
Bioinformatics. 2020 Aug 1;36(15):4377-4378
pubmed: 32437515
BMC Bioinformatics. 2009 Jan 08;10:11
pubmed: 19133141
Breast Cancer Res. 2011 Oct 12;13(5):220
pubmed: 22018398
Nucleic Acids Res. 2007 Jan;35(Database issue):D760-5
pubmed: 17099226
BMC Bioinformatics. 2014;15 Suppl 2:S10
pubmed: 24564526
Genome Biol. 2003;4(4):210
pubmed: 12702200
Toxicol Appl Pharmacol. 2022 Nov 1;454:116215
pubmed: 36067808
J Biomed Inform. 2004 Aug;37(4):293-303
pubmed: 15465482

Auteurs

Giuseppe Agapito (G)

Department of Law, Economics and Social Sciences, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy.
Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy.

Marianna Milano (M)

Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy.
Department of Medical and Clinical Surgery, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy.

Mario Cannataro (M)

Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy.
Department of Medical and Clinical Surgery, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy.

Articles similaires

Robotic Surgical Procedures Animals Humans Telemedicine Models, Animal

Odour generalisation and detection dog training.

Lyn Caldicott, Thomas W Pike, Helen E Zulch et al.
1.00
Animals Odorants Dogs Generalization, Psychological Smell

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Animals TOR Serine-Threonine Kinases Colorectal Neoplasms Colitis Mice

Classifications MeSH