Workflow to Mine Frequent DNA Co-methylation Clusters in DNA Methylome Data.

Cluster mining DNA co-methylation Epigenetics Frequent network mining Pan-cancer methylation lmQCM

Journal

Methods in molecular biology (Clifton, N.J.)
ISSN: 1940-6029
Titre abrégé: Methods Mol Biol
Pays: United States
ID NLM: 9214969

Informations de publication

Date de publication:
2022
Historique:
entrez: 3 5 2022
pubmed: 4 5 2022
medline: 6 5 2022
Statut: ppublish

Résumé

The advances in high-throughput nucleotide sequencing technology revolutionized biomedical research. Vast amount of genomic data rapidly accumulates in a daily basis, which in turn calls for the development of powerful bioinformatics tools and efficient workflows to analyze them. One of the approaches to address the "big data" issue is to mine highly correlated clusters/networks of biological molecules, which may provide rich yet hidden information about the underlying functional, regulatory, or structural relationships among genes, proteins, genomic loci or various types of biological molecules or events. A network mining algorithm lmQCM has recently been developed, which can be applied to mine tightly connected correlation clusters (networks) in large biological data with big sample size, and it guarantees a lower bound of the cluster density. This algorithm has been used in a variety of cancer transcriptomic data to mine gene co-expression networks (GCNs), but it can be applied to any correlational matrix. lmQCM is available through R package lmQCM as well as the online tool package TSUNAMI ( https://biolearns.medicine.iu.edu ). In this study, the purpose is to establish an analytical workflow to apply lmQCM for frequent (consensus) cluster mining in multiple DNA methylation datasets in different cancers and extract the underlying common co-methylation networks for genes.Specifically, the workflow is applied to analyze DNA methylome data across different cancer types using lmQCM. It mines frequent DNA methylation clusters based on individual clustering mining results, identifying common as well as distinctive DNA methylation patterns among different cancer types. This workflow has successfully identified frequent GCNs in 33 types of cancers, thus proven to be a powerful tool to analyze large biological data. It helps to identify common features as well as distinctions among different diseases, disease subtypes, or among different biological processes. The resulted frequent clusters may provide new insights on the pathway/function networks. In the case of disease study, the results lead to new directions for biomarker and drug target discovery. The advantages of this workflow include the highly efficient processing of large biological data generated from high-throughput experiments, quick identification of highly correlated interaction networks, substantial reduction of the data dimensionality to a manageable number of variables for downstream comparative analysis, and consequently increased statistical power for detecting differences between conditions.

Identifiants

pubmed: 35505214
doi: 10.1007/978-1-0716-1994-0_12
doi:

Substances chimiques

DNA 9007-49-2

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

153-165

Informations de copyright

© 2022. Springer Science+Business Media, LLC, part of Springer Nature.

Références

Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98(9):5116–5121
doi: 10.1073/pnas.091062498
Narayanan A et al (2004) Single-layer artificial neural networks for gene expression analysis. Neurocomputing 61:217–237
doi: 10.1016/j.neucom.2003.10.017
Tibshirani R et al (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 99(10):6567–6572
doi: 10.1073/pnas.082099299
Yeung KY, Ruzzo WL (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17(9):763–774
doi: 10.1093/bioinformatics/17.9.763
Hu H et al (2005) Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 21(Suppl 1):i213–i221
doi: 10.1093/bioinformatics/bti1049
Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9(1):559
doi: 10.1186/1471-2105-9-559
Zhang J, Huang K (2014) Normalized lmQCM: an algorithm for detecting weak quasi-cliques in weighted graph with applications in gene co-expression module discovery in cancers. Cancer Inform 13(Suppl 3):137–146
pubmed: 27486298
Ou Y, Zhang CQ (2007) A new multimembership clustering method. J Indust Manag Optim 3(4):619–624
doi: 10.3934/jimo.2007.3.619
Shroff S, Zhang J, Huang K (2016) Gene co-expression analysis predicts genetic variants associated with drug responsiveness in lung cancer. AMIA Jt Summits Transl Sci Proc 2016:32–41
pubmed: 27570645 pmcid: 5001757
Cheng J et al (2018) Identification of topological features in renal tumor microenvironment associated with patient survival. Bioinformatics 34(6):1024–1030
doi: 10.1093/bioinformatics/btx723
Cheng J et al (2017) Integrative analysis of histopathological images and genomic data predicts clear cell renal cell carcinoma prognosis. Cancer Res 77(21):e91–e100
doi: 10.1158/0008-5472.CAN-17-0313
Huang Z, Han Z, Wang T, Salama P, Huang K, Zhang J (2021) TSUNAMI: Translational bioinformatics tool suite for network analysis and mining, Genom Proteom Bioinform. https://doi.org/10.1016/j.gpb.2019.05.006
Kulis M, Esteller M (2010) DNA methylation and cancer. Adv Genet 70:27–56
doi: 10.1016/B978-0-12-380866-0.60002-2
Zhang J, Huang K (2016) Normalized ImQCM: an algorithm for detecting weak quasi-cliques in weighted graph with applications in gene co-expression module discovery in cancers. Cancer Inform 13(Suppl 3):137–146
pubmed: 27486298 pmcid: 4962959
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inform Process Syst 14:849–856
Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between co-expression modules. BMC Syst Biol 1:54
doi: 10.1186/1752-0509-1-54

Auteurs

Jie Zhang (J)

Department of Medical & Molecular Genetics, School of Medicine, Indiana University, Indianapolis, IN, USA. jizhan@iu.edu.

Kun Huang (K)

Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH