Workflow to Mine Frequent DNA Co-methylation Clusters in DNA Methylome Data.

DNA DNA Methylation Epigenome Humans Neoplasms / genetics Workflow

Cluster mining DNA co-methylation Epigenetics Frequent network mining Pan-cancer methylation lmQCM

Journal

Methods in molecular biology (Clifton, N.J.)

ISSN: 1940-6029

Titre abrégé: Methods Mol Biol

Pays: United States

ID NLM: 9214969

Informations de publication

Date de publication:
2022

Historique:

entrez: 3 5 2022

pubmed: 4 5 2022

medline: 6 5 2022

Statut: ppublish

Résumé

The advances in high-throughput nucleotide sequencing technology revolutionized biomedical research. Vast amount of genomic data rapidly accumulates in a daily basis, which in turn calls for the development of powerful bioinformatics tools and efficient workflows to analyze them. One of the approaches to address the "big data" issue is to mine highly correlated clusters/networks of biological molecules, which may provide rich yet hidden information about the underlying functional, regulatory, or structural relationships among genes, proteins, genomic loci or various types of biological molecules or events. A network mining algorithm lmQCM has recently been developed, which can be applied to mine tightly connected correlation clusters (networks) in large biological data with big sample size, and it guarantees a lower bound of the cluster density. This algorithm has been used in a variety of cancer transcriptomic data to mine gene co-expression networks (GCNs), but it can be applied to any correlational matrix. lmQCM is available through R package lmQCM as well as the online tool package TSUNAMI ( https://biolearns.medicine.iu.edu ). In this study, the purpose is to establish an analytical workflow to apply lmQCM for frequent (consensus) cluster mining in multiple DNA methylation datasets in different cancers and extract the underlying common co-methylation networks for genes.Specifically, the workflow is applied to analyze DNA methylome data across different cancer types using lmQCM. It mines frequent DNA methylation clusters based on individual clustering mining results, identifying common as well as distinctive DNA methylation patterns among different cancer types. This workflow has successfully identified frequent GCNs in 33 types of cancers, thus proven to be a powerful tool to analyze large biological data. It helps to identify common features as well as distinctions among different diseases, disease subtypes, or among different biological processes. The resulted frequent clusters may provide new insights on the pathway/function networks. In the case of disease study, the results lead to new directions for biomarker and drug target discovery. The advantages of this workflow include the highly efficient processing of large biological data generated from high-throughput experiments, quick identification of highly correlated interaction networks, substantial reduction of the data dimensionality to a manageable number of variables for downstream comparative analysis, and consequently increased statistical power for detecting differences between conditions.

Identifiants

DOI: 10.1007/978-1-0716-1994-0_12 PMID: 35505214

pubmed: 35505214

doi: 10.1007/978-1-0716-1994-0_12

doi:

Substances chimiques

DNA 9007-49-2

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

Pagination

153-165

Informations de copyright

Références

Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98(9):5116–5121

doi: 10.1073/pnas.091062498

Narayanan A et al (2004) Single-layer artificial neural networks for gene expression analysis. Neurocomputing 61:217–237

doi: 10.1016/j.neucom.2003.10.017

Tibshirani R et al (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 99(10):6567–6572

doi: 10.1073/pnas.082099299

Yeung KY, Ruzzo WL (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17(9):763–774

doi: 10.1093/bioinformatics/17.9.763

Hu H et al (2005) Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 21(Suppl 1):i213–i221

doi: 10.1093/bioinformatics/bti1049

Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9(1):559

doi: 10.1186/1471-2105-9-559

Zhang J, Huang K (2014) Normalized lmQCM: an algorithm for detecting weak quasi-cliques in weighted graph with applications in gene co-expression module discovery in cancers. Cancer Inform 13(Suppl 3):137–146

pubmed: 27486298

Ou Y, Zhang CQ (2007) A new multimembership clustering method. J Indust Manag Optim 3(4):619–624

doi: 10.3934/jimo.2007.3.619

Shroff S, Zhang J, Huang K (2016) Gene co-expression analysis predicts genetic variants associated with drug responsiveness in lung cancer. AMIA Jt Summits Transl Sci Proc 2016:32–41

pubmed: 27570645 pmcid: 5001757

Cheng J et al (2018) Identification of topological features in renal tumor microenvironment associated with patient survival. Bioinformatics 34(6):1024–1030

doi: 10.1093/bioinformatics/btx723

Cheng J et al (2017) Integrative analysis of histopathological images and genomic data predicts clear cell renal cell carcinoma prognosis. Cancer Res 77(21):e91–e100

doi: 10.1158/0008-5472.CAN-17-0313

Huang Z, Han Z, Wang T, Salama P, Huang K, Zhang J (2021) TSUNAMI: Translational bioinformatics tool suite for network analysis and mining, Genom Proteom Bioinform. https://doi.org/10.1016/j.gpb.2019.05.006

Kulis M, Esteller M (2010) DNA methylation and cancer. Adv Genet 70:27–56

doi: 10.1016/B978-0-12-380866-0.60002-2

Zhang J, Huang K (2016) Normalized ImQCM: an algorithm for detecting weak quasi-cliques in weighted graph with applications in gene co-expression module discovery in cancers. Cancer Inform 13(Suppl 3):137–146

pubmed: 27486298 pmcid: 4962959

Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inform Process Syst 14:849–856

Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between co-expression modules. BMC Syst Biol 1:54

doi: 10.1186/1752-0509-1-54

Workflow to Mine Frequent DNA Co-methylation Clusters in DNA Methylome Data.

Journal

Informations de publication

Résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Jie Zhang (J)

Kun Huang (K)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH