Differential Expression, Functional and Machine Learning Analysis of High-Throughput -Omics Data Using Open-Source Tools.

DNA methylation Differential expression analysis Functional groups Gene expression Gingiva Machine learning Microarray Next-generation sequencing Periodontal disease Transcriptome microRNA

Journal

Methods in molecular biology (Clifton, N.J.)
ISSN: 1940-6029
Titre abrégé: Methods Mol Biol
Pays: United States
ID NLM: 9214969

Informations de publication

Date de publication:
2023
Historique:
entrez: 23 11 2022
pubmed: 24 11 2022
medline: 26 11 2022
Statut: ppublish

Résumé

Today, -omics analyses, including the systematic cataloging of messenger RNA and microRNA sequences or DNA methylation patterns in a cell population, organ or tissue sample, allow for an unbiased, comprehensive genome-level analysis of complex diseases, offering a large advantage over earlier "candidate" gene or pathway analyses. A primary goal in the analysis of these high-throughput assays is the detection of those features among several thousand that differ between different groups of samples. In the context of oral biology, our group has successfully utilized -omics technology to identify key molecules and pathways in different diagnostic entities of periodontal disease.A major issue when inferring biological information from high-throughput -omics studies is the fact that the sheer volume of high-dimensional data generated by contemporary technology is not appropriately analyzed using common statistical methods employed in the biomedical sciences. Furthermore, machine learning methods facilitate the detection of additional patterns, beyond the mere identification of lists of features that differ between groups.Herein, we outline a robust and well-accepted bioinformatics workflow for the initial analysis of -omics data using open-source tools. We outline a differential expression analysis pipeline that can be used for data from both arrays and sequencing experiments, and offers the possibility to account for random or fixed effects. Furthermore, we present an overview of the possibilities for a functional analysis of the obtained data including subsequent machine learning approaches in form of (i) supervised classification algorithms in class validation and (ii) unsupervised clustering in class discovery.

Identifiants

pubmed: 36418696
doi: 10.1007/978-1-0716-2780-8_19
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't Review

Langues

eng

Sous-ensembles de citation

IM

Pagination

317-351

Informations de copyright

© 2023. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.

Références

Kebschull M, Demmer RT, Grun B, Guarnieri P, Pavlidis P, Papapanou PN (2014) Gingival tissue transcriptomes identify distinct periodontitis phenotypes. J Dent Res 93:459–468. https://doi.org/10.1177/0022034514527288
doi: 10.1177/0022034514527288
Nowak M, Krämer B, Haupt M, Papapanou PN, Kebschull J, Hoffmann P et al (2013) Activation of invariant NK T cells in periodontitis lesions. J immunol 190:2282–2291. https://doi.org/10.4049/jimmunol.1201215
doi: 10.4049/jimmunol.1201215
Kebschull M, Guarnieri P, Demmer RT, Boulesteix AL, Pavlidis P, Papapanou PN (2013) Molecular differences between chronic and aggressive periodontitis. J Dent Res 92:1081–1088. https://doi.org/10.1177/0022034513506011
doi: 10.1177/0022034513506011
Kramer B, Kebschull M, Nowak M, Demmer RT, Haupt M, Korner C et al (2013) Role of the NK cell-activating receptor CRACC in periodontitis. Infect Immun 81:690–696. https://doi.org/10.1128/IAI.00895-12
doi: 10.1128/IAI.00895-12
Stoecklin-Wasmer C, Guarnieri P, Celenti R, Demmer RT, Kebschull M, Papapanou PN (2012) MicroRNAs and their target genes in gingival tissues. J Dent Res 91:934–940. https://doi.org/10.1177/0022034512456551
doi: 10.1177/0022034512456551
Kebschull M, Papapanou PN (2010) The use of gene arrays in deciphering the pathobiology of periodontal diseases. Methods Mol Biol 666:385–393. https://doi.org/10.1007/978-1-60761-820-1_24
doi: 10.1007/978-1-60761-820-1_24
Papapanou PN, Behle JH, Kebschull M, Celenti R, Wolf DL, Handfield M et al (2009) Subgingival bacterial colonization profiles correlate with gingival tissue gene expression. BMC Microbiol 9:221. https://doi.org/10.1186/1471-2180-9-221
doi: 10.1186/1471-2180-9-221
Demmer RT, Behle JH, Wolf DL, Handfield M, Kebschull M, Celenti R et al (2008) Transcriptomes in healthy and diseased gingival tissues. J Periodontol 79:2112–2124. https://doi.org/10.1902/jop.2008.080139
doi: 10.1902/jop.2008.080139
Joensson D, Ramberg P, Demmer RT, Kebschull M, Dahlen G, Papapanou PN (2011) Gingival tissue transcriptomes in experimental gingivitis. J Clin Periodontol 38:599–611. https://doi.org/10.1111/j.1600-051X.2011.01719.x
doi: 10.1111/j.1600-051X.2011.01719.x
Kroger A, Hulsmann C, Fickl S, Spinell T, Huttig F, Kaufmann F et al (2018) The severity of human peri-implantitis lesions correlates with the level of submucosal microbial dysbiosis. J Clin Periodontol 45:1498–1509. https://doi.org/10.1111/jcpe.13023
doi: 10.1111/jcpe.13023
Kim H, Momen-Heravi F, Chen S, Hoffmann P, Kebschull M, Papapanou PN (2021) Differential DNA methylation and mRNA transcription in gingival tissues in periodontal health and disease. J Clin Periodontol 48:1152–1164. https://doi.org/10.1111/jcpe.13504
doi: 10.1111/jcpe.13504
Momen-Heravi F, Friedman RA, Albeshri S, Sawle A, Kebschull M, Kuhn A et al (2021) Cell type-specific decomposition of gingival tissue transcriptomes. J Dent Res 100(5):549–556. https://doi.org/10.1177/0022034520979614
doi: 10.1177/0022034520979614
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10):R80. https://doi.org/10.1186/gb-2004-5-10-r80
doi: 10.1186/gb-2004-5-10-r80
Ritchie ME, Diyagama D, Neilson J, van Laar R, Dobrovic A, Holloway A et al (2006) Empirical array quality weights in the analysis of microarray data. BMC Bioinformatics 7:261 . 1471-2105-7-261 [pii]. https://doi.org/10.1186/1471-2105-7-261
doi: 10.1186/1471-2105-7-261
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W et al (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43(7):e47. https://doi.org/10.1093/nar/gkv007
doi: 10.1093/nar/gkv007
Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. Journal of Statistical Software. 28(4):1–35
doi: 10.18637/jss.v028.i04
Slawski M, Daumer M, Boulesteix AL (2008) CMA: a comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9:439. https://doi.org/10.1186/1471-2105-9-439
doi: 10.1186/1471-2105-9-439
Wickham H (2007) Reshaping data with the reshape package. Journal of Statistical Software 21(12):1–20
doi: 10.18637/jss.v021.i12
Wilkerson MD, Hayes DN (2010) ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26(12):1572–1573. https://doi.org/10.1093/bioinformatics/btq170
doi: 10.1093/bioinformatics/btq170
Warnes GR, Bolker B, Bonebakker L, Gentleman R, Huber W, Liaw A et al (2009) gplots: various R programming tools for plotting data. R package version 2(4):1
Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) MCLUST version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Technical report no. 597, Department of Statistics, University of Washington, USA
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21. https://doi.org/10.1093/bioinformatics/bts635
doi: 10.1093/bioinformatics/bts635
Dobin A, Gingeras TR (2015) Mapping RNA-seq reads with STAR. Curr Protoc Bioinformatics 51:11.4.1–11.4.9. https://doi.org/10.1002/0471250953.bi1114s51
doi: 10.1002/0471250953.bi1114s51
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120. https://doi.org/10.1093/bioinformatics/btu170
doi: 10.1093/bioinformatics/btu170
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102(43):15545–15550. https://doi.org/10.1073/pnas.0506580102
doi: 10.1073/pnas.0506580102
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res13(11):2498-2504 doi: https://doi.org/10.1101/gr.1239303
Merico D, Isserlin R, Bader GD (2011) Visualizing gene-set enrichment results using the Cytoscape plug-in enrichment map. Methods Mol Biol 781:257–277. https://doi.org/10.1007/978-1-61779-276-2_12
doi: 10.1007/978-1-61779-276-2_12
Gillis J, Mistry M, Pavlidis P (2010) Gene function analysis in complex data sets using ErmineJ. Nat Protoc 5(6):1148–1159. https://doi.org/10.1038/nprot.2010.78
doi: 10.1038/nprot.2010.78
Armitage GC (1999) Development of a classification system for periodontal diseases and conditions. Ann Periodontol 4(1):1–6. https://doi.org/10.1902/annals.1999.4.1.1
doi: 10.1902/annals.1999.4.1.1
Armitage GC, Cullinan MP (2010) Comparison of the clinical features of chronic and aggressive periodontitis. Periodontol 2000 53:12–27. PRD353 [pii]. https://doi.org/10.1111/j.1600-0757.2010.00353.x
doi: 10.1111/j.1600-0757.2010.00353.x
Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD et al (2014) Minfi: a flexible and comprehensive bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30(10):1363–1369. https://doi.org/10.1093/bioinformatics/btu049
doi: 10.1093/bioinformatics/btu049
Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30(7):923–930. https://doi.org/10.1093/bioinformatics/btt656
doi: 10.1093/bioinformatics/btt656
Robinson MD, Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11(3):R25. https://doi.org/10.1186/gb-2010-11-3-r25
doi: 10.1186/gb-2010-11-3-r25
Law CW, Chen Y, Shi W, Smyth GK (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15(2):R29. https://doi.org/10.1186/gb-2014-15-2-r29
doi: 10.1186/gb-2014-15-2-r29
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc 57:289–300
Hubert L, Arabie P (1985) Comparing partitions. Journal of classification 2(1):193–218
doi: 10.1007/BF01908075
Papapanou PN, Abron A, Verbitsky M, Picolos D, Yang J, Qin J et al (2004) Gene expression signatures in chronic and aggressive periodontitis: a pilot study. Eur J Oral Sci 112:216–223
doi: 10.1111/j.1600-0722.2004.00124.x
Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE et al (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11(10):733–739. https://doi.org/10.1038/nrg2825
doi: 10.1038/nrg2825
Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 28:882–883. https://doi.org/10.1093/bioinformatics/bts034
doi: 10.1093/bioinformatics/bts034
Hansen KD, Brenner SE, Dudoit S (2010) Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38(12):e131. https://doi.org/10.1093/nar/gkq224
doi: 10.1093/nar/gkq224
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652. https://doi.org/10.1038/nbt.1883
doi: 10.1038/nbt.1883
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J et al (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8:1494–1512. https://doi.org/10.1038/nprot.2013.084
doi: 10.1038/nprot.2013.084
Bray NL, Pimentel H, Melsted P, Pachter L (2016) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34:522–527. https://doi.org/10.1038/nbt.3519
doi: 10.1038/nbt.3519
Kozomara A, Griffiths-Jones S (2014) miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res 42(Database issue):D68–D73. https://doi.org/10.1093/nar/gkt1181
doi: 10.1093/nar/gkt1181
Boulesteix AL (2020) Over-optimism in bioinformatics research. Bioinformatics 26(3):437–439. https://doi.org/10.1093/bioinformatics/btp648
doi: 10.1093/bioinformatics/btp648
Smyth GK (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:Article3. https://doi.org/10.2202/1544-6115.1027
doi: 10.2202/1544-6115.1027
Boulesteix AL, Strobl C (2009) Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction. BMC Med Res Methedol 9:85. https://doi.org/10.1186/1471-2288-9-85
doi: 10.1186/1471-2288-9-85
Kopylova E, Noe L, Touzet H (2012) SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28(24):3211–3217. https://doi.org/10.1093/bioinformatics/bts611
doi: 10.1093/bioinformatics/bts611

Auteurs

Moritz Kebschull (M)

Periodontal Research Group, Institute of Clinical Sciences, College of Medical & Dental Sciences, The University of Birmingham, Birmingham, UK. moritz@kebschull.me.
Division of Periodontics, Section of Oral, Diagnostic and Rehabilitation Sciences, Columbia University College of Dental Medicine, New York, NY, USA. moritz@kebschull.me.
Birmingham Community Healthcare NHS Trust, Birmingham, UK. moritz@kebschull.me.

Annika Therese Kroeger (AT)

Birmingham Community Healthcare NHS Trust, Birmingham, UK.
Department of Oral Surgery, School of Dentistry, University of Birmingham, Birmingham, UK.

Panos N Papapanou (PN)

Division of Periodontics, Section of Oral, Diagnostic and Rehabilitation Sciences, Columbia University College of Dental Medicine, New York, NY, USA.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature

Classifications MeSH