A new approach to describe the taxonomic structure of microbiome and its application to assess the relationship between microbial niches.

Bacterial sub-communities Latent Dirichlet allocation Oral microbiome Unsupervised machine learning

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
05 Feb 2024
Historique:
received: 01 02 2023
accepted: 20 11 2023
medline: 6 2 2024
pubmed: 6 2 2024
entrez: 5 2 2024
Statut: epublish

Résumé

Data from microbiomes from multiple niches is often collected, but methods to analyse these often ignore associations between niches. One interesting case is that of the oral microbiome. Its composition is receiving increasing attention due to reports on its associations with general health. While the oral cavity includes different niches, multi-niche microbiome data analysis is conducted using a single niche at a time and, therefore, ignores other niches that could act as confounding variables. Understanding the interaction between niches would assist interpretation of the results, and help improve our understanding of multi-niche microbiomes. In this study, we used a machine learning technique called latent Dirichlet allocation (LDA) on two microbiome datasets consisting of several niches. LDA was used on both individual niches and all niches simultaneously. On individual niches, LDA was used to decompose each niche into bacterial sub-communities unveiling their taxonomic structure. These sub-communities were then used to assess the relationship between microbial niches using the global test. On all niches simultaneously, LDA allowed us to extract meaningful microbial patterns. Sets of co-occurring operational taxonomic units (OTUs) comprising those patterns were then used to predict the original location of each sample. Our approach showed that the per-niche sub-communities displayed a strong association between supragingival plaque and saliva, as well as between the anterior and posterior tongue. In addition, the LDA-derived microbial signatures were able to predict the original sample niche illustrating the meaningfulness of our sub-communities. For the multi-niche oral microbiome dataset we had an overall accuracy of 76%, and per-niche sensitivity of up to 83%. Finally, for a second multi-niche microbiome dataset from the entire body, microbial niches from the oral cavity displayed stronger associations to each other than with those from other parts of the body, such as niches within the vagina and the skin. Our LDA-based approach produces sets of co-occurring taxa that can describe niche composition. LDA-derived microbial signatures can also be instrumental in summarizing microbiome data, for both descriptions as well as prediction.

Sections du résumé

BACKGROUND BACKGROUND
Data from microbiomes from multiple niches is often collected, but methods to analyse these often ignore associations between niches. One interesting case is that of the oral microbiome. Its composition is receiving increasing attention due to reports on its associations with general health. While the oral cavity includes different niches, multi-niche microbiome data analysis is conducted using a single niche at a time and, therefore, ignores other niches that could act as confounding variables. Understanding the interaction between niches would assist interpretation of the results, and help improve our understanding of multi-niche microbiomes.
METHODS METHODS
In this study, we used a machine learning technique called latent Dirichlet allocation (LDA) on two microbiome datasets consisting of several niches. LDA was used on both individual niches and all niches simultaneously. On individual niches, LDA was used to decompose each niche into bacterial sub-communities unveiling their taxonomic structure. These sub-communities were then used to assess the relationship between microbial niches using the global test. On all niches simultaneously, LDA allowed us to extract meaningful microbial patterns. Sets of co-occurring operational taxonomic units (OTUs) comprising those patterns were then used to predict the original location of each sample.
RESULTS RESULTS
Our approach showed that the per-niche sub-communities displayed a strong association between supragingival plaque and saliva, as well as between the anterior and posterior tongue. In addition, the LDA-derived microbial signatures were able to predict the original sample niche illustrating the meaningfulness of our sub-communities. For the multi-niche oral microbiome dataset we had an overall accuracy of 76%, and per-niche sensitivity of up to 83%. Finally, for a second multi-niche microbiome dataset from the entire body, microbial niches from the oral cavity displayed stronger associations to each other than with those from other parts of the body, such as niches within the vagina and the skin.
CONCLUSION CONCLUSIONS
Our LDA-based approach produces sets of co-occurring taxa that can describe niche composition. LDA-derived microbial signatures can also be instrumental in summarizing microbiome data, for both descriptions as well as prediction.

Identifiants

pubmed: 38317062
doi: 10.1186/s12859-023-05575-8
pii: 10.1186/s12859-023-05575-8
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

58

Informations de copyright

© 2024. The Author(s).

Références

Sender R, Fuchs S, Milo R. Revised estimates for the number of human and bacteria cells in the body. PLoS Biol. 2016;14(8):1002533. https://doi.org/10.1371/journal.pbio.1002533 .
doi: 10.1371/journal.pbio.1002533
Curtis MA, Diaz PI, Van Dyke TE. The role of the microbiota in periodontal disease. Periodontol 2000. 2020;83(1):14–25. https://doi.org/10.1111/prd.12296 .
doi: 10.1111/prd.12296 pubmed: 32385883
Irfan M, Delgado RZR, Frias-Lopez J. The oral microbiome and cancer. Front Immunol. 2020;11: 591088. https://doi.org/10.3389/fimmu.2020.591088 .
doi: 10.3389/fimmu.2020.591088 pubmed: 33193429 pmcid: 7645040
Wingfield B, Lapsley C, McDowell A, Miliotis G, McLafferty M, O’Neill SM, Coleman S, McGinnity TM, Bjourson AJ, Murray EK. Variations in the oral microbiome are associated with depression in young adults. Sci Rep. 2021;11(1):15009. https://doi.org/10.1038/s41598-021-94498-6 .
doi: 10.1038/s41598-021-94498-6 pubmed: 34294835 pmcid: 8298414
Sureda A, Daglia M, Argüelles Castilla S, Sanadgol N, Fazel Nabavi S, Khan H, Belwal T, Jeandet P, Marchese A, Pistollato F, Forbes-Hernandez T, Battino M, Berindan-Neagoe I, D’Onofrio G, Nabavi SM. Oral microbiota and Alzheimer’s disease: Do all roads lead to Rome? Pharmacol Res. 2020;151: 104582. https://doi.org/10.1016/j.phrs.2019.104582 .
doi: 10.1016/j.phrs.2019.104582 pubmed: 31794871
Matsha TE, Prince Y, Davids S, Chikte U, Erasmus RT, Kengne AP, Davison GM. Oral microbiome signatures in diabetes mellitus and periodontal disease. J Dent Res. 2020;99(6):658–65. https://doi.org/10.1177/0022034520913818 .
doi: 10.1177/0022034520913818 pubmed: 32298191
Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome datasets are compositional: and this is not optional. Front Microbiol. 2017;8:2224.
doi: 10.3389/fmicb.2017.02224 pubmed: 29187837 pmcid: 5695134
Gloor GB, Wu JR, Pawlowsky-Glahn V, Egozcue JJ. It’s all relative: analyzing microbiome data as compositions. Microbiome Epidemiol. 2016;26(5):322–9. https://doi.org/10.1016/j.annepidem.2016.03.003 .
doi: 10.1016/j.annepidem.2016.03.003
Pan AY. Statistical analysis of microbiome data: the challenge of sparsity. Curr Opin Endocr Metab Res. 2021;19:35–40. https://doi.org/10.1016/j.coemr.2021.05.005 .
doi: 10.1016/j.coemr.2021.05.005
McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol. 2014;10(4):1003531. https://doi.org/10.1371/journal.pcbi.1003531 .
doi: 10.1371/journal.pcbi.1003531
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40. https://doi.org/10.1093/bioinformatics/btp616 .
doi: 10.1093/bioinformatics/btp616 pubmed: 19910308
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. https://doi.org/10.1186/s13059-014-0550-8 .
doi: 10.1186/s13059-014-0550-8 pubmed: 25516281 pmcid: 4302049
Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C. Metagenomic biomarker discovery and explanation. Genome Biol. 2011;12(6):60. https://doi.org/10.1186/gb-2011-12-6-r60 .
doi: 10.1186/gb-2011-12-6-r60
Nearing JT, Douglas GM, Hayes MG, MacDonald J, Desai DK, Allward N, Jones CMA, Wright RJ, Dhanani AS, Comeau AM, Langille MGI. Microbiome differential abundance methods produce different results across 38 datasets. Nat Commun. 2022;13(1):342. https://doi.org/10.1038/s41467-022-28034-z .
doi: 10.1038/s41467-022-28034-z pubmed: 35039521 pmcid: 8763921
Clarke KR. Non-parametric multivariate analyses of changes in community structure. Aust J Ecol. 1993;18(1):117–43. https://doi.org/10.1111/j.1442-9993.1993.tb00438.x .
doi: 10.1111/j.1442-9993.1993.tb00438.x
Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27(3):379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x . (Conference Name: The Bell System Technical Journal).
Whittaker RH. Vegetation of the Siskiyou mountains, Oregon and California. Ecol Monogr. 1960;30(3):279–338. https://doi.org/10.2307/1943563 .
doi: 10.2307/1943563
Kahharova D, Brandt BW, Buijs MJ, Peters M, Jackson R, Eckert G, Katz B, Keels MA, Levy SM, Fontana M, Zaura E. Maturation of the oral microbiome in caries-free toddlers: a longitudinal study. J Dent Res. 2020;99(2):159–67. https://doi.org/10.1177/0022034519889015 .
doi: 10.1177/0022034519889015 pubmed: 31771395
Harrison JG, Calder WJ, Shastry V, Buerkle CA. Dirichlet-multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data. Mol Ecol Resour. 2020;20(2):481–97. https://doi.org/10.1111/1755-0998.13128 .
doi: 10.1111/1755-0998.13128 pubmed: 31872949
Holmes I, Harris K, Quince C. Dirichlet multinomial mixtures: generative models for microbial metagenomics. PLoS ONE. 2012;7(2):30126. https://doi.org/10.1371/journal.pone.0030126 .
doi: 10.1371/journal.pone.0030126
Ding T, Schloss PD. Dynamics and associations of microbial community types across the human body. Nature. 2014;509(7500):357–60. https://doi.org/10.1038/nature13178 .
doi: 10.1038/nature13178 pubmed: 24739969 pmcid: 4139711
Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–59.
doi: 10.1093/genetics/155.2.945 pubmed: 10835412 pmcid: 1461096
Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.
Breuninger TA, Wawro N, Breuninger J, Reitmeier S, Clavel T, Six-Merker J, Pestoni G, Rohrmann S, Rathmann W, Peters A, Grallert H, Meisinger C, Haller D, Linseisen J. Associations between habitual diet, metabolic disease, and the gut microbiota using latent Dirichlet allocation. Microbiome. 2021;9(1):61. https://doi.org/10.1186/s40168-020-00969-9 .
doi: 10.1186/s40168-020-00969-9 pubmed: 33726846 pmcid: 7967986
Hosoda S, Nishijima S, Fukunaga T, Hattori M, Hamada M. Revealing the microbial assemblage structure in the human gut microbiome using latent Dirichlet allocation. Microbiome. 2020;8(1):95. https://doi.org/10.1186/s40168-020-00864-3 .
doi: 10.1186/s40168-020-00864-3 pubmed: 32576288 pmcid: 7313204
Sommeria-Klein G, Zinger L, Coissac E, Iribar A, Schimann H, Taberlet P, Chave J. Latent Dirichlet allocation reveals spatial and taxonomic structure in a DNA-based census of soil biodiversity from a tropical forest. Mol Ecol Resour. 2020;20(2):371–86. https://doi.org/10.1111/1755-0998.13109 .
doi: 10.1111/1755-0998.13109 pubmed: 31650682
Sankaran K, Holmes SP. Latent variable modeling for the microbiome. Biostatistics. 2019;20(4):599–614. https://doi.org/10.1093/biostatistics/kxy018 .
doi: 10.1093/biostatistics/kxy018 pubmed: 29868846
Goeman JJ, van de Geer SA, de Kort F, van Houwelingen HC. A global test for groups of genes: testing association with a clinical outcome. Bioinformatics. 2004;20(1):93–9. https://doi.org/10.1093/bioinformatics/btg382 .
doi: 10.1093/bioinformatics/btg382 pubmed: 14693814
Huber W, von Heydebreck A, Sültmann H, Poustka A, Vingron M. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 2002;18(suppl–1):96–104. https://doi.org/10.1093/bioinformatics/18.suppl_1.S96 .
doi: 10.1093/bioinformatics/18.suppl_1.S96
Arun R, Suresh V, Veni Madhavan CE, Narasimha Murthy MN. On finding the natural number of topics with latent Dirichlet allocation: some observations. In: Zaki MJ, Yu JX, Ravindran B, Pudi V, editors. Advances in knowledge discovery and data mining. Berlin: Springer; 2010. p. 391–402. https://doi.org/10.1007/978-3-642-13657-3_43 .
doi: 10.1007/978-3-642-13657-3_43
Cao J, Xia T, Li J, Zhang Y, Tang S. A density-based method for adaptive LDA model selection. Adv Mach Learn Comput Intell. 2009;72(7):1775–81. https://doi.org/10.1016/j.neucom.2008.06.011 .
doi: 10.1016/j.neucom.2008.06.011
Deveaud R, SanJuan E, Bellot P. Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique. 2014;17(1):61–84. https://doi.org/10.3166/dn.17.1.61-84 .
doi: 10.3166/dn.17.1.61-84
Griffiths TL, Steyvers M. Finding scientific topics. Proc Natl Acad Sci. 2004;101(suppl–1):5228–35. https://doi.org/10.1073/pnas.0307752101 .
doi: 10.1073/pnas.0307752101 pubmed: 14872004 pmcid: 387300
Zhao W, Chen JJ, Perkins R, Liu Z, Ge W, Ding Y, Zou W. A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinform. 2015;16(13):8. https://doi.org/10.1186/1471-2105-16-S13-S8 .
doi: 10.1186/1471-2105-16-S13-S8
Zaura E, Brandt BW, Prodan A, Teixeira de Mattos MJ, Imangaliyev S, Kool J, Buijs MJ, Jagers FL, Hennequin-Hoenderdos NL, Slot DE, Nicu EA, Lagerweij MD, Janus MM, Fernandez-Gutierrez MM, Levin E, Krom BP, Brand HS, Veerman EC, Kleerebezem M, Loos BG, van der Weijden GA, Crielaard W, Keijser BJ. On the ecosystemic network of saliva in healthy young adults. ISME J. 2017;11(5):1218–31. https://doi.org/10.1038/ismej.2016.199 .
doi: 10.1038/ismej.2016.199 pubmed: 28072421 pmcid: 5475835
Prodan A, Brand HS, Ligtenberg AJM, Imangaliyev S, Tsivtsivadze E, van der Weijden F, Crielaard W, Keijser BJF, Veerman ECI. Interindividual variation, correlations, and sex-related differences in the salivary biochemistry of young healthy adults. Eur J Oral Sci. 2015;123(3):149–57. https://doi.org/10.1111/eos.12182 .
doi: 10.1111/eos.12182 pubmed: 25809904
Human Microbiome Project Consortium. A framework for human microbiome research. Nature. 2012;486(7402):215–21. https://doi.org/10.1038/nature11209 .
doi: 10.1038/nature11209
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486(7402):207–14. https://doi.org/10.1038/nature11234 .
doi: 10.1038/nature11234
Faust K, Sathirapongsasuti JF, Izard J, Segata N, Gevers D, Raes J, Huttenhower C. Microbial co-occurrence relationships in the human microbiome. PLoS Comput Biol. 2012;8(7):1002606. https://doi.org/10.1371/journal.pcbi.1002606 .
doi: 10.1371/journal.pcbi.1002606
Byrd AL, Belkaid Y, Segre JA. The human skin microbiome. Nat Rev Microbiol. 2018;16(3):143–55. https://doi.org/10.1038/nrmicro.2017.157 .
doi: 10.1038/nrmicro.2017.157 pubmed: 29332945
Aas JA, Paster BJ, Stokes LN, Olsen I, Dewhirst FE. Defining the normal bacterial flora of the oral cavity. J Clin Microbiol. 2005;43(11):5721–32. https://doi.org/10.1128/JCM.43.11.5721-5732.2005 .
doi: 10.1128/JCM.43.11.5721-5732.2005 pubmed: 16272510 pmcid: 1287824
Gomar-Vercher S, Simón-Soro A, Montiel-Company JM, Almerich-Silla JM, Mira A. Stimulated and unstimulated saliva samples have significantly different bacterial profiles. PLoS ONE. 2018;13(6):0198021. https://doi.org/10.1371/journal.pone.0198021 .
doi: 10.1371/journal.pone.0198021
Buntine WL, Perttu S. Is multinomial PCA multi-faceted clustering or dimensionality reduction? Proc Mach Learn Res. 2003;R4:57–64.
Chen J, Gong Z, Liu W. A Dirichlet process biterm-based mixture model for short text stream clustering. Appl Intell. 2020;50(5):1609–19. https://doi.org/10.1007/s10489-019-01606-1 .
doi: 10.1007/s10489-019-01606-1

Auteurs

Vincent Y Pappalardo (VY)

Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam, University of Amsterdam and Vrije Universiteit Amsterdam, Amsterdam, The Netherlands. v.y.pappalardo@acta.nl.
Biostatistics Centre, Department of Psychosocial Research and Epidemiology, Netherlands Cancer Institute, Amsterdam, The Netherlands. v.y.pappalardo@acta.nl.

Leyla Azarang (L)

Biostatistics Centre, Department of Psychosocial Research and Epidemiology, Netherlands Cancer Institute, Amsterdam, The Netherlands.

Egija Zaura (E)

Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam, University of Amsterdam and Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.

Bernd W Brandt (BW)

Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam, University of Amsterdam and Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.

Renée X de Menezes (RX)

Biostatistics Centre, Department of Psychosocial Research and Epidemiology, Netherlands Cancer Institute, Amsterdam, The Netherlands.

Classifications MeSH