A new approach to describe the taxonomic structure of microbiome and its application to assess the relationship between microbial niches.
Bacterial sub-communities
Latent Dirichlet allocation
Oral microbiome
Unsupervised machine learning
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
05 Feb 2024
05 Feb 2024
Historique:
received:
01
02
2023
accepted:
20
11
2023
medline:
6
2
2024
pubmed:
6
2
2024
entrez:
5
2
2024
Statut:
epublish
Résumé
Data from microbiomes from multiple niches is often collected, but methods to analyse these often ignore associations between niches. One interesting case is that of the oral microbiome. Its composition is receiving increasing attention due to reports on its associations with general health. While the oral cavity includes different niches, multi-niche microbiome data analysis is conducted using a single niche at a time and, therefore, ignores other niches that could act as confounding variables. Understanding the interaction between niches would assist interpretation of the results, and help improve our understanding of multi-niche microbiomes. In this study, we used a machine learning technique called latent Dirichlet allocation (LDA) on two microbiome datasets consisting of several niches. LDA was used on both individual niches and all niches simultaneously. On individual niches, LDA was used to decompose each niche into bacterial sub-communities unveiling their taxonomic structure. These sub-communities were then used to assess the relationship between microbial niches using the global test. On all niches simultaneously, LDA allowed us to extract meaningful microbial patterns. Sets of co-occurring operational taxonomic units (OTUs) comprising those patterns were then used to predict the original location of each sample. Our approach showed that the per-niche sub-communities displayed a strong association between supragingival plaque and saliva, as well as between the anterior and posterior tongue. In addition, the LDA-derived microbial signatures were able to predict the original sample niche illustrating the meaningfulness of our sub-communities. For the multi-niche oral microbiome dataset we had an overall accuracy of 76%, and per-niche sensitivity of up to 83%. Finally, for a second multi-niche microbiome dataset from the entire body, microbial niches from the oral cavity displayed stronger associations to each other than with those from other parts of the body, such as niches within the vagina and the skin. Our LDA-based approach produces sets of co-occurring taxa that can describe niche composition. LDA-derived microbial signatures can also be instrumental in summarizing microbiome data, for both descriptions as well as prediction.
Sections du résumé
BACKGROUND
BACKGROUND
Data from microbiomes from multiple niches is often collected, but methods to analyse these often ignore associations between niches. One interesting case is that of the oral microbiome. Its composition is receiving increasing attention due to reports on its associations with general health. While the oral cavity includes different niches, multi-niche microbiome data analysis is conducted using a single niche at a time and, therefore, ignores other niches that could act as confounding variables. Understanding the interaction between niches would assist interpretation of the results, and help improve our understanding of multi-niche microbiomes.
METHODS
METHODS
In this study, we used a machine learning technique called latent Dirichlet allocation (LDA) on two microbiome datasets consisting of several niches. LDA was used on both individual niches and all niches simultaneously. On individual niches, LDA was used to decompose each niche into bacterial sub-communities unveiling their taxonomic structure. These sub-communities were then used to assess the relationship between microbial niches using the global test. On all niches simultaneously, LDA allowed us to extract meaningful microbial patterns. Sets of co-occurring operational taxonomic units (OTUs) comprising those patterns were then used to predict the original location of each sample.
RESULTS
RESULTS
Our approach showed that the per-niche sub-communities displayed a strong association between supragingival plaque and saliva, as well as between the anterior and posterior tongue. In addition, the LDA-derived microbial signatures were able to predict the original sample niche illustrating the meaningfulness of our sub-communities. For the multi-niche oral microbiome dataset we had an overall accuracy of 76%, and per-niche sensitivity of up to 83%. Finally, for a second multi-niche microbiome dataset from the entire body, microbial niches from the oral cavity displayed stronger associations to each other than with those from other parts of the body, such as niches within the vagina and the skin.
CONCLUSION
CONCLUSIONS
Our LDA-based approach produces sets of co-occurring taxa that can describe niche composition. LDA-derived microbial signatures can also be instrumental in summarizing microbiome data, for both descriptions as well as prediction.
Identifiants
pubmed: 38317062
doi: 10.1186/s12859-023-05575-8
pii: 10.1186/s12859-023-05575-8
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
58Informations de copyright
© 2024. The Author(s).
Références
Sender R, Fuchs S, Milo R. Revised estimates for the number of human and bacteria cells in the body. PLoS Biol. 2016;14(8):1002533. https://doi.org/10.1371/journal.pbio.1002533 .
doi: 10.1371/journal.pbio.1002533
Curtis MA, Diaz PI, Van Dyke TE. The role of the microbiota in periodontal disease. Periodontol 2000. 2020;83(1):14–25. https://doi.org/10.1111/prd.12296 .
doi: 10.1111/prd.12296
pubmed: 32385883
Irfan M, Delgado RZR, Frias-Lopez J. The oral microbiome and cancer. Front Immunol. 2020;11: 591088. https://doi.org/10.3389/fimmu.2020.591088 .
doi: 10.3389/fimmu.2020.591088
pubmed: 33193429
pmcid: 7645040
Wingfield B, Lapsley C, McDowell A, Miliotis G, McLafferty M, O’Neill SM, Coleman S, McGinnity TM, Bjourson AJ, Murray EK. Variations in the oral microbiome are associated with depression in young adults. Sci Rep. 2021;11(1):15009. https://doi.org/10.1038/s41598-021-94498-6 .
doi: 10.1038/s41598-021-94498-6
pubmed: 34294835
pmcid: 8298414
Sureda A, Daglia M, Argüelles Castilla S, Sanadgol N, Fazel Nabavi S, Khan H, Belwal T, Jeandet P, Marchese A, Pistollato F, Forbes-Hernandez T, Battino M, Berindan-Neagoe I, D’Onofrio G, Nabavi SM. Oral microbiota and Alzheimer’s disease: Do all roads lead to Rome? Pharmacol Res. 2020;151: 104582. https://doi.org/10.1016/j.phrs.2019.104582 .
doi: 10.1016/j.phrs.2019.104582
pubmed: 31794871
Matsha TE, Prince Y, Davids S, Chikte U, Erasmus RT, Kengne AP, Davison GM. Oral microbiome signatures in diabetes mellitus and periodontal disease. J Dent Res. 2020;99(6):658–65. https://doi.org/10.1177/0022034520913818 .
doi: 10.1177/0022034520913818
pubmed: 32298191
Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome datasets are compositional: and this is not optional. Front Microbiol. 2017;8:2224.
doi: 10.3389/fmicb.2017.02224
pubmed: 29187837
pmcid: 5695134
Gloor GB, Wu JR, Pawlowsky-Glahn V, Egozcue JJ. It’s all relative: analyzing microbiome data as compositions. Microbiome Epidemiol. 2016;26(5):322–9. https://doi.org/10.1016/j.annepidem.2016.03.003 .
doi: 10.1016/j.annepidem.2016.03.003
Pan AY. Statistical analysis of microbiome data: the challenge of sparsity. Curr Opin Endocr Metab Res. 2021;19:35–40. https://doi.org/10.1016/j.coemr.2021.05.005 .
doi: 10.1016/j.coemr.2021.05.005
McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol. 2014;10(4):1003531. https://doi.org/10.1371/journal.pcbi.1003531 .
doi: 10.1371/journal.pcbi.1003531
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40. https://doi.org/10.1093/bioinformatics/btp616 .
doi: 10.1093/bioinformatics/btp616
pubmed: 19910308
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. https://doi.org/10.1186/s13059-014-0550-8 .
doi: 10.1186/s13059-014-0550-8
pubmed: 25516281
pmcid: 4302049
Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C. Metagenomic biomarker discovery and explanation. Genome Biol. 2011;12(6):60. https://doi.org/10.1186/gb-2011-12-6-r60 .
doi: 10.1186/gb-2011-12-6-r60
Nearing JT, Douglas GM, Hayes MG, MacDonald J, Desai DK, Allward N, Jones CMA, Wright RJ, Dhanani AS, Comeau AM, Langille MGI. Microbiome differential abundance methods produce different results across 38 datasets. Nat Commun. 2022;13(1):342. https://doi.org/10.1038/s41467-022-28034-z .
doi: 10.1038/s41467-022-28034-z
pubmed: 35039521
pmcid: 8763921
Clarke KR. Non-parametric multivariate analyses of changes in community structure. Aust J Ecol. 1993;18(1):117–43. https://doi.org/10.1111/j.1442-9993.1993.tb00438.x .
doi: 10.1111/j.1442-9993.1993.tb00438.x
Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27(3):379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x . (Conference Name: The Bell System Technical Journal).
Whittaker RH. Vegetation of the Siskiyou mountains, Oregon and California. Ecol Monogr. 1960;30(3):279–338. https://doi.org/10.2307/1943563 .
doi: 10.2307/1943563
Kahharova D, Brandt BW, Buijs MJ, Peters M, Jackson R, Eckert G, Katz B, Keels MA, Levy SM, Fontana M, Zaura E. Maturation of the oral microbiome in caries-free toddlers: a longitudinal study. J Dent Res. 2020;99(2):159–67. https://doi.org/10.1177/0022034519889015 .
doi: 10.1177/0022034519889015
pubmed: 31771395
Harrison JG, Calder WJ, Shastry V, Buerkle CA. Dirichlet-multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data. Mol Ecol Resour. 2020;20(2):481–97. https://doi.org/10.1111/1755-0998.13128 .
doi: 10.1111/1755-0998.13128
pubmed: 31872949
Holmes I, Harris K, Quince C. Dirichlet multinomial mixtures: generative models for microbial metagenomics. PLoS ONE. 2012;7(2):30126. https://doi.org/10.1371/journal.pone.0030126 .
doi: 10.1371/journal.pone.0030126
Ding T, Schloss PD. Dynamics and associations of microbial community types across the human body. Nature. 2014;509(7500):357–60. https://doi.org/10.1038/nature13178 .
doi: 10.1038/nature13178
pubmed: 24739969
pmcid: 4139711
Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–59.
doi: 10.1093/genetics/155.2.945
pubmed: 10835412
pmcid: 1461096
Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.
Breuninger TA, Wawro N, Breuninger J, Reitmeier S, Clavel T, Six-Merker J, Pestoni G, Rohrmann S, Rathmann W, Peters A, Grallert H, Meisinger C, Haller D, Linseisen J. Associations between habitual diet, metabolic disease, and the gut microbiota using latent Dirichlet allocation. Microbiome. 2021;9(1):61. https://doi.org/10.1186/s40168-020-00969-9 .
doi: 10.1186/s40168-020-00969-9
pubmed: 33726846
pmcid: 7967986
Hosoda S, Nishijima S, Fukunaga T, Hattori M, Hamada M. Revealing the microbial assemblage structure in the human gut microbiome using latent Dirichlet allocation. Microbiome. 2020;8(1):95. https://doi.org/10.1186/s40168-020-00864-3 .
doi: 10.1186/s40168-020-00864-3
pubmed: 32576288
pmcid: 7313204
Sommeria-Klein G, Zinger L, Coissac E, Iribar A, Schimann H, Taberlet P, Chave J. Latent Dirichlet allocation reveals spatial and taxonomic structure in a DNA-based census of soil biodiversity from a tropical forest. Mol Ecol Resour. 2020;20(2):371–86. https://doi.org/10.1111/1755-0998.13109 .
doi: 10.1111/1755-0998.13109
pubmed: 31650682
Sankaran K, Holmes SP. Latent variable modeling for the microbiome. Biostatistics. 2019;20(4):599–614. https://doi.org/10.1093/biostatistics/kxy018 .
doi: 10.1093/biostatistics/kxy018
pubmed: 29868846
Goeman JJ, van de Geer SA, de Kort F, van Houwelingen HC. A global test for groups of genes: testing association with a clinical outcome. Bioinformatics. 2004;20(1):93–9. https://doi.org/10.1093/bioinformatics/btg382 .
doi: 10.1093/bioinformatics/btg382
pubmed: 14693814
Huber W, von Heydebreck A, Sültmann H, Poustka A, Vingron M. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 2002;18(suppl–1):96–104. https://doi.org/10.1093/bioinformatics/18.suppl_1.S96 .
doi: 10.1093/bioinformatics/18.suppl_1.S96
Arun R, Suresh V, Veni Madhavan CE, Narasimha Murthy MN. On finding the natural number of topics with latent Dirichlet allocation: some observations. In: Zaki MJ, Yu JX, Ravindran B, Pudi V, editors. Advances in knowledge discovery and data mining. Berlin: Springer; 2010. p. 391–402. https://doi.org/10.1007/978-3-642-13657-3_43 .
doi: 10.1007/978-3-642-13657-3_43
Cao J, Xia T, Li J, Zhang Y, Tang S. A density-based method for adaptive LDA model selection. Adv Mach Learn Comput Intell. 2009;72(7):1775–81. https://doi.org/10.1016/j.neucom.2008.06.011 .
doi: 10.1016/j.neucom.2008.06.011
Deveaud R, SanJuan E, Bellot P. Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique. 2014;17(1):61–84. https://doi.org/10.3166/dn.17.1.61-84 .
doi: 10.3166/dn.17.1.61-84
Griffiths TL, Steyvers M. Finding scientific topics. Proc Natl Acad Sci. 2004;101(suppl–1):5228–35. https://doi.org/10.1073/pnas.0307752101 .
doi: 10.1073/pnas.0307752101
pubmed: 14872004
pmcid: 387300
Zhao W, Chen JJ, Perkins R, Liu Z, Ge W, Ding Y, Zou W. A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinform. 2015;16(13):8. https://doi.org/10.1186/1471-2105-16-S13-S8 .
doi: 10.1186/1471-2105-16-S13-S8
Zaura E, Brandt BW, Prodan A, Teixeira de Mattos MJ, Imangaliyev S, Kool J, Buijs MJ, Jagers FL, Hennequin-Hoenderdos NL, Slot DE, Nicu EA, Lagerweij MD, Janus MM, Fernandez-Gutierrez MM, Levin E, Krom BP, Brand HS, Veerman EC, Kleerebezem M, Loos BG, van der Weijden GA, Crielaard W, Keijser BJ. On the ecosystemic network of saliva in healthy young adults. ISME J. 2017;11(5):1218–31. https://doi.org/10.1038/ismej.2016.199 .
doi: 10.1038/ismej.2016.199
pubmed: 28072421
pmcid: 5475835
Prodan A, Brand HS, Ligtenberg AJM, Imangaliyev S, Tsivtsivadze E, van der Weijden F, Crielaard W, Keijser BJF, Veerman ECI. Interindividual variation, correlations, and sex-related differences in the salivary biochemistry of young healthy adults. Eur J Oral Sci. 2015;123(3):149–57. https://doi.org/10.1111/eos.12182 .
doi: 10.1111/eos.12182
pubmed: 25809904
Human Microbiome Project Consortium. A framework for human microbiome research. Nature. 2012;486(7402):215–21. https://doi.org/10.1038/nature11209 .
doi: 10.1038/nature11209
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486(7402):207–14. https://doi.org/10.1038/nature11234 .
doi: 10.1038/nature11234
Faust K, Sathirapongsasuti JF, Izard J, Segata N, Gevers D, Raes J, Huttenhower C. Microbial co-occurrence relationships in the human microbiome. PLoS Comput Biol. 2012;8(7):1002606. https://doi.org/10.1371/journal.pcbi.1002606 .
doi: 10.1371/journal.pcbi.1002606
Byrd AL, Belkaid Y, Segre JA. The human skin microbiome. Nat Rev Microbiol. 2018;16(3):143–55. https://doi.org/10.1038/nrmicro.2017.157 .
doi: 10.1038/nrmicro.2017.157
pubmed: 29332945
Aas JA, Paster BJ, Stokes LN, Olsen I, Dewhirst FE. Defining the normal bacterial flora of the oral cavity. J Clin Microbiol. 2005;43(11):5721–32. https://doi.org/10.1128/JCM.43.11.5721-5732.2005 .
doi: 10.1128/JCM.43.11.5721-5732.2005
pubmed: 16272510
pmcid: 1287824
Gomar-Vercher S, Simón-Soro A, Montiel-Company JM, Almerich-Silla JM, Mira A. Stimulated and unstimulated saliva samples have significantly different bacterial profiles. PLoS ONE. 2018;13(6):0198021. https://doi.org/10.1371/journal.pone.0198021 .
doi: 10.1371/journal.pone.0198021
Buntine WL, Perttu S. Is multinomial PCA multi-faceted clustering or dimensionality reduction? Proc Mach Learn Res. 2003;R4:57–64.
Chen J, Gong Z, Liu W. A Dirichlet process biterm-based mixture model for short text stream clustering. Appl Intell. 2020;50(5):1609–19. https://doi.org/10.1007/s10489-019-01606-1 .
doi: 10.1007/s10489-019-01606-1