PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies.


Journal

Communications biology
ISSN: 2399-3642
Titre abrégé: Commun Biol
Pays: England
ID NLM: 101719179

Informations de publication

Date de publication:
25 03 2022
Historique:
received: 22 12 2020
accepted: 25 02 2022
entrez: 26 3 2022
pubmed: 27 3 2022
medline: 19 4 2022
Statut: epublish

Résumé

Genome-wide association studies (GWAS) are increasingly being applied to investigate the genetic basis of bacterial traits. However, approaches to perform power calculations for bacterial GWAS are limited. Here we implemented two alternative approaches to conduct power calculations using existing collections of bacterial genomes. First, a sub-sampling approach was undertaken to reduce the allele frequency and effect size of a known and detectable genotype-phenotype relationship by modifying phenotype labels. Second, a phenotype-simulation approach was conducted to simulate phenotypes from existing genetic variants. We implemented both approaches into a computational pipeline (PowerBacGWAS) that supports power calculations for burden testing, pan-genome and variant GWAS; and applied it to collections of Enterococcus faecium, Klebsiella pneumoniae and Mycobacterium tuberculosis. We used this pipeline to determine sample sizes required to detect causal variants of different minor allele frequencies (MAF), effect sizes and phenotype heritability, and studied the effect of homoplasy and population diversity on the power to detect causal variants. Our pipeline and user documentation are made available and can be applied to other bacterial populations. PowerBacGWAS can be used to determine sample sizes required to find statistically significant associations, or the associations detectable with a given sample size. We recommend to perform power calculations using existing genomes of the bacterial species and population of study.

Identifiants

pubmed: 35338232
doi: 10.1038/s42003-022-03194-2
pii: 10.1038/s42003-022-03194-2
pmc: PMC8956664
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

266

Subventions

Organisme : Medical Research Council
ID : MR/M01360X/1
Pays : United Kingdom
Organisme : Wellcome Trust
ID : WT098600
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/R013063/1
Pays : United Kingdom
Organisme : Wellcome Trust
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/N010469/1
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/R025576/1
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/V032836/1
Pays : United Kingdom
Organisme : Wellcome Trust
ID : 201344/Z/16/Z
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/R020973/1
Pays : United Kingdom

Informations de copyright

© 2022. The Author(s).

Références

Earle, S. G. et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat. Microbiol. 1, 16041 (2016).
pubmed: 27572646 pmcid: 5049680 doi: 10.1038/nmicrobiol.2016.41
Coll, F. et al. Genome-wide analysis of multi- and extensively drug-resistant Mycobacterium tuberculosis. Nat. Genet. 50, 307–316 (2018).
pubmed: 29358649 doi: 10.1038/s41588-017-0029-0
Chewapreecha, C. et al. Comprehensive identification of single nucleotide polymorphisms associated with beta-lactam resistance within pneumococcal mosaic genes. PLoS Genet. 10, e1004547 (2014).
pubmed: 25101644 pmcid: 4125147 doi: 10.1371/journal.pgen.1004547
Salipante, S. J. et al. Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains. Genome Res. 25, 119–128 (2015).
pubmed: 25373147 pmcid: 4317167 doi: 10.1101/gr.180190.114
Pidot, S. J. et al. Increasing tolerance of hospital Enterococcus faecium to handwash alcohols. Sci. Transl. Med. 10, eaar6115 (2018).
pubmed: 30068573 doi: 10.1126/scitranslmed.aar6115
Sheppard, S. & Didelot, X. Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proceedings … 110, 11923–11927 (2013).
Richardson, E. J. et al. Gene exchange drives the ecological success of a multi-host bacterial pathogen. Nat. Ecol. Evol. https://doi.org/10.1038/s41559-018-0617-0 (2018).
Nebenzahl-Guimaraes, H. et al. Transmissible mycobacterium tuberculosis strains share genetic markers and immune phenotypes. Am. J. Respir. Crit. Care Med. 195, 1519–1527 (2017).
pubmed: 27997216 pmcid: 5803666 doi: 10.1164/rccm.201605-1042OC
Lees, J. A. et al. Genome-wide identification of lineage and locus specific variation associated with pneumococcal carriage duration. eLife 6, 1–25 (2017).
doi: 10.7554/eLife.26255
Chewapreecha, C. et al. Genetic variation associated with infection and the environment in the accidental pathogen Burkholderia pseudomallei. Commun. Biol. 2, 428 (2019).
pubmed: 31799430 pmcid: 6874650 doi: 10.1038/s42003-019-0678-x
Young, B. C. et al. Panton–Valentine leucocidin is the key determinant of Staphylococcus aureus pyomyositis in a bacterial GWAS. eLife 8, 1–15 (2019).
doi: 10.7554/eLife.42486
Maury, M. M. et al. Uncovering Listeria monocytogenes hypervirulence by harnessing its biodiversity. Nat. Genet. 48, 308–313 (2016).
pubmed: 26829754 pmcid: 4768348 doi: 10.1038/ng.3501
Lilje, B. et al. Whole-genome sequencing of bloodstream Staphylococcus aureus isolates does not distinguish bacteraemia from endocarditis. Microb. Genomics 3, 1–11 (2017).
doi: 10.1099/mgen.0.000138
Young, B. C. et al. Severe infections emerge from commensal bacteria by adaptive evolution. eLife 6, 1–25 (2017).
doi: 10.7554/eLife.30637
Lees, J. A. et al. Joint sequencing of human and pathogen genomes reveals the genetics of pneumococcal meningitis. Nat. Commun. 10, 1–14 (2019).
doi: 10.1038/s41467-019-09976-3
Cremers, A. J. H. et al. The contribution of genetic variation of streptococcus pneumoniae to the clinical manifestation of invasive pneumococcal disease. Clin. Infect. Dis. https://doi.org/10.1093/cid/ciy417 (2018).
Lees, J. A. et al. Large scale genomic analysis shows no evidence for pathogen adaptation between the blood and cerebrospinal fluid niches during bacterial meningitis. Microb. Genomics 3, 1–12 (2017).
doi: 10.1099/mgen.0.000103
Earle, S. G. et al. Genome-wide association studies reveal the role of polymorphisms affecting factor H binding protein expression in host invasion by Neisseria meningitidis. PLOS Pathog. 17, e1009992 (2021).
pubmed: 34662348 pmcid: 8553145 doi: 10.1371/journal.ppat.1009992
Young, B. C. et al. Antimicrobial resistance determinants are associated with Staphylococcus aureus bacteraemia and adaptation to the healthcare environment: a bacterial genome-wide association study. Microb. Genom. 7, 700 (2021).
Tunjungputri, R. N. et al. Phage-derived protein induces increased platelet activation and is associated with mortality in patients with invasive pneumococcal disease. mBio. 8, 1–10 (2017).
doi: 10.1128/mBio.01984-16
Power, R. A., Parkhill, J. & de Oliveira, T. Microbial genome-wide association studies: lessons from human GWAS. Nat. Rev. Genet. https://doi.org/10.1038/nrg.2016.132 (2016).
San, J. E. et al. Current affairs of microbial genome-wide association studies: approaches, bottlenecks and analytical pitfalls. Front. Microbiol. 10, 3119 (2020).
Chen, P. E. & Shapiro, B. J. The advent of genome-wide association studies for bacteria. Curr. Opin. Microbiol. 25, 17–24 (2015).
pubmed: 25835153 doi: 10.1016/j.mib.2015.03.002
Lees, J. A., Galardini, M., Bentley, S. D., Weiser, J. N. & Corander, J. pyseer: a comprehensive tool for microbial pangenome-wide association studies. Bioinformatics 34, 4310–4312 (2018).
pubmed: 30535304 pmcid: 6289128 doi: 10.1093/bioinformatics/bty539
Brynildsrud, O., Bohlin, J., Scheffer, L. & Eldholm, V. Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol. 17, 238 (2016).
pubmed: 27887642 pmcid: 5124306 doi: 10.1186/s13059-016-1108-8
Lees, J. A. et al. Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nat. Commun. 7, 12797 (2016).
pubmed: 27633831 pmcid: 5028413 doi: 10.1038/ncomms12797
Jaillard, M. et al. A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events. PLoS Genet. 14, 1–28 (2018).
doi: 10.1371/journal.pgen.1007758
Farhat, M. R. et al. Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nat. Genet. 45, 1183–1189 (2013).
pubmed: 23995135 pmcid: 3887553 doi: 10.1038/ng.2747
Collins, C. & Didelot, X. A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination. PLoS Comput. Biol. 14, 1–21 (2018).
doi: 10.1371/journal.pcbi.1005958
Purcell, S., Cherny, S. S. & Sham, P. C. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19, 149–150 (2003).
pubmed: 12499305 doi: 10.1093/bioinformatics/19.1.149
Chow, J. W. Aminoglycoside resistance in enterococci. Clin. Infect. Dis. 31, 586–589 (2000).
pubmed: 10987725 doi: 10.1086/313949
Phelan, J. E. et al. Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs. Genome Med. 11, 41 (2019).
pubmed: 31234910 pmcid: 6591855 doi: 10.1186/s13073-019-0650-x
Coll, F. et al. Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences. Genome Med. 7, 51 (2015).
pubmed: 26019726 pmcid: 4446134 doi: 10.1186/s13073-015-0164-0
Bush, S. J. et al. Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism–calling pipelines. GigaScience 9, 1–21 (2020).
doi: 10.1093/gigascience/giaa007
Bush, S. J. Generalizable characteristics of false-positive bacterial variant calls. Microb. Genom. 7, 000615 (2021).
pmcid: 8549357
Saber, M. M. & Shapiro, J. B. Benchmarking bacterial genome-wide association study methods using simulated genomes and phenotypes. Microb. Genom. 6, e000337 (2020).
pmcid: 7200059
Gouliouris, T. et al. Genomic surveillance of enterococcus faecium reveals limited sharing of strains and resistance genes between livestock and humans in the United Kingdom. mBio. 9, 1–15 (2018).
doi: 10.1128/mBio.01780-18
Gouliouris, T. et al. Quantifying acquisition and transmission of Enterococcus faecium using genomic surveillance. Nat. Microbiol. 6, 103–111 (2021).
pubmed: 33106672 doi: 10.1038/s41564-020-00806-7
Raven, K. E. et al. Complex routes of nosocomial vancomycin-resistant enterococcus faecium transmission revealed by genome sequencing. Clin. Infect. Dis. 64, 886–893 (2017).
pubmed: 28362945 pmcid: 5439346 doi: 10.1093/cid/ciw872
Raven, K. E. et al. A decade of genomic history for healthcare-associated Enterococcus faecium in the United Kingdom and Ireland. Genome Res. 26, 1388–1396 (2016).
pubmed: 27527616 pmcid: 5052055 doi: 10.1101/gr.204024.116
David, S. et al. Epidemic of carbapenem-resistant Klebsiella pneumoniae in Europe is driven by nosocomial spread. Nat. Microbiol. 4, 1919–1929 (2019).
pubmed: 31358985 pmcid: 7244338 doi: 10.1038/s41564-019-0492-8
Runcharoen, C. et al. Whole genome sequencing reveals high-resolution epidemiological links between clinical and environmental Klebsiella pneumoniae. Genome Med. 9, 6 (2017).
pubmed: 28118859 pmcid: 5264300 doi: 10.1186/s13073-017-0397-1
Heinz, E., Brindle, R., Morgan-McCalla, A., Peters, K. & Thomson, N. R. Caribbean multi-centre study of Klebsiella pneumoniae: whole-genome sequencing, antimicrobial resistance and virulence factors. Microb. Genom. 5, 1–12 (2019).
Heinz, E. et al. Resistance mechanisms and population structure of highly drug resistant Klebsiella in Pakistan during the introduction of the carbapenemase NDM-1. Sci. Rep. 9, 2392 (2019).
pubmed: 30787414 pmcid: 6382945 doi: 10.1038/s41598-019-38943-7
Moradigaravand, D., Martin, V., Peacock, S. J. & Parkhill, J. Evolution and Epidemiology of Multidrug-Resistant Klebsiella pneumoniae in the United Kingdom and Ireland. mBio. 8, 1–13 (2017).
doi: 10.1128/mBio.01976-16
Musicha, P. et al. Genomic analysis of Klebsiella pneumoniae isolates from Malawi reveals acquisition of multiple ESBL determinants across diverse lineages. J. Antimicrob. Chemother. 74, 1223–1232 (2019).
pubmed: 30778540 pmcid: 6477993 doi: 10.1093/jac/dkz032
Ludden, C. et al. A one health study of the genetic relatedness of Klebsiella pneumoniae and their mobile elements in the East of England. Clin. Infect. Dis. 70, 219–226 (2020).
pubmed: 30840764 doi: 10.1093/cid/ciz174
Nguyen, M. et al. Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae. Sci. Rep. 8, 421 (2018).
pubmed: 29323230 pmcid: 5765115 doi: 10.1038/s41598-017-18972-w
Macesic, N. et al. Predicting phenotypic polymyxin resistance in Klebsiella pneumoniae through machine learning analysis of genomic data. mSystems 5, 1–16 (2020).
doi: 10.1128/mSystems.00656-19
Lam, M. M. C. et al. Genetic diversity, mobilisation and spread of the yersiniabactin-encoding mobile element ICEKp in klebsiella pneumoniae populations. Microb. Genom. 4, e000196 (2018).
Parkhill, J. et al. Robust high-throughput prokaryote de novo assembly and improvement pipeline for Illumina data. Microb. Genomics 2, 1–7 (2016).
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
pubmed: 24642063 doi: 10.1093/bioinformatics/btu153
Tonkin-Hill, G. et al. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol. 21, 180 (2020).
pubmed: 32698896 pmcid: 7376924 doi: 10.1186/s13059-020-02090-4
Page, A. J. et al. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb. Genomics 2, 1–5 (2016).
doi: 10.1099/mgen.0.000056
Croucher, N. J. et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 43, e15–e15 (2015).
pubmed: 25414349 doi: 10.1093/nar/gku1196
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
pubmed: 24451623 pmcid: 3998144 doi: 10.1093/bioinformatics/btu033
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
pubmed: 17701901 pmcid: 1950838 doi: 10.1086/519795
Alcock, B. P. et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res. 48, 517–525 (2019).
Ishikawa, S. A., Zhukova, A., Iwasaki, W. & Gascuel, O. A fast likelihood method to reconstruct and visualize ancestral scenarios. Mol. Biol. Evol. 36, 2069–2085 (2019).
pubmed: 31127303 pmcid: 6735705 doi: 10.1093/molbev/msz131
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
pubmed: 21167468 pmcid: 3014363 doi: 10.1016/j.ajhg.2010.11.011
Coll, F. PowerBacGWAS v1.0.0. Zenodo https://doi.org/10.5281/zenodo.5950535 (2022).

Auteurs

Francesc Coll (F)

Department of Infection Biology, Faculty of Infectious & Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK. Francesc.Coll@lshtm.ac.uk.

Theodore Gouliouris (T)

Department of Medicine, University of Cambridge, Cambridge, UK.
Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK.

Sebastian Bruchmann (S)

Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

Jody Phelan (J)

Department of Infection Biology, Faculty of Infectious & Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK.

Kathy E Raven (KE)

Department of Medicine, University of Cambridge, Cambridge, UK.

Taane G Clark (TG)

Department of Infection Biology, Faculty of Infectious & Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK.
Faculty of Epidemiology and Population Health, Department of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine, London, UK.

Julian Parkhill (J)

Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.

Sharon J Peacock (SJ)

Department of Medicine, University of Cambridge, Cambridge, UK.

Articles similaires

Humans Macular Degeneration Mendelian Randomization Analysis Life Style Genome-Wide Association Study
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Humans Meta-Analysis as Topic Sample Size Models, Statistical Computer Simulation
Animals Natural Killer T-Cells Mice Adipose Tissue Lipid Metabolism

Classifications MeSH