PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies.
Journal
Communications biology
ISSN: 2399-3642
Titre abrégé: Commun Biol
Pays: England
ID NLM: 101719179
Informations de publication
Date de publication:
25 03 2022
25 03 2022
Historique:
received:
22
12
2020
accepted:
25
02
2022
entrez:
26
3
2022
pubmed:
27
3
2022
medline:
19
4
2022
Statut:
epublish
Résumé
Genome-wide association studies (GWAS) are increasingly being applied to investigate the genetic basis of bacterial traits. However, approaches to perform power calculations for bacterial GWAS are limited. Here we implemented two alternative approaches to conduct power calculations using existing collections of bacterial genomes. First, a sub-sampling approach was undertaken to reduce the allele frequency and effect size of a known and detectable genotype-phenotype relationship by modifying phenotype labels. Second, a phenotype-simulation approach was conducted to simulate phenotypes from existing genetic variants. We implemented both approaches into a computational pipeline (PowerBacGWAS) that supports power calculations for burden testing, pan-genome and variant GWAS; and applied it to collections of Enterococcus faecium, Klebsiella pneumoniae and Mycobacterium tuberculosis. We used this pipeline to determine sample sizes required to detect causal variants of different minor allele frequencies (MAF), effect sizes and phenotype heritability, and studied the effect of homoplasy and population diversity on the power to detect causal variants. Our pipeline and user documentation are made available and can be applied to other bacterial populations. PowerBacGWAS can be used to determine sample sizes required to find statistically significant associations, or the associations detectable with a given sample size. We recommend to perform power calculations using existing genomes of the bacterial species and population of study.
Identifiants
pubmed: 35338232
doi: 10.1038/s42003-022-03194-2
pii: 10.1038/s42003-022-03194-2
pmc: PMC8956664
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
266Subventions
Organisme : Medical Research Council
ID : MR/M01360X/1
Pays : United Kingdom
Organisme : Wellcome Trust
ID : WT098600
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/R013063/1
Pays : United Kingdom
Organisme : Wellcome Trust
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/N010469/1
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/R025576/1
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/V032836/1
Pays : United Kingdom
Organisme : Wellcome Trust
ID : 201344/Z/16/Z
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/R020973/1
Pays : United Kingdom
Informations de copyright
© 2022. The Author(s).
Références
Earle, S. G. et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat. Microbiol. 1, 16041 (2016).
pubmed: 27572646
pmcid: 5049680
doi: 10.1038/nmicrobiol.2016.41
Coll, F. et al. Genome-wide analysis of multi- and extensively drug-resistant Mycobacterium tuberculosis. Nat. Genet. 50, 307–316 (2018).
pubmed: 29358649
doi: 10.1038/s41588-017-0029-0
Chewapreecha, C. et al. Comprehensive identification of single nucleotide polymorphisms associated with beta-lactam resistance within pneumococcal mosaic genes. PLoS Genet. 10, e1004547 (2014).
pubmed: 25101644
pmcid: 4125147
doi: 10.1371/journal.pgen.1004547
Salipante, S. J. et al. Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains. Genome Res. 25, 119–128 (2015).
pubmed: 25373147
pmcid: 4317167
doi: 10.1101/gr.180190.114
Pidot, S. J. et al. Increasing tolerance of hospital Enterococcus faecium to handwash alcohols. Sci. Transl. Med. 10, eaar6115 (2018).
pubmed: 30068573
doi: 10.1126/scitranslmed.aar6115
Sheppard, S. & Didelot, X. Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proceedings … 110, 11923–11927 (2013).
Richardson, E. J. et al. Gene exchange drives the ecological success of a multi-host bacterial pathogen. Nat. Ecol. Evol. https://doi.org/10.1038/s41559-018-0617-0 (2018).
Nebenzahl-Guimaraes, H. et al. Transmissible mycobacterium tuberculosis strains share genetic markers and immune phenotypes. Am. J. Respir. Crit. Care Med. 195, 1519–1527 (2017).
pubmed: 27997216
pmcid: 5803666
doi: 10.1164/rccm.201605-1042OC
Lees, J. A. et al. Genome-wide identification of lineage and locus specific variation associated with pneumococcal carriage duration. eLife 6, 1–25 (2017).
doi: 10.7554/eLife.26255
Chewapreecha, C. et al. Genetic variation associated with infection and the environment in the accidental pathogen Burkholderia pseudomallei. Commun. Biol. 2, 428 (2019).
pubmed: 31799430
pmcid: 6874650
doi: 10.1038/s42003-019-0678-x
Young, B. C. et al. Panton–Valentine leucocidin is the key determinant of Staphylococcus aureus pyomyositis in a bacterial GWAS. eLife 8, 1–15 (2019).
doi: 10.7554/eLife.42486
Maury, M. M. et al. Uncovering Listeria monocytogenes hypervirulence by harnessing its biodiversity. Nat. Genet. 48, 308–313 (2016).
pubmed: 26829754
pmcid: 4768348
doi: 10.1038/ng.3501
Lilje, B. et al. Whole-genome sequencing of bloodstream Staphylococcus aureus isolates does not distinguish bacteraemia from endocarditis. Microb. Genomics 3, 1–11 (2017).
doi: 10.1099/mgen.0.000138
Young, B. C. et al. Severe infections emerge from commensal bacteria by adaptive evolution. eLife 6, 1–25 (2017).
doi: 10.7554/eLife.30637
Lees, J. A. et al. Joint sequencing of human and pathogen genomes reveals the genetics of pneumococcal meningitis. Nat. Commun. 10, 1–14 (2019).
doi: 10.1038/s41467-019-09976-3
Cremers, A. J. H. et al. The contribution of genetic variation of streptococcus pneumoniae to the clinical manifestation of invasive pneumococcal disease. Clin. Infect. Dis. https://doi.org/10.1093/cid/ciy417 (2018).
Lees, J. A. et al. Large scale genomic analysis shows no evidence for pathogen adaptation between the blood and cerebrospinal fluid niches during bacterial meningitis. Microb. Genomics 3, 1–12 (2017).
doi: 10.1099/mgen.0.000103
Earle, S. G. et al. Genome-wide association studies reveal the role of polymorphisms affecting factor H binding protein expression in host invasion by Neisseria meningitidis. PLOS Pathog. 17, e1009992 (2021).
pubmed: 34662348
pmcid: 8553145
doi: 10.1371/journal.ppat.1009992
Young, B. C. et al. Antimicrobial resistance determinants are associated with Staphylococcus aureus bacteraemia and adaptation to the healthcare environment: a bacterial genome-wide association study. Microb. Genom. 7, 700 (2021).
Tunjungputri, R. N. et al. Phage-derived protein induces increased platelet activation and is associated with mortality in patients with invasive pneumococcal disease. mBio. 8, 1–10 (2017).
doi: 10.1128/mBio.01984-16
Power, R. A., Parkhill, J. & de Oliveira, T. Microbial genome-wide association studies: lessons from human GWAS. Nat. Rev. Genet. https://doi.org/10.1038/nrg.2016.132 (2016).
San, J. E. et al. Current affairs of microbial genome-wide association studies: approaches, bottlenecks and analytical pitfalls. Front. Microbiol. 10, 3119 (2020).
Chen, P. E. & Shapiro, B. J. The advent of genome-wide association studies for bacteria. Curr. Opin. Microbiol. 25, 17–24 (2015).
pubmed: 25835153
doi: 10.1016/j.mib.2015.03.002
Lees, J. A., Galardini, M., Bentley, S. D., Weiser, J. N. & Corander, J. pyseer: a comprehensive tool for microbial pangenome-wide association studies. Bioinformatics 34, 4310–4312 (2018).
pubmed: 30535304
pmcid: 6289128
doi: 10.1093/bioinformatics/bty539
Brynildsrud, O., Bohlin, J., Scheffer, L. & Eldholm, V. Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol. 17, 238 (2016).
pubmed: 27887642
pmcid: 5124306
doi: 10.1186/s13059-016-1108-8
Lees, J. A. et al. Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nat. Commun. 7, 12797 (2016).
pubmed: 27633831
pmcid: 5028413
doi: 10.1038/ncomms12797
Jaillard, M. et al. A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events. PLoS Genet. 14, 1–28 (2018).
doi: 10.1371/journal.pgen.1007758
Farhat, M. R. et al. Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nat. Genet. 45, 1183–1189 (2013).
pubmed: 23995135
pmcid: 3887553
doi: 10.1038/ng.2747
Collins, C. & Didelot, X. A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination. PLoS Comput. Biol. 14, 1–21 (2018).
doi: 10.1371/journal.pcbi.1005958
Purcell, S., Cherny, S. S. & Sham, P. C. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19, 149–150 (2003).
pubmed: 12499305
doi: 10.1093/bioinformatics/19.1.149
Chow, J. W. Aminoglycoside resistance in enterococci. Clin. Infect. Dis. 31, 586–589 (2000).
pubmed: 10987725
doi: 10.1086/313949
Phelan, J. E. et al. Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs. Genome Med. 11, 41 (2019).
pubmed: 31234910
pmcid: 6591855
doi: 10.1186/s13073-019-0650-x
Coll, F. et al. Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences. Genome Med. 7, 51 (2015).
pubmed: 26019726
pmcid: 4446134
doi: 10.1186/s13073-015-0164-0
Bush, S. J. et al. Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism–calling pipelines. GigaScience 9, 1–21 (2020).
doi: 10.1093/gigascience/giaa007
Bush, S. J. Generalizable characteristics of false-positive bacterial variant calls. Microb. Genom. 7, 000615 (2021).
pmcid: 8549357
Saber, M. M. & Shapiro, J. B. Benchmarking bacterial genome-wide association study methods using simulated genomes and phenotypes. Microb. Genom. 6, e000337 (2020).
pmcid: 7200059
Gouliouris, T. et al. Genomic surveillance of enterococcus faecium reveals limited sharing of strains and resistance genes between livestock and humans in the United Kingdom. mBio. 9, 1–15 (2018).
doi: 10.1128/mBio.01780-18
Gouliouris, T. et al. Quantifying acquisition and transmission of Enterococcus faecium using genomic surveillance. Nat. Microbiol. 6, 103–111 (2021).
pubmed: 33106672
doi: 10.1038/s41564-020-00806-7
Raven, K. E. et al. Complex routes of nosocomial vancomycin-resistant enterococcus faecium transmission revealed by genome sequencing. Clin. Infect. Dis. 64, 886–893 (2017).
pubmed: 28362945
pmcid: 5439346
doi: 10.1093/cid/ciw872
Raven, K. E. et al. A decade of genomic history for healthcare-associated Enterococcus faecium in the United Kingdom and Ireland. Genome Res. 26, 1388–1396 (2016).
pubmed: 27527616
pmcid: 5052055
doi: 10.1101/gr.204024.116
David, S. et al. Epidemic of carbapenem-resistant Klebsiella pneumoniae in Europe is driven by nosocomial spread. Nat. Microbiol. 4, 1919–1929 (2019).
pubmed: 31358985
pmcid: 7244338
doi: 10.1038/s41564-019-0492-8
Runcharoen, C. et al. Whole genome sequencing reveals high-resolution epidemiological links between clinical and environmental Klebsiella pneumoniae. Genome Med. 9, 6 (2017).
pubmed: 28118859
pmcid: 5264300
doi: 10.1186/s13073-017-0397-1
Heinz, E., Brindle, R., Morgan-McCalla, A., Peters, K. & Thomson, N. R. Caribbean multi-centre study of Klebsiella pneumoniae: whole-genome sequencing, antimicrobial resistance and virulence factors. Microb. Genom. 5, 1–12 (2019).
Heinz, E. et al. Resistance mechanisms and population structure of highly drug resistant Klebsiella in Pakistan during the introduction of the carbapenemase NDM-1. Sci. Rep. 9, 2392 (2019).
pubmed: 30787414
pmcid: 6382945
doi: 10.1038/s41598-019-38943-7
Moradigaravand, D., Martin, V., Peacock, S. J. & Parkhill, J. Evolution and Epidemiology of Multidrug-Resistant Klebsiella pneumoniae in the United Kingdom and Ireland. mBio. 8, 1–13 (2017).
doi: 10.1128/mBio.01976-16
Musicha, P. et al. Genomic analysis of Klebsiella pneumoniae isolates from Malawi reveals acquisition of multiple ESBL determinants across diverse lineages. J. Antimicrob. Chemother. 74, 1223–1232 (2019).
pubmed: 30778540
pmcid: 6477993
doi: 10.1093/jac/dkz032
Ludden, C. et al. A one health study of the genetic relatedness of Klebsiella pneumoniae and their mobile elements in the East of England. Clin. Infect. Dis. 70, 219–226 (2020).
pubmed: 30840764
doi: 10.1093/cid/ciz174
Nguyen, M. et al. Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae. Sci. Rep. 8, 421 (2018).
pubmed: 29323230
pmcid: 5765115
doi: 10.1038/s41598-017-18972-w
Macesic, N. et al. Predicting phenotypic polymyxin resistance in Klebsiella pneumoniae through machine learning analysis of genomic data. mSystems 5, 1–16 (2020).
doi: 10.1128/mSystems.00656-19
Lam, M. M. C. et al. Genetic diversity, mobilisation and spread of the yersiniabactin-encoding mobile element ICEKp in klebsiella pneumoniae populations. Microb. Genom. 4, e000196 (2018).
Parkhill, J. et al. Robust high-throughput prokaryote de novo assembly and improvement pipeline for Illumina data. Microb. Genomics 2, 1–7 (2016).
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
pubmed: 24642063
doi: 10.1093/bioinformatics/btu153
Tonkin-Hill, G. et al. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol. 21, 180 (2020).
pubmed: 32698896
pmcid: 7376924
doi: 10.1186/s13059-020-02090-4
Page, A. J. et al. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb. Genomics 2, 1–5 (2016).
doi: 10.1099/mgen.0.000056
Croucher, N. J. et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 43, e15–e15 (2015).
pubmed: 25414349
doi: 10.1093/nar/gku1196
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
pubmed: 24451623
pmcid: 3998144
doi: 10.1093/bioinformatics/btu033
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
pubmed: 17701901
pmcid: 1950838
doi: 10.1086/519795
Alcock, B. P. et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res. 48, 517–525 (2019).
Ishikawa, S. A., Zhukova, A., Iwasaki, W. & Gascuel, O. A fast likelihood method to reconstruct and visualize ancestral scenarios. Mol. Biol. Evol. 36, 2069–2085 (2019).
pubmed: 31127303
pmcid: 6735705
doi: 10.1093/molbev/msz131
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
pubmed: 21167468
pmcid: 3014363
doi: 10.1016/j.ajhg.2010.11.011
Coll, F. PowerBacGWAS v1.0.0. Zenodo https://doi.org/10.5281/zenodo.5950535 (2022).