Maast: genotyping thousands of microbial strains efficiently.
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
10 08 2023
10 08 2023
Historique:
received:
07
07
2022
accepted:
31
07
2023
medline:
16
8
2023
pubmed:
11
8
2023
entrez:
10
8
2023
Statut:
epublish
Résumé
Existing single nucleotide polymorphism (SNP) genotyping algorithms do not scale for species with thousands of sequenced strains, nor do they account for conspecific redundancy. Here we present a bioinformatics tool, Maast, which empowers population genetic meta-analysis of microbes at an unrivaled scale. Maast implements a novel algorithm to heuristically identify a minimal set of diverse conspecific genomes, then constructs a reliable SNP panel for each species, and enables rapid and accurate genotyping using a hybrid of whole-genome alignment and k-mer exact matching. We demonstrate Maast's utility by genotyping thousands of Helicobacter pylori strains and tracking SARS-CoV-2 diversification.
Identifiants
pubmed: 37563669
doi: 10.1186/s13059-023-03030-8
pii: 10.1186/s13059-023-03030-8
pmc: PMC10416524
doi:
Types de publication
Meta-Analysis
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
186Subventions
Organisme : NHLBI NIH HHS
ID : R01 HL160862
Pays : United States
Informations de copyright
© 2023. BioMed Central Ltd., part of Springer Nature.
Références
Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9:5114.
doi: 10.1038/s41467-018-07641-9
pubmed: 30504855
pmcid: 6269478
Almeida A, Nayfach S, Boland M, Strozzi F, Beracochea M, Shi ZJ, et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol. 2021;39:105–14.
doi: 10.1038/s41587-020-0603-3
pubmed: 32690973
Pearce ME, Alikhan N-F, Dallman TJ, Zhou Z, Grant K, Maiden MCJ. Comparative analysis of core genome MLST and SNP typing within a European Salmonella serovar Enteritidis outbreak. Int J Food Microbiol. 2018;274:1–11.
doi: 10.1016/j.ijfoodmicro.2018.02.023
pubmed: 29574242
pmcid: 5899760
Leaché AD, Oaks JR. The Utility of Single Nucleotide Polymorphism (SNP) Data in Phylogenetics. Annu Rev Ecol Evol Syst. 2017;48:69–84.
doi: 10.1146/annurev-ecolsys-110316-022645
Freschi L, Vargas R, Husain A, Kamal SMM, Skrahina A, Tahseen S, et al. Population structure, biogeography and transmissibility of Mycobacterium tuberculosis. Nat Commun. 2021;12:6099.
doi: 10.1038/s41467-021-26248-1
pubmed: 34671035
pmcid: 8528816
Figueroa J, Castro D, Lagos F, Cartes C, Isla A, Yáñez AJ, et al. Analysis of single nucleotide polymorphisms (SNPs) associated with antibiotic resistance genes in Chilean Piscirickettsia salmonis strains. J Fish Dis. 2019;42:1645–55.
doi: 10.1111/jfd.13089
pubmed: 31591746
Cooper AL, Low AJ, Koziol AG, Thomas MC, Leclair D, Tamber S, et al. Systematic Evaluation of Whole Genome Sequence-Based Predictions of Salmonella Serotype and Antimicrobial Resistance. Front Microbiol. 2020;11:549.
Maiden Martin C. J., Bygraves Jane A., Feil Edward, Morelli Giovanna, Russell Joanne E., Urwin Rachel, et al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci. 1998;95:3140–5.
Gardner SN, Hall BG. When whole-genome alignments just won’t work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes. PLoS ONE. 2013;8: e81760.
doi: 10.1371/journal.pone.0081760
pubmed: 24349125
pmcid: 3857212
Treangen TJ, Ondov BD, Koren S, Phillippy AM. The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol. 2014;15:524.
doi: 10.1186/s13059-014-0524-x
pubmed: 25410596
pmcid: 4262987
Gardner SN, Slezak T, Hall BG. kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome. Bioinformatics. 2015;31:2877–8.
Ghazi AR, Münch PC, Chen D, Jensen J, Huttenhower C. Strain identification and quantitative analysis in microbial communities. J Mol Biol. 2022;434:167582.
Zhao C, Shi ZJ, Pollard KS. Pitfalls of genotyping microbial communities with rapidly growing genome collections. Cell Syst. 2023;14:160-176.e3.
doi: 10.1016/j.cels.2022.12.007
pubmed: 36657438
Tolar B, Joseph LA, Schroeder MN, Stroika S, Ribot EM, Hise KB, et al. An overview of PulseNet USA databases. Foodborne Pathog Dis. 2019;16:457–62.
doi: 10.1089/fpd.2019.2637
pubmed: 31066584
pmcid: 6653802
Shi ZJ, Dimitrov B, Zhao C, Nayfach S, Pollard KS. Fast and accurate metagenotyping of the human gut microbiome with GT-Pro. Nat Biotechnol. 2022;40:507–16.
doi: 10.1038/s41587-021-01102-3
pubmed: 34949778
Iqbal Z, Turner I, McVean G. High-throughput microbial population genomics using the Cortex variation assembler. Bioinformatics. 2013;29:275–6.
doi: 10.1093/bioinformatics/bts673
pubmed: 23172865
Jiang X, Xu Z, Zhang T, Li Y, Li W, Tan H. Whole-genome-based helicobacter pylori geographic surveillance: a visualized and expandable webtool. Front Microbiol. 2021;12:687259.
Moodley Yoshan, Brunelli Andrea, Ghirotto Silvia, Klyubin Andrey, Maady Ayas S., Tyne William, et al. Helicobacter pylori’s historical journey through Siberia and the Americas. Proc Natl Acad Sci. 2021;118:e2015523118.
Linz B, Windsor HM, McGraw JJ, Hansen LM, Gajewski JP, Tomsho LP, et al. A mutation burst during the acute phase of Helicobacter pylori infection in humans and rhesus macaques. Nat Commun. 2014;5:4165.
doi: 10.1038/ncomms5165
pubmed: 24924186
Bishara A, Moss EL, Kolmogorov M, Parada AE, Weng Z, Sidow A, et al. High-quality genome sequences of uncultured microbes by assembly of read clouds. Nat Biotechnol. 2018;36:1067–75.
doi: 10.1038/nbt.4266
Zheng Wenshan, Zhao Shijie, Yin Yehang, Zhang Huidan, Needham David M., Evans Ethan D., et al. High-throughput, single-microbe genomics with strain resolution, applied to a human gut microbiome. Science. 2022;376:eabm1483.
Nayfach S, Páez-Espino D, Call L, Low SJ, Sberro H, Ivanova NN, et al. Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome. Nat Microbiol. 2021;6:960–70.
doi: 10.1038/s41564-021-00928-6
pubmed: 34168315
pmcid: 8241571
Turner I, Garimella KV, Iqbal Z, McVean G. Integrating long-range connectivity information into de Bruijn graphs. Bioinformatics. 2018;34:2556–65.
doi: 10.1093/bioinformatics/bty157
pubmed: 29554215
pmcid: 6061703
Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016;17:132.
doi: 10.1186/s13059-016-0997-x
pubmed: 27323842
pmcid: 4915045
Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol. 2018;14: e1005944.
doi: 10.1371/journal.pcbi.1005944
pubmed: 29373581
pmcid: 5802927
Angiuoli SV, Salzberg SL. Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics. 2011;27:334–42.
doi: 10.1093/bioinformatics/btq665
pubmed: 21148543
Darling ACE, Mau B, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14:1394–403.
doi: 10.1101/gr.2289704
pubmed: 15231754
pmcid: 442156
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–60.
doi: 10.1093/bioinformatics/btp324
pubmed: 19451168
pmcid: 2705234
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9.
doi: 10.1093/bioinformatics/btp352
pubmed: 19505943
pmcid: 2723002
Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:12073907. 2012.
Bush SJ, Foster D, Eyre DW, Clark EL, De Maio N, Shaw LP, et al. Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism–calling pipelines. GigaScience. 2020;9:giaa007.
Zou Y, Xue W, Luo G, Deng Z, Qin P, Guo R, et al. 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat Biotechnol. 2019;37:179–85.
doi: 10.1038/s41587-018-0008-8
pubmed: 30718868
pmcid: 6784896
Gourlé H, Karlsson-Lindsjö O, Hayer J, Bongcam-Rudloff E. Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics. 2018;35:521–2.
doi: 10.1093/bioinformatics/bty630
pmcid: 6361232
Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, et al. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res. 2014;42:D581–91.
doi: 10.1093/nar/gkt1099
pubmed: 24225323
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014/01/21 ed. 2014;30:1312–3.
Letunic I, Bork P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 2019;47:W256–9.
doi: 10.1093/nar/gkz239
pubmed: 30931475
pmcid: 6602468
Shi ZJ, Nayfach S, Pollard KS. Maast: genotyping thousands of microbial strains efficiently. Zenodo; 2022. https://doi.org/10.5281/zenodo.8200643 .
Shi ZJ, Nayfach S, Pollard KS. Maast: genotyping thousands of microbial strains efficiently. GitHub; 2022. Available from: https://github.com/zjshi/Maast .