Maast: genotyping thousands of microbial strains efficiently.


Journal

Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660

Informations de publication

Date de publication:
10 08 2023
Historique:
received: 07 07 2022
accepted: 31 07 2023
medline: 16 8 2023
pubmed: 11 8 2023
entrez: 10 8 2023
Statut: epublish

Résumé

Existing single nucleotide polymorphism (SNP) genotyping algorithms do not scale for species with thousands of sequenced strains, nor do they account for conspecific redundancy. Here we present a bioinformatics tool, Maast, which empowers population genetic meta-analysis of microbes at an unrivaled scale. Maast implements a novel algorithm to heuristically identify a minimal set of diverse conspecific genomes, then constructs a reliable SNP panel for each species, and enables rapid and accurate genotyping using a hybrid of whole-genome alignment and k-mer exact matching. We demonstrate Maast's utility by genotyping thousands of Helicobacter pylori strains and tracking SARS-CoV-2 diversification.

Identifiants

pubmed: 37563669
doi: 10.1186/s13059-023-03030-8
pii: 10.1186/s13059-023-03030-8
pmc: PMC10416524
doi:

Types de publication

Meta-Analysis Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

186

Subventions

Organisme : NHLBI NIH HHS
ID : R01 HL160862
Pays : United States

Informations de copyright

© 2023. BioMed Central Ltd., part of Springer Nature.

Références

Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018;9:5114.
doi: 10.1038/s41467-018-07641-9 pubmed: 30504855 pmcid: 6269478
Almeida A, Nayfach S, Boland M, Strozzi F, Beracochea M, Shi ZJ, et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol. 2021;39:105–14.
doi: 10.1038/s41587-020-0603-3 pubmed: 32690973
Pearce ME, Alikhan N-F, Dallman TJ, Zhou Z, Grant K, Maiden MCJ. Comparative analysis of core genome MLST and SNP typing within a European Salmonella serovar Enteritidis outbreak. Int J Food Microbiol. 2018;274:1–11.
doi: 10.1016/j.ijfoodmicro.2018.02.023 pubmed: 29574242 pmcid: 5899760
Leaché AD, Oaks JR. The Utility of Single Nucleotide Polymorphism (SNP) Data in Phylogenetics. Annu Rev Ecol Evol Syst. 2017;48:69–84.
doi: 10.1146/annurev-ecolsys-110316-022645
Freschi L, Vargas R, Husain A, Kamal SMM, Skrahina A, Tahseen S, et al. Population structure, biogeography and transmissibility of Mycobacterium tuberculosis. Nat Commun. 2021;12:6099.
doi: 10.1038/s41467-021-26248-1 pubmed: 34671035 pmcid: 8528816
Figueroa J, Castro D, Lagos F, Cartes C, Isla A, Yáñez AJ, et al. Analysis of single nucleotide polymorphisms (SNPs) associated with antibiotic resistance genes in Chilean Piscirickettsia salmonis strains. J Fish Dis. 2019;42:1645–55.
doi: 10.1111/jfd.13089 pubmed: 31591746
Cooper AL, Low AJ, Koziol AG, Thomas MC, Leclair D, Tamber S, et al. Systematic Evaluation of Whole Genome Sequence-Based Predictions of Salmonella Serotype and Antimicrobial Resistance. Front Microbiol. 2020;11:549.
Maiden Martin C. J., Bygraves Jane A., Feil Edward, Morelli Giovanna, Russell Joanne E., Urwin Rachel, et al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci. 1998;95:3140–5.
Gardner SN, Hall BG. When whole-genome alignments just won’t work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes. PLoS ONE. 2013;8: e81760.
doi: 10.1371/journal.pone.0081760 pubmed: 24349125 pmcid: 3857212
Treangen TJ, Ondov BD, Koren S, Phillippy AM. The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol. 2014;15:524.
doi: 10.1186/s13059-014-0524-x pubmed: 25410596 pmcid: 4262987
Gardner SN, Slezak T, Hall BG. kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome. Bioinformatics. 2015;31:2877–8.
Ghazi AR, Münch PC, Chen D, Jensen J, Huttenhower C. Strain identification and quantitative analysis in microbial communities. J Mol Biol. 2022;434:167582.
Zhao C, Shi ZJ, Pollard KS. Pitfalls of genotyping microbial communities with rapidly growing genome collections. Cell Syst. 2023;14:160-176.e3.
doi: 10.1016/j.cels.2022.12.007 pubmed: 36657438
Tolar B, Joseph LA, Schroeder MN, Stroika S, Ribot EM, Hise KB, et al. An overview of PulseNet USA databases. Foodborne Pathog Dis. 2019;16:457–62.
doi: 10.1089/fpd.2019.2637 pubmed: 31066584 pmcid: 6653802
Shi ZJ, Dimitrov B, Zhao C, Nayfach S, Pollard KS. Fast and accurate metagenotyping of the human gut microbiome with GT-Pro. Nat Biotechnol. 2022;40:507–16.
doi: 10.1038/s41587-021-01102-3 pubmed: 34949778
Iqbal Z, Turner I, McVean G. High-throughput microbial population genomics using the Cortex variation assembler. Bioinformatics. 2013;29:275–6.
doi: 10.1093/bioinformatics/bts673 pubmed: 23172865
Jiang X, Xu Z, Zhang T, Li Y, Li W, Tan H. Whole-genome-based helicobacter pylori geographic surveillance: a visualized and expandable webtool. Front Microbiol. 2021;12:687259.
Moodley Yoshan, Brunelli Andrea, Ghirotto Silvia, Klyubin Andrey, Maady Ayas S., Tyne William, et al. Helicobacter pylori’s historical journey through Siberia and the Americas. Proc Natl Acad Sci. 2021;118:e2015523118.
Linz B, Windsor HM, McGraw JJ, Hansen LM, Gajewski JP, Tomsho LP, et al. A mutation burst during the acute phase of Helicobacter pylori infection in humans and rhesus macaques. Nat Commun. 2014;5:4165.
doi: 10.1038/ncomms5165 pubmed: 24924186
Bishara A, Moss EL, Kolmogorov M, Parada AE, Weng Z, Sidow A, et al. High-quality genome sequences of uncultured microbes by assembly of read clouds. Nat Biotechnol. 2018;36:1067–75.
doi: 10.1038/nbt.4266
Zheng Wenshan, Zhao Shijie, Yin Yehang, Zhang Huidan, Needham David M., Evans Ethan D., et al. High-throughput, single-microbe genomics with strain resolution, applied to a human gut microbiome. Science. 2022;376:eabm1483.
Nayfach S, Páez-Espino D, Call L, Low SJ, Sberro H, Ivanova NN, et al. Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome. Nat Microbiol. 2021;6:960–70.
doi: 10.1038/s41564-021-00928-6 pubmed: 34168315 pmcid: 8241571
Turner I, Garimella KV, Iqbal Z, McVean G. Integrating long-range connectivity information into de Bruijn graphs. Bioinformatics. 2018;34:2556–65.
doi: 10.1093/bioinformatics/bty157 pubmed: 29554215 pmcid: 6061703
Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016;17:132.
doi: 10.1186/s13059-016-0997-x pubmed: 27323842 pmcid: 4915045
Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol. 2018;14: e1005944.
doi: 10.1371/journal.pcbi.1005944 pubmed: 29373581 pmcid: 5802927
Angiuoli SV, Salzberg SL. Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics. 2011;27:334–42.
doi: 10.1093/bioinformatics/btq665 pubmed: 21148543
Darling ACE, Mau B, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14:1394–403.
doi: 10.1101/gr.2289704 pubmed: 15231754 pmcid: 442156
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–60.
doi: 10.1093/bioinformatics/btp324 pubmed: 19451168 pmcid: 2705234
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9.
doi: 10.1093/bioinformatics/btp352 pubmed: 19505943 pmcid: 2723002
Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:12073907. 2012.
Bush SJ, Foster D, Eyre DW, Clark EL, De Maio N, Shaw LP, et al. Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism–calling pipelines. GigaScience. 2020;9:giaa007.
Zou Y, Xue W, Luo G, Deng Z, Qin P, Guo R, et al. 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat Biotechnol. 2019;37:179–85.
doi: 10.1038/s41587-018-0008-8 pubmed: 30718868 pmcid: 6784896
Gourlé H, Karlsson-Lindsjö O, Hayer J, Bongcam-Rudloff E. Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics. 2018;35:521–2.
doi: 10.1093/bioinformatics/bty630 pmcid: 6361232
Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, et al. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res. 2014;42:D581–91.
doi: 10.1093/nar/gkt1099 pubmed: 24225323
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014/01/21 ed. 2014;30:1312–3.
Letunic I, Bork P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 2019;47:W256–9.
doi: 10.1093/nar/gkz239 pubmed: 30931475 pmcid: 6602468
Shi ZJ, Nayfach S, Pollard KS. Maast: genotyping thousands of microbial strains efficiently. Zenodo; 2022. https://doi.org/10.5281/zenodo.8200643 .
Shi ZJ, Nayfach S, Pollard KS. Maast: genotyping thousands of microbial strains efficiently. GitHub; 2022. Available from: https://github.com/zjshi/Maast .

Auteurs

Zhou Jason Shi (ZJ)

Chan Zuckerberg Biohub, San Francisco, CA, USA.
Gladstone Institutes of Data Science and Biotechnology, San Francisco, CA, USA.

Stephen Nayfach (S)

Joint Genome Institute, Department of Energy, Walnut Creek, CA, USA.
Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

Katherine S Pollard (KS)

Chan Zuckerberg Biohub, San Francisco, CA, USA. kpollard@gladstone.ucsf.edu.
Gladstone Institutes of Data Science and Biotechnology, San Francisco, CA, USA. kpollard@gladstone.ucsf.edu.
Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA. kpollard@gladstone.ucsf.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH