Large scale automated phylogenomic analysis of bacterial isolates and the Evergreen Online platform.
Journal
Communications biology
ISSN: 2399-3642
Titre abrégé: Commun Biol
Pays: England
ID NLM: 101719179
Informations de publication
Date de publication:
20 03 2020
20 03 2020
Historique:
received:
04
02
2019
accepted:
03
03
2020
entrez:
22
3
2020
pubmed:
22
3
2020
medline:
16
6
2021
Statut:
epublish
Résumé
Public health authorities whole-genome sequence thousands of isolates each month for microbial diagnostics and surveillance of pathogenic bacteria. The computational methods have not kept up with the deluge of data and the need for real-time results. We have therefore created a bioinformatics pipeline for rapid subtyping and continuous phylogenomic analysis of bacterial samples, suited for large-scale surveillance. The data is divided into sets by mapping to reference genomes, then consensus sequences are generated. Nucleotide based genetic distance is calculated between the sequences in each set, and isolates are clustered together at 10 single-nucleotide polymorphisms. Phylogenetic trees are inferred from the non-redundant sequences and the clustered isolates are added back. The method is accurate at grouping outbreak strains together, while discriminating them from non-outbreak strains. The pipeline is applied in Evergreen Online, which processes publicly available sequencing data from foodborne bacterial pathogens on a daily basis, updating phylogenetic trees as needed.
Identifiants
pubmed: 32198478
doi: 10.1038/s42003-020-0869-5
pii: 10.1038/s42003-020-0869-5
pmc: PMC7083913
doi:
Substances chimiques
DNA, Bacterial
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
137Références
Maiden, M. C. J. Multilocus sequence typing of bacteria. Annu. Rev. Microbiol. 60, 561–588 (2006).
doi: 10.1146/annurev.micro.59.030804.121325
Larsen, M. V. et al. Multilocus sequence typing of total-genome-sequenced bacteria. J. Clin. Microbiol. 50, 1355–1361 (2012).
doi: 10.1128/JCM.06094-11
Joensen, K. G., Tetzschner, A. M. M., Iguchi, A., Aarestrup, F. M. & Scheutz, F. Rapid and easy in silico serotyping of Escherichia coli isolates by use of whole-genome sequencing data. J. Clin. Microbiol. 53, 2410–2426 (2015).
doi: 10.1128/JCM.00008-15
Köser, C. U. et al. Rapid whole-genome sequencing for investigation of a neonatal MRSA outbreak. N. Engl. J. Med. 366, 2267–2275 (2012).
doi: 10.1056/NEJMoa1109910
Mellmann, A. et al. Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology. PLoS ONE 6, e22751 (2011).
doi: 10.1371/journal.pone.0022751
Joensen, K. G. et al. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli. J. Clin. Microbiol. 52, 1501–1510 (2014).
doi: 10.1128/JCM.03617-13
WHO. Whole Genome Sequencing for Foodborne Disease Surveillance: Landscape Paper (World Health Organization, 2018).
Deng, X., den Bakker, H. C. & Hendriksen, R. S. Genomic epidemiology: whole-genome-sequencing-powered surveillance and outbreak investigation of foodborne bacterial pathogens. Annu. Rev. Food Sci. Technol. 7, 1–22 (2016).
doi: 10.1146/annurev-food-041715-033259
Whole Genome Sequencing (WGS) Program | FDA. https://www.fda.gov/food/science-research-food/whole-genome-sequencing-wgs-program . Accessed 12 June 2019.
COMPARE Europe. http://www.compare-europe.eu .
Nadon, C. et al. PulseNet International: vision for the implementation of whole genome sequencing (WGS) for global food-borne disease surveillance. Euro Surveill. 22, 30544 (2017).
doi: 10.2807/1560-7917.ES.2017.22.23.30544
Timme, R. E., Sanchez Leon, M. & Allard, M. W. Utilizing the Public GenomeTrakr Database for Foodborne Pathogen Traceback. in Foodborne Bacterial Pathogens. Methods in Molecular Biology 1918, 201–212 (2019).
Pathogen Detection—NCBI. https://www.ncbi.nlm.nih.gov/pathogens/ . Accessed 27 June 2018.
Cherry, J. L. A practical exact maximum compatibility algorithm for reconstruction of recent evolutionary history. BMC Bioinform. 18, 127 (2017).
doi: 10.1186/s12859-017-1520-4
Alikhan, N.-F., Zhou, Z., Sergeant, M. J. & Achtman, M. A genomic overview of the population structure of Salmonella. PLOS Genet. 14, e1007261 (2018).
doi: 10.1371/journal.pgen.1007261
Cody, A. J., Bray, J. E., Jolley, K. A., McCarthy, N. D. & Maiden, M. C. J. Core genome multilocus sequence typing scheme for stable, comparative analyses of Campylobacter jejuni and C. coli human disease isolates. J. Clin. Microbiol. 55, 2086–2097 (2017).
doi: 10.1128/JCM.00080-17
Institut Pasteur MLST databases and software. https://bigsdb.pasteur.fr/ . Accessed 28 May 2019.
Ghanem, M. & El-Gazzar, M. Development of Mycoplasma synoviae (MS) core genome multilocus sequence typing (cgMLST) scheme. Vet. Microbiol. 218, 84–89 (2018).
doi: 10.1016/j.vetmic.2018.03.021
Higgins, P. G., Prior, K., Harmsen, D. & Seifert, H. Development and evaluation of a core genome multilocus typing scheme for whole-genome sequence-based typing of Acinetobacter baumannii. PLoS ONE 12, e0179228 (2017).
doi: 10.1371/journal.pone.0179228
Ghanem, M. et al. Core genome multilocus sequence typing: a standardized approach for molecular typing of Mycoplasma gallisepticum. J. Clin. Microbiol. 56, e01145 (2017).
Bletz, S., Janezic, S., Harmsen, D., Rupnik, M. & Mellmann, A. Defining and evaluating a core genome multilocus sequence typing scheme for genome-wide typing of Clostridium difficile. J. Clin. Microbiol. 56, e01987-17 (2018).
Zhou, H., Liu, W., Qin, T., Liu, C. & Ren, H. Defining and evaluating a core genome multilocus sequence typing scheme for whole-genome sequence-based typing of Klebsiella pneumoniae. Front. Microbiol. 8, 371 (2017).
pubmed: 28337187
pmcid: 5340756
Kohl, T. A. et al. Whole-genome-based Mycobacterium tuberculosis surveillance: a standardized, portable, and expandable approach. J. Clin. Microbiol. 52, 2479–2486 (2014).
doi: 10.1128/JCM.00567-14
Moran-Gilad, J. et al. Design and application of a core genome multilocus sequence typing scheme for investigation of Legionnaires’ disease incidents. Eurosurveillance 20, 21186 (2015).
Leekitcharoenphon, P. et al. Comparative genomics of quinolone‐resistant and susceptible Campylobacter jejuni of poultry origin from major poultry producing European countries (GENCAMP). EFSA Support. Publ. 15, 1398E (2018).
Pathogenwatch | A Global Platform for Genomic Surveillance. https://pathogen.watch/ . Accessed 28 May 2019.
Kvistholm Jensen, A. et al. Whole-genome sequencing used to investigate a nationwide outbreak of listeriosis caused by ready-to-eat delicatessen meat, Denmark, 2014. Clin. Infect. Dis. 63, 64–70 (2016).
doi: 10.1093/cid/ciw192
Schjørring, S. et al. Cross-border outbreak of listeriosis caused by cold-smoked salmon, revealed by integrated surveillance and whole genome sequencing (WGS), Denmark and France, 2015 to 2017. Eurosurveillance 22, 17-00762 (2017).
pmcid: 5743096
Ford, L. et al. Incorporating whole-genome sequencing into public health surveillance: lessons from prospective sequencing of Salmonella Typhimurium in Australia. Foodborne Pathog. Dis. 15, 161–167 (2018).
doi: 10.1089/fpd.2017.2352
Holmes, A., Dallman, T. J., Shabaan, S., Hanson, M. & Allison, L. Validation of whole-genome sequencing for identification and characterization of Shiga toxin-producing Escherichia coli to produce standardized data to enable data sharing. J. Clin. Microbiol. 56, e01388–17 (2018).
pubmed: 29263202
pmcid: 5824047
Woksepp, H., Ryberg, A., Berglind, L., Schön, T. & Söderman, J. Epidemiological characterization of a nosocomial outbreak of extended spectrum β-lactamase Escherichia coli ST-131 confirms the clinical value of core genome multilocus sequence typing. APMIS 125, 1117–1124 (2017).
doi: 10.1111/apm.12753
Davis, S. et al. CFSAN SNP pipeline: an automated method for constructing SNP matrices from next-generation sequence data. PeerJ Comput. Sci. 1, e20 (2015).
doi: 10.7717/peerj-cs.20
Dallman, T. et al. SnapperDB: a database solution for routine sequencing analysis of bacterial isolates. Bioinformatics 81, 3946–3952 (2018).
Neher, R. A. & Bedford, T. nextflu: real-time tracking of seasonal influenza virus evolution in humans. Bioinformatics 31, 3546–3548 (2015).
doi: 10.1093/bioinformatics/btv381
Hadfield, J. et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34, 4121–4123 (2018).
doi: 10.1093/bioinformatics/bty407
Leekitcharoenphon, P., Nielsen, E. M., Kaas, R. S., Lund, O. & Aarestrup, F. M. Evaluation of whole genome sequencing for outbreak detection of Salmonella enterica. PLoS ONE 9, e87991 (2014).
doi: 10.1371/journal.pone.0087991
Ahrenfeldt, J. et al. Bacterial whole genome-based phylogeny: construction of a new benchmarking dataset and assessment of some existing methods. BMC Genomics 18, 19 (2017).
doi: 10.1186/s12864-016-3407-6
Timme, R. E. et al. Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance. PeerJ 5, e3893 (2017).
doi: 10.7717/peerj.3893
Argimón, S. et al. Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Microb. Genomics 2 (2016).
Kaas, R. S., Leekitcharoenphon, P., Aarestrup, F. M. & Lund, O. Solving the problem of comparing whole bacterial genomes across different sequencing platforms. PLoS ONE 9, e104984 (2014).
doi: 10.1371/journal.pone.0104984
Joensen, K. G. et al. Evaluating next-generation sequencing for direct clinical diagnostics in diarrhoeal disease. Eur. J. Clin. Microbiol. Infect. Dis. 36, 1325–1338 (2017).
doi: 10.1007/s10096-017-2947-2
Clausen, P. T. L. C., Aarestrup, F. M. & Lund, O. Rapid and precise alignment of raw reads against redundant databases with KMA. BMC Bioinform. 19, 307 (2018).
doi: 10.1186/s12859-018-2336-6
Hobohm, U., Scharf, M., Schneider, R. & Sander, C. Selection of representative protein data sets. Protein Sci. 1, 409–417 (1992).
doi: 10.1002/pro.5560010313
Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).
pubmed: 3447015
Studier, J. & Keppler, K. A note on the neighbor-joining algorithm of Saitou and Nei. Mol. Biol. Evol. 5, 729–731 (1988).
pubmed: 3221794
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
doi: 10.1093/molbev/msu300
Huerta-Cepas, J. et al. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635–1638 (2016).
doi: 10.1093/molbev/msw046
Revell, L. J. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012).
doi: 10.1111/j.2041-210X.2011.00169.x
Schliep, K. P. phangorn: phylogenetic analysis in R. Bioinformatics 27, 592–593 (2011).
doi: 10.1093/bioinformatics/btq706
CDC. Multistate Outbreak of E. coli O157:H7 Infections Linked to Romaine Lettuce (Final Update) | Investigation Notice: Multistate Outbreak of E. coli O157:H7 Infections April 2018 | E. coli | CDC. https://www.cdc.gov/ecoli/2018/o157h7-04-18/index.html . Accessed 7 August 2018.