vSNP: a SNP pipeline for the generation of transparent SNP matrices and phylogenetic trees from whole genome sequencing data sets.


Journal

BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258

Informations de publication

Date de publication:
01 Jun 2024
Historique:
received: 04 01 2024
accepted: 21 05 2024
medline: 1 6 2024
pubmed: 1 6 2024
entrez: 31 5 2024
Statut: epublish

Résumé

Several single nucleotide polymorphism (SNP) pipelines exist, each offering its own advantages. Among them and described here is vSNP that has been developed over the past decade and is specifically tailored to meet the needs of diagnostic laboratories. Laboratories that aim to provide rapid whole genome sequencing results during outbreak investigations face unique challenges. vSNP addresses these challenges by enabling users to verify and validate sequence accuracy with ease- having utility across various pathogens, being fully auditable, and presenting results that are easy to interpret and can be comprehended by individuals with diverse backgrounds. vSNP has proven effective for real-time phylogenetic analysis of disease outbreaks and eradication efforts, including bovine tuberculosis, brucellosis, virulent Newcastle disease, SARS-CoV-2, African swine fever, and highly pathogenic avian influenza. The pipeline produces easy-to-read SNP matrices, sorted for convenience, as well as corresponding phylogenetic trees, making the output easily understandable. Essential data for verifying SNPs is included in the output, and the process has been divided into two steps for ease of use and faster processing times. vSNP requires minimal computational resources to run and can be run in a wide range of environments. Several utilities have been developed to make analysis more accessible for subject matter experts who may not have computational expertise. The vSNP pipeline integrates seamlessly into a diagnostic workflow and meets the criteria for quality control accreditation programs, such as 17025 by the International Organization for Standardization. Its versatility and robustness make it suitable for use with a diverse range of organisms, providing detailed, reproducible, and transparent results, making it a valuable tool in various applications, including phylogenetic analysis performed in real time.

Sections du résumé

BACKGROUND BACKGROUND
Several single nucleotide polymorphism (SNP) pipelines exist, each offering its own advantages. Among them and described here is vSNP that has been developed over the past decade and is specifically tailored to meet the needs of diagnostic laboratories. Laboratories that aim to provide rapid whole genome sequencing results during outbreak investigations face unique challenges. vSNP addresses these challenges by enabling users to verify and validate sequence accuracy with ease- having utility across various pathogens, being fully auditable, and presenting results that are easy to interpret and can be comprehended by individuals with diverse backgrounds.
RESULTS RESULTS
vSNP has proven effective for real-time phylogenetic analysis of disease outbreaks and eradication efforts, including bovine tuberculosis, brucellosis, virulent Newcastle disease, SARS-CoV-2, African swine fever, and highly pathogenic avian influenza. The pipeline produces easy-to-read SNP matrices, sorted for convenience, as well as corresponding phylogenetic trees, making the output easily understandable. Essential data for verifying SNPs is included in the output, and the process has been divided into two steps for ease of use and faster processing times. vSNP requires minimal computational resources to run and can be run in a wide range of environments. Several utilities have been developed to make analysis more accessible for subject matter experts who may not have computational expertise.
CONCLUSION CONCLUSIONS
The vSNP pipeline integrates seamlessly into a diagnostic workflow and meets the criteria for quality control accreditation programs, such as 17025 by the International Organization for Standardization. Its versatility and robustness make it suitable for use with a diverse range of organisms, providing detailed, reproducible, and transparent results, making it a valuable tool in various applications, including phylogenetic analysis performed in real time.

Identifiants

pubmed: 38822271
doi: 10.1186/s12864-024-10437-5
pii: 10.1186/s12864-024-10437-5
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

545

Informations de copyright

© 2024. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.

Références

Salvador LCM, O’Brien DJ, Cosgrove MK, Stuber TP, Schooley AM, Crispell J, et al. Disease management at the wildlife-livestock interface: Using whole-genome sequencing to study the role of elk in Mycobacterium bovis transmission in Michigan, USA. Mol Ecol. 2019;28(9):2192–205.
doi: 10.1111/mec.15061 pubmed: 30807679
Allard MW, Luo Y, Strain E, Pettengill J, Timme R, Wang C, et al. On the evolutionary history, population genetics and diversity among isolates of Salmonella Enteritidis PFGE pattern JEGX01.0004. PLoS One. 2013;8(1):e55254.
doi: 10.1371/journal.pone.0055254 pubmed: 23383127 pmcid: 3559427
Jajou R, Kohl TA, Walker T, Norman A, Cirillo DM, Tagliani E, et al. Towards standardisation: comparison of five whole genome sequencing (WGS) analysis pipelines for detection of epidemiologically linked tuberculosis cases. Euro Surveill. 2019;24(50):1900130.
doi: 10.2807/1560-7917.ES.2019.24.50.1900130 pubmed: 31847944 pmcid: 6918587
Davis S, Pettengill JB, Luo Y, Payne J, Shpuntoff A, Rand H, et al. CFSAN SNP Pipeline: an automated method for constructing SNP matrices from next-generation sequence data. PeerJ Computer Science. 2015;1:e20.
doi: 10.7717/peerj-cs.20
Sahl JW, Lemmer D, Travis J, Schupp JM, Gillece JD, Aziz M, et al. NASP: an accurate, rapid method for the identification of SNPs in WGS datasets that supports flexible input and output formats. Microb Genom. 2016;2(8): e000074.
pubmed: 28348869 pmcid: 5320593
Orloski K, Robbe-Austerman S, Stuber T, Hench B, Schoenbaum M. Whole genome sequencing of mycobacterium bovis isolated from livestock in the United States, 1989–2018. Front Vet Sci. 2018;5:253.
doi: 10.3389/fvets.2018.00253 pubmed: 30425994 pmcid: 6219248
Kamath PL, Foster JT, Drees KP, Luikart G, Quance C, Anderson NJ, et al. Genomics reveals historic and contemporary transmission dynamics of a bacterial disease among wildlife and livestock. Nat Commun. 2016;7:11448.
doi: 10.1038/ncomms11448 pubmed: 27165544 pmcid: 4865865
Hicks J, Stuber T, Lantz K, Erdman M, Robbe-Austerman S, Huang X. Genomic diversity of Taylorella equigenitalis introduced into the United States from 1978 to 2012. PLoS One. 2018;13(3):e0194253.
doi: 10.1371/journal.pone.0194253 pubmed: 29584782 pmcid: 5870977
Lorente-Leal V, Farrell D, Romero B, Alvarez J, de Juan L, Gordon SV. Performance and agreement between wgs variant calling pipelines used for bovine tuberculosis control: toward international standardization. Front Vet Sci. 2021;8: 780018.
doi: 10.3389/fvets.2021.780018 pubmed: 34970617 pmcid: 8712436
Anaconda. Available from: https://www.anaconda.com/ .
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 2013. arXiv:13033997v1 [q-bioGN]. https://doi.org/10.48550/arXiv.1303.3997 .
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60.
doi: 10.1093/bioinformatics/btp324 pubmed: 19451168 pmcid: 2705234
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.
doi: 10.1093/bioinformatics/btp352 pubmed: 19505943 pmcid: 2723002
Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv. 2012(1207.3907). https://doi.org/10.48550/arXiv.1207.3907 .
Hicks J. Available from: https://github.com/jameshicks/vcffilter .
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19(5):455–77.
doi: 10.1089/cmb.2012.0021 pubmed: 22506599 pmcid: 3342519
Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24–6.
doi: 10.1038/nbt.1754 pubmed: 21221095 pmcid: 3346182
Pozo P, Lorente-Leal V, Robbe-Austerman S, Hicks J, Stuber T, Bezos J, et al. Use of Whole-genome sequencing to unravel the genetic diversity of a prevalent mycobacterium bovis spoligotype in a multi-host scenario in Spain. Front Microbiol. 2022;13: 915843.
doi: 10.3389/fmicb.2022.915843 pubmed: 35898917 pmcid: 9309649
Perera O, Perea C, Davalos E, Flores V, Salazar G, Rosas C, et al. Whole genome sequencing links mycobacterium bovis from cattle, cheese and humans in baja California. recent advances in bovine tuberculosis. 2022;8(674307):143.
Buss BF, Keyser-Metobo A, Rother J, et al. Possible Airborne Person-to-Person Transmission of Mycobacterium bovis — Nebraska 2014–2015. MMWR Morb Mortal Wkly Rep. 2016;65:197–201. https://doi.org/10.15585/mmwr.mm6508a1 .
USDA:APHIS:VS:Center for epidemiology and animal health. epidemiologic analyses of virulent newcastle disease in poultry in California, March 2021. USDA-APHIS; 2021. https://www.aphis.usda.gov/animal_health/downloads/animal_diseases/ai/epi-analy-vnd-poultry-calif.pdf .
Glaser L, Carstensen M, Shaw S, Robbe-Austerman S, Wunschmann A, Grear D, et al. Descriptive epidemiology and whole genome sequencing analysis for an outbreak of bovine tuberculosis in beef cattle and white-tailed deer in northwestern Minnesota. PLoS ONE. 2016;11(1): e0145735.
doi: 10.1371/journal.pone.0145735 pubmed: 26785113 pmcid: 4718535
Lakin SM, O’Donnell V, Xu L, Barrette RW, Barnabei J, Nunez R, et al. Whole genome sequencing and molecular epidemiology of the 2021 African swine fever virus outbreak in the Dominican Republic. Authorea Preprints; 2022.
Price-Carter M, Brauning R, De Lisle GW, Livingstone P, Neill M, Sinclair J, et al. Whole genome sequencing for determining the source of Mycobacterium bovis infections in livestock herds and wildlife in New Zealand. Front Vet Sci. 2018;5:272.
doi: 10.3389/fvets.2018.00272 pubmed: 30425997 pmcid: 6218598
Ortiz AP, Perea C, Davalos E, Velázquez EF, González KS, Camacho ER, et al. Whole genome sequencing links mycobacterium bovis from cattle, cheese and humans in Baja California, Mexico. Front Vet Sci. 2021;8: 674307.
doi: 10.3389/fvets.2021.674307 pubmed: 34414224 pmcid: 8370811
Quance C, Robbe-Austerman S, Stuber T, Brignole T, DeBess EE, Boyd L, et al. Identification of source of Brucella suis infection in human by whole-genome sequencing, United States and Tonga. Emerg Infect Dis. 2016;22(1):79.
doi: 10.3201/eid2201.150843 pubmed: 26689610 pmcid: 4696693
Srednik ME, Morningstar-Shaw BR, Hicks JA, Mackie TA, Schlater LK. Antimicrobial resistance and genomic characterization of Salmonella enterica serovar Senftenberg isolates in production animals from the United States. Front Microbiol. 2022;13: 979790.
doi: 10.3389/fmicb.2022.979790 pubmed: 36406424 pmcid: 9668867
Srednik ME, Lantz K, Hicks JA, Morningstar-Shaw BR, Mackie TA, Schlater LK. Antimicrobial resistance and genomic characterization of Salmonella Dublin isolates in cattle from the United States. PLoS One. 2021;16(9):e0249617.
doi: 10.1371/journal.pone.0249617 pubmed: 34547028 pmcid: 8454963
Thacker TC, Palmer MV, Robbe-Austerman S, Stuber TP, Waters WR. Anatomical distribution of Mycobacterium bovis genotypes in experimentally infected white-tailed deer. Vet Microbiol. 2015;180(1–2):75–81.
doi: 10.1016/j.vetmic.2015.07.006 pubmed: 26243696
Srednik ME, Perea CA, Giacoboni GI, Hicks JA, Foxx CL, Harris B, et al. Genomic features of antimicrobial resistance in staphylococcus pseudintermedius isolated from dogs with pyoderma in Argentina and the United States: A Comparative Study. Int J Mol Sci. 2023;24(14):11361.
doi: 10.3390/ijms241411361 pubmed: 37511121 pmcid: 10379401
Abadi S, Azouri D, Pupko T, Mayrose I. Model selection may not be a mandatory step for phylogeny reconstruction. Nat Commun. 2019;10(1):934.
doi: 10.1038/s41467-019-08822-w pubmed: 30804347 pmcid: 6389923
Li H. New strategies to improve minimap2 alignment accuracy. Bioinformatics. 2021;37(23):4572-4574.
doi: 10.1093/bioinformatics/btab705 pubmed: 34623391 pmcid: 8652018

Auteurs

Jessica Hicks (J)

National Veterinary Services Laboratories (NVSL), USDA, 1920 Dayton Avenue, Ames, IA, 50010, USA.

Tod Stuber (T)

National Veterinary Services Laboratories (NVSL), USDA, 1920 Dayton Avenue, Ames, IA, 50010, USA. tod.p.stuber@usda.gov.

Kristina Lantz (K)

National Veterinary Services Laboratories (NVSL), USDA, 1920 Dayton Avenue, Ames, IA, 50010, USA.

Mia Torchetti (M)

National Veterinary Services Laboratories (NVSL), USDA, 1920 Dayton Avenue, Ames, IA, 50010, USA.

Suelee Robbe-Austerman (S)

National Veterinary Services Laboratories (NVSL), USDA, 1920 Dayton Avenue, Ames, IA, 50010, USA.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH