Parsnp 2.0: Scalable Core-Genome alignment for massive microbial datasets.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
09 May 2024
Historique:
received: 30 01 2024
revised: 12 04 2024
accepted: 07 05 2024
medline: 10 5 2024
pubmed: 10 5 2024
entrez: 9 5 2024
Statut: aheadofprint

Résumé

Since 2016, the number of microbial species with available reference genomes in NCBI has more than tripled. Multiple genome alignment, the process of identifying nucleotides across multiple genomes which share a common ancestor, is used as the input to numerous downstream comparative analysis methods. Parsnp is one of the few multiple genome alignment methods able to scale to the current era of genomic data; however, there has been no major release since its initial release in 2014. To address this gap, we developed Parsnp v2, which significantly improves on its original release. Parsnp v2 provides users with more control over executions of the program, allowing Parsnp to be better tailored for different use-cases. We introduce a partitioning option to Parsnp, which allows the input to be broken up into multiple parallel alignment processes which are then combined into a final alignment. The partitioning option can reduce memory usage by over 4x and reduce runtime by over 2x, all while maintaining a precise core-genome alignment. The partitioning workflow is also less susceptible to complications caused by assembly artifacts and minor variation, as alignment anchors only need to be conserved within their partition and not across the entire input set. We highlight the performance on datasets involving thousands of bacterial and viral genomes. Parsnp v2 is available at https://github.com/marbl/parsnp.

Identifiants

pubmed: 38724243
pii: 7667868
doi: 10.1093/bioinformatics/btae311
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2024. Published by Oxford University Press.

Auteurs

Bryce Kille (B)

Department of Computer Science, Rice University, Houston, Texas United States.

Michael G Nute (MG)

Department of Computer Science, Rice University, Houston, Texas United States.

Victor Huang (V)

Department of Computer Science, Rice University, Houston, Texas United States.

Eddie Kim (E)

Department of Computer Science, Rice University, Houston, Texas United States.

Adam M Phillippy (AM)

Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland United States.

Todd J Treangen (TJ)

Department of Computer Science, Rice University, Houston, Texas United States.
Department of Bioengineering, Rice University, Houston, Texas United States.

Classifications MeSH