Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy.
Genome assembly
accessible
large genomes
modularity
opensource
public
reproducibility
scalable
Journal
bioRxiv : the preprint server for biology
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187
Informations de publication
Date de publication:
30 Jun 2023
30 Jun 2023
Historique:
pubmed:
10
7
2023
medline:
10
7
2023
entrez:
10
7
2023
Statut:
epublish
Résumé
Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals).
Identifiants
pubmed: 37425881
doi: 10.1101/2023.06.28.546576
pmc: PMC10327048
pii:
doi:
Types de publication
Preprint
Langues
eng
Subventions
Organisme : NCI NIH HHS
ID : U01 CA253481
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA231877
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG010263
Pays : United States
Organisme : NHGRI NIH HHS
ID : U41 HG006620
Pays : United States