Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy.

Genome assembly accessible large genomes modularity opensource public reproducibility scalable

Journal

bioRxiv : the preprint server for biology
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187

Informations de publication

Date de publication:
30 Jun 2023
Historique:
pubmed: 10 7 2023
medline: 10 7 2023
entrez: 10 7 2023
Statut: epublish

Résumé

Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals).

Identifiants

pubmed: 37425881
doi: 10.1101/2023.06.28.546576
pmc: PMC10327048
pii:
doi:

Types de publication

Preprint

Langues

eng

Subventions

Organisme : NCI NIH HHS
ID : U01 CA253481
Pays : United States
Organisme : NCI NIH HHS
ID : U24 CA231877
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG010263
Pays : United States
Organisme : NHGRI NIH HHS
ID : U41 HG006620
Pays : United States

Auteurs

Delphine Larivière (D)

Dept. of Biochemistry and Molecular Biology, Pennsylvania State University, USA.

Linelle Abueg (L)

Vertebrate Genome Laboratory, The Rockefeller University, USA.

Nadolina Brajuka (N)

Vertebrate Genome Laboratory, The Rockefeller University, USA.

Cristóbal Gallardo-Alba (C)

Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany.

Bjorn Grüning (B)

Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany.

Byung June Ko (BJ)

Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea.

Alex Ostrovsky (A)

Departments of Biology and Computer Science, Johns Hopkins University, USA.

Marc Palmada-Flores (M)

Department of Medicine and Life Sciences (MELIS), Institut de Biologia Evolutiva, Universitat Pompeu Fabra-CSIC, Barcelona 08003, Spain.

Brandon D Pickett (BD)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

Keon Rabbani (K)

Department of Quantitative and Computational Biology, University of Southern California.

Jennifer R Balacco (JR)

Vertebrate Genome Laboratory, The Rockefeller University, USA.

Mark Chaisson (M)

Department of Quantitative and Computational Biology, University of Southern California.

Haoyu Cheng (H)

Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Joanna Collins (J)

Wellcome Sanger Institute, Cambridge CB10 1SA, United Kingdom.

Alexandra Denisova (A)

Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russia.

Olivier Fedrigo (O)

Vertebrate Genome Laboratory, The Rockefeller University, USA.

Guido Roberto Gallo (GR)

Department of Biosciences, University of Milan, Milan, Italy.

Alice Maria Giani (AM)

BMRI, Weill Cornell Medical College, New York, 10021, USA.

Grenville MacDonald Gooder (GM)

Vertebrate Genome Laboratory, The Rockefeller University, USA.

Nivesh Jain (N)

Vertebrate Genome Laboratory, The Rockefeller University, USA.

Cassidy Johnson (C)

Vertebrate Genome Laboratory, The Rockefeller University, USA.

Heebal Kim (H)

Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea.
eGnome, Inc, Seoul, Republic of Korea.
Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.

Chul Lee (C)

Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, NY, 10065, USA.

Tomas Marques-Bonet (T)

Department of Medicine and Life Sciences (MELIS), Institut de Biologia Evolutiva, Universitat Pompeu Fabra-CSIC, Barcelona 08003, Spain.
Catalan Institution of Research and Advanced Studies (ICREA), Barcelona 08010, Spain.
CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona 08028, Spain.
Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Cerdanyola del Vallès 08193, Spain.

Brian O'Toole (B)

Vertebrate Genome Laboratory, The Rockefeller University, USA.

Arang Rhie (A)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

Simona Secomandi (S)

Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus.

Marcella Sozzoni (M)

University of Florence, Department of Biology, Via Madonna del Piano 6, Sesto Fiorentino (FI).

Tatiana Tilley (T)

Vertebrate Genome Laboratory, The Rockefeller University, USA.

Marcela Uliano-Silva (M)

Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, United Kingdom.

Marius van den Beek (M)

Dept. of Biochemistry and Molecular Biology, Pennsylvania State University, USA.

Robert M Waterhouse (RM)

Department of Ecology & Evolution and Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland.

Adam M Phillippy (AM)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

Erich D Jarvis (ED)

Vertebrate Genome Laboratory, The Rockefeller University, USA.

Michael C Schatz (MC)

Departments of Biology and Computer Science, Johns Hopkins University, USA.

Anton Nekrutenko (A)

Dept. of Biochemistry and Molecular Biology, Pennsylvania State University, USA.

Giulio Formenti (G)

Vertebrate Genome Laboratory, The Rockefeller University, USA.

Classifications MeSH