Long-read assembly of the Brassica napus reference genome Darmor-bzh.

Brassica Darmor-bzh assembly chromosome-scale direct RNA nanopore sequencing oilseed rape optical mapping

Journal

GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872

Informations de publication

Date de publication:
15 12 2020
Historique:
received: 22 07 2020
revised: 18 09 2020
accepted: 09 11 2020
entrez: 15 12 2020
pubmed: 16 12 2020
medline: 26 10 2021
Statut: ppublish

Résumé

The combination of long reads and long-range information to produce genome assemblies is now accepted as a common standard. This strategy not only allows access to the gene catalogue of a given species but also reveals the architecture and organization of chromosomes, including complex regions such as telomeres and centromeres. The Brassica genus is not exempt, and many assemblies based on long reads are now available. The reference genome for Brassica napus, Darmor-bzh, which was published in 2014, was produced using short reads and its contiguity was extremely low compared with current assemblies of the Brassica genus. Herein, we report the new long-read assembly of Darmor-bzh genome (Brassica napus) generated by combining long-read sequencing data and optical and genetic maps. Using the PromethION device and 6 flowcells, we generated ∼16 million long reads representing 93× coverage and, more importantly, 6× with reads longer than 100 kb. This ultralong-read dataset allows us to generate one of the most contiguous and complete assemblies of a Brassica genome to date (contig N50 > 10 Mb). In addition, we exploited all the advantages of the nanopore technology to detect modified bases and sequence transcriptomic data using direct RNA to annotate the genome and focus on resistance genes. Using these cutting-edge technologies, and in particular by relying on all the advantages of the nanopore technology, we provide the most contiguous Brassica napus assembly, a resource that will be valuable to the Brassica community for crop improvement and will facilitate the rapid selection of agronomically important traits.

Sections du résumé

BACKGROUND
The combination of long reads and long-range information to produce genome assemblies is now accepted as a common standard. This strategy not only allows access to the gene catalogue of a given species but also reveals the architecture and organization of chromosomes, including complex regions such as telomeres and centromeres. The Brassica genus is not exempt, and many assemblies based on long reads are now available. The reference genome for Brassica napus, Darmor-bzh, which was published in 2014, was produced using short reads and its contiguity was extremely low compared with current assemblies of the Brassica genus.
FINDINGS
Herein, we report the new long-read assembly of Darmor-bzh genome (Brassica napus) generated by combining long-read sequencing data and optical and genetic maps. Using the PromethION device and 6 flowcells, we generated ∼16 million long reads representing 93× coverage and, more importantly, 6× with reads longer than 100 kb. This ultralong-read dataset allows us to generate one of the most contiguous and complete assemblies of a Brassica genome to date (contig N50 > 10 Mb). In addition, we exploited all the advantages of the nanopore technology to detect modified bases and sequence transcriptomic data using direct RNA to annotate the genome and focus on resistance genes.
CONCLUSION
Using these cutting-edge technologies, and in particular by relying on all the advantages of the nanopore technology, we provide the most contiguous Brassica napus assembly, a resource that will be valuable to the Brassica community for crop improvement and will facilitate the rapid selection of agronomically important traits.

Identifiants

pubmed: 33319912
pii: 6034787
doi: 10.1093/gigascience/giaa137
pmc: PMC7736779
pii:
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press GigaScience.

Références

Nat Methods. 2019 Dec;16(12):1297-1305
pubmed: 31740818
New Phytol. 2013 Jul;199(1):252-63
pubmed: 23551259
J Comput Biol. 2006 Jun;13(5):1028-40
pubmed: 16796549
Genome Biol. 2004;5(2):R12
pubmed: 14759262
Plant Biotechnol J. 2018 Jul;16(7):1265-1274
pubmed: 29205771
Sci Rep. 2020 Jul 24;10(1):12394
pubmed: 32709963
Comput Appl Biosci. 1997 Aug;13(4):477-8
pubmed: 9283765
Methods Mol Biol. 2019;1962:227-245
pubmed: 31020564
Elife. 2020 Jan 14;9:
pubmed: 31931956
Sci Data. 2017 Aug 01;4:170093
pubmed: 28763055
Nat Plants. 2018 Oct;4(10):762-765
pubmed: 30287950
Plant Cell. 2017 Oct;29(10):2336-2348
pubmed: 29025960
Front Plant Sci. 2020 Nov 12;11:577536
pubmed: 33281844
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
Theor Appl Genet. 2018 Aug;131(8):1627-1643
pubmed: 29728747
Nat Plants. 2020 Aug;6(8):929-941
pubmed: 32782408
Front Plant Sci. 2020 Apr 28;11:496
pubmed: 32411167
Hortic Res. 2018 Aug 15;5:50
pubmed: 30131865
Nat Commun. 2018 Jan 15;9(1):189
pubmed: 29335486
Ann Bot. 2005 Jan;95(1):229-35
pubmed: 15596470
PeerJ. 2020 Nov 5;8:e10150
pubmed: 33194395
Genetics. 2016 Feb;202(2):513-23
pubmed: 26614742
Genome Res. 2004 May;14(5):988-95
pubmed: 15123596
BMC Genomics. 2016 Nov 2;17(1):852
pubmed: 27806688
Sci Rep. 2019 Oct 17;9(1):14908
pubmed: 31624302
Plant Biotechnol J. 2017 Dec;15(12):1602-1610
pubmed: 28403535
Nat Plants. 2020 Jan;6(1):34-45
pubmed: 31932676
Genome Res. 2016 Dec;26(12):1721-1729
pubmed: 27852649
Genome Res. 2017 May;27(5):737-746
pubmed: 28100585
Nucleic Acids Res. 1989 Mar 25;17(6):2362
pubmed: 2468132
BMC Genomics. 2013 Feb 22;14:120
pubmed: 23432809
Nat Biotechnol. 2019 May;37(5):540-546
pubmed: 30936562
Gigascience. 2017 Feb 1;6(2):1-13
pubmed: 28369459
Bioinformatics. 2005 Apr 15;21(8):1703-4
pubmed: 15598829
Plant Biotechnol J. 2020 Apr;18(4):969-982
pubmed: 31553100
Bioinformatics. 2004 Sep 22;20(14):2324-6
pubmed: 15059820
Science. 2014 Aug 22;345(6199):950-3
pubmed: 25146293
Genome Res. 2002 Apr;12(4):656-64
pubmed: 11932250
Nat Methods. 2020 Feb;17(2):155-158
pubmed: 31819265
Bioinformatics. 2020 Jul 16;:
pubmed: 32910174
Nat Biotechnol. 2018 Apr;36(4):338-345
pubmed: 29431738
Sci Rep. 2017 Dec 21;7(1):17986
pubmed: 29269833
J Hum Genet. 2020 Jan;65(1):25-33
pubmed: 31602005
PLoS One. 2014 Nov 19;9(11):e112963
pubmed: 25409509
Theor Appl Genet. 2016 Oct;129(10):1887-99
pubmed: 27364915
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Nat Plants. 2018 Nov;4(11):879-887
pubmed: 30390080
J Genet. 2016 Dec;95(4):997-1001
pubmed: 27994200

Auteurs

Mathieu Rousseau-Gueutin (M)

IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.

Caroline Belser (C)

Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.

Corinne Da Silva (C)

Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.

Gautier Richard (G)

IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.

Benjamin Istace (B)

Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.

Corinne Cruaud (C)

Genoscope, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.

Cyril Falentin (C)

IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.

Franz Boideau (F)

IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.

Julien Boutte (J)

IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.

Regine Delourme (R)

IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.

Gwenaëlle Deniot (G)

IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.

Stefan Engelen (S)

Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.

Julie Ferreira de Carvalho (JF)

IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.

Arnaud Lemainque (A)

Genoscope, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.

Loeiz Maillet (L)

IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.

Jérôme Morice (J)

IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.

Patrick Wincker (P)

Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.

France Denoeud (F)

Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.

Anne-Marie Chèvre (AM)

IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.

Jean-Marc Aury (JM)

Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Animals Natural Killer T-Cells Mice Adipose Tissue Lipid Metabolism

Classifications MeSH