High-quality chromosome-scale assembly of the walnut (Juglans regia L.) reference genome.


Journal

GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872

Informations de publication

Date de publication:
01 05 2020
Historique:
received: 19 10 2019
revised: 13 03 2020
accepted: 20 04 2020
entrez: 21 5 2020
pubmed: 21 5 2020
medline: 5 10 2021
Statut: ppublish

Résumé

The release of the first reference genome of walnut (Juglans regia L.) enabled many achievements in the characterization of walnut genetic and functional variation. However, it is highly fragmented, preventing the integration of genetic, transcriptomic, and proteomic information to fully elucidate walnut biological processes. Here, we report the new chromosome-scale assembly of the walnut reference genome (Chandler v2.0) obtained by combining Oxford Nanopore long-read sequencing with chromosome conformation capture (Hi-C) technology. Relative to the previous reference genome, the new assembly features an 84.4-fold increase in N50 size, with the 16 chromosomal pseudomolecules assembled and representing 95% of its total length. Using full-length transcripts from single-molecule real-time sequencing, we predicted 37,554 gene models, with a mean gene length higher than the previous gene annotations. Most of the new protein-coding genes (90%) present both start and stop codons, which represents a significant improvement compared with Chandler v1.0 (only 48%). We then tested the potential impact of the new chromosome-level genome on different areas of walnut research. By studying the proteome changes occurring during male flower development, we observed that the virtual proteome obtained from Chandler v2.0 presents fewer artifacts than the previous reference genome, enabling the identification of a new potential pollen allergen in walnut. Also, the new chromosome-scale genome facilitates in-depth studies of intraspecies genetic diversity by revealing previously undetected autozygous regions in Chandler, likely resulting from inbreeding, and 195 genomic regions highly differentiated between Western and Eastern walnut cultivars. Overall, Chandler v2.0 will serve as a valuable resource to better understand and explore walnut biology.

Sections du résumé

BACKGROUND
The release of the first reference genome of walnut (Juglans regia L.) enabled many achievements in the characterization of walnut genetic and functional variation. However, it is highly fragmented, preventing the integration of genetic, transcriptomic, and proteomic information to fully elucidate walnut biological processes.
FINDINGS
Here, we report the new chromosome-scale assembly of the walnut reference genome (Chandler v2.0) obtained by combining Oxford Nanopore long-read sequencing with chromosome conformation capture (Hi-C) technology. Relative to the previous reference genome, the new assembly features an 84.4-fold increase in N50 size, with the 16 chromosomal pseudomolecules assembled and representing 95% of its total length. Using full-length transcripts from single-molecule real-time sequencing, we predicted 37,554 gene models, with a mean gene length higher than the previous gene annotations. Most of the new protein-coding genes (90%) present both start and stop codons, which represents a significant improvement compared with Chandler v1.0 (only 48%). We then tested the potential impact of the new chromosome-level genome on different areas of walnut research. By studying the proteome changes occurring during male flower development, we observed that the virtual proteome obtained from Chandler v2.0 presents fewer artifacts than the previous reference genome, enabling the identification of a new potential pollen allergen in walnut. Also, the new chromosome-scale genome facilitates in-depth studies of intraspecies genetic diversity by revealing previously undetected autozygous regions in Chandler, likely resulting from inbreeding, and 195 genomic regions highly differentiated between Western and Eastern walnut cultivars.
CONCLUSION
Overall, Chandler v2.0 will serve as a valuable resource to better understand and explore walnut biology.

Identifiants

pubmed: 32432329
pii: 5841058
doi: 10.1093/gigascience/giaa050
pmc: PMC7238675
pii:
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : NHGRI NIH HHS
ID : R01 HG006677
Pays : United States

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press.

Références

Nucleic Acids Res. 2012 Apr;40(7):e49
pubmed: 22217600
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W116-20
pubmed: 15980438
BMC Genomics. 2020 Mar 4;21(1):203
pubmed: 32131731
Bioinformatics. 2015 Oct 1;31(19):3210-2
pubmed: 26059717
BMC Genomics. 2015 Sep 17;16:707
pubmed: 26383694
Sci Rep. 2019 Apr 23;9(1):6376
pubmed: 31015545
Plant J. 2016 Sep;87(5):507-32
pubmed: 27145194
Nat Methods. 2015 Apr;12(4):357-60
pubmed: 25751142
PLoS One. 2017 Mar 3;12(3):e0172541
pubmed: 28257470
Nucleic Acids Res. 2017 Jul 3;45(W1):W6-W11
pubmed: 28486635
Theor Appl Genet. 2014 May;127(5):1073-90
pubmed: 24567047
Nat Protoc. 2016 Sep;11(9):1650-67
pubmed: 27560171
G3 (Bethesda). 2016 Nov 8;6(11):3485-3495
pubmed: 27621377
J Sci Food Agric. 2010 Sep;90(12):1959-67
pubmed: 20586084
Gigascience. 2018 Jun 1;7(6):
pubmed: 29893845
Science. 2009 Oct 9;326(5950):289-93
pubmed: 19815776
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
PLoS One. 2019 Jun 27;14(6):e0210928
pubmed: 31246947
PLoS One. 2018 Nov 27;13(11):e0208021
pubmed: 30481202
Nat Commun. 2018 Nov 19;9(1):4844
pubmed: 30451840
Bioinformatics. 2017 Jul 15;33(14):2202-2204
pubmed: 28369201
Bioinformatics. 2012 Dec 15;28(24):3326-8
pubmed: 23060615
Biotechniques. 2016 Oct 1;61(4):203-205
pubmed: 27712583
Genome Biol. 2018 Jul 13;19(1):90
pubmed: 30005597
Bioinformatics. 2013 Nov 1;29(21):2669-77
pubmed: 23990416
Nat Biotechnol. 2011 May 15;29(7):644-52
pubmed: 21572440
PLoS Genet. 2009 Nov;5(11):e1000734
pubmed: 19956538
Bioinformatics. 2008 Dec 15;24(24):2818-24
pubmed: 18952627
J Exp Bot. 2017 Nov 28;68(20):5419-5429
pubmed: 28992056
Gigascience. 2019 Dec 1;8(12):
pubmed: 31816089
Mol Ecol Resour. 2020 Mar;20(2):591-604
pubmed: 31628884
Mol Breed. 2016;36:119
pubmed: 27547106
Bioinformatics. 2014 May 1;30(9):1236-40
pubmed: 24451626
Bioinformatics. 2016 Jun 1;32(11):1749-51
pubmed: 26826718
Nucleic Acids Res. 1999 Jan 15;27(2):573-80
pubmed: 9862982
Nat Protoc. 2012 Feb 16;7(3):467-78
pubmed: 22343429
Plant J. 2017 Dec;92(6):1218-1231
pubmed: 29031026
Science. 2019 Jun 14;364(6445):1095-1098
pubmed: 31197015
Nucleic Acids Res. 2003 Oct 1;31(19):5654-66
pubmed: 14500829
Methods. 2012 Nov;58(3):268-76
pubmed: 22652625
Bioinformatics. 2007 Aug 15;23(16):2188-9
pubmed: 17586550
Front Plant Sci. 2019 May 28;10:689
pubmed: 31191588
Genomics Proteomics Bioinformatics. 2016 Oct;14(5):265-279
pubmed: 27646134
Nat Genet. 2018 Jun;50(6):772-777
pubmed: 29713014
Nat Genet. 2019 May;51(5):885-895
pubmed: 30962619
Gigascience. 2019 May 1;8(5):
pubmed: 31049561
G3 (Bethesda). 2018 Jul 2;8(7):2153-2165
pubmed: 29792315
Bioinformatics. 2011 Aug 1;27(15):2156-8
pubmed: 21653522
Plant Biotechnol J. 2019 Jun;17(6):1027-1036
pubmed: 30515952
Bioinformatics. 2006 Jan 15;22(2):195-201
pubmed: 16301204
Bioinformatics. 2005 May 1;21(9):1859-75
pubmed: 15728110
Hortic Res. 2019 Mar 25;6:55
pubmed: 30937174
Genome Res. 2002 Apr;12(4):656-64
pubmed: 11932250
Curr Opin Plant Biol. 2014 Apr;18:31-6
pubmed: 24548794
Bioinformatics. 2013 Apr 15;29(8):1072-5
pubmed: 23422339
Proteomes. 2018 Jun 21;6(3):
pubmed: 29933572
PLoS Comput Biol. 2018 Jan 26;14(1):e1005944
pubmed: 29373581
Proteins. 2006 Aug 15;64(3):559-74
pubmed: 16736488
Nature. 2017 Feb 16;542(7641):307-312
pubmed: 28178233
Bioinformatics. 2011 Mar 15;27(6):764-70
pubmed: 21217122
Gigascience. 2019 Sep 1;8(9):
pubmed: 31513707
Plant Cell. 2017 Oct;29(10):2336-2348
pubmed: 29025960
Nat Protoc. 2013 Aug;8(8):1494-512
pubmed: 23845962
Gigascience. 2019 Apr 1;8(4):
pubmed: 30942870
Genomics Proteomics Bioinformatics. 2015 Oct;13(5):278-89
pubmed: 26542840
Mol Biol Evol. 2019 Jun 04;:
pubmed: 31163451
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Clin Exp Allergy. 2014 Mar;44(3):319-41
pubmed: 24382327
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Front Plant Sci. 2019 Sep 20;10:1140
pubmed: 31616449
Plant Cell. 2016 Aug;28(8):1759-68
pubmed: 27512012
Genome Res. 2017 May;27(5):787-792
pubmed: 28130360
Nat Plants. 2018 Jul;4(7):440-452
pubmed: 29915331
Genetics. 1989 Nov;123(3):585-95
pubmed: 2513255
Nat Plants. 2018 Nov;4(11):879-887
pubmed: 30390080
Nat Genet. 2017 Jul;49(7):1099-1106
pubmed: 28581499
Genome Res. 2016 Mar;26(3):342-50
pubmed: 26848124
DNA Res. 2018 Aug 1;25(4):409-419
pubmed: 29800113

Auteurs

Annarita Marrano (A)

Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA.

Monica Britton (M)

Bioinformatics Core Facility, Genome Center, University of California, One Shields Avenue, Davis, CA 95616, USA.

Paulo A Zaini (PA)

Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA.

Aleksey V Zimin (AV)

Department of Biomedical Engineering, Johns Hopkins University, 720 Rutland Avenue, Baltimore, MD 21205, USA.
Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, 3100 Wyman Park Dr., Baltimore, MD 21211, USA.

Rachael E Workman (RE)

Department of Biomedical Engineering, Johns Hopkins University, 720 Rutland Avenue, Baltimore, MD 21205, USA.

Daniela Puiu (D)

Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, 3100 Wyman Park Dr., Baltimore, MD 21211, USA.

Luca Bianco (L)

Research and Innovation Center, Fondazione Edmund Mach, Via E. Mach, 1 38010 S. Michele all'Adige (TN) 38010, Italy.

Erica Adele Di Pierro (EAD)

Research and Innovation Center, Fondazione Edmund Mach, Via E. Mach, 1 38010 S. Michele all'Adige (TN) 38010, Italy.

Brian J Allen (BJ)

Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA.

Sandeep Chakraborty (S)

Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA.

Michela Troggio (M)

Research and Innovation Center, Fondazione Edmund Mach, Via E. Mach, 1 38010 S. Michele all'Adige (TN) 38010, Italy.

Charles A Leslie (CA)

Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA.

Winston Timp (W)

Department of Biomedical Engineering, Johns Hopkins University, 720 Rutland Avenue, Baltimore, MD 21205, USA.
Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, 3100 Wyman Park Dr., Baltimore, MD 21211, USA.

Abhaya Dandekar (A)

Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA.

Steven L Salzberg (SL)

Department of Biomedical Engineering, Johns Hopkins University, 720 Rutland Avenue, Baltimore, MD 21205, USA.
Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, 3100 Wyman Park Dr., Baltimore, MD 21211, USA.
Departments of Computer Science and Biostatistics, Johns Hopkins University, 3400 North Charles Street Baltimore, MD 21218, USA.

David B Neale (DB)

Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA.

Articles similaires

Humans Macular Degeneration Mendelian Randomization Analysis Life Style Genome-Wide Association Study
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Genome Size Genome, Plant Magnoliopsida Evolution, Molecular Arabidopsis
Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic

Classifications MeSH