A chromosome-scale reference genome of grasspea (Lathyrus sativus).


Journal

Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192

Informations de publication

Date de publication:
27 Sep 2024
Historique:
received: 19 02 2024
accepted: 05 09 2024
medline: 28 9 2024
pubmed: 28 9 2024
entrez: 27 9 2024
Statut: epublish

Résumé

Grasspea (Lathyrus sativus L.) is an underutilised but promising legume crop with tolerance to a wide range of abiotic and biotic stress factors, and potential for climate-resilient agriculture. Despite a long history and wide geographical distribution of cultivation, only limited breeding resources are available. This paper reports a 5.96 Gbp genome assembly of grasspea genotype LS007, of which 5.03 Gbp is scaffolded into 7 pseudo-chromosomes. The assembly has a BUSCO completeness score of 99.1% and is annotated with 31719 gene models and repeat elements. This represents the most contiguous and accurate assembly of the grasspea genome to date.

Identifiants

pubmed: 39333203
doi: 10.1038/s41597-024-03868-y
pii: 10.1038/s41597-024-03868-y
doi:

Types de publication

Dataset Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

1035

Subventions

Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/X01097X/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/P012523/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/P012523/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/X01097X/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/X01097X/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/P012523/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/X01097X/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/P012523/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/X01097X/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/P012523/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/X01097X/1
Organisme : RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
ID : BB/P012523/1
Organisme : Grantová Agentura České Republiky (Grant Agency of the Czech Republic)
ID : 24-10036S
Organisme : Grantová Agentura České Republiky (Grant Agency of the Czech Republic)
ID : 24-10036S
Organisme : Grantová Agentura České Republiky (Grant Agency of the Czech Republic)
ID : 24-10036S
Organisme : Ministry of Education and Science | Fundação para a Ciência e a Tecnologia (Portuguese Science and Technology Foundation)
ID : 2017.00198.CEECIND
Organisme : Ministry of Education and Science | Fundação para a Ciência e a Tecnologia (Portuguese Science and Technology Foundation)
ID : 2017.00198

Informations de copyright

© 2024. The Author(s).

Références

Dixit, G. P., Parihar, A. K., Bohra, A. & Singh, N. P. Achievements and prospects of grass pea (Lathyrus sativus L.) improvement for sustainable food production. The Crop Journal 4, 407–416 (2016).
doi: 10.1016/j.cj.2016.06.008
Kislev, M. E. Origins of the cultivation of Lathyrus sativus and L. cicera (Fabaceae). Economic Botany 43, 262–270 (1989).
doi: 10.1007/BF02859868
Coward, F., Shennan, S., Colledge, S., Conolly, J. & Collard, M. The spread of Neolithic plant economies from the Near East to northwest Europe: a phylogenetic analysis. Journal of Archaeological Science 35, 42–56 (2008).
doi: 10.1016/j.jas.2007.02.022
Lambein, F., Travella, S., Kuo, Y.-H., Van Montagu, M. & Heijde, M. Grass pea (Lathyrus sativus L.): orphan crop, nutraceutical or just plain food? Planta https://doi.org/10.1007/s00425-018-03084-0 (2019).
Campbell, C. G. Grass Pea: Lathyrus Sativus L. Promoting the conservation and use of underutilized and neglected crops vol. 18 (International Plant Genetic Resources Institute, 1997).
Rajarammohan, S. et al. Genome sequencing and assembly of Lathyrus sativus - a nutrient-rich hardy legume crop. Sci Data 10, 32 (2023).
pubmed: 36650149 pmcid: 9845207 doi: 10.1038/s41597-022-01903-4
Edwards, A. et al. Genomics and biochemical analyses reveal a metabolon key to β-L-ODAP biosynthesis in Lathyrus sativus. Nat Commun 14, 876 (2023).
pubmed: 36797319 pmcid: 9935904 doi: 10.1038/s41467-023-36503-2
Neumann, P., Novák, P., Hoštáková, N. & Macas, J. Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mob DNA 10, 1 (2019).
pubmed: 30622655 pmcid: 6317226 doi: 10.1186/s13100-018-0144-1
Macas, J., Koblízková, A. & Neumann, P. Characterization of Stowaway MITEs in pea (Pisum sativum L.) and identification of their potential master elements. Genome 48, 831–839 (2005).
pubmed: 16391689 doi: 10.1139/g05-047
Macas, J., Neumann, P. & Pozárková, D. Zaba: a novel miniature transposable element present in genomes of legume plants. Mol Genet Genomics 269, 624–631 (2003).
pubmed: 12898216 doi: 10.1007/s00438-003-0869-4
Yang, T. et al. Improved pea reference genome and pan-genome highlight genomic features and evolutionary characteristics. Nat Genet 54, 1553–1563 (2022).
pubmed: 36138232 pmcid: 9534762 doi: 10.1038/s41588-022-01172-2
Sanches, M. et al. Grass pea (Lathyrus sativus) interesting panoply of mechanisms to cope with contrasting water stress conditions – a controlled study of sub populational differences in a worldwide collection of accessions. Agricultural Water Management 292, 108664 (2024).
doi: 10.1016/j.agwat.2023.108664
Jones, A. et al. High-molecular weight DNA extraction, clean-up and size selection for long-read sequencing. PLOS ONE 16, e0253830 (2021).
pubmed: 34264958 pmcid: 8282028 doi: 10.1371/journal.pone.0253830
Schalamun, M. et al. Harnessing the MinION: An example of how to establish long-read sequencing in a laboratory using challenging plant tissue from Eucalyptus pauciflora. Molecular Ecology Resources 19, 77–89 (2019).
pubmed: 30118581 doi: 10.1111/1755-0998.12938
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175 (2021).
pubmed: 33526886 pmcid: 7961889 doi: 10.1038/s41592-020-01056-5
Laetsch, D. R. & Blaxter, M. L. BlobTools: Interrogation of genome assemblies. Preprint at https://doi.org/10.12688/f1000research.12232.1 (2017).
Laetsch, D. R., Koutsovoulos, G., Booth, T., Stajich, J. & Kumar, S. DRL/blobtools: BlobTools v1.0.1. Zenodo https://doi.org/10.5281/zenodo.845347 (2017).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 [q-bio] (2013).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
pubmed: 33590861 pmcid: 7931819 doi: 10.1093/gigascience/giab008
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Systems 3, 95–98 (2016).
pubmed: 27467249 pmcid: 5846465 doi: 10.1016/j.cels.2016.07.002
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
pubmed: 28336562 pmcid: 5635820 doi: 10.1126/science.aal3327
Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Systems 3, 99–101 (2016).
pubmed: 27467250 pmcid: 5596920 doi: 10.1016/j.cels.2015.07.012
Vondrak, T. et al. Characterization of repeat arrays in ultra-long nanopore reads reveals frequent origin of satellite DNA from retrotransposon-derived tandem repeats. The Plant Journal 101, 484–500 (2020).
pubmed: 31559657 doi: 10.1111/tpj.14546
Aliyeva-Schnorr, L., Ma, L. & Houben, A. A Fast Air-dry Dropping Chromosome Preparation Method Suitable for FISH in Plants. J Vis Exp e53470 https://doi.org/10.3791/53470 (2015).
Macas, J. et al. In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae. PLoS ONE 10, e0143424 (2015).
pubmed: 26606051 pmcid: 4659654 doi: 10.1371/journal.pone.0143424
Macas, J., Neumann, P. & Navrátilová, A. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula. BMC Genomics 8, 427 (2007).
pubmed: 18031571 pmcid: 2206039 doi: 10.1186/1471-2164-8-427
Macas, J. et al. Assembly of the 81.6 Mb centromere of pea chromosome 6 elucidates the structure and evolution of metapolycentric chromosomes. PLOS Genetics 19, e1010633 (2023).
pubmed: 36735726 pmcid: 10027222 doi: 10.1371/journal.pgen.1010633
Neumann, P. et al. Centromeres Off the Hook: Massive Changes in Centromere Size and Structure Following Duplication of CenH3 Gene in Fabeae Species. Molecular Biology and Evolution 32, 1862–1879 (2015).
pubmed: 25771197 pmcid: 4476163 doi: 10.1093/molbev/msv070
Neumann, P. et al. Epigenetic Histone Marks of Extended Meta-Polycentric Centromeres of Lathyrus and Pisum Chromosomes. Frontiers in Plant Science 7 (2016).
Macas, J. et al. Long read sequencing and centromere characterization of Fabeae species (2022).
Ávila Robledillo, L. et al. Extraordinary Sequence Diversity and Promiscuity of Centromeric Satellites in the Legume Tribe Fabeae. Molecular Biology and Evolution 37, 2341–2356 (2020).
pubmed: 32259249 pmcid: 7403623 doi: 10.1093/molbev/msaa090
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
pubmed: 24695404 pmcid: 4103590 doi: 10.1093/bioinformatics/btu170
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359 (2012).
pubmed: 22388286 pmcid: 3322381 doi: 10.1038/nmeth.1923
Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).
pubmed: 25697820 pmcid: 4765878 doi: 10.1093/bioinformatics/btv098
Stovner, E. B. & Sætrom, P. epic2 efficiently finds diffuse domains in ChIP-seq data. Bioinformatics 35, 4392–4393 (2019).
pubmed: 30923821 doi: 10.1093/bioinformatics/btz232
Novak, P. kavonrtep/TideCluster: 0.0.8. Zenodo https://doi.org/10.5281/zenodo.7885626 (2023).
Gao, Y., Liu, B., Wang, Y. & Xing, Y. TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain. Bioinformatics 35, i200–i207 (2019).
pubmed: 31510677 pmcid: 6612900 doi: 10.1093/bioinformatics/btz376
Novak, P. Domain based annotation of transposable elements - DANTE (2023).
Novak, P. kavonrtep/dante_ltr: 0.2.3.2. Zenodo https://doi.org/10.5281/zenodo.8183566 (2023).
Novák, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792–793 (2013).
pubmed: 23376349 doi: 10.1093/bioinformatics/btt054
Novák, P., Neumann, P. & Macas, J. Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2. Nat Protoc 15, 3745–3776 (2020).
pubmed: 33097925 doi: 10.1038/s41596-020-0400-y
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker (2013).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
pubmed: 20110278 pmcid: 2832824 doi: 10.1093/bioinformatics/btq033
Novak, P. Various bioinformatics utilities (2023).
Hoff, K. J., Lange, S., Lomsadze, A., Borodovsky, M. & Stanke, M. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32, 767–769 (2016).
pubmed: 26559507 doi: 10.1093/bioinformatics/btv661
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics and Bioinformatics 3, lqaa108 (2021).
pubmed: 33575650 pmcid: 7787252 doi: 10.1093/nargab/lqaa108
Santos, C., Polanco, C., Rubiales, D. & Vaz Patto, M. C. The MLO1 powdery mildew susceptibility gene in Lathyrus species: The power of high-density linkage maps in comparative mapping and synteny analysis. The Plant Genome 14, 1–15 (2021).
doi: 10.1002/tpg2.20090
Santos, C., Martins, D., Rubiales, D. & Vaz Patto, M. C. Partial Resistance Against Erysiphe pisi and E. trifolii Under Different Genetic Control in Lathyrus cicera: Outcomes from a Linkage Mapping Approach. Plant Disease 104, 2875–2884 (2020).
pubmed: 32954987 doi: 10.1094/PDIS-03-20-0513-RE
Kreplak, J. et al. A reference genome for pea provides insight into legume genome evolution. Nature Genetics 51, 1411–1422 (2019).
pubmed: 31477930 doi: 10.1038/s41588-019-0480-1
BioBam Bioinformatics. OmicsBox – Bioinformatics Made Easy (2019).
Bayer, M. et al. Comparative visualization of genetic and physical maps with Strudel. Bioinformatics 27, 1307–1308 (2011).
pubmed: 21372085 pmcid: 3077070 doi: 10.1093/bioinformatics/btr111
Kuznetsov, D. et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res 51, D445–D451 (2023).
pubmed: 36350662 doi: 10.1093/nar/gkac998
Vigouroux, M. et al. PRJEB70892 - Lathyrus sativus LS007 HiFi genome sequencing, PacBio raw data. European Nucleotide Archive https://www.ebi.ac.uk/ena/browser/view/PRJEB70892 , https://identifiers.org/ena.embl:ERP155791 (2024).
Vigouroux, M. et al. PRJEB70892 - Lathyrus sativus LS007 HiFi genome sequencing, scaffolded genome assembly. NCBI https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_963859935.3/ , https://identifiers.org/ncbi/insdc.gca:GCA_963859935.3 (2024).
Vigouroux, M. et al. Supporting files for research paper ‘A chromosome-scale reference genome of Lathyrus sativus’ https://doi.org/10.5281/zenodo.10671532 (2024).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
pubmed: 26059717 doi: 10.1093/bioinformatics/btv351
Blum, M. et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Research 49, D344–D354 (2021).
pubmed: 33156333 doi: 10.1093/nar/gkaa977
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol Biol Evol 38, 5825–5829 (2021).
pubmed: 34597405 pmcid: 8662613 doi: 10.1093/molbev/msab293
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res 47, D309–D314 (2019).
pubmed: 30418610 doi: 10.1093/nar/gky1085
Emmrich, P. M. F. et al. A draft genome of grass pea (Lathyrus sativus), a resilient diploid legume. 2020.04.24.058164 Preprint at https://doi.org/10.1101/2020.04.24.058164 (2020).

Auteurs

Marielle Vigouroux (M)

John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK.

Petr Novák (P)

Institute of Plant Molecular Biology, Biology Centre CAS, Branisovska 31, Ceske Budejovice, CZ, 37005, Czech Republic.

Ludmila Cristina Oliveira (LC)

Institute of Plant Molecular Biology, Biology Centre CAS, Branisovska 31, Ceske Budejovice, CZ, 37005, Czech Republic.

Carmen Santos (C)

Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, 2780-157, Portugal.

Jitender Cheema (J)

John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK.
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SD, Cambridge, United Kingdom.

Roland H M Wouters (RHM)

John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK.

Pirita Paajanen (P)

John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK.

Martin Vickers (M)

John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK.

Andrea Koblížková (A)

Institute of Plant Molecular Biology, Biology Centre CAS, Branisovska 31, Ceske Budejovice, CZ, 37005, Czech Republic.

Maria Carlota Vaz Patto (MC)

Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, 2780-157, Portugal.

Jiří Macas (J)

Institute of Plant Molecular Biology, Biology Centre CAS, Branisovska 31, Ceske Budejovice, CZ, 37005, Czech Republic.

Burkhard Steuernagel (B)

John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK.

Cathie Martin (C)

John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK.

Peter M F Emmrich (PMF)

John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK. p.emmrich@uea.ac.uk.
Norwich Institute for Sustainable Development, School of International Development, University of East Anglia, Norwich, NR4 7TJ, UK. p.emmrich@uea.ac.uk.

Articles similaires

Genome Size Genome, Plant Magnoliopsida Evolution, Molecular Arabidopsis
Genome, Plant Medicago sativa Crops, Agricultural Genomics Polyploidy

Fine mapping of a major QTL, qECQ8, for rice taste quality.

Shan Zhu, Guoping Tang, Zhou Yang et al.
1.00
Oryza Quantitative Trait Loci Taste Chromosome Mapping Phenotype
Arabidopsis Amorphophallus Plants, Genetically Modified Phylogeny Droughts

Classifications MeSH