Chromosome level genome assembly of the Etruscan shrew Suncus etruscus.


Journal

Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192

Informations de publication

Date de publication:
07 Feb 2024
Historique:
received: 28 07 2023
accepted: 26 01 2024
medline: 8 2 2024
pubmed: 8 2 2024
entrez: 7 2 2024
Statut: epublish

Résumé

Suncus etruscus is one of the world's smallest mammals, with an average body mass of about 2 grams. The Etruscan shrew's small body is accompanied by a very high energy demand and numerous metabolic adaptations. Here we report a chromosome-level genome assembly using PacBio long read sequencing, 10X Genomics linked short reads, optical mapping, and Hi-C linked reads. The assembly is partially phased, with the 2.472 Gbp primary pseudohaplotype and 1.515 Gbp alternate. We manually curated the primary assembly and identified 22 chromosomes, including X and Y sex chromosomes. The NCBI genome annotation pipeline identified 39,091 genes, 19,819 of them protein-coding. We also identified segmental duplications, inferred GO term annotations, and computed orthologs of human and mouse genes. This reference-quality genome will be an important resource for research on mammalian development, metabolism, and body size control.

Identifiants

pubmed: 38326333
doi: 10.1038/s41597-024-03011-x
pii: 10.1038/s41597-024-03011-x
pmc: PMC10850158
doi:

Types de publication

Dataset Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

176

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM133840
Pays : United States

Informations de copyright

© 2024. The Author(s).

Références

Anjum, F., Turni, H., Mulder, P. G. H., van der Burg, J. & Brecht, M. Tactile guidance of prey capture in Etruscan shrews. Proc. Natl. Acad. Sci. 103, 16544–16549 (2006).
pubmed: 17060642 pmcid: 1621049 doi: 10.1073/pnas.0605573103
Munz, M., Brecht, M. & Wolfe, J. Active Touch During Shrew Prey Capture. Front. Behav. Neurosci. 4, (2010).
Roth-Alpermann, C., Anjum, F., Naumann, R. & Brecht, M. Cortical Organization in the Etruscan Shrew (Suncus etruscus). J. Neurophysiol. 104, 2389–2406 (2010).
pubmed: 20668271 doi: 10.1152/jn.00762.2009
Brecht, M. & Anjum, F. Tactile experience shapes prey-capture behavior in Etruscan shrews. Front. Behav. Neurosci. 6, (2012).
Hutterer, R. Order Soricomorpha. in Mammal Species of the World: A Taxonomic and Geographic Reference (eds. Wilson, D. E. & Reeder, D. M.) 220 (JHU Press, 2005).
Broad Institute. Crocidura indochinensis genome assembly CroInd_v1_BIUU, GenBank. NCBI https://identifiers.org/ncbi/insdc.gca:GCA_004027635.1 (2019).
National Institutes of Health. Cryptotis parvus genome assembly Cryptotis parva assembly 1.0, GenBank. NCBI https://identifiers.org/ncbi/insdc.gca:GCA_021461705.1 (2022).
Chung, D. J. et al. Metabolic design in a mammalian model of extreme metabolism, the North American least shrew (Cryptotis parva). J. Physiol. 600, 547–567 (2022).
pubmed: 34837710 doi: 10.1113/JP282153
Broad Institute. Sorex araneus genome assembly SorAra2.0, GenBank. NCBI https://identifiers.org/ncbi/insdc.gca:GCA_000181275.2 (2012).
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011).
pubmed: 21993624 pmcid: 3207357 doi: 10.1038/nature10530
Cossette, M.-L. et al. Epigenetics and island-mainland divergence in an insectivorous small mammal. Mol. Ecol. 32, 152–166 (2023).
pubmed: 36226847 doi: 10.1111/mec.16735
Trent University. Sorex fumeus genome assembly SorCin_1.0, GenBank. NCBI https://identifiers.org/ncbi/insdc.gca:GCA_026122425.1 (2022).
IRIDIAN GENOMES. Sorex palustris genome assembly ASM2856567v1, GenBank. NCBI https://identifiers.org/ncbi/insdc.gca:GCA_028565675.1 (2023).
Sun, S. & Brecht, M. Relative enlargement of the medial preoptic nucleus in the Etruscan shrew, the smallest torpid mammal. Sci. Rep. 12, 18602 (2022).
pubmed: 36329087 pmcid: 9633763 doi: 10.1038/s41598-022-22320-y
Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021).
pubmed: 33911273 pmcid: 8081667 doi: 10.1038/s41586-021-03451-0
Meylan, A. Note sur les chromosomes de la musaraigne etrusque Suncus etruscus (Savi) (Mammalia-Insectivora). Bull. Société Vaudoise Sci. Nat. 70, 85–89 (1968).
Aswathanarayana, N. V., Krishnarao, S. & Satya-prakash, K. L. Karyology of the pigmy shrew, Suncus etruscus perrotteti (Savi) (Soricidae: Insectivora). Curr. Sci. 56, 911–913 (1987).
Aswathanarayana, N. V. Karyotype Evolution in the Shrews, Crocidura and Suncus (Soricidae, Insectivora). Cytologia (Tokyo) 68, 83–87 (2003).
doi: 10.1508/cytologia.68.83
Hawkins, T., Chitale, M., Luban, S. & Kihara, D. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 74, 566–582 (2009).
pubmed: 18655063 doi: 10.1002/prot.22172
Jain, A. & Kihara, D. Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences. Bioinformatics 35, 753–759 (2019).
pubmed: 30165572 doi: 10.1093/bioinformatics/bty704
Chitale, M., Hawkins, T., Park, C. & Kihara, D. ESG: extended similarity group method for automated protein function prediction. Bioinformatics 25, 1739–1745 (2009).
pubmed: 19435743 pmcid: 2705228 doi: 10.1093/bioinformatics/btp309
Kirilenko, B. M. et al. Integrating gene annotation with orthology inference at scale. Science 380, eabn3107 (2023).
pubmed: 37104600 pmcid: 10193443 doi: 10.1126/science.abn3107
Bukhman, Y. V. et al. A high-quality blue whale genome, segmental duplications, and historical demography. https://doi.org/10.21203/rs.3.rs-1910240/v1 (2022).
Toh, H. et al. A haplotype-resolved genome assembly of the Nile rat facilitates exploration of the genetic basis of diabetes. BMC Biol. 20, 245 (2022).
pubmed: 36344967 pmcid: 9641963 doi: 10.1186/s12915-022-01427-8
Geyer, B. et al. Establishing and Maintaining an Etruscan Shrew Colony. J. Am. Assoc. Lab. Anim. Sci. 61, 52–60 (2022).
pubmed: 34772472 pmcid: 8786385 doi: 10.30802/AALAS-JAALAS-21-000068
Naumann, R. K., Anjum, F., Roth-Alpermann, C. & Brecht, M. Cytoarchitecture, areas, and neuron numbers of the Etruscan Shrew. cortex. J. Comp. Neurol. 520, 2512–2530 (2012).
pubmed: 22252518 doi: 10.1002/cne.23053
Secomandi, S. et al. A chromosome-level reference genome and pangenome for barn swallow population genomics. Cell Rep. 42, 111992 (2023).
pubmed: 36662619 pmcid: 10044405 doi: 10.1016/j.celrep.2023.111992
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
pubmed: 32188846 pmcid: 7080791 doi: 10.1038/s41467-020-14998-3
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
pubmed: 32928274 pmcid: 7488777 doi: 10.1186/s13059-020-02134-9
Klammer, A. A. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563 (2013).
pubmed: 23644548 doi: 10.1038/nmeth.2474
Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050 (2016).
pubmed: 27749838 pmcid: 5503144 doi: 10.1038/nmeth.4035
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics https://doi.org/10.1093/bioinformatics/btaa025 (2020).
doi: 10.1093/bioinformatics/btaa025 pubmed: 33297937 pmcid: 7724830
Mapleson, D., Garcia Accinelli, G., Kettleborough, G., Wright, J. & Clavijo, B. J. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinforma. Oxf. Engl. 33, 574–576 (2017).
doi: 10.1093/bioinformatics/btw663
Formenti, G. et al. SMRT long reads and Direct Label and Stain optical maps allow the generation of a high-quality genome assembly for the European barn swallow (Hirundo rustica rustica). GigaScience 8, giy142 (2019).
pubmed: 30496513 doi: 10.1093/gigascience/giy142
Ghurye, J. et al. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLOS Comput. Biol. 15, e1007273 (2019).
pubmed: 31433799 pmcid: 6719893 doi: 10.1371/journal.pcbi.1007273
Formenti, G. et al. Complete vertebrate mitogenomes reveal widespread repeats and gene duplications. Genome Biol. 22, 120 (2021).
pubmed: 33910595 pmcid: 8082918 doi: 10.1186/s13059-021-02336-9
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at https://doi.org/10.48550/arXiv.1207.3907 (2012).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
pubmed: 21903627 pmcid: 3198575 doi: 10.1093/bioinformatics/btr509
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021).
pubmed: 33590861 pmcid: 7931819 doi: 10.1093/gigascience/giab008
Bernt, M. et al. MITOS: Improved de novo metazoan mitochondrial genome annotation. Mol. Phylogenet. Evol. 69, 313–319 (2013).
pubmed: 22982435 doi: 10.1016/j.ympev.2012.08.023
Howe, K. et al. Significantly improving the quality of genome assemblies through curation. GigaScience 10, giaa153 (2021).
pubmed: 33420778 pmcid: 7794651 doi: 10.1093/gigascience/giaa153
Chow, W. et al. gEVAL — a web-based browser for evaluating genome assemblies. Bioinformatics 32, 2508–2510 (2016).
pubmed: 27153597 pmcid: 4978925 doi: 10.1093/bioinformatics/btw159
Kerpedjiev, P. et al. HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol. 19, 125 (2018).
pubmed: 30143029 pmcid: 6109259 doi: 10.1186/s13059-018-1486-1
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
pubmed: 9254694 pmcid: 146917 doi: 10.1093/nar/25.17.3389
Kent, W. J., Baertsch, R., Hinrichs, A., Miller, W. & Haussler, D. Evolution’s cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc. Natl. Acad. Sci. 100, 11484–11489 (2003).
pubmed: 14500911 pmcid: 208784 doi: 10.1073/pnas.1932072100
Osipova, E., Hecker, N. & Hiller, M. RepeatFiller newly identifies megabases of aligning repetitive sequences and improves annotations of conserved non-exonic elements. GigaScience 8, giz132 (2019).
pubmed: 31742600 pmcid: 6862929 doi: 10.1093/gigascience/giz132
Suarez, H. G., Langer, B. E., Ladde, P. & Hiller, M. chainCleaner improves genome alignment specificity and sensitivity. Bioinformatics 33, 1596–1603 (2017).
pubmed: 28108446 doi: 10.1093/bioinformatics/btx024
Blumer, M. et al. Gene losses in the common vampire bat illuminate molecular adaptations to blood feeding. Sci. Adv. 8, eabm6494 (2022).
pubmed: 35333583 pmcid: 8956264 doi: 10.1126/sciadv.abm6494
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
pubmed: 29750242 pmcid: 6137996 doi: 10.1093/bioinformatics/bty191
Li, H. New strategies to improve minimap2 alignment accuracy. Bioinformatics 37, 4572–4574 (2021).
pubmed: 34623391 pmcid: 8652018 doi: 10.1093/bioinformatics/btab705
Needleman, S. B. & Wunsch, C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970).
pubmed: 5420325 doi: 10.1016/0022-2836(70)90057-4
Šošić, M. & Šikić, M. Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance. Bioinformatics 33, 1394–1395 (2017).
pubmed: 28453688 pmcid: 5408825 doi: 10.1093/bioinformatics/btw753
Kohany, O., Gentles, A. J., Hankus, L. & Jurka, J. Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics 7, 474 (2006).
pubmed: 17064419 pmcid: 1634758 doi: 10.1186/1471-2105-7-474
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRP456787 (2023).
Vertebrate Genomes Project. Suncus etruscus genome assembly mSunEtr1.pri.cur. Genbank. https://identifiers.org/ncbi/insdc.gca:GCA_024139225 (2022).
Vertebrate Genomes Project & NCBI. mSunEtr1.alt.cur - Genome - Assembly - NCBI, GCA_024140225.1. NCBI Assembly Database https://identifiers.org/ncbi/insdc.gca:GCA_024140225.1 (2022).
Suncus etruscus isolate mSunEtr1 mitochondrion, complete sequence, whole genome shotgun sequence. GenBank. https://identifiers.org/ncbi/insdc:CM044019 (2022).
Hiller, M. et al. TOGA, Etruscan shrew genome paper supplementary materials. OSF, https://doi.org/10.17605/OSF.IO/X4EWT (2024).
Giri, S. J. et al. GO Term Predictions, Etruscan shrew genome paper supplementary materials. OSF https://doi.org/10.17605/OSF.IO/VS7Y8 (2022).
Rabbani, K. et al. Segmental duplications, Etruscan shrew genome paper supplementary materials. OSF https://doi.org/10.17605/OSF.IO/QZSJ6 (2022).
Formenti, G. et al. Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs. Bioinformatics 38, 4214–4216 (2022).
pubmed: 35799367 pmcid: 9438950 doi: 10.1093/bioinformatics/btac460
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
pubmed: 26059717 doi: 10.1093/bioinformatics/btv351
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
pubmed: 34320186 pmcid: 8476166 doi: 10.1093/molbev/msab199
Bukhman, Y. V. et al. taxon_assembly_stats.R, Eulipotyphla genomes quality stats. OSF https://doi.org/10.17605/OSF.IO/3PK9G (2023).
Max Planck Institute for Molecular Genetics. Talpa occidentalis genome assembly MPIMG_talOcc4v2, GenBank. NCBI https://identifiers.org/ncbi/insdc.gca:GCA_014898055.2 (2020).
Bukhman, Y. V. et al. NCBI_qc_stats.csv, Eulipotyphla genomes quality stats. OSF https://doi.org/10.17605/OSF.IO/3PK9G (2023).

Auteurs

Yury V Bukhman (YV)

Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA. ybukhman@morgridge.org.

Susanne Meyer (S)

Neuroscience Research Institute, University of California - Santa Barbara, 494 UCEN Rd, Isla Vista, CA, 93117, USA.

Li-Fang Chu (LF)

Department of Comparative Biology and Experimental Medicine, University of Calgary, 2500 University Drive NW, Calgary, Alberta, T2N 1N4, Canada.

Linelle Abueg (L)

Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA.

Jessica Antosiewicz-Bourget (J)

Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA.

Jennifer Balacco (J)

Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA.

Michael Brecht (M)

BCCN/Humboldt University Berlin, Philippstr, 13 House 6, 10115, Berlin, Germany.

Erica Dinatale (E)

Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, 72076, Tübingen, Germany.

Olivier Fedrigo (O)

Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA.

Giulio Formenti (G)

Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, 1230 York Avenue, New York, NY, 10065, USA.

Arkarachai Fungtammasan (A)

DNAnexus Inc., 1975 W El Camino Real, Mountain View, CA, 94040, USA.

Swagarika Jaharlal Giri (SJ)

Department of Computer Science, Purdue University, 249 S. Martin Jischke Dr, West Lafayette, IN, 47907, USA.

Michael Hiller (M)

LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325, Frankfurt, Germany.
Senckenberg Research Institute, Senckenberganlage 25, 60325, Frankfurt, Germany.
Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438, Frankfurt, Germany.

Kerstin Howe (K)

Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK.

Daisuke Kihara (D)

Department of Computer Science, Purdue University, 249 S. Martin Jischke Dr, West Lafayette, IN, 47907, USA.
Department of Biological Sciences, Purdue University, 249 S. Martin Jischke Dr., West Lafayette, IN, 47907, USA.

Daniel Mamott (D)

Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA.

Jacquelyn Mountcastle (J)

Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA.

Sarah Pelan (S)

Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK.

Keon Rabbani (K)

Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way RRI 408, Los Angeles, CA, 90089, USA.

Ying Sims (Y)

Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK.

Alan Tracey (A)

Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK.

Jonathan M D Wood (JMD)

Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK.

Erich D Jarvis (ED)

Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA.
Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, 1230 York Avenue, New York, NY, 10065, USA.

James A Thomson (JA)

Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA.
Department of Molecular, Cellular and Developmental Biology, University of California Santa Barbara, Santa Barbara, CA, 93106, USA.
Department of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI, 53726, USA.

Mark J P Chaisson (MJP)

Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way RRI 408, Los Angeles, CA, 90089, USA.

Ron Stewart (R)

Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA.

Articles similaires

Robotic Surgical Procedures Animals Humans Telemedicine Models, Animal

Odour generalisation and detection dog training.

Lyn Caldicott, Thomas W Pike, Helen E Zulch et al.
1.00
Animals Odorants Dogs Generalization, Psychological Smell
Animals TOR Serine-Threonine Kinases Colorectal Neoplasms Colitis Mice
Animals Tail Swine Behavior, Animal Animal Husbandry

Classifications MeSH