Construction of a new chromosome-scale, long-read reference genome assembly for the Syrian hamster, Mesocricetus auratus.
COVID-19
Mesocricetus auratus
Syrian hamster
disease model
genome
Journal
GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872
Informations de publication
Date de publication:
28 05 2022
28 05 2022
Historique:
received:
05
07
2021
revised:
03
11
2021
accepted:
29
03
2022
entrez:
31
5
2022
pubmed:
1
6
2022
medline:
3
6
2022
Statut:
ppublish
Résumé
The Syrian hamster (Mesocricetus auratus) has been suggested as a useful mammalian model for a variety of diseases and infections, including infection with respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was generated in 2013 using whole-genome shotgun sequencing with short-read sequence data. Current more advanced sequencing technologies and assembly methods now permit the generation of near-complete genome assemblies with higher quality and greater continuity. Here, we report an improved assembly of the M. auratus genome (BCM_Maur_2.0) using Oxford Nanopore Technologies long-read sequencing to produce a chromosome-scale assembly. The total length of the new assembly is 2.46 Gb, similar to the 2.50-Gb length of a previous assembly of this genome, MesAur1.0. BCM_Maur_2.0 exhibits significantly improved continuity, with a scaffold N50 that is 6.7 times greater than MesAur1.0. Furthermore, 21,616 protein-coding genes and 10,459 noncoding genes are annotated in BCM_Maur_2.0 compared to 20,495 protein-coding genes and 4,168 noncoding genes in MesAur1.0. This new assembly also improves the unresolved regions as measured by nucleotide ambiguities, where ∼17.11% of bases in MesAur1.0 were unresolved compared to BCM_Maur_2.0, in which the number of unresolved bases is reduced to 3.00%. Access to a more complete reference genome with improved accuracy and continuity will facilitate more detailed, comprehensive, and meaningful research results for a wide variety of future studies using Syrian hamsters as models.
Sections du résumé
BACKGROUND
The Syrian hamster (Mesocricetus auratus) has been suggested as a useful mammalian model for a variety of diseases and infections, including infection with respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was generated in 2013 using whole-genome shotgun sequencing with short-read sequence data. Current more advanced sequencing technologies and assembly methods now permit the generation of near-complete genome assemblies with higher quality and greater continuity.
FINDINGS
Here, we report an improved assembly of the M. auratus genome (BCM_Maur_2.0) using Oxford Nanopore Technologies long-read sequencing to produce a chromosome-scale assembly. The total length of the new assembly is 2.46 Gb, similar to the 2.50-Gb length of a previous assembly of this genome, MesAur1.0. BCM_Maur_2.0 exhibits significantly improved continuity, with a scaffold N50 that is 6.7 times greater than MesAur1.0. Furthermore, 21,616 protein-coding genes and 10,459 noncoding genes are annotated in BCM_Maur_2.0 compared to 20,495 protein-coding genes and 4,168 noncoding genes in MesAur1.0. This new assembly also improves the unresolved regions as measured by nucleotide ambiguities, where ∼17.11% of bases in MesAur1.0 were unresolved compared to BCM_Maur_2.0, in which the number of unresolved bases is reduced to 3.00%.
CONCLUSIONS
Access to a more complete reference genome with improved accuracy and continuity will facilitate more detailed, comprehensive, and meaningful research results for a wide variety of future studies using Syrian hamsters as models.
Identifiants
pubmed: 35640223
pii: 6594469
doi: 10.1093/gigascience/giac039
pmc: PMC9155146
pii:
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : NIH HHS
ID : P51 OD011106
Pays : United States
Organisme : NIGMS NIH HHS
ID : T32 GM135119
Pays : United States
Organisme : NIAID NIH HHS
ID : HHSN272201600007C
Pays : United States
Informations de copyright
© The Author(s) 2022. Published by Oxford University Press GigaScience.
Références
Nat Commun. 2021 Sep 22;12(1):5469
pubmed: 34552091
Nature. 2020 Oct;586(7830):509-515
pubmed: 32967005
Genome Biol. 2017 May 18;18(1):93
pubmed: 28521789
Surgery. 2015 May;157(5):888-98
pubmed: 25731784
Genome Biol. 2004;5(2):R12
pubmed: 14759262
PLoS One. 2012;7(12):e52210
pubmed: 23284938
Sci Rep. 2020 Sep 28;10(1):15917
pubmed: 32985513
Viruses. 2021 Dec 14;13(12):
pubmed: 34960775
Science. 2020 May 29;368(6494):1012-1015
pubmed: 32303590
Science. 2020 May 29;368(6494):1016-1020
pubmed: 32269068
Elife. 2022 Jan 11;11:
pubmed: 35014610
PLoS One. 2014 Nov 19;9(11):e112963
pubmed: 25409509
Bioinformatics. 2011 Mar 15;27(6):764-70
pubmed: 21217122
Expert Opin Drug Discov. 2018 Dec;13(12):1131-1139
pubmed: 30362841
Parasite Immunol. 2020 Oct;42(10):e12768
pubmed: 32594532
Clin Infect Dis. 2020 Dec 3;71(9):2428-2446
pubmed: 32215622
J Infect Dis. 2015 Oct 1;212 Suppl 2:S271-6
pubmed: 25948862
Cell Res. 2014 Mar;24(3):380-2
pubmed: 24394888
Nat Biotechnol. 2019 May;37(5):540-546
pubmed: 30936562
Immunity. 2021 Mar 9;54(3):557-570.e5
pubmed: 33577760
Science. 2020 Aug 21;369(6506):956-963
pubmed: 32540903
Proc Natl Acad Sci U S A. 2020 Jul 14;117(28):16587-16595
pubmed: 32571934
Methods Mol Biol. 2019;1962:227-245
pubmed: 31020564
J Natl Cancer Inst. 1963 Sep;31:639-50
pubmed: 14059008
Bioinformatics. 2014 May 1;30(9):1228-35
pubmed: 24443382
Gigascience. 2022 May 28;11:
pubmed: 35640223
Nat Commun. 2020 Nov 17;11(1):5838
pubmed: 33203860
Bioinformatics. 2013 Apr 15;29(8):1072-5
pubmed: 23422339
Sci Rep. 2017 Jan 10;7:40472
pubmed: 28071753
Viruses. 2021 Sep 05;13(9):
pubmed: 34578354
Cell. 2020 Apr 16;181(2):271-280.e8
pubmed: 32142651
Comp Biochem Physiol A Mol Integr Physiol. 2022 Jan;263:111083
pubmed: 34571152