Expert curation of the human and mouse olfactory receptor gene repertoires identifies conserved coding regions split across two exons.
Annotation
Curation
Human
Mouse
Olfactory receptor gene
Journal
BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258
Informations de publication
Date de publication:
03 Mar 2020
03 Mar 2020
Historique:
received:
31
10
2019
accepted:
17
02
2020
entrez:
5
3
2020
pubmed:
5
3
2020
medline:
12
11
2020
Statut:
epublish
Résumé
Olfactory receptor (OR) genes are the largest multi-gene family in the mammalian genome, with 874 in human and 1483 loci in mouse (including pseudogenes). The expansion of the OR gene repertoire has occurred through numerous duplication events followed by diversification, resulting in a large number of highly similar paralogous genes. These characteristics have made the annotation of the complete OR gene repertoire a complex task. Most OR genes have been predicted in silico and are typically annotated as intronless coding sequences. Here we have developed an expert curation pipeline to analyse and annotate every OR gene in the human and mouse reference genomes. By combining evidence from structural features, evolutionary conservation and experimental data, we have unified the annotation of these gene families, and have systematically determined the protein-coding potential of each locus. We have defined the non-coding regions of many OR genes, enabling us to generate full-length transcript models. We found that 13 human and 41 mouse OR loci have coding sequences that are split across two exons. These split OR genes are conserved across mammals, and are expressed at the same level as protein-coding OR genes with an intronless coding region. Our findings challenge the long-standing and widespread notion that the coding region of a vertebrate OR gene is contained within a single exon. This work provides the most comprehensive curation effort of the human and mouse OR gene repertoires to date. The complete annotation has been integrated into the GENCODE reference gene set, for immediate availability to the research community.
Sections du résumé
BACKGROUND
BACKGROUND
Olfactory receptor (OR) genes are the largest multi-gene family in the mammalian genome, with 874 in human and 1483 loci in mouse (including pseudogenes). The expansion of the OR gene repertoire has occurred through numerous duplication events followed by diversification, resulting in a large number of highly similar paralogous genes. These characteristics have made the annotation of the complete OR gene repertoire a complex task. Most OR genes have been predicted in silico and are typically annotated as intronless coding sequences.
RESULTS
RESULTS
Here we have developed an expert curation pipeline to analyse and annotate every OR gene in the human and mouse reference genomes. By combining evidence from structural features, evolutionary conservation and experimental data, we have unified the annotation of these gene families, and have systematically determined the protein-coding potential of each locus. We have defined the non-coding regions of many OR genes, enabling us to generate full-length transcript models. We found that 13 human and 41 mouse OR loci have coding sequences that are split across two exons. These split OR genes are conserved across mammals, and are expressed at the same level as protein-coding OR genes with an intronless coding region. Our findings challenge the long-standing and widespread notion that the coding region of a vertebrate OR gene is contained within a single exon.
CONCLUSIONS
CONCLUSIONS
This work provides the most comprehensive curation effort of the human and mouse OR gene repertoires to date. The complete annotation has been integrated into the GENCODE reference gene set, for immediate availability to the research community.
Identifiants
pubmed: 32126975
doi: 10.1186/s12864-020-6583-3
pii: 10.1186/s12864-020-6583-3
pmc: PMC7055050
doi:
Substances chimiques
Receptors, Odorant
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
196Subventions
Organisme : NHGRI NIH HHS
ID : U41 HG007234
Pays : United States
Organisme : NHGRI NIH HHS
ID : 2U41HG007234
Pays : United States
Références
Proc Natl Acad Sci U S A. 2004 Jan 27;101(4):1069-74
pubmed: 14732684
Malar J. 2015 Dec 21;14:512
pubmed: 26692187
Nucleic Acids Res. 2019 Jul 2;47(W1):W256-W259
pubmed: 30931475
Cell. 2004 Jun 11;117(6):801-15
pubmed: 15186780
Genome Biol. 2013 Apr 25;14(4):R36
pubmed: 23618408
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45
pubmed: 26553804
Bioinformatics. 2015 Jan 15;31(2):166-9
pubmed: 25260700
Nucleic Acids Res. 2016 Jan 4;44(D1):D710-6
pubmed: 26687719
BMC Res Notes. 2016 Jan 22;9:39
pubmed: 26801397
Nucleic Acids Res. 2017 Jan 4;45(D1):D626-D634
pubmed: 27899642
Nucleic Acids Res. 2003 Jan 1;31(1):142-6
pubmed: 12519968
Nucleic Acids Res. 2019 Jan 8;47(D1):D745-D751
pubmed: 30407521
Genome Res. 2014 Sep;24(9):1485-96
pubmed: 25053675
Mol Syst Biol. 2011 Oct 11;7:539
pubmed: 21988835
Nat Biotechnol. 2010 May;28(5):511-5
pubmed: 20436464
Nucleic Acids Res. 2001 Jan 1;29(1):308-11
pubmed: 11125122
Sci Rep. 2015 Dec 16;5:18178
pubmed: 26670777
Genome Biol. 2014;15(12):550
pubmed: 25516281
Bioinformatics. 2001 Sep;17(9):849-50
pubmed: 11590105
BMC Genomics. 2010 Oct 05;11:538
pubmed: 20923551
Cell. 1991 Apr 5;65(1):175-87
pubmed: 1840504
Mol Biol Evol. 1997 Jul;14(7):685-95
pubmed: 9254330
Gene. 1995 Dec 29;167(1-2):GC1-10
pubmed: 8566757
Genome Res. 2001 May;11(5):685-702
pubmed: 11337468
Nat Genet. 2003 Jun;34(2):143-4
pubmed: 12730696
Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W465-9
pubmed: 18424797
Trends Biochem Sci. 2019 Sep;44(9):782-794
pubmed: 31003826
Database (Oxford). 2012 Mar 20;2012:bas009
pubmed: 22434843
Mol Cell Neurosci. 2013 Nov;57:120-9
pubmed: 23962816
J Neurosci. 2001 Dec 15;21(24):9713-23
pubmed: 11739580
Sci Adv. 2019 Jul 31;5(7):eaax0396
pubmed: 31392275
Elife. 2017 Apr 25;6:
pubmed: 28438259
Nat Neurosci. 2014 Jan;17(1):114-20
pubmed: 24316890
Mamm Genome. 2015 Oct;26(9-10):403-12
pubmed: 26123534
Nucleic Acids Res. 2017 Jul 3;45(W1):W550-W553
pubmed: 28431173
F1000Res. 2016 Aug 31;5:2122
pubmed: 27909575
Genome Res. 2004 May;14(5):963-70
pubmed: 15123593
Genome Res. 2012 Sep;22(9):1760-74
pubmed: 22955987
Nat Genet. 2017 Dec;49(12):1731-1740
pubmed: 29106417
BMC Genomics. 2016 Aug 11;17(1):619
pubmed: 27515280
Bioinformatics. 2013 Jan 1;29(1):15-21
pubmed: 23104886
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761
pubmed: 29155950
Cell. 1999 Mar 5;96(5):713-23
pubmed: 10089886
Genome Res. 2002 Apr;12(4):656-64
pubmed: 11932250
Nat Genet. 2018 Nov;50(11):1574-1583
pubmed: 30275530
J Mol Biol. 2001 Jan 19;305(3):567-80
pubmed: 11152613
Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773
pubmed: 30357393
BMC Genomics. 2012 Aug 21;13:414
pubmed: 22908908
Nat Biotechnol. 2011 Jan;29(1):24-6
pubmed: 21221095
Hum Mol Genet. 2002 Mar 1;11(5):535-46
pubmed: 11875048
Nat Neurosci. 2002 Feb;5(2):124-33
pubmed: 11802173
Methods Mol Biol. 2013;1003:23-38
pubmed: 23585031
Nucleic Acids Res. 2018 Jan 4;46(D1):D836-D842
pubmed: 29092072
PLoS Genet. 2014 Sep 04;10(9):e1004593
pubmed: 25187969
Nucleic Acids Res. 2015 Oct 30;43(19):9314-26
pubmed: 25908788
Curr Mol Med. 2016;16(6):526-32
pubmed: 27280498
Brief Bioinform. 2013 Mar;14(2):178-92
pubmed: 22517427
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7
pubmed: 15034147
Genet Mol Res. 2004 Dec 30;3(4):545-53
pubmed: 15688320
BMC Genomics. 2019 Jul 12;20(1):577
pubmed: 31299892