Expert curation of the human and mouse olfactory receptor gene repertoires identifies conserved coding regions split across two exons.


Journal

BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258

Informations de publication

Date de publication:
03 Mar 2020
Historique:
received: 31 10 2019
accepted: 17 02 2020
entrez: 5 3 2020
pubmed: 5 3 2020
medline: 12 11 2020
Statut: epublish

Résumé

Olfactory receptor (OR) genes are the largest multi-gene family in the mammalian genome, with 874 in human and 1483 loci in mouse (including pseudogenes). The expansion of the OR gene repertoire has occurred through numerous duplication events followed by diversification, resulting in a large number of highly similar paralogous genes. These characteristics have made the annotation of the complete OR gene repertoire a complex task. Most OR genes have been predicted in silico and are typically annotated as intronless coding sequences. Here we have developed an expert curation pipeline to analyse and annotate every OR gene in the human and mouse reference genomes. By combining evidence from structural features, evolutionary conservation and experimental data, we have unified the annotation of these gene families, and have systematically determined the protein-coding potential of each locus. We have defined the non-coding regions of many OR genes, enabling us to generate full-length transcript models. We found that 13 human and 41 mouse OR loci have coding sequences that are split across two exons. These split OR genes are conserved across mammals, and are expressed at the same level as protein-coding OR genes with an intronless coding region. Our findings challenge the long-standing and widespread notion that the coding region of a vertebrate OR gene is contained within a single exon. This work provides the most comprehensive curation effort of the human and mouse OR gene repertoires to date. The complete annotation has been integrated into the GENCODE reference gene set, for immediate availability to the research community.

Sections du résumé

BACKGROUND BACKGROUND
Olfactory receptor (OR) genes are the largest multi-gene family in the mammalian genome, with 874 in human and 1483 loci in mouse (including pseudogenes). The expansion of the OR gene repertoire has occurred through numerous duplication events followed by diversification, resulting in a large number of highly similar paralogous genes. These characteristics have made the annotation of the complete OR gene repertoire a complex task. Most OR genes have been predicted in silico and are typically annotated as intronless coding sequences.
RESULTS RESULTS
Here we have developed an expert curation pipeline to analyse and annotate every OR gene in the human and mouse reference genomes. By combining evidence from structural features, evolutionary conservation and experimental data, we have unified the annotation of these gene families, and have systematically determined the protein-coding potential of each locus. We have defined the non-coding regions of many OR genes, enabling us to generate full-length transcript models. We found that 13 human and 41 mouse OR loci have coding sequences that are split across two exons. These split OR genes are conserved across mammals, and are expressed at the same level as protein-coding OR genes with an intronless coding region. Our findings challenge the long-standing and widespread notion that the coding region of a vertebrate OR gene is contained within a single exon.
CONCLUSIONS CONCLUSIONS
This work provides the most comprehensive curation effort of the human and mouse OR gene repertoires to date. The complete annotation has been integrated into the GENCODE reference gene set, for immediate availability to the research community.

Identifiants

pubmed: 32126975
doi: 10.1186/s12864-020-6583-3
pii: 10.1186/s12864-020-6583-3
pmc: PMC7055050
doi:

Substances chimiques

Receptors, Odorant 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

196

Subventions

Organisme : NHGRI NIH HHS
ID : U41 HG007234
Pays : United States
Organisme : NHGRI NIH HHS
ID : 2U41HG007234
Pays : United States

Références

Proc Natl Acad Sci U S A. 2004 Jan 27;101(4):1069-74
pubmed: 14732684
Malar J. 2015 Dec 21;14:512
pubmed: 26692187
Nucleic Acids Res. 2019 Jul 2;47(W1):W256-W259
pubmed: 30931475
Cell. 2004 Jun 11;117(6):801-15
pubmed: 15186780
Genome Biol. 2013 Apr 25;14(4):R36
pubmed: 23618408
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45
pubmed: 26553804
Bioinformatics. 2015 Jan 15;31(2):166-9
pubmed: 25260700
Nucleic Acids Res. 2016 Jan 4;44(D1):D710-6
pubmed: 26687719
BMC Res Notes. 2016 Jan 22;9:39
pubmed: 26801397
Nucleic Acids Res. 2017 Jan 4;45(D1):D626-D634
pubmed: 27899642
Nucleic Acids Res. 2003 Jan 1;31(1):142-6
pubmed: 12519968
Nucleic Acids Res. 2019 Jan 8;47(D1):D745-D751
pubmed: 30407521
Genome Res. 2014 Sep;24(9):1485-96
pubmed: 25053675
Mol Syst Biol. 2011 Oct 11;7:539
pubmed: 21988835
Nat Biotechnol. 2010 May;28(5):511-5
pubmed: 20436464
Nucleic Acids Res. 2001 Jan 1;29(1):308-11
pubmed: 11125122
Sci Rep. 2015 Dec 16;5:18178
pubmed: 26670777
Genome Biol. 2014;15(12):550
pubmed: 25516281
Bioinformatics. 2001 Sep;17(9):849-50
pubmed: 11590105
BMC Genomics. 2010 Oct 05;11:538
pubmed: 20923551
Cell. 1991 Apr 5;65(1):175-87
pubmed: 1840504
Mol Biol Evol. 1997 Jul;14(7):685-95
pubmed: 9254330
Gene. 1995 Dec 29;167(1-2):GC1-10
pubmed: 8566757
Genome Res. 2001 May;11(5):685-702
pubmed: 11337468
Nat Genet. 2003 Jun;34(2):143-4
pubmed: 12730696
Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W465-9
pubmed: 18424797
Trends Biochem Sci. 2019 Sep;44(9):782-794
pubmed: 31003826
Database (Oxford). 2012 Mar 20;2012:bas009
pubmed: 22434843
Mol Cell Neurosci. 2013 Nov;57:120-9
pubmed: 23962816
J Neurosci. 2001 Dec 15;21(24):9713-23
pubmed: 11739580
Sci Adv. 2019 Jul 31;5(7):eaax0396
pubmed: 31392275
Elife. 2017 Apr 25;6:
pubmed: 28438259
Nat Neurosci. 2014 Jan;17(1):114-20
pubmed: 24316890
Mamm Genome. 2015 Oct;26(9-10):403-12
pubmed: 26123534
Nucleic Acids Res. 2017 Jul 3;45(W1):W550-W553
pubmed: 28431173
F1000Res. 2016 Aug 31;5:2122
pubmed: 27909575
Genome Res. 2004 May;14(5):963-70
pubmed: 15123593
Genome Res. 2012 Sep;22(9):1760-74
pubmed: 22955987
Nat Genet. 2017 Dec;49(12):1731-1740
pubmed: 29106417
BMC Genomics. 2016 Aug 11;17(1):619
pubmed: 27515280
Bioinformatics. 2013 Jan 1;29(1):15-21
pubmed: 23104886
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761
pubmed: 29155950
Cell. 1999 Mar 5;96(5):713-23
pubmed: 10089886
Genome Res. 2002 Apr;12(4):656-64
pubmed: 11932250
Nat Genet. 2018 Nov;50(11):1574-1583
pubmed: 30275530
J Mol Biol. 2001 Jan 19;305(3):567-80
pubmed: 11152613
Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773
pubmed: 30357393
BMC Genomics. 2012 Aug 21;13:414
pubmed: 22908908
Nat Biotechnol. 2011 Jan;29(1):24-6
pubmed: 21221095
Hum Mol Genet. 2002 Mar 1;11(5):535-46
pubmed: 11875048
Nat Neurosci. 2002 Feb;5(2):124-33
pubmed: 11802173
Methods Mol Biol. 2013;1003:23-38
pubmed: 23585031
Nucleic Acids Res. 2018 Jan 4;46(D1):D836-D842
pubmed: 29092072
PLoS Genet. 2014 Sep 04;10(9):e1004593
pubmed: 25187969
Nucleic Acids Res. 2015 Oct 30;43(19):9314-26
pubmed: 25908788
Curr Mol Med. 2016;16(6):526-32
pubmed: 27280498
Brief Bioinform. 2013 Mar;14(2):178-92
pubmed: 22517427
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7
pubmed: 15034147
Genet Mol Res. 2004 Dec 30;3(4):545-53
pubmed: 15688320
BMC Genomics. 2019 Jul 12;20(1):577
pubmed: 31299892

Auteurs

If H A Barnes (IHA)

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. if@ebi.ac.uk.

Ximena Ibarra-Soria (X)

Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK. ximena.ibarra@cruk.cam.ac.uk.
Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. ximena.ibarra@cruk.cam.ac.uk.

Stephen Fitzgerald (S)

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

Jose M Gonzalez (JM)

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Claire Davidson (C)

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Matthew P Hardy (MP)

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Deepa Manthravadi (D)

Brandeis University, 415 South Street, Waltham, MA, 02453, USA.

Laura Van Gerven (L)

Department of ENT-HNS, UZ Leuven, Herestraat 49, 3000, Leuven, Belgium.

Mark Jorissen (M)

Department of ENT-HNS, UZ Leuven, Herestraat 49, 3000, Leuven, Belgium.

Zhen Zeng (Z)

Max Planck Research Unit for Neurogenetics, Max von-Laue-Strasse 4, 60438, Frankfurt, Germany.

Mona Khan (M)

Max Planck Research Unit for Neurogenetics, Max von-Laue-Strasse 4, 60438, Frankfurt, Germany.

Peter Mombaerts (P)

Max Planck Research Unit for Neurogenetics, Max von-Laue-Strasse 4, 60438, Frankfurt, Germany.

Jennifer Harrow (J)

ELIXIR, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Darren W Logan (DW)

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
Monell Chemical Senses Center, Philadelphia, PA, 19104, USA.
Waltham Petcare Science Institute, Leicestershire, LE14 4RT, UK.

Adam Frankish (A)

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. frankish@ebi.ac.uk.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH