Comparative genomics and community curation further improve gene annotations in the nematode Pristionchus pacificus.
Caenorhabditis elegans
Evolution
Genome
Orphan genes
Parasitic nematodes
Journal
BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258
Informations de publication
Date de publication:
12 Oct 2020
12 Oct 2020
Historique:
received:
03
08
2020
accepted:
23
09
2020
entrez:
13
10
2020
pubmed:
14
10
2020
medline:
30
4
2021
Statut:
epublish
Résumé
Nematode model organisms such as Caenorhabditis elegans and Pristionchus pacificus are powerful systems for studying the evolution of gene function at a mechanistic level. However, the identification of P. pacificus orthologs of candidate genes known from C. elegans is complicated by the discrepancy in the quality of gene annotations, a common problem in nematode and invertebrate genomics. Here, we combine comparative genomic screens for suspicious gene models with community-based curation to further improve the quality of gene annotations in P. pacificus. We extend previous curations of one-to-one orthologs to larger gene families and also orphan genes. Cross-species comparisons of protein lengths, screens for atypical domain combinations and species-specific orphan genes resulted in 4311 candidate genes that were subject to community-based curation. Corrections for 2946 gene models were implemented in a new version of the P. pacificus gene annotations. The new set of gene annotations contains 28,896 genes and has a single copy ortholog completeness level of 97.6%. Our work demonstrates the effectiveness of comparative genomic screens to identify suspicious gene models and the scalability of community-based approaches to improve the quality of thousands of gene models. Similar community-based approaches can help to improve the quality of gene annotations in other invertebrate species, including parasitic nematodes.
Sections du résumé
BACKGROUND
BACKGROUND
Nematode model organisms such as Caenorhabditis elegans and Pristionchus pacificus are powerful systems for studying the evolution of gene function at a mechanistic level. However, the identification of P. pacificus orthologs of candidate genes known from C. elegans is complicated by the discrepancy in the quality of gene annotations, a common problem in nematode and invertebrate genomics.
RESULTS
RESULTS
Here, we combine comparative genomic screens for suspicious gene models with community-based curation to further improve the quality of gene annotations in P. pacificus. We extend previous curations of one-to-one orthologs to larger gene families and also orphan genes. Cross-species comparisons of protein lengths, screens for atypical domain combinations and species-specific orphan genes resulted in 4311 candidate genes that were subject to community-based curation. Corrections for 2946 gene models were implemented in a new version of the P. pacificus gene annotations. The new set of gene annotations contains 28,896 genes and has a single copy ortholog completeness level of 97.6%.
CONCLUSIONS
CONCLUSIONS
Our work demonstrates the effectiveness of comparative genomic screens to identify suspicious gene models and the scalability of community-based approaches to improve the quality of thousands of gene models. Similar community-based approaches can help to improve the quality of gene annotations in other invertebrate species, including parasitic nematodes.
Identifiants
pubmed: 33045985
doi: 10.1186/s12864-020-07100-0
pii: 10.1186/s12864-020-07100-0
pmc: PMC7552371
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
708Références
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W435-9
pubmed: 16845043
Trends Genet. 2019 Dec;35(12):914-922
pubmed: 31610892
Nat Genet. 2008 Oct;40(10):1193-8
pubmed: 18806794
Genome Res. 2010 Jun;20(6):837-46
pubmed: 20237107
Genome Res. 2018 Nov;28(11):1675-1687
pubmed: 30232198
Methods Mol Biol. 2018;1704:419-432
pubmed: 29277876
Genome Res. 2018 Nov;28(11):1664-1674
pubmed: 30232197
Science. 2019 Apr 5;364(6435):86-89
pubmed: 30948551
Genome Res. 2009 Sep;19(9):1630-8
pubmed: 19570905
G3 (Bethesda). 2019 Jul 9;9(7):2277-2286
pubmed: 31088903
G3 (Bethesda). 2017 Nov 6;7(11):3745-3755
pubmed: 28903981
Elife. 2019 Sep 17;8:
pubmed: 31526477
BMC Bioinformatics. 2004 May 14;5:59
pubmed: 15144565
PLoS Genet. 2015 Jun 18;11(6):e1005146
pubmed: 26087034
Cell Rep. 2017 Oct 17;21(3):834-844
pubmed: 29045848
BMC Evol Biol. 2015 Sep 15;15:185
pubmed: 26370559
BMC Bioinformatics. 2016 May 31;17(1):226
pubmed: 27245157
PLoS Genet. 2020 Apr 13;16(4):e1008687
pubmed: 32282814
BMC Bioinformatics. 2011 Dec 22;12:491
pubmed: 22192575
Nat Rev Genet. 2009 Jun;10(6):416-22
pubmed: 19369972
J Mol Evol. 2015 Jan;80(1):18-36
pubmed: 25323991
BMC Evol Biol. 2016 Aug 22;16(1):165
pubmed: 27549405
Bioinformatics. 2015 Oct 1;31(19):3210-2
pubmed: 26059717
Genetics. 2014 Apr;196(4):1145-52
pubmed: 24414549
Genome Res. 2008 Jan;18(1):188-96
pubmed: 18025269
ISME J. 2020 Jun;14(6):1494-1507
pubmed: 32152389
PLoS Pathog. 2011 Sep;7(9):e1002219
pubmed: 21909270
Environ Microbiol. 2017 Apr;19(4):1476-1489
pubmed: 28198090
BMC Evol Biol. 2011 Aug 15;11:239
pubmed: 21843315
Proc Biol Sci. 2016 Feb 24;283(1825):20152263
pubmed: 26888028
Curr Biol. 2018 Oct 8;28(19):3123-3127.e5
pubmed: 30245109
Nat Genet. 2016 Mar;48(3):299-307
pubmed: 26829753
Cell. 2013 Jan 17;152(1-2):109-19
pubmed: 23332749
BMC Res Notes. 2016 Mar 05;9:142
pubmed: 26944260
Bioinformatics. 2013 Jan 1;29(1):15-21
pubmed: 23104886
Sci Rep. 2019 Dec 11;9(1):18789
pubmed: 31827189
PLoS One. 2015 Jun 30;10(6):e0131136
pubmed: 26125626
Int J Parasitol. 2012 Jul;42(8):747-53
pubmed: 22705203
Sci Rep. 2017 Dec 14;7(1):17550
pubmed: 29242625
PLoS One. 2016 Oct 14;11(10):e0164881
pubmed: 27741297
Mol Biol Evol. 2016 Oct;33(10):2506-14
pubmed: 27189572
PLoS One. 2018 Jun 4;13(6):e0198018
pubmed: 29864131
Curr Biol. 2016 Aug 22;26(16):2174-9
pubmed: 27451902
Cell Rep. 2018 Jun 5;23(10):2835-2843.e4
pubmed: 29874571