Genome annotation: From human genetics to biodiversity genomics.
Journal
Cell genomics
ISSN: 2666-979X
Titre abrégé: Cell Genom
Pays: United States
ID NLM: 9918284260106676
Informations de publication
Date de publication:
09 Aug 2023
09 Aug 2023
Historique:
medline:
21
8
2023
pubmed:
21
8
2023
entrez:
21
8
2023
Statut:
epublish
Résumé
Within the next decade, the genomes of 1.8 million eukaryotic species will be sequenced. Identifying genes in these sequences is essential to understand the biology of the species. This is challenging due to the transcriptional complexity of eukaryotic genomes, which encode hundreds of thousands of transcripts of multiple types. Among these, a small set of protein-coding mRNAs play a disproportionately large role in defining phenotypes. Due to their sequence conservation, orthology can be established, making it possible to define the universal catalog of eukaryotic protein-coding genes. This catalog should substantially contribute to uncovering the genomic events underlying the emergence of eukaryotic phenotypes. This piece briefly reviews the basics of protein-coding gene prediction, discusses challenges in finalizing annotation of the human genome, and proposes strategies for producing annotations across the eukaryotic Tree of Life. This lays the groundwork for obtaining the catalog of all genes-the Earth's code of life.
Identifiants
pubmed: 37601977
doi: 10.1016/j.xgen.2023.100375
pii: S2666-979X(23)00172-6
pmc: PMC10435374
doi:
Banques de données
figshare
['10.6084/m9.figshare.19642383.v1']
Types de publication
Journal Article
Review
Langues
eng
Pagination
100375Informations de copyright
© 2023 The Author.
Déclaration de conflit d'intérêts
The author declares no competing interests.
Références
PLoS Comput Biol. 2019 Apr 3;15(4):e1006682
pubmed: 30943207
Gene. 2002 Oct 30;300(1-2):89-95
pubmed: 12468090
Genetics. 2022 Apr 4;220(4):
pubmed: 35266522
Nature. 2009 Sep 10;461(7261):206-11
pubmed: 19741701
Nature. 2014 Mar 27;507(7493):462-70
pubmed: 24670764
Genome Biol. 2016 Mar 04;17:41
pubmed: 26944702
NAR Genom Bioinform. 2020 Aug 06;2(3):lqaa054
pubmed: 33575605
Nucleic Acids Res. 2020 Jan 8;48(D1):D704-D715
pubmed: 31701156
Nat Commun. 2020 Oct 16;11(1):5301
pubmed: 33067450
Genome Biol. 2004;5(9):R64
pubmed: 15345048
Genome Res. 2001 May;11(5):904-18
pubmed: 11337484
Philos Trans R Soc Lond B Biol Sci. 2019 Jul 22;374(1777):20190102
pubmed: 31154976
Nat Ecol Evol. 2023 Feb;7(2):264-278
pubmed: 36593289
Genome Biol. 2021 Nov 11;22(1):310
pubmed: 34763716
Nat Commun. 2018 Feb 13;9(1):490
pubmed: 29440659
Mol Syst Biol. 2011 Oct 11;7:539
pubmed: 21988835
Gigascience. 2019 May 1;8(5):
pubmed: 31077315
Genome Biol. 2012 Sep 26;13(9):R51
pubmed: 22951037
Genome Biol. 2021 Aug 23;22(1):240
pubmed: 34425866
BMC Microbiol. 2014 Nov 30;14:294
pubmed: 25433798
Nucleic Acids Res. 2021 Jan 8;49(D1):D916-D923
pubmed: 33270111
Proc Natl Acad Sci U S A. 2007 Mar 27;104(13):5495-500
pubmed: 17372197
Nat Methods. 2013 Dec;10(12):1177-84
pubmed: 24185837
Genome Biol. 2021 Nov 29;22(1):323
pubmed: 34844637
Nat Genet. 2007 Oct;39(10):1256-60
pubmed: 17828263
BMC Biol. 2021 Nov 27;19(1):254
pubmed: 34838024
Nature. 2021 Nov;599(7883):91-95
pubmed: 34707284
Methods Mol Biol. 2020;2120:277-301
pubmed: 32124327
Nature. 2022 Apr;604(7906):437-446
pubmed: 35444317
Science. 2018 Aug 10;361(6402):591-594
pubmed: 30093596
Nat Commun. 2016 Jun 02;7:11778
pubmed: 27250503
Bioinformatics. 2011 Jul 1;27(13):i275-82
pubmed: 21685081
Nature. 2012 Sep 6;489(7414):101-8
pubmed: 22955620
Nucleic Acids Res. 2018 Jan 4;46(D1):D221-D228
pubmed: 29126148
Genome Biol. 2020 Feb 7;21(1):30
pubmed: 32033565
Genomics. 1996 Nov 1;37(3):327-36
pubmed: 8938445
Nat Commun. 2019 Jul 16;10(1):3120
pubmed: 31311926
Front Genet. 2020 Oct 20;11:574737
pubmed: 33193682
Nucleic Acids Res. 1982 Sep 11;10(17):5303-18
pubmed: 7145702
Nat Rev Genet. 2015 Mar;16(3):172-83
pubmed: 25645873
Plant Physiol. 2019 Jan;179(1):38-54
pubmed: 30401722
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D112-5
pubmed: 15608158
Nat Rev Genet. 2001 Jul;2(7):493-503
pubmed: 11433356
Genome Res. 2012 Sep;22(9):1775-89
pubmed: 22955988
Science. 1999 Oct 15;286(5439):455-7
pubmed: 10521335
Proc Natl Acad Sci U S A. 2022 Jan 25;119(4):
pubmed: 35042805
PeerJ. 2015 Sep 24;3:e1273
pubmed: 26421241
Nucleic Acids Res. 2020 Mar 18;48(5):2271-2286
pubmed: 31980822
Science. 2023 Apr 28;380(6643):eabn3107
pubmed: 37104600
Nature. 2008 Nov 27;456(7221):470-6
pubmed: 18978772
Nucleic Acids Res. 2003 Oct 1;31(19):5654-66
pubmed: 14500829
Science. 2013 Jan 25;339(6118):456-60
pubmed: 23258410
BMC Bioinformatics. 2015;16 Suppl 14:S1
pubmed: 26451672
Mol Biol Evol. 2022 Mar 2;39(3):
pubmed: 35143670
Nucleic Acids Res. 2016 Dec 1;44(21):10074-10090
pubmed: 27915291
Genomics. 1996 Jun 15;34(3):353-67
pubmed: 8786136
Science. 2023 Jun 2;380(6648):906-913
pubmed: 37262161
Nature. 2022 Apr;604(7905):310-315
pubmed: 35388217
Elife. 2019 May 29;8:
pubmed: 31140975
Nature. 2012 Sep 6;489(7414):57-74
pubmed: 22955616
Science. 2003 May 30;300(5624):1439-43
pubmed: 12775843
Nucleic Acids Res. 2023 Jan 6;51(D1):D942-D949
pubmed: 36420896
Cell. 2021 May 27;184(11):2973-2987.e18
pubmed: 33945788
NAR Genom Bioinform. 2021 Jan 06;3(1):lqaa108
pubmed: 33575650
Curr Protoc Bioinformatics. 2014 Dec 12;48:4.11.1-4.11.39
pubmed: 25501943
Genome Res. 2004 Jun;14(6):1188-90
pubmed: 15173120
Exp Cell Res. 2020 Jun 1;391(1):111940
pubmed: 32156600
Nature. 2013 Sep 26;501(7468):506-11
pubmed: 24037378
Nat Neurosci. 2017 Apr;20(4):602-611
pubmed: 28263302
Proc Natl Acad Sci U S A. 2007 Jul 24;104(30):12422-7
pubmed: 17640892
Proc Natl Acad Sci U S A. 2007 Dec 18;104(51):20421-6
pubmed: 18077390
Genome Biol. 2019 Dec 16;20(1):278
pubmed: 31842956
Nat Genet. 2017 Dec;49(12):1731-1740
pubmed: 29106417
Nat Commun. 2020 Aug 12;11(1):4025
pubmed: 32788667
Nat Biotechnol. 2017 Apr 11;35(4):316-319
pubmed: 28398311
Nat Biotechnol. 2023 Jul;41(7):915-918
pubmed: 36593406
Nat Biotechnol. 2022 Jul;40(7):1082-1092
pubmed: 35256815
Nucleic Acids Res. 1982 Jan 22;10(2):459-72
pubmed: 7063411
Nat Biotechnol. 2019 Dec;37(12):1466-1470
pubmed: 31792410
Science. 1991 Jun 21;252(5013):1651-6
pubmed: 2047873
Science. 2023 Apr 28;380(6643):eabn3943
pubmed: 37104599
Nat Biotechnol. 2021 Jun;39(6):697-704
pubmed: 33510483
Mol Biol Evol. 2015 Oct;32(10):2775-83
pubmed: 26163667
Development. 2008 Apr;135(7):1201-14
pubmed: 18287206
Wellcome Open Res. 2023 Jan 17;8:24
pubmed: 36864925
PLoS One. 2016 Jun 21;11(6):e0157779
pubmed: 27327613
Nucleic Acids Res. 2000 Jan 1;28(1):33-6
pubmed: 10592175
PLoS Comput Biol. 2015 Aug 06;11(8):e1004393
pubmed: 26248053
Nat Genet. 2000 Jun;25(2):239-40
pubmed: 10835646
Nature. 2022 Aug;608(7924):733-740
pubmed: 35978187
Nat Methods. 2014 Feb;11(2):190-6
pubmed: 24412976
Trends Biochem Sci. 2017 Feb;42(2):98-110
pubmed: 27712956
Genome Biol. 2013 Jul 01;14(7):R70
pubmed: 23815980
Proteomics. 2006 Dec;6(23):6207-20
pubmed: 17078018
Nature. 2009 Mar 12;458(7235):223-7
pubmed: 19182780
Nat Genet. 2000 Jun;25(2):235-8
pubmed: 10835645
Nat Biotechnol. 2022 Jul;40(7):994-999
pubmed: 35831657
Proc Natl Acad Sci U S A. 2022 Jan 25;119(4):
pubmed: 35042800
Cardiovasc Res. 2020 Oct 1;116(12):1981-1994
pubmed: 31990292
Nat Med. 2022 Feb;28(2):243-250
pubmed: 35145307
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45
pubmed: 26553804
Bioinformatics. 2010 Aug 1;26(15):1834-40
pubmed: 20529891
Trends Ecol Evol. 2020 May;35(5):415-425
pubmed: 32294423
Genome Res. 2007 Jun;17(6):669-81
pubmed: 17567988
Nucleic Acids Res. 2007 Jan;35(Database issue):D110-5
pubmed: 17082203
Nat Genet. 1993 Mar;3(3):266-72
pubmed: 8485583