The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.
Biofilm
Community challenge
Critical assessment
Long-term memory
Protein function prediction
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
19 11 2019
19 11 2019
Historique:
received:
16
05
2019
accepted:
24
09
2019
entrez:
21
11
2019
pubmed:
21
11
2019
medline:
6
2
2020
Statut:
epublish
Résumé
The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
Sections du résumé
BACKGROUND
The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.
RESULTS
Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.
CONCLUSION
We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
Identifiants
pubmed: 31744546
doi: 10.1186/s13059-019-1835-8
pii: 10.1186/s13059-019-1835-8
pmc: PMC6864930
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
244Subventions
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/M025047/1
Pays : United Kingdom
Organisme : NIGMS NIH HHS
ID : P20 GM113132
Pays : United States
Organisme : NIGMS NIH HHS
ID : R00 GM097033
Pays : United States
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/N019431/1
Pays : United Kingdom
Organisme : NINDS NIH HHS
ID : R21 NS103831
Pays : United States
Organisme : NCI NIH HHS
ID : U01 CA198942
Pays : United States
Organisme : NHGRI NIH HHS
ID : U41 HG007234
Pays : United States
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/F00964X/1
Pays : United Kingdom
Organisme : NCATS NIH HHS
ID : U24 TR002306
Pays : United States
Organisme : NIGMS NIH HHS
ID : R35 GM128637
Pays : United States
Organisme : Medical Research Council
ID : MC_UP_1201/14
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/L020505/1
Pays : United Kingdom
Organisme : NIGMS NIH HHS
ID : R01 GM093123
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM071749
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM123055
Pays : United States
Organisme : NIGMS NIH HHS
ID : R15 GM120650
Pays : United States
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/N004876/1
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/L002817/1
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/K004131/1
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/M015009/1
Pays : United Kingdom
Organisme : NCATS NIH HHS
ID : UL1 TR002319
Pays : United States
Références
PLoS One. 2011 Apr 07;6(4):e18394
pubmed: 21512583
Bioinformatics. 2014 Sep 1;30(17):i609-16
pubmed: 25161254
J Bacteriol. 2004 Jul;186(14):4466-75
pubmed: 15231778
Cell Mol Life Sci. 2003 Dec;60(12):2637-50
pubmed: 14685688
Bioinformatics. 2005 Jun;21 Suppl 1:i302-10
pubmed: 15961472
J Mol Biol. 2002 Jun 21;319(5):1257-65
pubmed: 12079362
Proc Natl Acad Sci U S A. 2006 Feb 21;103(8):2833-8
pubmed: 16477005
Nucleic Acids Res. 2015 Jan;43(Database issue):D447-52
pubmed: 25352553
PLoS Comput Biol. 2005 Oct;1(5):e45
pubmed: 16217548
Eukaryot Cell. 2011 Jun;10(6):753-60
pubmed: 21498642
Bioinformatics. 2013 Jul 01;29(13):i53-61
pubmed: 23813009
Protein Sci. 2006 Jun;15(6):1550-6
pubmed: 16672240
Trends Biotechnol. 2009 Apr;27(4):210-9
pubmed: 19251332
Nature. 2003 Mar 13;422(6928):198-207
pubmed: 12634793
PLoS One. 2013 Sep 04;8(9):e74444
pubmed: 24023943
Mol Microbiol. 2011 Nov;82(3):602-18
pubmed: 21923768
Nat Genet. 2000 May;25(1):25-9
pubmed: 10802651
Proc Natl Acad Sci U S A. 2004 Oct 12;101(41):14754-9
pubmed: 15456910
Trends Genet. 2013 Nov;29(11):609-10
pubmed: 24138813
J Bacteriol. 2004 Jul;186(14):4457-65
pubmed: 15231777
Mol Syst Biol. 2007;3:88
pubmed: 17353930
BMC Bioinformatics. 2005 Oct 12;6:247
pubmed: 16221304
Nucleic Acids Res. 2017 Jan 4;45(D1):D592-D596
pubmed: 27738138
Science. 1994 Dec 9;266(5191):1723-6
pubmed: 7992058
Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169
pubmed: 27899622
PLoS One. 2018 Jun 11;13(6):e0198216
pubmed: 29889900
J Bioinform Comput Biol. 2010 Apr;8(2):357-76
pubmed: 20401950
Nat Rev Genet. 2016 May 17;17(6):333-51
pubmed: 27184599
J Comput Biol. 2003;10(6):947-60
pubmed: 14980019
Bioinformatics. 2008 Mar 15;24(6):798-806
pubmed: 18263643
Mol Microbiol. 2004 Feb;51(3):675-90
pubmed: 14731271
Fungal Genet Biol. 2008 Jun;45(6):861-77
pubmed: 18296085
Bioinformatics. 2003 Jul 1;19(10):1275-83
pubmed: 12835272
Mol Microbiol. 2003 Oct;50(1):167-81
pubmed: 14507372
Genome Biol. 2016 Sep 07;17(1):184
pubmed: 27604469
J Bacteriol. 2007 Nov;189(22):8165-78
pubmed: 17586641
G3 (Bethesda). 2019 Jan 9;9(1):251-267
pubmed: 30463884
PLoS Genet. 2009 Mar;5(3):e1000407
pubmed: 19300474
Mol Biol Cell. 2008 May;19(5):2251-66
pubmed: 18321992
Bioinformatics. 2018 Jul 15;34(14):2465-2473
pubmed: 29522145
F1000Res. 2018 Sep 28;7:
pubmed: 30450194
Nucleic Acids Res. 2015 Jan;43(Database issue):D1057-63
pubmed: 25378336
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402
pubmed: 9254694
Nucleic Acids Res. 2016 Jan 4;44(D1):D646-53
pubmed: 26578582
Eukaryot Cell. 2005 Aug;4(8):1493-502
pubmed: 16087754
PLoS Comput Biol. 2009 Mar;5(3):e1000322
pubmed: 19300515
Bioinformatics. 2018 Jul 1;34(13):i313-i322
pubmed: 29949985
Proteins. 2011 Jul;79(7):2086-96
pubmed: 21671271
Mol Biol Cell. 2011 Jul 15;22(14):2458-69
pubmed: 21593210
Methods Mol Biol. 2017;1446:55-67
pubmed: 27812935
Nucleic Acids Res. 2019 Jul 2;47(W1):W373-W378
pubmed: 31073595
Proc Natl Acad Sci U S A. 1999 Apr 13;96(8):4285-8
pubmed: 10200254
Methods Mol Biol. 2017;1446:133-146
pubmed: 27812940
Cell Syst. 2017 Jul 26;5(1):63-71.e6
pubmed: 28711280
Nucleic Acids Res. 2019 Jul 2;47(W1):W379-W387
pubmed: 31106361
PLoS Comput Biol. 2013;9(5):e1003063
pubmed: 23737737
Genome Biol. 2008;9 Suppl 1:S4
pubmed: 18613948
Nat Genet. 2010 Jul;42(7):590-8
pubmed: 20543849
Nat Methods. 2013 Mar;10(3):221-7
pubmed: 23353650
Brief Bioinform. 2006 Sep;7(3):225-42
pubmed: 16772267