Discriminating physiological from non-physiological interfaces in structures of protein complexes: A community-wide study.
crystal contacts
homodimers
potential energy
protein interactions
protein structure
Journal
Proteomics
ISSN: 1615-9861
Titre abrégé: Proteomics
Pays: Germany
ID NLM: 101092707
Informations de publication
Date de publication:
09 2023
09 2023
Historique:
revised:
11
05
2023
received:
04
02
2023
accepted:
11
05
2023
medline:
6
9
2023
pubmed:
27
6
2023
entrez:
27
6
2023
Statut:
ppublish
Résumé
Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.
Identifiants
pubmed: 37365936
doi: 10.1002/pmic.202200323
pmc: PMC10937251
mid: NIHMS1966567
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e2200323Subventions
Organisme : NIGMS NIH HHS
ID : R01 GM133840
Pays : United States
Organisme : NIGMS NIH HHS
ID : R35 GM122517
Pays : United States
Organisme : NIGMS NIH HHS
ID : R35 GM136409
Pays : United States
Informations de copyright
© 2023 Wiley-VCH GmbH.
Références
Bioinformatics. 2017 Jun 01;33(11):1656-1663
pubmed: 28130235
Front Mol Biosci. 2021 May 25;8:647915
pubmed: 34113650
Nucleic Acids Res. 2019 Jul 2;47(W1):W437-W442
pubmed: 31073605
Structure. 2021 Jun 3;29(6):606-621.e5
pubmed: 33539768
Protein Sci. 2018 Jan;27(1):172-181
pubmed: 28891124
Bioinformatics. 2019 Nov 1;35(22):4821-4823
pubmed: 31141126
Nucleic Acids Res. 2018 Jan 4;46(D1):D435-D439
pubmed: 29112716
PLoS One. 2016 Sep 09;11(9):e0162143
pubmed: 27611671
Protein Sci. 2013 Jan;22(1):74-82
pubmed: 23139141
Bioinformatics. 2018 Oct 15;34(20):3461-3469
pubmed: 29718115
Proteins. 2011 Sep;79(9):2648-61
pubmed: 21732421
Nucleic Acids Res. 2023 Jan 6;51(D1):D466-D478
pubmed: 36300618
Bioinform Adv. 2021 Dec 10;2(1):vbab042
pubmed: 36699405
Nat Struct Biol. 2003 Dec;10(12):980
pubmed: 14634627
Nat Commun. 2022 Mar 10;13(1):1265
pubmed: 35273146
Proteins. 2008 Nov 15;73(3):705-9
pubmed: 18491384
Bioinformatics. 2013 Dec 15;29(24):3158-66
pubmed: 24078704
Biochimie. 1995;77(7-8):497-505
pubmed: 8589061
Proteins. 2019 Dec;87(12):1200-1221
pubmed: 31612567
BMC Struct Biol. 2014 Oct 18;14:22
pubmed: 25326082
Sci Rep. 2017 Sep 5;7(1):10480
pubmed: 28874689
Nat Rev Genet. 2011 Jan;12(1):56-68
pubmed: 21164525
Science. 2021 Aug 20;373(6557):871-876
pubmed: 34282049
Nat Methods. 2022 Jun;19(6):730-739
pubmed: 35637310
Nat Methods. 2022 Jun;19(6):679-682
pubmed: 35637307
PLoS One. 2016 Aug 25;11(8):e0161879
pubmed: 27560519
FEBS Lett. 2010 Mar 19;584(6):1163-8
pubmed: 20153323
Bioinformatics. 2020 Apr 1;36(7):2113-2118
pubmed: 31746961
Proteins. 2010 Nov 15;78(15):3111-4
pubmed: 20806234
Bio Protoc. 2017 Feb 05;7(3):e2124
pubmed: 34458447
Curr Opin Struct Biol. 2019 Oct;58:105-114
pubmed: 31394387
Proteins. 2021 Dec;89(12):1711-1721
pubmed: 34599769
F1000Res. 2016 Feb 18;5:189
pubmed: 26973785
Proteins. 2013 Dec;81(12):2082-95
pubmed: 24115211
Nat Commun. 2020 Feb 5;11(1):711
pubmed: 32024829
Acta Crystallogr D Biol Crystallogr. 2013 May;69(Pt 5):701-9
pubmed: 23633579
Proteins. 2008 Aug;72(2):557-79
pubmed: 18247354
Nature. 2021 Aug;596(7873):583-589
pubmed: 34265844
BMC Biophys. 2012 May 06;5:7
pubmed: 22559010
Proteins. 2013 Sep;81(9):1571-84
pubmed: 23609916
Biophys J. 2011 Oct 19;101(8):2043-52
pubmed: 22004759
Nat Methods. 2020 Feb;17(2):184-192
pubmed: 31819266
J Mol Biol. 2007 Sep 21;372(3):774-97
pubmed: 17681537
J Mol Biol. 2016 Feb 22;428(4):720-725
pubmed: 26410586
Proteins. 2006 Nov 1;65(2):392-406
pubmed: 16933295
Proteins. 2021 Dec;89(12):1800-1823
pubmed: 34453465
Proteins. 2012 Jul;80(7):1818-33
pubmed: 22488467
Proteins. 1995 Dec;23(4):580-7
pubmed: 8749854
Proteins. 2007 Aug 1;68(2):503-15
pubmed: 17444519
Proteins. 2020 Sep;88(9):1180-1188
pubmed: 32170770
J Mol Biol. 2008 Aug 29;381(2):487-507
pubmed: 18599072
Proteins. 2003 Jul 1;52(1):2-9
pubmed: 12784359
Adv Protein Chem. 2002;61:9-73
pubmed: 12461820
Proteins. 2015 Sep;83(9):1563-70
pubmed: 25488330
PLoS Comput Biol. 2006 Nov 17;2(11):e155
pubmed: 17112313
Bioinformatics. 2011 Oct 15;27(20):2915-6
pubmed: 21873642
Bioinformatics. 2015 Jan 1;31(1):123-5
pubmed: 25183488
Nucleic Acids Res. 2018 Jul 2;46(W1):W296-W303
pubmed: 29788355
Nat Methods. 2018 Jan;15(1):67-72
pubmed: 29155427
Proteins. 2020 Aug;88(8):916-938
pubmed: 31886916
J Chem Theory Comput. 2017 Jun 13;13(6):3031-3048
pubmed: 28430426
Methods Mol Biol. 2018;1764:429-447
pubmed: 29605932
Bioinformatics. 2020 Apr 1;36(7):2105-2112
pubmed: 31738385
J Mol Biol. 2015 Sep 25;427(19):3031-41
pubmed: 26231283
J Mol Biol. 2003 Aug 1;331(1):281-99
pubmed: 12875852
Front Mol Biosci. 2022 Jan 05;8:787510
pubmed: 35071324
Proteins. 2017 Mar;85(3):359-377
pubmed: 27865038
Curr Opin Struct Biol. 2004 Apr;14(2):242-9
pubmed: 15093840
J Comput Chem. 2014 Mar 30;35(8):672-81
pubmed: 24523197
Genome Res. 2008 Apr;18(4):644-52
pubmed: 18381899
Proteins. 2017 Jun;85(6):1131-1145
pubmed: 28263393
Nat Commun. 2021 Dec 3;12(1):7068
pubmed: 34862392
Nucleic Acids Res. 2021 Jan 8;49(D1):D266-D273
pubmed: 33237325
Nucleic Acids Res. 2018 Jul 2;46(W1):W408-W416
pubmed: 29741647
Bioinformatics. 2013 Jul 15;29(14):1742-9
pubmed: 23652426
Front Bioinform. 2022 Sep 26;2:959160
pubmed: 36304330
Nat Commun. 2021 Nov 26;12(1):6933
pubmed: 34836937
Bioinformatics. 2003 Jan;19(1):163-4
pubmed: 12499312
Bioinformatics. 2023 Jan 1;39(1):
pubmed: 36420989
Cell. 1998 Feb 6;92(3):291-4
pubmed: 9476889
Proteins. 2014 Nov;82(11):3163-9
pubmed: 25179222