Assessing predictions on fitness effects of missense variants in HMBS in CAGI6.


Journal

Human genetics
ISSN: 1432-1203
Titre abrégé: Hum Genet
Pays: Germany
ID NLM: 7613873

Informations de publication

Date de publication:
07 Aug 2024
Historique:
received: 18 11 2023
accepted: 17 05 2024
medline: 7 8 2024
pubmed: 7 8 2024
entrez: 7 8 2024
Statut: aheadofprint

Résumé

This paper presents an evaluation of predictions submitted for the "HMBS" challenge, a component of the sixth round of the Critical Assessment of Genome Interpretation held in 2021. The challenge required participants to predict the effects of missense variants of the human HMBS gene on yeast growth. The HMBS enzyme, critical for the biosynthesis of heme in eukaryotic cells, is highly conserved among eukaryotes. Despite the application of a variety of algorithms and methods, the performance of predictors was relatively similar, with Kendall's tau correlation coefficients between predictions and experimental scores around 0.3 for a majority of submissions. Notably, the median correlation (≥ 0.34) observed among these predictors, especially the top predictions from different groups, was greater than the correlation observed between their predictions and the actual experimental results. Most predictors were moderately successful in distinguishing between deleterious and benign variants, as evidenced by an area under the receiver operating characteristic (ROC) curve (AUC) of approximately 0.7 respectively. Compared with the recent two rounds of CAGI competitions, we noticed more predictors outperformed the baseline predictor, which is solely based on the amino acid frequencies. Nevertheless, the overall accuracy of predictions is still far short of positive control, which is derived from experimental scores, indicating the necessity for considerable improvements in the field. The most inaccurately predicted variants in this round were associated with the insertion loop, which is absent in many orthologs, suggesting the predictors still heavily rely on the information from multiple sequence alignment.

Identifiants

pubmed: 39110250
doi: 10.1007/s00439-024-02680-3
pii: 10.1007/s00439-024-02680-3
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Welch Foundation
ID : I-2095-20220331
Organisme : Welch Foundation
ID : I-2095-20220331
Organisme : Welch Foundation
ID : I-1505
Organisme : Cancer Prevention and Research Institute of Texas
ID : RP210041
Organisme : NIH HHS
ID : R35-GM134922
Pays : United States
Organisme : NIH HHS
ID : R35-GM134922
Pays : United States
Organisme : NIH HHS
ID : R35GM124952
Pays : United States
Organisme : NIH HHS
ID : R35GM124952
Pays : United States
Organisme : NIH HHS
ID : HG012022
Pays : United States
Organisme : NIH HHS
ID : U24 HG007346
Pays : United States
Organisme : NIH HHS
ID : GM127390
Pays : United States
Organisme : Ministero dell'Istruzione e del Merito
ID : MIUR-PRIN-201744NR8S
Organisme : Ministero dell'Istruzione e del Merito
ID : MIUR-PRIN-201744NR8S
Organisme : National Science Foundation
ID : 2224128

Informations de copyright

© 2024. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.

Références

Adzhubei I, Jordan DM, Sunyaev SR (2013) Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet Chapter 7(Unit7):20. https://doi.org/10.1002/0471142905.hg0720s76
doi: 10.1002/0471142905.hg0720s76
Ancien F, Pucci F, Godfroid M, Rooman M (2018) Prediction and interpretation of deleterious coding variants in terms of protein structural stability. Sci Rep 8:4480. https://doi.org/10.1038/s41598-018-22531-2
doi: 10.1038/s41598-018-22531-2 pubmed: 29540703 pmcid: 5852127
Brandes N, Ofer D, Peleg Y, Rappoport N, Linial M (2022) ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics 38:2102–2110. https://doi.org/10.1093/bioinformatics/btac020
doi: 10.1093/bioinformatics/btac020 pubmed: 35020807 pmcid: 9386727
Bustad HJ, Kallio JP, Laitaoja M, Toska K, Kursula I, Martinez A, Janis J (2021) Characterization of porphobilinogen deaminase mutants reveals that arginine-173 is crucial for polypyrrole elongation mechanism. iScience 24:102152. https://doi.org/10.1016/j.isci.2021.102152
doi: 10.1016/j.isci.2021.102152 pubmed: 33665570 pmcid: 7907807
Calabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R (2009) Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat 30:1237–1244. https://doi.org/10.1002/humu.21047
doi: 10.1002/humu.21047 pubmed: 19514061
Capriotti E, Altman RB (2011) Improving the prediction of disease-related variants using protein three-dimensional structure. BMC Bioinformatics 12(Suppl 4):S3. https://doi.org/10.1186/1471-2105-12-S4-S3
doi: 10.1186/1471-2105-12-S4-S3 pubmed: 21992054 pmcid: 3194195
Capriotti E, Fariselli P (2017) PhD-SNPg: a webserver and lightweight tool for scoring single nucleotide variants. Nucleic Acids Res 45:W247–W252. https://doi.org/10.1093/nar/gkx369
doi: 10.1093/nar/gkx369 pubmed: 28482034 pmcid: 5570245
Capriotti E, Fariselli P (2023) PhD-SNPg: updating a webserver and lightweight tool for scoring nucleotide variants. Nucleic Acids Res. https://doi.org/10.1093/nar/gkad455
doi: 10.1093/nar/gkad455 pubmed: 37246737 pmcid: 10320148
Capriotti E, Calabrese R, Casadio R (2006) Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics 22:2729–2734. https://doi.org/10.1093/bioinformatics/btl423
doi: 10.1093/bioinformatics/btl423 pubmed: 16895930
Capriotti E, Martelli PL, Fariselli P, Casadio R (2017) Blind prediction of deleterious amino acid variations with SNPs&GO. Hum Mutat 38:1064–1071. https://doi.org/10.1002/humu.23179
doi: 10.1002/humu.23179 pubmed: 28102005 pmcid: 5522651
Choi Y, Chan AP (2015) PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31:2745–2747. https://doi.org/10.1093/bioinformatics/btv195
doi: 10.1093/bioinformatics/btv195 pubmed: 25851949 pmcid: 4528627
Consortium I (2023) The Impact of Genomic Variation on Function (IGVF) Consortium. arXiv preprint arXiv:2307.13708
Cooper GM, Stone EA, Asimenos G, Program NCS, Green ED, Batzoglou S, Sidow A (2005) Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 15:901–913. https://doi.org/10.1101/gr.3577405
doi: 10.1101/gr.3577405 pubmed: 15965027 pmcid: 1172034
Critical Assessment of Genome Interpretation C (2024) CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 25:53. https://doi.org/10.1186/s13059-023-03113-6
doi: 10.1186/s13059-023-03113-6
Dehouck Y, Grosfils A, Folch B, Gilis D, Bogaerts P, Rooman M (2009) Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. Bioinformatics 25:2537–2543. https://doi.org/10.1093/bioinformatics/btp445
doi: 10.1093/bioinformatics/btp445 pubmed: 19654118
Dehouck Y, Kwasigroch JM, Gilis D, Rooman M (2011) PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinformatics 12:151. https://doi.org/10.1186/1471-2105-12-151
doi: 10.1186/1471-2105-12-151 pubmed: 21569468 pmcid: 3113940
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L, Gibbs T, Feher T, Angerer C, Steinegger M, Bhowmik D, Rost B (2022) ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Trans Pattern Anal Mach Intell 44:7112–7127. https://doi.org/10.1109/TPAMI.2021.3095381
doi: 10.1109/TPAMI.2021.3095381 pubmed: 34232869
Geiser JR, van Tuinen D, Brockerhoff SE, Neff MM, Davis TN (1991) Can calmodulin function without binding calcium? Cell 65:949–959. https://doi.org/10.1016/0092-8674(91)90547-c
doi: 10.1016/0092-8674(91)90547-c pubmed: 2044154
Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR (2015) A global reference for human genetic variation. Nature 526:68–74. https://doi.org/10.1038/nature15393
doi: 10.1038/nature15393
Gill R, Kolstoe SE, Mohammed F, Al DBA, Mosely JE, Sarwar M, Cooper JB, Wood SP, Shoolingin-Jordan PM (2009) Structure of human porphobilinogen deaminase at 2.8 A: the molecular basis of acute intermittent porphyria. Biochem J 420:17–25. https://doi.org/10.1042/BJ20082077
doi: 10.1042/BJ20082077 pubmed: 19207107
Guerois R, Nielsen JE, Serrano L (2002) Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 320:369–387. https://doi.org/10.1016/S0022-2836(02)00442-4
doi: 10.1016/S0022-2836(02)00442-4 pubmed: 12079393
Gulko B, Hubisz MJ, Gronau I, Siepel A (2015) A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat Genet 47:276–283. https://doi.org/10.1038/ng.3196
doi: 10.1038/ng.3196 pubmed: 25599402 pmcid: 4342276
Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33:D514–D517. https://doi.org/10.1093/nar/gki033
doi: 10.1093/nar/gki033 pubmed: 15608251
International Cancer Genome C, Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, Bernabe RR, Bhan MK, Calvo F, Eerola I, Gerhard DS, Guttmacher A, Guyer M, Hemsley FM, Jennings JL, Kerr D, Klatt P, Kolar P, Kusada J, Lane DP, Laplace F, Youyong L, Nettekoven G, Ozenberger B, Peterson J, Rao TS, Remacle J, Schafer AJ, Shibata T, Stratton MR, Vockley JG, Watanabe K, Yang H, Yuen MM, Knoppers BM, Bobrow M, Cambon-Thomsen A, Dressler LG, Dyke SO, Joly Y, Kato K, Kennedy KL, Nicolas P, Parker MJ, Rial-Sebbag E, Romeo-Casabona CM, Shaw KM, Wallace S, Wiesner GL, Zeps N, Lichter P, Biankin AV, Chabannon C, Chin L, Clement B, de Alava E, Degos F, Ferguson ML, Geary P, Hayes DN, Hudson TJ, Johns AL, Kasprzyk A, Nakagawa H, Penny R, Piris MA, Sarin R, Scarpa A, Shibata T, van de Vijver M, Futreal PA, Aburatani H, Bayes M, Botwell DD, Campbell PJ, Estivill X, Gerhard DS, Grimmond SM, Gut I, Hirst M, Lopez-Otin C, Majumder P, Marra M, McPherson JD, Nakagawa H, Ning Z, Puente XS, Ruan Y, Shibata T, Stratton MR, Stunnenberg HG, Swerdlow H, Velculescu VE, Wilson RK, Xue HH, Yang L, Spellman PT, Bader GD, Boutros PC, Campbell PJ et al (2010) International network of cancer genome projects. Nature 464:993–998. https://doi.org/10.1038/nature08987
doi: 10.1038/nature08987
Ioannidis NM, Rothstein JH, Pejaver V, Middha S, McDonnell SK, Baheti S, Musolf A, Li Q, Holzinger E, Karyadi D, Cannon-Albright LA, Teerlink CC, Stanford JL, Isaacs WB, Xu J, Cooney KA, Lange EM, Schleutker J, Carpten JD, Powell IJ, Cussenot O, Cancel-Tassin G, Giles GG, MacInnis RJ, Maier C, Hsieh CL, Wiklund F, Catalona WJ, Foulkes WD, Mandal D, Eeles RA, Kote-Jarai Z, Bustamante CD, Schaid DJ, Hastie T, Ostrander EA, Bailey-Wilson JE, Radivojac P, Thibodeau SN, Whittemore AS, Sieh W (2016) REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am J Hum Genet 99:877–885. https://doi.org/10.1016/j.ajhg.2016.08.016
doi: 10.1016/j.ajhg.2016.08.016 pubmed: 27666373 pmcid: 5065685
Jagota M, Ye C, Albors C, Rastogi R, Koehl A, Ioannidis N, Song YS (2023) Cross-protein transfer learning substantially improves disease variant prediction. Genome Biol 24:182. https://doi.org/10.1186/s13059-023-03024-6
doi: 10.1186/s13059-023-03024-6 pubmed: 37550700 pmcid: 10408151
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Zidek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589. https://doi.org/10.1038/s41586-021-03819-2
doi: 10.1038/s41586-021-03819-2 pubmed: 34265844 pmcid: 8371605
Kaitlin ES, Jack AK, Konrad JK, Anne HOD-L, Emma P-H, Daniel GM, Benjamin MN, Mark JD (2017) Regional missense constraint improves variant deleteriousness prediction. bioRxiv. https://doi.org/10.1101/148353
doi: 10.1101/148353
Katsonis P, Lichtarge O (2014) A formal perturbation equation between genotype and phenotype determines the evolutionary action of protein-coding variations on fitness. Genome Res 24:2050–2058. https://doi.org/10.1101/gr.176214.114
doi: 10.1101/gr.176214.114 pubmed: 25217195 pmcid: 4248321
Katsonis P, Lichtarge O (2017) Objective assessment of the evolutionary action equation for the fitness effect of missense mutations across CAGI-blinded contests. Hum Mutat 38:1072–1084. https://doi.org/10.1002/humu.23266
doi: 10.1002/humu.23266 pubmed: 28544059 pmcid: 5600169
Katsonis P, Lichtarge O (2019) CAGI5: Objective performance assessments of predictions based on the evolutionary action equation. Hum Mutat 40:1436–1454. https://doi.org/10.1002/humu.23873
doi: 10.1002/humu.23873 pubmed: 31317604 pmcid: 6900054
Kauppinen R, von und zu Fraunberg M (2002) Molecular and biochemical studies of acute intermittent porphyria in 196 patients and their families. Clin Chem 48:1891–1900
doi: 10.1093/clinchem/48.11.1891 pubmed: 12406973
Kim S, Jhong JH, Lee J, Koo JY (2017) Meta-analytic support vector machine for integrating multiple omics data. BioData Min 10:2. https://doi.org/10.1186/s13040-017-0126-8
doi: 10.1186/s13040-017-0126-8 pubmed: 28149325 pmcid: 5270233
Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J (2021) Critical assessment of methods of protein structure prediction (CASP)-Round XIV. Proteins 89:1607–1617. https://doi.org/10.1002/prot.26237
doi: 10.1002/prot.26237 pubmed: 34533838 pmcid: 8726744
Kuru N, Dereli O, Akkoyun E, Bircan A, Tastan O, Adebali O (2022) PHACT: phylogeny-aware computing of tolerance for missense mutations. Mol Biol Evol 39:msac114. https://doi.org/10.1093/molbev/msac114
doi: 10.1093/molbev/msac114 pubmed: 35639618 pmcid: 9178230
Laimer J, Hofer H, Fritz M, Wegenkittl S, Lackner P (2015) MAESTRO–multi agent stability prediction upon point mutations. BMC Bioinformatics 16:116. https://doi.org/10.1186/s12859-015-0548-6
doi: 10.1186/s12859-015-0548-6 pubmed: 25885774 pmcid: 4403899
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921. https://doi.org/10.1038/35057062
doi: 10.1038/35057062 pubmed: 11237011
Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42:D980–D985. https://doi.org/10.1093/nar/gkt1113
doi: 10.1093/nar/gkt1113 pubmed: 24234437
Lenglet H, Schmitt C, Grange T, Manceau H, Karboul N, Bouchet-Crivat F, Robreau AM, Nicolas G, Lamoril J, Simonin S, Mirmiran A, Karim Z, Casalino E, Deybach JC, Puy H, Peoc’h K, Gouya L (2018) From a dominant to an oligogenic model of inheritance with environmental modifiers in acute intermittent porphyria. Hum Mol Genet 27:1164–1173. https://doi.org/10.1093/hmg/ddy030
doi: 10.1093/hmg/ddy030 pubmed: 29360981
Li C, Zhi D, Wang K, Liu X (2022) MetaRNN: differentiating rare pathogenic and rare benign missense SNVs and InDels using deep learning. Genome Med 14:115. https://doi.org/10.1186/s13073-022-01120-z
doi: 10.1186/s13073-022-01120-z pubmed: 36209109 pmcid: 9548151
Lichtarge O, Bourne HR, Cohen FE (1996) An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257:342–358. https://doi.org/10.1006/jmbi.1996.0167
doi: 10.1006/jmbi.1996.0167 pubmed: 8609628
Liu X, Li C, Mou C, Dong Y, Tu Y (2020) dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs. Genome Med 12:103. https://doi.org/10.1186/s13073-020-00803-9
doi: 10.1186/s13073-020-00803-9 pubmed: 33261662 pmcid: 7709417
Matsvei T, Gabriel C, Pauline H, Jean K, Marianne R, Fabrizio P (2023) FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction. bioRxiv. https://doi.org/10.1101/2023.08.01.551497
doi: 10.1101/2023.08.01.551497
Meier J, Rao R, Verkuil R, Liu J, Sercu T, Rives A (2021) Language models enable zero-shot prediction of the effects of mutations on protein function. Adv Neural Inf Process Syst 34:29287–29303
Ng PC, Henikoff S (2001) Predicting deleterious amino acid substitutions. Genome Res 11:863–874. https://doi.org/10.1101/gr.176601
doi: 10.1101/gr.176601 pubmed: 11337480 pmcid: 311071
Park H, Bradley P, Greisen P Jr, Liu Y, Mulligan VK, Kim DE, Baker D, DiMaio F (2016) Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules. J Chem Theory Comput 12:6201–6212. https://doi.org/10.1021/acs.jctc.6b00819
doi: 10.1021/acs.jctc.6b00819 pubmed: 27766851 pmcid: 5515585
Pei J, Grishin NV (2001) AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics 17:700–712. https://doi.org/10.1093/bioinformatics/17.8.700
doi: 10.1093/bioinformatics/17.8.700 pubmed: 11524371
Pei J, Kim BH, Grishin NV (2008) PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res 36:2295–2300. https://doi.org/10.1093/nar/gkn072
doi: 10.1093/nar/gkn072 pubmed: 18287115 pmcid: 2367709
Pluta P, Roversi P, Bernardo-Seisdedos G, Rojas AL, Cooper JB, Gu S, Pickersgill RW, Millet O (2018) Structural basis of pyrrole polymerization in human porphobilinogen deaminase. Biochim Biophys Acta Gen Subj 1862:1948–1955. https://doi.org/10.1016/j.bbagen.2018.06.013
doi: 10.1016/j.bbagen.2018.06.013 pubmed: 29908816 pmcid: 6192514
Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A (2010) Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 20:110–121. https://doi.org/10.1101/gr.097857.109
doi: 10.1101/gr.097857.109 pubmed: 19858363 pmcid: 2798823
Pucci F, Zerihun MB, Rooman M, Schug A (2024) pycofitness-Evaluating the fitness landscape of RNA and protein sequences. Bioinformatics 40:btae074. https://doi.org/10.1093/bioinformatics/btae074
doi: 10.1093/bioinformatics/btae074 pubmed: 38335928 pmcid: 10881095
Raimondi D, Tanyalcin I, Ferte J, Gazzo A, Orlando G, Lenaerts T, Rooman M, Vranken W (2017) DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins. Nucleic Acids Res 45:W201–W206. https://doi.org/10.1093/nar/gkx390
doi: 10.1093/nar/gkx390 pubmed: 28498993 pmcid: 5570203
Resource Sequence Variant Interpretation Working G, Recommendations Brnich SE, Abou Tayoun AN, Couch FJ, Cutting GR, Greenblatt MS, Heinen CD, Kanavy DM, Luo X, McNulty SM, Starita LM, Tavtigian SV, Wright MW, Harrison SM, Biesecker LG, Berg JS (2019) Clinical Genome for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med 12:3. https://doi.org/10.1186/s13073-019-0690-2
Riesselman AJ, Ingraham JB, Marks DS (2018) Deep generative models of genetic variation capture the effects of mutations. Nat Methods 15:816–822. https://doi.org/10.1038/s41592-018-0138-4
doi: 10.1038/s41592-018-0138-4 pubmed: 30250057 pmcid: 6693876
Sato H, Sugishima M, Tsukaguchi M, Masuko T, Iijima M, Takano M, Omata Y, Hirabayashi K, Wada K, Hisaeda Y, Yamamoto K (2021) Crystal structures of hydroxymethylbilane synthase complexed with a substrate analog: a single substrate-binding site for four consecutive condensation steps. Biochem J 478:1023–1042. https://doi.org/10.1042/BCJ20200996
doi: 10.1042/BCJ20200996 pubmed: 33600566
Savojardo C, Fariselli P, Martelli PL, Casadio R (2016) INPS-MD: a web server to predict stability of protein variants from sequence and structure. Bioinformatics 32:2542–2544. https://doi.org/10.1093/bioinformatics/btw192
doi: 10.1093/bioinformatics/btw192 pubmed: 27153629
Schneider-Yin X, Ulbrichova D, Mamet R, Martasek P, Marohnic CC, Goren A, Minder EI, Schoenfeld N (2008) Characterization of two missense variants in the hydroxymethylbilane synthase gene in the Israeli population, which differ in their associations with acute intermittent porphyria. Mol Genet Metab 94:343–346. https://doi.org/10.1016/j.ymgme.2008.03.001
doi: 10.1016/j.ymgme.2008.03.001 pubmed: 18406650
Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L (2005) The FoldX web server: an online force field. Nucleic Acids Res 33:W382–W388. https://doi.org/10.1093/nar/gki387
doi: 10.1093/nar/gki387 pubmed: 15980494 pmcid: 1160148
Sherry ST, Ward M, Sirotkin K (1999) dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res 9:677–679
doi: 10.1101/gr.9.8.677 pubmed: 10447503
Song G, Li Y, Cheng C, Zhao Y, Gao A, Zhang R, Joachimiak A, Shaw N, Liu ZJ (2009) Structural insight into acute intermittent porphyria. FASEB J 23:396–404. https://doi.org/10.1096/fj.08-115469
doi: 10.1096/fj.08-115469 pubmed: 18936296
Strokach A, Becerra D, Corbi-Verge C, Perez-Riba A, Kim PM (2020) Fast and flexible protein design using deep graph neural networks. Cell Syst 11(402–411):e4. https://doi.org/10.1016/j.cels.2020.08.016
doi: 10.1016/j.cels.2020.08.016
Strokach A, Lu TY, Kim PM (2021) ELASPIC2 (EL2): combining contextualized language models and graph neural networks to predict effects of mutations. J Mol Biol 433:166810. https://doi.org/10.1016/j.jmb.2021.166810
doi: 10.1016/j.jmb.2021.166810 pubmed: 33450251
Tsishyn M, Cia G, Hermans P, Kwasigroch J, Rooman M, Pucci F (2024) FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction. Hum Genomics 18:36. https://doi.org/10.1186/s40246-024-00605-9
doi: 10.1186/s40246-024-00605-9 pubmed: 38627807 pmcid: 11020440
Turnbull C, Scott RH, Thomas E, Jones L, Murugaesu N, Pretty FB, Halai D, Baple E, Craig C, Hamblin A, Henderson S, Patch C, O'Neill A, Devereau A, Smith K, Martin AR, Sosinsky A, McDonagh EM, Sultana R, Mueller M, Smedley D, Toms A, Dinh L, Fowler T, Bale M, Hubbard T, Rendon A, Hill S, Caulfield MJ, Project G (2018) The 100 000 Genomes Project: bringing whole genome sequencing to the NHS. Br Med J 361:k1687. https://doi.org/10.1136/bmj.k1687
Ulbrichova D, Schneider-Yin X, Mamet R, Saudek V, Martasek P, Minder EI, Schoenfeld N (2009) Correlation between biochemical findings, structural and enzymatic abnormalities in mutated HMBS identified in six Israeli families with acute intermittent porphyria. Blood Cells Mol Dis 42:167–173. https://doi.org/10.1016/j.bcmd.2008.11.001
doi: 10.1016/j.bcmd.2008.11.001 pubmed: 19138865
van Loggerenberg W, Sowlati-Hashjin S, Weile J, Hamilton R, Chawla A, Sheykhkarimli D, Gebbia M, Kishore N, Fresard L, Mustajoki S, Pischik E, Di Pierro E, Barbaro M, Floderus Y, Schmitt C, Gouya L, Colavin A, Nussbaum R, Friesema ECH, Kauppinen R, To-Figueras J, Aarsand AK, Desnick RJ, Garton M, Roth FP (2023) Systematically testing human HMBS missense variants to reveal mechanism and pathogenic variation. Am J Hum Genet 110:1769–1786. https://doi.org/10.1016/j.ajhg.2023.08.012
doi: 10.1016/j.ajhg.2023.08.012 pubmed: 37729906 pmcid: 10577081
van Warren L, Shahin S-H, Jochen W, Rayna H, Aditya C, Marinella G, Nishka K, Laure F, Sami M, Elena P, Di Elena P, Michela B, Ylva F, Caroline S, Laurent G, Alexandre C, Robert N, Edith CHF, Raili K, Jordi T-F, Aasne KA, Robert JD, Michael G, Frederick PR (2023) Systematically testing human HMBS missense variants to reveal mechanism and pathogenic variation. bioRxiv. https://doi.org/10.1101/2023.02.06.527353
doi: 10.1101/2023.02.06.527353 pubmed: 38187674 pmcid: 10769204
Weile J, Sun S, Cote AG, Knapp J, Verby M, Mellor JC, Wu Y, Pons C, Wong C, van Lieshout N, Yang F, Tasan M, Tan G, Yang S, Fowler DM, Nussbaum R, Bloom JD, Vidal M, Hill DE, Aloy P, Roth FP (2017) A framework for exhaustively mapping functional missense variants. Mol Syst Biol 13:957. https://doi.org/10.15252/msb.20177908
doi: 10.15252/msb.20177908 pubmed: 29269382 pmcid: 5740498
Zdobnov EM, Kuznetsov D, Tegenfeldt F, Manni M, Berkeley M, Kriventseva EV (2021) OrthoDB in 2020: evolutionary and functional annotations of orthologs. Nucleic Acids Res 49:D389–D393. https://doi.org/10.1093/nar/gkaa1009
doi: 10.1093/nar/gkaa1009 pubmed: 33196836
Zhang J, Kinch LN, Cong Q, Weile J, Sun S, Cote AG, Roth FP, Grishin NV (2017) Assessing predictions of fitness effects of missense mutations in SUMO-conjugating enzyme UBE2I. Hum Mutat 38:1051–1063. https://doi.org/10.1002/humu.23293
doi: 10.1002/humu.23293 pubmed: 28817247 pmcid: 5746193
Zhang J, Kinch LN, Cong Q, Katsonis P, Lichtarge O, Savojardo C, Babbi G, Martelli PL, Capriotti E, Casadio R, Garg A, Pal D, Weile J, Sun S, Verby M, Roth FP, Grishin NV (2019) Assessing predictions on fitness effects of missense variants in calmodulin. Hum Mutat 40:1463–1473. https://doi.org/10.1002/humu.23857
doi: 10.1002/humu.23857 pubmed: 31283071 pmcid: 6744288

Auteurs

Jing Zhang (J)

Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.

Lisa Kinch (L)

Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.

Panagiotis Katsonis (P)

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.

Olivier Lichtarge (O)

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.

Milind Jagota (M)

Computer Science Division, University of California, Berkeley, CA, 94720, USA.

Yun S Song (YS)

Computer Science Division, University of California, Berkeley, CA, 94720, USA.
Department of Statistics, University of California, Berkeley, Berkeley, CA, 94720, USA.

Yuanfei Sun (Y)

Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA.

Yang Shen (Y)

Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA.

Nurdan Kuru (N)

Faculty of Engineering and Natural Sciences, Sabanci University, Tuzla, Turkey.

Onur Dereli (O)

Faculty of Engineering and Natural Sciences, Sabanci University, Tuzla, Turkey.

Ogun Adebali (O)

Faculty of Engineering and Natural Sciences, Sabanci University, Tuzla, Turkey.

Muttaqi Ahmad Alladin (MA)

Department of Computational and Data Sciences, Indian Institute of Science, Bangaluru, 560012, India.

Debnath Pal (D)

Department of Computational and Data Sciences, Indian Institute of Science, Bangaluru, 560012, India.

Emidio Capriotti (E)

Department of Pharmacy and Biotechnology, University of Bologna, Via Selmi 3, 40126, Bologna, Italy.

Maria Paola Turina (MP)

Department of Pharmacy and Biotechnology, University of Bologna, Via Selmi 3, 40126, Bologna, Italy.

Castrense Savojardo (C)

Department of Pharmacy and Biotechnology, University of Bologna, Via Selmi 3, 40126, Bologna, Italy.

Pier Luigi Martelli (PL)

Department of Pharmacy and Biotechnology, University of Bologna, Via Selmi 3, 40126, Bologna, Italy.

Giulia Babbi (G)

Department of Pharmacy and Biotechnology, University of Bologna, Via Selmi 3, 40126, Bologna, Italy.

Rita Casadio (R)

Department of Pharmacy and Biotechnology, University of Bologna, Via Selmi 3, 40126, Bologna, Italy.

Fabrizio Pucci (F)

Computational Biology and Bioinformatics, Université Libre de Bruxelles, 50 Roosevelt Ave, 1050, Brussels, Belgium.

Marianne Rooman (M)

Computational Biology and Bioinformatics, Université Libre de Bruxelles, 50 Roosevelt Ave, 1050, Brussels, Belgium.

Gabriel Cia (G)

Computational Biology and Bioinformatics, Université Libre de Bruxelles, 50 Roosevelt Ave, 1050, Brussels, Belgium.

Matsvei Tsishyn (M)

Computational Biology and Bioinformatics, Université Libre de Bruxelles, 50 Roosevelt Ave, 1050, Brussels, Belgium.

Alexey Strokach (A)

Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada.

Zhiqiang Hu (Z)

Department of Plant and Microbial Biology, University of California, Berkeley, CA, 94720, USA.
Center for Computational Biology, University of California, Berkeley, Berkeley, CA, 94720, USA.

Warren van Loggerenberg (W)

Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA.
Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada.
Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada.
Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada.

Frederick P Roth (FP)

Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA.
Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada.
Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada.
Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada.

Predrag Radivojac (P)

Khoury College of Computer Sciences, Northeastern University, Boston, MA, 02115, USA.

Steven E Brenner (SE)

Department of Plant and Microbial Biology, University of California, Berkeley, CA, 94720, USA.
Center for Computational Biology, University of California, Berkeley, Berkeley, CA, 94720, USA.
Biophysics Graduate Group, University of California, Berkeley, Berkeley, CA, 94720, USA.

Qian Cong (Q)

Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA. qian.cong@UTSouthwestern.edu.
Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA. qian.cong@UTSouthwestern.edu.
Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA. qian.cong@UTSouthwestern.edu.
Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA. qian.cong@UTSouthwestern.edu.

Nick V Grishin (NV)

Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA. grishin@chop.swmed.edu.
Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA. grishin@chop.swmed.edu.

Classifications MeSH