Rosace: a robust deep mutational scanning analysis framework employing position and mean-variance shrinkage.


Journal

Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660

Informations de publication

Date de publication:
24 May 2024
Historique:
received: 31 10 2023
accepted: 14 05 2024
medline: 25 5 2024
pubmed: 25 5 2024
entrez: 24 5 2024
Statut: epublish

Résumé

Deep mutational scanning (DMS) measures the effects of thousands of genetic variants in a protein simultaneously. The small sample size renders classical statistical methods ineffective. For example, p-values cannot be correctly calibrated when treating variants independently. We propose Rosace, a Bayesian framework for analyzing growth-based DMS data. Rosace leverages amino acid position information to increase power and control the false discovery rate by sharing information across parameters via shrinkage. We also developed Rosette for simulating the distributional properties of DMS. We show that Rosace is robust to the violation of model assumptions and is more powerful than existing tools.

Identifiants

pubmed: 38789982
doi: 10.1186/s13059-024-03279-7
pii: 10.1186/s13059-024-03279-7
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

138

Subventions

Organisme : Howard Hughes Medical Institute
ID : Hanna H. Gray Fellowship
Pays : United States
Organisme : Howard Hughes Medical Institute
ID : Hanna H. Gray Fellowship
Pays : United States

Informations de copyright

© 2024. The Author(s).

Références

Fowler DM, Stephany JJ, Fields S. Measuring the activity of protein variants on a large scale using deep mutational scanning. Nat Protoc. 2014;9(9):2267–84. https://doi.org/10.1038/nprot.2014.153 .
doi: 10.1038/nprot.2014.153 pubmed: 25167058 pmcid: 4412028
Fowler DM, Fields S. Deep mutational scanning: a new style of protein science. Nature Methods. 2014;11(8):801–7. https://doi.org/10.1038/nmeth.3027 .
doi: 10.1038/nmeth.3027 pubmed: 25075907 pmcid: 4410700
Araya CL, Fowler DM. Deep mutational scanning: assessing protein function on a massive scale. Trends Biotechnol. 2011;29(9):435–42. https://doi.org/10.1016/j.tibtech.2011.04.003 .
doi: 10.1016/j.tibtech.2011.04.003 pubmed: 21561674 pmcid: 3159719
Tabet D, Parikh V, Mali P, Roth FP, Claussnitzer M. Scalable functional assays for the interpretation of human genetic variation. Annu Rev Genet. 2022;56(1):441–65. https://doi.org/10.1146/annurev-genet-072920-032107 .
doi: 10.1146/annurev-genet-072920-032107 pubmed: 36055970
Stein A, Fowler DM, Hartmann-Petersen R, Lindorff-Larsen K. Biophysical and mechanistic models for disease-causing protein variants. Trends Biochem Sci. 2019;44(7):575–88. https://doi.org/10.1016/j.tibs.2019.01.003 .
doi: 10.1016/j.tibs.2019.01.003 pubmed: 30712981 pmcid: 6579676
Romero PA, Tran TM, Abate AR. Dissecting enzyme function with microfluidic-based deep mutational scanning. Proc Natl Acad Sci USA. 2015;112:7159–64. https://doi.org/10.1073/PNAS.1422285112 .
doi: 10.1073/PNAS.1422285112 pubmed: 26040002 pmcid: 4466731
Chen JZ, Fowler DM, Tokuriki N. Comprehensive exploration of the translocation, stability and substrate recognition requirements in vim-2 lactamase. eLife. 2020;9:1–31.
doi: 10.7554/eLife.56707
Matreyek KA, Starita LM, Stephany JJ, Martin B, Chiasson MA, Gray VE, et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat Genet. 2018;50(6):874–82. https://doi.org/10.1038/s41588-018-0122-z .
doi: 10.1038/s41588-018-0122-z pubmed: 29785012 pmcid: 5980760
Leander M, Liu Z, Cui Q, Raman S. Deep mutational scanning and machine learning reveal structural and molecular rules governing allosteric hotspots in homologous proteins. eLife. 2022;11. https://doi.org/10.7554/ELIFE.79932 .
Yee SW, Macdonald C, Mitrovic D, Zhou X, Koleske ML, Yang J, et al. The full spectrum of OCT1 (SLC22A1) mutations bridges transporter biophysics to drug pharmacogenomics. bioRxiv. 2023. https://doi.org/10.1101/2023.06.06.543963 .
Estevam GO, Linossi EM, Macdonald CB, Espinoza CA, Michaud JM, Coyote-Maestas W, et al. Conserved regulatory motifs in the juxtamembrane domain and kinase N-lobe revealed through deep mutational scanning of the MET receptor tyrosine kinase domain. eLife. 2023. https://doi.org/10.7554/elife.91619.1 .
Jia X, Burugula BB, Chen V, Lemons RM, Jayakody S, Maksutova M, et al. Massively parallel functional testing of MSH2 missense variants conferring Lynch syndrome risk. Am J Hum Genet. 2021;108:163–75. https://doi.org/10.1016/J.AJHG.2020.12.003 .
doi: 10.1016/J.AJHG.2020.12.003 pubmed: 33357406
Findlay GM, Daza RM, Martin B, Zhang MD, Leith AP, Gasperini M, et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature. 2018;562(7726):217–22. https://doi.org/10.1038/s41586-018-0461-z .
doi: 10.1038/s41586-018-0461-z pubmed: 30209399 pmcid: 6181777
Meitlis I, Allenspach EJ, Bauman BM, Phan IQ, Dabbah G, Schmitt EG, et al. Multiplexed functional assessment of genetic variants in CARD11. Am J Hum Genet. 2020;107:1029–43. https://doi.org/10.1016/J.AJHG.2020.10.015 .
doi: 10.1016/J.AJHG.2020.10.015 pubmed: 33202260 pmcid: 7820631
Flynn JM, Rossouw A, Cote-Hammarlof P, Fragata I, Mavor D, Hollins C III, et al. Comprehensive fitness maps of Hsp90 show widespread environmental dependence. eLife. 2020;9:e53810. https://doi.org/10.7554/eLife.53810 .
doi: 10.7554/eLife.53810 pubmed: 32129763 pmcid: 7069724
Steinberg B, Ostermeier M. Shifting fitness and epistatic landscapes reflect trade-offs along an evolutionary pathway. J Mol Biol. 2016;428(13):2730–43. https://doi.org/10.1016/j.jmb.2016.04.033 .
doi: 10.1016/j.jmb.2016.04.033 pubmed: 27173379
Fowler DM, Araya CL, Fleishman SJ, Kellogg EH, Stephany JJ, Baker D, et al. High-resolution mapping of protein sequence-function relationships. Nat Methods. 2010;7(9):741–6. https://doi.org/10.1038/nmeth.1492 .
doi: 10.1038/nmeth.1492 pubmed: 20711194 pmcid: 2938879
Rubin AF, Gelman H, Lucas N, Bajjalieh SM, Papenfuss AT, Speed TP, et al. A statistical framework for analyzing deep mutational scanning data. Genome Biol. 2017;18:1–15. https://doi.org/10.1186/S13059-017-1272-5/FIGURES/7 .
doi: 10.1186/S13059-017-1272-5/FIGURES/7
Coyote-Maestas W, Nedrud D, He Y, Schmidt D. Determinants of trafficking, conduction, and disease within a K
doi: 10.7554/eLife.76903 pubmed: 35639599 pmcid: 9273215
Faure AJ, Schmiedel JM, Baeza-Centurion P, Lehner B. DiMSum: An error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies. Genome Biol. 2020;21:1–23. https://doi.org/10.1186/S13059-020-02091-3/TABLES/2 .
doi: 10.1186/S13059-020-02091-3/TABLES/2
Bloom JD. Software for the analysis and visualization of deep mutational scanning data. BMC Bioinformatics. 2015;16:1–13. https://doi.org/10.1186/S12859-015-0590-4/FIGURES/6 .
doi: 10.1186/S12859-015-0590-4/FIGURES/6
Bank C, Hietpas RT, Wong A, Bolon DN, Jensen JD. A Bayesian MCMC approach to assess the complete distribution of fitness effects of new mutations: Uncovering the potential for adaptive walks in challenging environments. Genetics. 2014;196:841–52. https://doi.org/10.1534/GENETICS.113.156190/-/DC1 .
doi: 10.1534/GENETICS.113.156190/-/DC1 pubmed: 24398421 pmcid: 3948810
Starita LM, Young DL, Islam M, Kitzman JO, Gullingsrud J, Hause RJ, et al. Massively parallel functional analysis of BRCA1 RING domain variants. Genetics. 2015;200(2):413–22. https://doi.org/10.1534/genetics.115.175802 .
doi: 10.1534/genetics.115.175802 pubmed: 25823446 pmcid: 4492368
Soneson C, Bendel AM, Diss G, Stadler MB. mutscan-a flexible R package for efficient end-to-end analysis of multiplexed assays of variant effect data. Genome Biol. 2023;12(24):1–22. https://doi.org/10.1186/S13059-023-02967-0/FIGURES/6 .
doi: 10.1186/S13059-023-02967-0/FIGURES/6
Eddy SR. Accelerated Profile HMM Searches. PLOS Comput Biol. 2011;7(10):1–16. https://doi.org/10.1371/journal.pcbi.1002195 .
doi: 10.1371/journal.pcbi.1002195
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2009;26(1):139–40. https://doi.org/10.1093/bioinformatics/btp616 .
doi: 10.1093/bioinformatics/btp616 pubmed: 19910308 pmcid: 2796818
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:1–21.
doi: 10.1186/s13059-014-0550-8
Stephens M. False discovery rates: a new deal. Biostatistics. 2017;18:275–94. https://doi.org/10.1093/BIOSTATISTICS/KXW041 .
doi: 10.1093/BIOSTATISTICS/KXW041 pubmed: 27756721
Kowalsky CA, Whitehead TA. Determination of binding affinity upon mutation for type I dockerin-cohesin complexes from C lostridium thermocellum and C lostridium cellulolyticum using deep sequencing. Proteins Struct Funct Bioinforma. 2016;84(12):1914–28.
doi: 10.1002/prot.25175
Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46(D1):D1062–7.
doi: 10.1093/nar/gkx1153 pubmed: 29165669
Frazer J, Notin P, Dias M, Gomez A, Min JK, Brock K, et al. Disease variant prediction with deep generative models of evolutionary data. Nature. 2021;599(7883):91–5.
doi: 10.1038/s41586-021-04043-8 pubmed: 34707284
Cheng J, Novati G, Pan J, Bycroft C, Žemgulytė A, Applebaum T, et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science. 2023;381(6664):eadg7492.
doi: 10.1126/science.adg7492 pubmed: 37733863
Starr TN, Greaney AJ, Hilton SK, Ellis D, Crawford KHD, Dingens AS, et al. Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. Cell. 2020;182:1295-1310.e20. https://doi.org/10.1016/J.CELL.2020.08.012 .
doi: 10.1016/J.CELL.2020.08.012 pubmed: 32841599 pmcid: 7418704
Stiffler M, Hekstra D, Ranganathan R. Evolvability as a function of purifying selection in TEM-1 beta-lactamase. Cell. 2015;160(5):882–892. Publisher Copyright: © 2015 Elsevier Inc. https://doi.org/10.1016/j.cell.2015.01.035 .
Faure AJ, Domingo J, Schmiedel JM, Hidalgo-Carcedo C, Diss G, Lehner B. Mapping the energetic and allosteric landscapes of protein binding domains. Nature. 2022;604(7904):175–83. https://doi.org/10.1038/s41586-022-04586-4 .
doi: 10.1038/s41586-022-04586-4 pubmed: 35388192
Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH, Sochat V, et al. Sustainable data analysis with Snakemake. F1000Research. 2021;10:33.  https://f1000research.com/articles/10-33/v2 .
Bushnell B. BBTools software package. 2014. https://sourceforge.net/projects/bbmap . Accessed 11 June 2021.
Van der Auwera GA, O’Connor BD. Genomics in the cloud: using Docker, GATK, and WDL in Terra. Sebastopol: O’Reilly Media; 2020.
Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–8. https://doi.org/10.1093/bioinformatics/btw354 .
doi: 10.1093/bioinformatics/btw354 pubmed: 27312411 pmcid: 5039924
Stan Development Team. RStan: the R interface to Stan. 2023. R package version 2.21.8. https://mc-stan.org/ . Accessed 22 May 2024.
Betancourt M. A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv:1701.02434. 2017.  https://arxiv.org/abs/1701.02434 .
Hoffman MD, Gelman A. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res. 2014;15(47):1593–623.
Rao J. pimentellab/rosace. 2023. Zenodo. https://doi.org/10.5281/zenodo.10814911 .

Auteurs

Jingyou Rao (J)

Department of Computer Science, UCLA, Los Angeles, CA, USA.

Ruiqi Xin (R)

Computational and Systems Biology Interdepartmental Program, UCLA, Los Angeles, CA, USA.

Christian Macdonald (C)

Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA.

Matthew K Howard (MK)

Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA.
Tetrad Graduate Program, UCSF, San Francisco, CA, USA.
Department of Pharmaceutical Chemistry, UCSF, San Francisco, CA, USA.

Gabriella O Estevam (GO)

Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA.
Tetrad Graduate Program, UCSF, San Francisco, CA, USA.

Sook Wah Yee (SW)

Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA.

Mingsen Wang (M)

Department of Mathematics, Baruch College, CUNY, New York, NY, USA.

James S Fraser (JS)

Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA.
Quantitative Biosciences Institute, UCSF, San Francisco, CA, USA.

Willow Coyote-Maestas (W)

Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, CA, USA. willow.coyote-maestas@ucsf.edu.
Quantitative Biosciences Institute, UCSF, San Francisco, CA, USA. willow.coyote-maestas@ucsf.edu.

Harold Pimentel (H)

Department of Computer Science, UCLA, Los Angeles, CA, USA. hjp@ucla.edu.
Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA. hjp@ucla.edu.
Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA. hjp@ucla.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH