Improved prediction of bacterial CRISPRi guide efficiency from depletion screens through mixed-effect machine learning and data integration.
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
11 Jan 2024
11 Jan 2024
Historique:
received:
19
06
2022
accepted:
20
12
2023
medline:
11
1
2024
pubmed:
11
1
2024
entrez:
10
1
2024
Statut:
epublish
Résumé
CRISPR interference (CRISPRi) is the leading technique to silence gene expression in bacteria; however, design rules remain poorly defined. We develop a best-in-class prediction algorithm for guide silencing efficiency by systematically investigating factors influencing guide depletion in genome-wide essentiality screens, with the surprising discovery that gene-specific features substantially impact prediction. We develop a mixed-effect random forest regression model that provides better estimates of guide efficiency. We further apply methods from explainable AI to extract interpretable design rules from the model. This study provides a blueprint for predictive models for CRISPR technologies where only indirect measurements of guide activity are available.
Identifiants
pubmed: 38200565
doi: 10.1186/s13059-023-03153-y
pii: 10.1186/s13059-023-03153-y
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
13Subventions
Organisme : Bayerisches Staatsministerium für Bildung und Kultus, Wissenschaft und Kunst
ID : Research network bayresq.net
Informations de copyright
© 2024. The Author(s).
Références
Qi LS, Larson MH, Gilbert LA, Doudna JA, Weissman JS, Arkin AP, et al. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell. 2013;152:1173–83.
doi: 10.1016/j.cell.2013.02.022
pubmed: 23452860
pmcid: 3664290
Bikard D, Jiang W, Samai P, Hochschild A, Zhang F, Marraffini LA. Programmable repression and activation of bacterial gene expression using an engineered CRISPR-Cas system. Nucleic Acids Res. 2013;41:7429–37.
doi: 10.1093/nar/gkt520
pubmed: 23761437
pmcid: 3753641
Luo ML, Leenay RT, Beisel CL. Current and future prospects for CRISPR-based tools in bacteria. Biotechnol Bioeng. 2016;113:930–43.
doi: 10.1002/bit.25851
pubmed: 26460902
Vigouroux A, Bikard D. CRISPR Tools To Control Gene Expression in Bacteria. Microbiol Mol Biol Rev. 2020;84:e00077-e119.
doi: 10.1128/MMBR.00077-19
pubmed: 32238445
pmcid: 7117552
Cain AK, Barquist L, Goodman AL, Paulsen IT, Parkhill J, van Opijnen T. A decade of advances in transposon-insertion sequencing. Nat Rev Genet. 2020; Available from: https://doi.org/10.1038/s41576-020-0244-x .
Jusiak B, Cleto S, Perez-Piñera P, Lu TK. Engineering Synthetic Gene Circuits in Living Cells with CRISPR Technology. Trends Biotechnol. 2016;34:535–47.
doi: 10.1016/j.tibtech.2015.12.014
pubmed: 26809780
Cho S, Shin J, Cho B-K. Applications of CRISPR/Cas System to Bacterial Metabolic Engineering. Int J Mol Sci. 2018;19. Available from: https://doi.org/10.3390/ijms19041089 .
Mougiakos I, Bosma EF, Ganguly J, van der Oost J, van Kranenburg R. Hijacking CRISPR-Cas for high-throughput bacterial metabolic engineering: advances and prospects. Curr Opin Biotechnol. 2018;50:146–57.
doi: 10.1016/j.copbio.2018.01.002
pubmed: 29414054
Liao C, Ttofali F, Slotkowski RA, Denny SR, Cecil TD, Leenay RT, et al. Modular one-pot assembly of CRISPR arrays enables library generation and reveals factors influencing crRNA biogenesis. Nat Commun. 2019;10:2948.
doi: 10.1038/s41467-019-10747-3
pubmed: 31270316
pmcid: 6610086
Reis AC, Halper SM, Vezeau GE, Cetnar DP, Hossain A, Clauer PR, et al. Simultaneous repression of multiple bacterial genes using nonrepetitive extra-long sgRNA arrays. Nat Biotechnol. 2019;37:1294–301.
doi: 10.1038/s41587-019-0286-9
pubmed: 31591552
Doench JG, Hartenian E, Graham DB, Tothova Z, Hegde M, Smith I, et al. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nat Biotechnol. 2014;32:1262–7.
doi: 10.1038/nbt.3026
pubmed: 25184501
pmcid: 4262738
Wong N, Liu W, Wang X. WU-CRISPR: characteristics of functional guide RNAs for the CRISPR/Cas9 system. Genome Biol. 2015;16:218.
doi: 10.1186/s13059-015-0784-0
pubmed: 26521937
pmcid: 4629399
Labun K, Montague TG, Gagnon JA, Thyme SB, Valen E. CHOPCHOP v2: a web tool for the next generation of CRISPR genome engineering. Nucleic Acids Res. 2016;44:W272–6.
doi: 10.1093/nar/gkw398
pubmed: 27185894
pmcid: 4987937
Moreno-Mateos MA, Vejnar CE, Beaudoin J-D, Fernandez JP, Mis EK, Khokha MK, et al. CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo. Nat Methods. 2015;12:982–8.
doi: 10.1038/nmeth.3543
pubmed: 26322839
pmcid: 4589495
Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat Biotechnol. 2016;34:184–91.
doi: 10.1038/nbt.3437
pubmed: 26780180
pmcid: 4744125
Chuai G, Ma H, Yan J, Chen M, Hong N, Xue D, et al. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biol. 2018;19:80.
doi: 10.1186/s13059-018-1459-4
pubmed: 29945655
pmcid: 6020378
Wang D, Zhang C, Wang B, Li B, Wang Q, Liu D, et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat Commun. 2019;10:4284.
doi: 10.1038/s41467-019-12281-8
pubmed: 31537810
pmcid: 6753114
Kim HK, Kim Y, Lee S, Min S, Bae JY, Choi JW, et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance. Sci Adv. 2019;5:eaax9249.
Xiang X, Corsi GI, Anthon C, Qu K, Pan X, Liang X, et al. Enhancing CRISPR-Cas9 gRNA efficiency prediction by data integration and deep learning. Nat Commun. 2021;12:3238.
doi: 10.1038/s41467-021-23576-0
pubmed: 34050182
pmcid: 8163799
Calvo-Villamañán A, Ng JW, Planel R, Ménager H, Chen A, Cui L, et al. On-target activity predictions enable improved CRISPR-dCas9 screens in bacteria. Nucleic Acids Res. 2020; Available from: https://doi.org/10.1093/nar/gkaa294 .
Rousset F, Cui L, Siouve E, Becavin C, Depardieu F, Bikard D. Genome-wide CRISPR-dCas9 screens in E. coli identify essential genes and phage host factors. PLoS Genet. 2018;14:e1007749.
Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F. Efficient and Robust Automated Machine Learning. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, editors. Advances in Neural Information Processing Systems 28. Cambridge: Curran Associates, Inc.; 2015. p. 2962–70.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12:2825–30.
Baba T, Ara T, Hasegawa M, Takai Y. Construction of Escherichia coli K‐12 in‐frame, single‐gene knockout mutants: the Keio collection. Mol Syst Biol. 2006; Available from: https://www.embopress.org/doi/abs/10.1038/msb4100050 .
Wang T, Guan C, Guo J, Liu B, Wu Y, Xie Z, et al. Pooled CRISPR interference screening enables genome-scale functional genomics study in bacteria with superior performance. Nat Commun. 2018;9:2475.
doi: 10.1038/s41467-018-04899-x
pubmed: 29946130
pmcid: 6018678
Lorenz R, Bernhart SH, Höner Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, et al. ViennaRNA Package 2.0. Algorithms Mol Biol. 2011;6:26.
Lorenz R, Hofacker IL, Bernhart SH. Folding RNA/DNA hybrid duplexes. Bioinformatics. 2012;28:2530–1.
doi: 10.1093/bioinformatics/bts466
pubmed: 22829626
Conway T, Creecy JP, Maddox SM, Grissom JE, Conkle TL, Shadid TM, et al. Unprecedented high-resolution view of bacterial operon architecture revealed by RNA sequencing. MBio. 2014;5:e01442-e1514.
doi: 10.1128/mBio.01442-14
pubmed: 25006232
pmcid: 4161252
Santos-Zavaleta A, Salgado H, Gama-Castro S, Sánchez-Pérez M, Gómez-Romero L, Ledezma-Tejeida D, et al. RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12. Nucleic Acids Res. 2019;47:D212–20.
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2:56–67.
doi: 10.1038/s42256-019-0138-9
pubmed: 32607472
pmcid: 7326367
Cui L, Vigouroux A, Rousset F, Varet H, Khanna V, Bikard D. A CRISPRi screen in E. coli reveals sequence-specific toxicity of dCas9. Nat Commun. 2018;9:1912.
Hajjem A, Bellavance F, Larocque D. Mixed-effects random forest for clustered data. J Stat Comput Simul. 2014;84:1313–28.
doi: 10.1080/00949655.2012.741599
Corsi GI, Qu K, Alkan F, Pan X, Luo Y, Gorodkin J. CRISPR/Cas9 gRNA activity depends on free energy changes and on the target PAM context. Nat Commun. 2022;13:3006.
doi: 10.1038/s41467-022-30515-0
pubmed: 35637227
pmcid: 9151727
Vialetto E, Yu Y, Collins SP, Wandera KG, Barquist L, Beisel CL. A target expression threshold dictates invader defense and prevents autoimmunity by CRISPR-Cas13. Cell Host Microbe. 2022; Available from: https://www.sciencedirect.com/science/article/pii/S1931312822002736 .
Typas A, Nichols RJ, Siegele DA, Shales M, Collins SR, Lim B, et al. High-throughput, quantitative analyses of genetic interactions in E. coli. Nat Methods. 2008;5:781–7.
Butland G, Babu M, Díaz-Mejía JJ, Bohdana F, Phanse S, Gold B, et al. eSGA: E. coli synthetic genetic array analysis. Nat Methods. 2008;5:789–95.
Kuzmin E, VanderSluis B, Wang W, Tan G, Deshpande R, Chen Y, et al. Systematic analysis of complex genetic interactions. Science. 2018;360. Available from: https://doi.org/10.1126/science.aao1729 .
Lian J, HamediRad M, Hu S, Zhao H. Combinatorial metabolic engineering using an orthogonal tri-functional CRISPR system. Nat Commun. 2017;8:1688.
doi: 10.1038/s41467-017-01695-x
pubmed: 29167442
pmcid: 5700065
Cho S, Choe D, Lee E, Kim SC, Palsson BØ, Cho B-K. High-level dCas9 expression induces abnormal cell morphology in Escherichia coli. ACS Synth Biol. 2018; Available from: https://doi.org/10.1021/acssynbio.7b00462 .
Rock JM, Hopkins FF, Chavez A, Diallo M, Chase MR, Gerrick ER, et al. Programmable transcriptional repression in mycobacteria using an orthogonal CRISPR interference platform. Nat Microbiol. 2017;2:16274.
doi: 10.1038/nmicrobiol.2016.274
pubmed: 28165460
pmcid: 5302332
Collias D, Beisel CL. CRISPR technologies and the search for the PAM-free nuclease. Nat Commun. 2021;12:555.
doi: 10.1038/s41467-020-20633-y
pubmed: 33483498
pmcid: 7822910
Alkan F, Wenzel A, Anthon C, Havgaard JH, Gorodkin J. CRISPR-Cas9 off-targeting assessment with nucleic acid duplex energy parameters. Genome Biol. 2018;19:177.
doi: 10.1186/s13059-018-1534-x
pubmed: 30367669
pmcid: 6203265
Tierrafría VH, Rioualen C, Salgado H, Lara P, Gama-Castro S, Lally P, et al. RegulonDB 11.0: Comprehensive high-throughput datasets on transcriptional regulation in Escherichia coli K-12. Microb Genom. 2022;8. Available from: https://doi.org/10.1099/mgen.0.000833 .
Puigbò P, Bravo IG, Garcia-Vallve S. CAIcal: a combined set of tools to assess codon usage adaptation. Biol Direct. 2008;3:38.
doi: 10.1186/1745-6150-3-38
pubmed: 18796141
pmcid: 2553769
Bergstra J, Yamins D, Cox D. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In: Dasgupta S, McAllester D, editors. Proceedings of the 30th International Conference on Machine Learning. Atlanta: PMLR; 2013. p. 115–23.
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv [cs.LG]. 2019. Available from: http://arxiv.org/abs/1912.01703 .
Ioffe S, Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv [cs.LG]. 2015. Available from: http://arxiv.org/abs/1502.03167 .
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15:1929–58.
Loshchilov I, Hutter F. Decoupled Weight Decay Regularization. arXiv [cs.LG]. 2017. Available from: http://arxiv.org/abs/1711.05101 .
Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. arXiv [cs.LG]. 2014. Available from: http://arxiv.org/abs/1412.6980 .
Bushnell B, Rood J, Singer E. BBMerge – Accurate paired shotgun read merging via overlap. PLoS ONE. 2017;12: e0185056.
doi: 10.1371/journal.pone.0185056
pubmed: 29073143
pmcid: 5657622
Robinson MD, McCarthy DJ, Smyth GK. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2009;26:139–40.
doi: 10.1093/bioinformatics/btp616
pubmed: 19910308
pmcid: 2796818
Yu, Y, Gawlitt, S, Barros de Andrade e Sousa L, Merdivan E, Piraud M, Beisel CL, Barquist L. CRISPRi_guide_efficiency_bacteria. Github. https://github.com/BarquistLab/CRISPRi_guide_efficiency_bacteria .
Yu, Y, Gawlitt, S, Barros de Andrade e Sousa L, Merdivan E, Piraud M, Beisel CL, Barquist L. BarquistLab/CRISPRi_guide_efficiency_bacteria: version 1.0. Zenodo. https://zenodo.org/doi/10.5281/zenodo.10262866 .
Yu, Y, Gawlitt, S, Beisel CL, Barquist L. Improved prediction of bacterial CRISPRi guide efficiency from depletion screens through mixed-effect machine learning and data integration. NCBI GEO. GSE196911. 2023. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?&acc=GSE196911 .