Codon usage and expression-based features significantly improve prediction of CRISPR efficiency.
Journal
NPJ systems biology and applications
ISSN: 2056-7189
Titre abrégé: NPJ Syst Biol Appl
Pays: England
ID NLM: 101677786
Informations de publication
Date de publication:
03 Sep 2024
03 Sep 2024
Historique:
received:
23
02
2024
accepted:
27
08
2024
medline:
4
9
2024
pubmed:
4
9
2024
entrez:
3
9
2024
Statut:
epublish
Résumé
CRISPR is a precise and effective genome editing technology; but despite several advancements during the last decade, our ability to computationally design gRNAs remains limited. Most predictive models have relatively low predictive power and utilize only the sequence of the target site as input. Here we suggest a new category of features, which incorporate the target site genomic position and the presence of genes close to it. We calculate four features based on gene expression and codon usage bias indices. We show, on CRISPR datasets taken from 3 different cell types, that such features perform comparably with 425 state-of-the-art predictive features, ranking in the top 2-12% of features. We trained new predictive models, showing that adding expression features to them significantly improves their r
Identifiants
pubmed: 39227603
doi: 10.1038/s41540-024-00431-8
pii: 10.1038/s41540-024-00431-8
doi:
Substances chimiques
RNA, Guide, CRISPR-Cas Systems
0
Codon
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
100Informations de copyright
© 2024. The Author(s).
Références
Doudna, J. A. & Charpentier, E. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096 (2014).
pubmed: 25430774
doi: 10.1126/science.1258096
Pickar-Oliver, A. & Gersbach, C. A. The next generation of CRISPR–Cas technologies and applications. Nat. Rev. Mol. Cell Biol. 20, 490–507 (2019).
pubmed: 31147612
pmcid: 7079207
doi: 10.1038/s41580-019-0131-5
Li, H. et al. Applications of genome editing technology in the targeted therapy of human diseases: mechanisms, advances and prospects. Signal Transduct. Target. Ther. 5, 1 (2020).
pubmed: 32296011
pmcid: 6946647
doi: 10.1038/s41392-019-0089-y
Moreno-Mateos, M. A. et al. CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo. Nat. Methods 12, 982–988 (2015).
pubmed: 26322839
pmcid: 4589495
doi: 10.1038/nmeth.3543
Singh, R., Kuscu, C., Quinlan, A., Qi, Y. & Adli, M. Cas9-chromatin binding information enables more accurate CRISPR off-target prediction. Nucleic Acids Res. 43, e118–e118 (2015).
pubmed: 26032770
pmcid: 4605288
doi: 10.1093/nar/gkv575
Xu, H. et al. Sequence determinants of improved CRISPR sgRNA design. Genome Res. 25, 1147–1157 (2015).
pubmed: 26063738
pmcid: 4509999
doi: 10.1101/gr.191452.115
Kaur, K., Gupta, A. K., Rajput, A. & Kumar, M. ge-CRISPR - an integrated pipeline for the prediction and analysis of sgRNAs genome editing efficiency for CRISPR/Cas system. Sci. Rep. 6, 30870 (2016).
pubmed: 27581337
pmcid: 5007494
doi: 10.1038/srep30870
Labuhn, M. et al. Refined sgRNA efficacy prediction improves large- and small-scale CRISPR–Cas9 applications. Nucleic Acids Res. 46, 1375–1385 (2018).
pubmed: 29267886
doi: 10.1093/nar/gkx1268
Chari, R., Yeo, N. C., Chavez, A. & Church, G. M. sgRNA Scorer 2.0: a species-independent model to predict CRISPR/Cas9 activity. ACS Synth. Biol. 6, 902–904 (2017).
pubmed: 28146356
pmcid: 5793212
doi: 10.1021/acssynbio.6b00343
Abadi, S., Yan, W. X., Amar, D. & Mayrose, I. A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action. PLOS Comput. Biol. 13, e1005807 (2017).
pubmed: 29036168
pmcid: 5658169
doi: 10.1371/journal.pcbi.1005807
Listgarten, J. et al. Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs. Nat. Biomed. Eng. 2, 38–47 (2018).
pubmed: 29998038
pmcid: 6037314
doi: 10.1038/s41551-017-0178-6
Chuai, G. et al. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biol. 19, 80 (2018).
pubmed: 29945655
pmcid: 6020378
doi: 10.1186/s13059-018-1459-4
Lin, J. & Wong, K.-C. Off-target predictions in CRISPR-Cas9 gene editing using deep learning. Bioinformatics 34, i656–i663 (2018).
pubmed: 30423072
pmcid: 6129261
doi: 10.1093/bioinformatics/bty554
Peng, H., Zheng, Y., Blumenstein, M., Tao, D. & Li, J. CRISPR/Cas9 cleavage efficiency regression through boosting algorithms and Markov sequence profiling. Bioinformatics 34, 3069–3077 (2018).
pubmed: 29672669
doi: 10.1093/bioinformatics/bty298
Alkan, F., Wenzel, A., Anthon, C., Havgaard, J. H. & Gorodkin, J. CRISPR-Cas9 off-targeting assessment with nucleic acid duplex energy parameters. Genome Biol. 19, 177 (2018).
pubmed: 30367669
pmcid: 6203265
doi: 10.1186/s13059-018-1534-x
Wang, D. et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat. Commun. 10, 4284 (2019).
pubmed: 31537810
pmcid: 6753114
doi: 10.1038/s41467-019-12281-8
Xue, L., Tang, B., Chen, W. & Luo, J. Prediction of CRISPR sgRNA activity using a deep convolutional neural network. J. Chem. Inf. Model. 59, 615–624 (2019).
pubmed: 30485088
doi: 10.1021/acs.jcim.8b00368
Zhang, G., Dai, Z. & Dai, X. A novel hybrid CNN-SVR for CRISPR/Cas9 guide RNA activity prediction. Front. Genet. 10, 1303 (2019).
pubmed: 31969902
doi: 10.3389/fgene.2019.01303
Dimauro, G. et al. CRISPRLearner: a deep learning-based system to predict CRISPR/Cas9 sgRNA on-target cleavage efficiency, GiovanniAU - Colagrande. Electronics 8, 1478 (2019).
doi: 10.3390/electronics8121478
Hiranniramol, K., Chen, Y., Liu, W. & Wang, X. Generalizable sgRNA design for improved CRISPR/Cas9 editing efficiency. Bioinformatics 36, 2684–2689 (2020).
pubmed: 31971562
pmcid: 7203743
doi: 10.1093/bioinformatics/btaa041
Niu, R., Peng, J., Zhang, Z. & Shang, X. R-CRISPR: a deep learning network to predict off-target activities with mismatch, insertion and deletion in CRISPR-Cas9 system. Genes 12, 1878 (2021).
pubmed: 34946828
pmcid: 8702036
doi: 10.3390/genes12121878
Zhang, G., Dai, Z. & Dai, X. C-RNNCrispr: prediction of CRISPR/Cas9 sgRNA activity using convolutional and recurrent neural networks. Comput. Struct. Biotechnol. J. 18, 344–354 (2020).
pubmed: 32123556
pmcid: 7037582
doi: 10.1016/j.csbj.2020.01.013
Konstantakos, V., Nentidis, A., Krithara, A. & Paliouras, G. CRISPRedict: a CRISPR-Cas9 web tool for interpretable efficiency predictions. Nucleic Acids Res. 50, W191–W198 (2022).
pubmed: 35670672
pmcid: 9252759
doi: 10.1093/nar/gkac466
Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646–651 (2018).
pubmed: 30405244
pmcid: 6517069
doi: 10.1038/s41586-018-0686-x
Chen, W. et al. Massively parallel profiling and predictive modeling of the outcomes of CRISPR/Cas9-mediated double-strand break repair. Nucleic Acids Res. 47, 7989–8003 (2019).
pubmed: 31165867
pmcid: 6735782
doi: 10.1093/nar/gkz487
Leenay, R. T. et al. Large dataset enables prediction of repair after CRISPR–Cas9 editing in primary T cells. Nat. Biotechnol. 37, 1034–1037 (2019).
pubmed: 31359007
pmcid: 7388783
doi: 10.1038/s41587-019-0203-2
Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. 37, 64–72 (2019).
doi: 10.1038/nbt.4317
Li, V. R., Zhang, Z. & Troyanskaya, O. G. CROTON: an automated and variant-aware deep learning framework for predicting CRISPR/Cas9 editing outcomes. Bioinformatics 37, i342–i348 (2021).
pubmed: 34252931
pmcid: 8275342
doi: 10.1093/bioinformatics/btab268
Zhu, L. J., Holmes, B. R., Aronin, N. & Brodsky, M. H. CRISPRseek: a Bioconductor package to identify target-specific guide RNAs for CRISPR-Cas9 genome-editing systems. PLoS One 9, e108424 (2014).
pubmed: 25247697
pmcid: 4172692
doi: 10.1371/journal.pone.0108424
Xie, S., Shen, B., Zhang, C., Huang, X. & Zhang, Y. sgRNAcas9: a software package for designing CRISPR sgRNA and evaluating potential off-target cleavage sites. PLoS One 9, e100448 (2014).
pubmed: 24956386
pmcid: 4067335
doi: 10.1371/journal.pone.0100448
Bae, S., Park, J. & Kim, J.-S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473–1475 (2014).
pubmed: 24463181
pmcid: 4016707
doi: 10.1093/bioinformatics/btu048
Xiao, A. et al. CasOT: a genome-wide Cas9/gRNA off-target searching tool. Bioinformatics 30, 1180–1182 (2014).
pubmed: 24389662
doi: 10.1093/bioinformatics/btt764
Heigwer, F., Kerr, G. & Boutros, M. E-CRISP: fast CRISPR target site identification. Nat. Methods 11, 122–123 (2014).
pubmed: 24481216
doi: 10.1038/nmeth.2812
Cradick, T. J., Qiu, P., Lee, C. M., Fine, E. J. & Bao, G. COSMID: a web-based tool for identifying and validating CRISPR/Cas off-target sites. Mol. Ther. Nucleic Acids. 3, e214 (2014).
pubmed: 25462530
pmcid: 4272406
doi: 10.1038/mtna.2014.64
Stemmer, M., Thumberger, T., del Sol Keyer, M., Wittbrodt, J. & Mateo, J. L. CCTop: an intuitive, flexible and reliable CRISPR/Cas9 target prediction tool. PLoS One 10, e0124633 (2015).
pubmed: 25909470
pmcid: 4409221
doi: 10.1371/journal.pone.0124633
Liu, H. et al. CRISPR-ERA: a comprehensive design tool for CRISPR-mediated gene editing, repression and activation. Bioinformatics 31, 3676–3678 (2015).
pubmed: 26209430
pmcid: 4757951
doi: 10.1093/bioinformatics/btv423
Peng, D. & Tarleton, R. EuPaGDT: a web tool tailored to design CRISPR guide RNAs for eukaryotic pathogens. Microb. Genom. 1, e000033 (2015).
pubmed: 28348817
pmcid: 5320623
Oliveros, J. C. et al. Breaking-Cas—interactive design of guide RNAs for CRISPR-Cas experiments for ENSEMBL genomes. Nucleic Acids Res. 44, W267–W271 (2016).
pubmed: 27166368
pmcid: 4987939
doi: 10.1093/nar/gkw407
Pulido-Quetglas, C. et al. Scalable design of paired CRISPR guide RNAs for genomic deletion. PLOS Comput. Biol. 13, e1005341 (2017).
pubmed: 28253259
pmcid: 5333799
doi: 10.1371/journal.pcbi.1005341
Perez, A. R. et al. GuideScan software for improved single and paired CRISPR guide RNA design. Nat. Biotechnol. 35, 347–349 (2017).
pubmed: 28263296
pmcid: 5607865
doi: 10.1038/nbt.3804
Liu, H. et al. CRISPR-P 2.0: an improved CRISPR-Cas9 tool for genome editing in plants. Mol. Plant 10, 530–532 (2017).
pubmed: 28089950
doi: 10.1016/j.molp.2017.01.003
Xie, X. et al. CRISPR-GE: a convenient software toolkit for CRISPR-based genome editing. Mol. Plant 10, 1246–1249 (2017).
pubmed: 28624544
doi: 10.1016/j.molp.2017.06.004
Concordet, J.-P. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 46, W242–W245 (2018).
pubmed: 29762716
pmcid: 6030908
doi: 10.1093/nar/gky354
McKenna, A. & Shendure, J. FlashFry: a fast and flexible tool for large-scale CRISPR target design. BMC Biol. 16, 74 (2018).
pubmed: 29976198
pmcid: 6033233
doi: 10.1186/s12915-018-0545-0
Peng, H., Zheng, Y., Zhao, Z., Liu, T. & Li, J. Recognition of CRISPR/Cas9 off-target sites through ensemble learning of uneven mismatch distributions. Bioinformatics 34, i757–i765 (2018).
pubmed: 30423065
doi: 10.1093/bioinformatics/bty558
Jacquin, A. L. S., Odom, D. T. & Lukk, M. Crisflash: open-source software to generate CRISPR guide RNAs against genomes annotated with individual variation. Bioinformatics 35, 3146–3147 (2019).
pubmed: 30649181
pmcid: 6735888
doi: 10.1093/bioinformatics/btz019
Labun, K. et al. CHOPCHOP v3: expanding the CRISPR web toolbox beyond genome editing. Nucleic Acids Res. 47, W171–W174 (2019).
pubmed: 31106371
pmcid: 6602426
doi: 10.1093/nar/gkz365
Minkenberg, B., Zhang, J., Xie, K. & Yang, Y. CRISPR-PLANT v2: an online resource for highly specific guide RNA spacers based on improved off-target analysis. Plant Biotechnol. J. 17, 5–8 (2019).
pubmed: 30325102
doi: 10.1111/pbi.13025
Bao, X. R., Pan, Y., Lee, C. M., Davis, T. H. & Bao, G. Tools for experimental and computational analyses of off-target editing by programmable nucleases. Nat. Protoc. 16, 10–26 (2021).
pubmed: 33288953
doi: 10.1038/s41596-020-00431-y
Newman, A., Starrs, L. & Burgio, G. Cas9 cuts and consequences; detecting, predicting, and mitigating CRISPR/Cas9 on- and off-target damage. BioEssays 42, 2000047 (2020).
doi: 10.1002/bies.202000047
Sledzinski, P., Nowaczyk, M. & Olejniczak, M. Computational tools and resources supporting CRISPR-Cas experiments. Cells 9, 1288 (2020).
pubmed: 32455882
pmcid: 7290941
doi: 10.3390/cells9051288
Wang, J., Zhang, X., Cheng, L. & Luo, Y. An overview and metanalysis of machine and deep learning-based CRISPR gRNA design tools. RNA Biol. 17, 13–22 (2020).
pubmed: 31533522
doi: 10.1080/15476286.2019.1669406
Konstantakos, V., Nentidis, A., Krithara, A. & Paliouras, G. CRISPR–Cas9 gRNA efficiency prediction: an overview of predictive tools and the role of deep learning. Nucleic Acids Res. 50, 3616–3637 (2022).
pubmed: 35349718
pmcid: 9023298
doi: 10.1093/nar/gkac192
Alipanahi, R., Safari, L. & Khanteymoori, A. CRISPR genome editing using computational approaches: a survey. Front. Bioinforma. 2, 1001131 (2023).
doi: 10.3389/fbinf.2022.1001131
Liu, G., Zhang, Y. & Zhang, T. Computational approaches for effective CRISPR guide RNA design and evaluation. Comput. Struct. Biotechnol. J. 18, 35–44 (2020).
pubmed: 31890142
doi: 10.1016/j.csbj.2019.11.006
Buccitelli, C. & Selbach, M. mRNAs, proteins and the emerging principles of gene expression control. Nat. Rev. Genet. 21, 630–644 (2020).
pubmed: 32709985
doi: 10.1038/s41576-020-0258-4
Bergman, S. & Tuller, T. Widespread non-modular overlapping codes in the coding regions. Phys. Biol. 17, 31002 (2020).
doi: 10.1088/1478-3975/ab7083
Bahiri-Elitzur, S. & Tuller, T. Codon-based indices for modeling gene expression and transcript evolution. Comput. Struct. Biotechnol. J. 19, 2646–2663 (2021).
pubmed: 34025951
pmcid: 8122159
doi: 10.1016/j.csbj.2021.04.042
Schmid-Burgk, J. L. et al. Highly parallel profiling of Cas9 variant specificity. Mol. Cell 78, 794–800.e8 (2020).
pubmed: 32187529
pmcid: 7370240
doi: 10.1016/j.molcel.2020.02.023
Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol. 33, 187–197 (2015).
pubmed: 25513782
doi: 10.1038/nbt.3117
Martin, F. J. et al. Ensembl 2023. Nucleic Acids Res. 51, D933–D941 (2023).
pubmed: 36318249
doi: 10.1093/nar/gkac958
Moreno, P. et al. Expression Atlas update: gene and protein expression in multiple species. Nucleic Acids Res. 50, D129–D140 (2022).
pubmed: 34850121
doi: 10.1093/nar/gkab1030
Diament, A. et al. ChimeraUGEM: unsupervised gene expression modeling in any given organism. Bioinformatics https://doi.org/10.1093/bioinformatics/btz080 (2019).
Pechmann, S. & Frydman, J. Evolutionary conservation of codon optimality reveals hidden signatures of cotranslational folding. Nat. Struct. Mol. Biol. 20, 237–243 (2013).
pubmed: 23262490
doi: 10.1038/nsmb.2466
Roymondal, U., Das, S. & Sahoo, S. Predicting gene expression level from relative codon usage bias: an application to Escherichia coli genome. DNA Res. 16, 13–30 (2009).
pubmed: 19131380
pmcid: 2646356
doi: 10.1093/dnares/dsn029
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191 (2016).
pubmed: 26780180
pmcid: 4744125
doi: 10.1038/nbt.3437
Kwon, K. H. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning–based model with high generalization performance. Sci. Adv. 5, eaax9249 (2022).
Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
pubmed: 32607472
pmcid: 7326367
doi: 10.1038/s42256-019-0138-9
Sternberg, S. H., Redding, S., Jinek, M., Greene, E. C. & Doudna, J. A. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 507, 62–67 (2014).
pubmed: 24476820
pmcid: 4106473
doi: 10.1038/nature13011
Sharp, P. M. & Li, W. H. The codon Adaptation Index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295 (1987).
pubmed: 3547335
pmcid: 340524
doi: 10.1093/nar/15.3.1281
Reis, M. D., Savva, R. & Wernisch, L. Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 32, 5036–5044 (2004).
pubmed: 15448185
pmcid: 521650
doi: 10.1093/nar/gkh834
Tuller, T. et al. An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell 141, 344–354 (2010).
pubmed: 20403328
doi: 10.1016/j.cell.2010.03.031
Luo, Y. et al. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res. 48, D882–D889 (2020).
pubmed: 31713622
doi: 10.1093/nar/gkz1062
Lorenz, R. et al. ViennaRNA package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
pubmed: 22115189
pmcid: 3319429
doi: 10.1186/1748-7188-6-26