Synergizing Off-Target Predictions for In Silico Insights of CENH3 Knockout in Cannabis through CRISPR/Cas.
CENH3
CFD score
MIT score
ensemble model
genome editing
hemp
machine learning algorithm
marijuana
sgRNA
Journal
Molecules (Basel, Switzerland)
ISSN: 1420-3049
Titre abrégé: Molecules
Pays: Switzerland
ID NLM: 100964009
Informations de publication
Date de publication:
03 Apr 2021
03 Apr 2021
Historique:
received:
05
02
2021
revised:
25
03
2021
accepted:
31
03
2021
entrez:
30
4
2021
pubmed:
1
5
2021
medline:
13
5
2021
Statut:
epublish
Résumé
The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas-mediated genome editing system has recently been used for haploid production in plants. Haploid induction using the CRISPR/Cas system represents an attractive approach in cannabis, an economically important industrial, recreational, and medicinal plant. However, the CRISPR system requires the design of precise (on-target) single-guide RNA (sgRNA). Therefore, it is essential to predict off-target activity of the designed sgRNAs to avoid unexpected outcomes. The current study is aimed to assess the predictive ability of three machine learning (ML) algorithms (radial basis function (RBF), support vector machine (SVM), and random forest (RF)) alongside the ensemble-bagging (E-B) strategy by synergizing MIT and cutting frequency determination (CFD) scores to predict sgRNA off-target activity through in silico targeting a histone H3-like centromeric protein, HTR12, in cannabis. The RF algorithm exhibited the highest precision, recall, and F-measure compared to all the tested individual algorithms with values of 0.61, 0.64, and 0.62, respectively. We then used the RF algorithm as a meta-classifier for the E-B method, which led to an increased precision with an F-measure of 0.62 and 0.66, respectively. The E-B algorithm had the highest area under the precision recall curves (AUC-PRC; 0.74) and area under the receiver operating characteristic (ROC) curves (AUC-ROC; 0.71), displaying the success of using E-B as one of the common ensemble strategies. This study constitutes a foundational resource of utilizing ML models to predict gRNA off-target activities in cannabis.
Identifiants
pubmed: 33916717
pii: molecules26072053
doi: 10.3390/molecules26072053
pmc: PMC8038328
pii:
doi:
Substances chimiques
Histones
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Références
Comput Struct Biotechnol J. 2020 Feb 12;18:344-354
pubmed: 32123556
Plant J. 2020 Apr;102(1):6-17
pubmed: 31713923
Mol Ther. 2016 Mar;24(3):475-87
pubmed: 26750397
Appl Microbiol Biotechnol. 2020 Nov;104(22):9449-9485
pubmed: 32984921
Nat Methods. 2018 Jul;15(7):512-514
pubmed: 29786090
Appl Plant Sci. 2020 Jul 28;8(7):e11376
pubmed: 32765975
Nat Biotechnol. 2015 Feb;33(2):175-8
pubmed: 25599175
PLoS One. 2020 Oct 14;15(10):e0240427
pubmed: 33052940
Sci Rep. 2019 Dec 3;9(1):18237
pubmed: 31796784
Nat Rev Genet. 2001 Jan;2(1):69-74
pubmed: 11253074
Front Plant Sci. 2020 May 27;11:645
pubmed: 32670304
Appl Microbiol Biotechnol. 2020 Dec;104(23):10249-10263
pubmed: 33119796
New Phytol. 2021 Apr;230(1):73-89
pubmed: 33283274
Front Plant Sci. 2021 Jan 12;11:624273
pubmed: 33510761
PLoS One. 2020 Sep 30;15(9):e0239901
pubmed: 32997694
Plant Sci. 2019 Jul;284:37-47
pubmed: 31084877
Planta. 2019 Apr;249(4):953-973
pubmed: 30715560
Genome Biol. 2016 Jul 05;17(1):148
pubmed: 27380939
Sci Rep. 2017 Dec 15;7(1):17628
pubmed: 29247163
Bioinformatics. 2015 Apr 1;31(7):1120-3
pubmed: 25414360
Front Plant Sci. 2019 Jul 05;10:869
pubmed: 31333705
Nat Biotechnol. 2015 Feb;33(2):187-197
pubmed: 25513782
Nat Biotechnol. 2019 Mar;37(3):287-292
pubmed: 30833776
Front Plant Sci. 2019 May 16;10:614
pubmed: 31156677
Plant Methods. 2020 Aug 13;16:112
pubmed: 32817755
Nat Biotechnol. 2016 Feb;34(2):184-191
pubmed: 26780180
Bioinformatics. 2014 Apr 15;30(8):1180-1182
pubmed: 24389662
Bioinformatics. 2014 May 15;30(10):1473-5
pubmed: 24463181
Nat Biotechnol. 2013 Sep;31(9):827-32
pubmed: 23873081
Bioinformatics. 2019 Apr 1;35(7):1108-1115
pubmed: 30169558
Molecules. 2020 Sep 04;25(18):
pubmed: 32899626
RNA Biol. 2020 Jan;17(1):13-22
pubmed: 31533522
Mol Ther Nucleic Acids. 2015 Nov 17;4:e264
pubmed: 26575098
Bioinformatics. 2018 Sep 1;34(17):i757-i765
pubmed: 30423065
Nat Biotechnol. 2014 Dec;32(12):1262-7
pubmed: 25184501
Nat Rev Mol Cell Biol. 2020 Nov;21(11):661-677
pubmed: 32973356
BMC Genomics. 2017 Nov 17;18(Suppl 9):826
pubmed: 29219081
J Cell Mol Med. 2020 Apr;24(7):3766-3778
pubmed: 32096600
Nucleic Acids Res. 2014 Jun;42(11):7473-85
pubmed: 24838573
Molecules. 2020 Dec 15;25(24):
pubmed: 33333745
Plant Mol Biol. 2021 Mar;105(4-5):483-495
pubmed: 33385273
Mol Plant. 2019 Apr 1;12(4):597-602
pubmed: 30902686
Plants (Basel). 2021 Jan 19;10(1):
pubmed: 33478171
Nat Biotechnol. 2020 Dec;38(12):1397-1401
pubmed: 33169035
Plant Methods. 2021 Feb 5;17(1):13
pubmed: 33546685
Nat Methods. 2015 Mar;12(3):237-43, 1 p following 243
pubmed: 25664545
Front Plant Sci. 2020 Sep 25;11:573299
pubmed: 33101342