Be-dataHIVE: a base editing database.
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
15 Oct 2024
15 Oct 2024
Historique:
received:
04
02
2024
accepted:
13
08
2024
medline:
16
10
2024
pubmed:
16
10
2024
entrez:
15
10
2024
Statut:
epublish
Résumé
Base editing is an enhanced gene editing approach that enables the precise transformation of single nucleotides and has the potential to cure rare diseases. The design process of base editors is labour-intensive and outcomes are not easily predictable. For any clinical use, base editing has to be accurate and efficient. Thus, any bystander mutations have to be minimized. In recent years, computational models to predict base editing outcomes have been developed. However, the overall robustness and performance of those models is limited. One way to improve the performance is to train models on a diverse, feature-rich, and large dataset, which does not exist for the base editing field. Hence, we develop BE-dataHIVE, a mySQL database that covers over 460,000 gRNA target combinations. The current version of BE-dataHIVE consists of data from five studies and is enriched with melting temperatures and energy terms. Furthermore, multiple different data structures for machine learning were computed and are directly available. The database can be accessed via our website https://be-datahive.com/ or API and is therefore suitable for practitioners and machine learning researchers.
Identifiants
pubmed: 39407093
doi: 10.1186/s12859-024-05898-0
pii: 10.1186/s12859-024-05898-0
doi:
Substances chimiques
RNA, Guide, CRISPR-Cas Systems
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
330Informations de copyright
© 2024. The Author(s).
Références
Komor AC, Kim YB, Packer MS, Zuris JA, Liu DR. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature. 2016;533(7603):420–4.
doi: 10.1038/nature17946
pubmed: 27096365
pmcid: 4873371
Göknur G, Saima I, Herold Marco J, Papenfuss AT. A systematic review of computational methods for designing efficient guides for CRISPR DNA base editor systems. Brief Bioinform. 2023;24(4):bbad205.
doi: 10.1093/bib/bbad205
Pallaseni A, Peets EM, Koeppel J, Weller J, Vanderstichele T, Ho UL, Crepaldi L, van Leeuwen J, Allen F, Parts L. Predicting base editing outcomes using position-specific sequence determinants. Nucleic Acids Res. 2022;50(6):3551–64.
doi: 10.1093/nar/gkac161
pubmed: 35286377
pmcid: 8989541
Mak JK, Störtz F, Minary P. Comprehensive computational analysis of epigenetic descriptors affecting crispr-cas9 off-target activity. BMC Genom. 2022;23:805.
doi: 10.1186/s12864-022-09012-7
Störtz F, Mak J, Minary P. picrispr: Physically informed deep learning models for crispr/cas9 off-target cleavage prediction. Artif Intell Life Sci. 2023;3:100075.
Arbab M, Shen MW, Mok B, Wilson C, Matuszek Z, Cassa CA, Liu DR. Determinants of base editing outcomes from target library analysis and machine learning. Cell. 2020;182(2):463-480.e30.
doi: 10.1016/j.cell.2020.05.037
pubmed: 32533916
pmcid: 7384975
Song M, Kim HK, Lee S, Kim Y, Seo S-Y, Park J, Choi JW, Jang H, Shin JH, Min S, Quan Z, Kim JH, Kang HC, Yoon S, Kim HH. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nat Biotechnol. 2020;38(9):1037–43.
doi: 10.1038/s41587-020-0573-5
pubmed: 32632303
Yuan T, Yan N, Fei T, Zheng J, Meng J, Li N, Liu J, Zhang H, Xie L, Ying W, Li D, Shi L, Sun Y, Li Y, Li Y, Sun Y, Zuo E. Optimization of C-to-G base editors with sequence context preference predictable by machine learning methods. Nat Commun. 2021;12(1):4902.
doi: 10.1038/s41467-021-25217-y
pubmed: 34385461
pmcid: 8361092
Marquart KF, Allam A, Janjuha S, Sintsova A, Villiger L, Frey N, Krauthammer M, Schwank G. Predicting base editing outcomes with an attention-based deep learning algorithm trained on high-throughput target library screens. Nat Commun. 2021;12(1):5114.
doi: 10.1038/s41467-021-25375-z
pubmed: 34433819
pmcid: 8387386
Dandage R, Després PC, Yachie N, Landry CR. Beditor: a computational workflow for designing libraries of guide RNAs for CRISPR-mediated base editing. Genetics. 2019;212(2):377–85.
doi: 10.1534/genetics.119.302089
pubmed: 30936113
pmcid: 6553823
Koblan LW, Arbab M, Shen MW, Hussmann JA, Anzalone AV, Doman JL, Newby GA, Yang D, Mok B, Replogle JM, Albert X, Sisley TA, Weissman JS, Adamson B, Liu DR. Efficient C·G-to-G·C base editors developed using CRISPRi screens, target-library analysis, and machine learning. Nat Biotechnol. 2021;39(11):1414–25.
doi: 10.1038/s41587-021-00938-z
pubmed: 34183861
pmcid: 8985520
Störtz F, Minary P. crisprSQL: a novel database platform for CRISPR/Cas off-target cleavage assays. Nucleic Acids Res. 2021;49(D1):D855–61.
doi: 10.1093/nar/gkaa885
pubmed: 33084893
Alkan F, Wenzel A, Anthon C, Havgaard JH, Gorodkin J. CRISPR-Cas9 off-targeting assessment with nucleic acid duplex energy parameters. Genome Biol. 2018;19(1):177.
doi: 10.1186/s13059-018-1534-x
pubmed: 30367669
pmcid: 6203265
Gruber AR, Lorenz R, Bernhart SH, Neuböck R, Hofacker IL. The Vienna RNA websuite. Nucleic Acids Res. 2008;36:W70–4.
doi: 10.1093/nar/gkn188
pubmed: 18424795
pmcid: 2447809
Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJL. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–3.
doi: 10.1093/bioinformatics/btp163
pubmed: 19304878
pmcid: 2682512
Ito EA, Katahira I, da Rocha Vicente FF, Pereira LFP, Lopes FM. BASiNET—biological sequences NETwork: a case study on coding and non-coding RNAs identification. Nucleic Acids Res. 2018;46(16):e96–e96.
doi: 10.1093/nar/gky462
pubmed: 29873784
pmcid: 6144827
Anjum MM, Asadullah TI, Sohel RM. CNN model with hilbert curve representation of DNA sequence for enhancer prediction. bioRxiv. 2019. https://doi.org/10.1101/552141 .
doi: 10.1101/552141
Mingyang Z, Yujia H. Epishilbert Min Zhu. Prediction of enhancer-promoter interactions via Hilbert curve encoding and transfer learning. Genes. 2021;12(9):1385.
doi: 10.3390/genes12091385
Hilbert D. über die stetige abbildung einer linie auf ein flächenstück. Math Ann. 1891. https://doi.org/10.1007/BF01199431 .
doi: 10.1007/BF01199431
Hamilton William L, Ying R, Leskovec J. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Red Hook, NY, USA, Curran Associates Inc.; 2017. p 1025–1035.
Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.
doi: 10.1109/5.726791