Be-dataHIVE: a base editing database.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
15 Oct 2024
Historique:
received: 04 02 2024
accepted: 13 08 2024
medline: 16 10 2024
pubmed: 16 10 2024
entrez: 15 10 2024
Statut: epublish

Résumé

Base editing is an enhanced gene editing approach that enables the precise transformation of single nucleotides and has the potential to cure rare diseases. The design process of base editors is labour-intensive and outcomes are not easily predictable. For any clinical use, base editing has to be accurate and efficient. Thus, any bystander mutations have to be minimized. In recent years, computational models to predict base editing outcomes have been developed. However, the overall robustness and performance of those models is limited. One way to improve the performance is to train models on a diverse, feature-rich, and large dataset, which does not exist for the base editing field. Hence, we develop BE-dataHIVE, a mySQL database that covers over 460,000 gRNA target combinations. The current version of BE-dataHIVE consists of data from five studies and is enriched with melting temperatures and energy terms. Furthermore, multiple different data structures for machine learning were computed and are directly available. The database can be accessed via our website https://be-datahive.com/ or API and is therefore suitable for practitioners and machine learning researchers.

Identifiants

pubmed: 39407093
doi: 10.1186/s12859-024-05898-0
pii: 10.1186/s12859-024-05898-0
doi:

Substances chimiques

RNA, Guide, CRISPR-Cas Systems 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

330

Informations de copyright

© 2024. The Author(s).

Références

Komor AC, Kim YB, Packer MS, Zuris JA, Liu DR. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature. 2016;533(7603):420–4.
doi: 10.1038/nature17946 pubmed: 27096365 pmcid: 4873371
Göknur G, Saima I, Herold Marco J, Papenfuss AT. A systematic review of computational methods for designing efficient guides for CRISPR DNA base editor systems. Brief Bioinform. 2023;24(4):bbad205.
doi: 10.1093/bib/bbad205
Pallaseni A, Peets EM, Koeppel J, Weller J, Vanderstichele T, Ho UL, Crepaldi L, van Leeuwen J, Allen F, Parts L. Predicting base editing outcomes using position-specific sequence determinants. Nucleic Acids Res. 2022;50(6):3551–64.
doi: 10.1093/nar/gkac161 pubmed: 35286377 pmcid: 8989541
Mak JK, Störtz F, Minary P. Comprehensive computational analysis of epigenetic descriptors affecting crispr-cas9 off-target activity. BMC Genom. 2022;23:805.
doi: 10.1186/s12864-022-09012-7
Störtz F, Mak J, Minary P. picrispr: Physically informed deep learning models for crispr/cas9 off-target cleavage prediction. Artif Intell Life Sci. 2023;3:100075.
Arbab M, Shen MW, Mok B, Wilson C, Matuszek Z, Cassa CA, Liu DR. Determinants of base editing outcomes from target library analysis and machine learning. Cell. 2020;182(2):463-480.e30.
doi: 10.1016/j.cell.2020.05.037 pubmed: 32533916 pmcid: 7384975
Song M, Kim HK, Lee S, Kim Y, Seo S-Y, Park J, Choi JW, Jang H, Shin JH, Min S, Quan Z, Kim JH, Kang HC, Yoon S, Kim HH. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nat Biotechnol. 2020;38(9):1037–43.
doi: 10.1038/s41587-020-0573-5 pubmed: 32632303
Yuan T, Yan N, Fei T, Zheng J, Meng J, Li N, Liu J, Zhang H, Xie L, Ying W, Li D, Shi L, Sun Y, Li Y, Li Y, Sun Y, Zuo E. Optimization of C-to-G base editors with sequence context preference predictable by machine learning methods. Nat Commun. 2021;12(1):4902.
doi: 10.1038/s41467-021-25217-y pubmed: 34385461 pmcid: 8361092
Marquart KF, Allam A, Janjuha S, Sintsova A, Villiger L, Frey N, Krauthammer M, Schwank G. Predicting base editing outcomes with an attention-based deep learning algorithm trained on high-throughput target library screens. Nat Commun. 2021;12(1):5114.
doi: 10.1038/s41467-021-25375-z pubmed: 34433819 pmcid: 8387386
Dandage R, Després PC, Yachie N, Landry CR. Beditor: a computational workflow for designing libraries of guide RNAs for CRISPR-mediated base editing. Genetics. 2019;212(2):377–85.
doi: 10.1534/genetics.119.302089 pubmed: 30936113 pmcid: 6553823
Koblan LW, Arbab M, Shen MW, Hussmann JA, Anzalone AV, Doman JL, Newby GA, Yang D, Mok B, Replogle JM, Albert X, Sisley TA, Weissman JS, Adamson B, Liu DR. Efficient C·G-to-G·C base editors developed using CRISPRi screens, target-library analysis, and machine learning. Nat Biotechnol. 2021;39(11):1414–25.
doi: 10.1038/s41587-021-00938-z pubmed: 34183861 pmcid: 8985520
Störtz F, Minary P. crisprSQL: a novel database platform for CRISPR/Cas off-target cleavage assays. Nucleic Acids Res. 2021;49(D1):D855–61.
doi: 10.1093/nar/gkaa885 pubmed: 33084893
Alkan F, Wenzel A, Anthon C, Havgaard JH, Gorodkin J. CRISPR-Cas9 off-targeting assessment with nucleic acid duplex energy parameters. Genome Biol. 2018;19(1):177.
doi: 10.1186/s13059-018-1534-x pubmed: 30367669 pmcid: 6203265
Gruber AR, Lorenz R, Bernhart SH, Neuböck R, Hofacker IL. The Vienna RNA websuite. Nucleic Acids Res. 2008;36:W70–4.
doi: 10.1093/nar/gkn188 pubmed: 18424795 pmcid: 2447809
Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJL. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–3.
doi: 10.1093/bioinformatics/btp163 pubmed: 19304878 pmcid: 2682512
Ito EA, Katahira I, da Rocha Vicente FF, Pereira LFP, Lopes FM. BASiNET—biological sequences NETwork: a case study on coding and non-coding RNAs identification. Nucleic Acids Res. 2018;46(16):e96–e96.
doi: 10.1093/nar/gky462 pubmed: 29873784 pmcid: 6144827
Anjum MM, Asadullah TI, Sohel RM. CNN model with hilbert curve representation of DNA sequence for enhancer prediction. bioRxiv. 2019. https://doi.org/10.1101/552141 .
doi: 10.1101/552141
Mingyang Z, Yujia H. Epishilbert Min Zhu. Prediction of enhancer-promoter interactions via Hilbert curve encoding and transfer learning. Genes. 2021;12(9):1385.
doi: 10.3390/genes12091385
Hilbert D. über die stetige abbildung einer linie auf ein flächenstück. Math Ann. 1891. https://doi.org/10.1007/BF01199431 .
doi: 10.1007/BF01199431
Hamilton William L, Ying R, Leskovec J. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Red Hook, NY, USA, Curran Associates Inc.; 2017. p 1025–1035.
Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.
doi: 10.1109/5.726791

Auteurs

Lucas Schneider (L)

Department of Computer Science, University of Oxford, Parks Road, Oxford, OX1 3QD, UK. lucas.schneider@cs.ox.ac.uk.

Peter Minary (P)

Department of Computer Science, University of Oxford, Parks Road, Oxford, OX1 3QD, UK. peter.minary@cs.ox.ac.uk.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH