CREDO: a friendly Customizable, REproducible, DOcker file generator for bioinformatics applications.
Bioinformatics
Docker
Open science
Reproducibility
Software sharing
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
12 Mar 2024
12 Mar 2024
Historique:
received:
14
03
2023
accepted:
09
02
2024
medline:
13
3
2024
pubmed:
13
3
2024
entrez:
13
3
2024
Statut:
epublish
Résumé
The analysis of large and complex biological datasets in bioinformatics poses a significant challenge to achieving reproducible research outcomes due to inconsistencies and the lack of standardization in the analysis process. These issues can lead to discrepancies in results, undermining the credibility and impact of bioinformatics research and creating mistrust in the scientific process. To address these challenges, open science practices such as sharing data, code, and methods have been encouraged. CREDO, a Customizable, REproducible, DOcker file generator for bioinformatics applications, has been developed as a tool to moderate reproducibility issues by building and distributing docker containers with embedded bioinformatics tools. CREDO simplifies the process of generating Docker images, facilitating reproducibility and efficient research in bioinformatics. The crucial step in generating a Docker image is creating the Dockerfile, which requires incorporating heterogeneous packages and environments such as Bioconductor and Conda. CREDO stores all required package information and dependencies in a Github-compatible format to enhance Docker image reproducibility, allowing easy image creation from scratch. The user-friendly GUI and CREDO's ability to generate modular Docker images make it an ideal tool for life scientists to efficiently create Docker images. Overall, CREDO is a valuable tool for addressing reproducibility issues in bioinformatics research and promoting open science practices.
Sections du résumé
BACKGROUND
BACKGROUND
The analysis of large and complex biological datasets in bioinformatics poses a significant challenge to achieving reproducible research outcomes due to inconsistencies and the lack of standardization in the analysis process. These issues can lead to discrepancies in results, undermining the credibility and impact of bioinformatics research and creating mistrust in the scientific process. To address these challenges, open science practices such as sharing data, code, and methods have been encouraged.
RESULTS
RESULTS
CREDO, a Customizable, REproducible, DOcker file generator for bioinformatics applications, has been developed as a tool to moderate reproducibility issues by building and distributing docker containers with embedded bioinformatics tools. CREDO simplifies the process of generating Docker images, facilitating reproducibility and efficient research in bioinformatics. The crucial step in generating a Docker image is creating the Dockerfile, which requires incorporating heterogeneous packages and environments such as Bioconductor and Conda. CREDO stores all required package information and dependencies in a Github-compatible format to enhance Docker image reproducibility, allowing easy image creation from scratch. The user-friendly GUI and CREDO's ability to generate modular Docker images make it an ideal tool for life scientists to efficiently create Docker images. Overall, CREDO is a valuable tool for addressing reproducibility issues in bioinformatics research and promoting open science practices.
Identifiants
pubmed: 38475691
doi: 10.1186/s12859-024-05695-9
pii: 10.1186/s12859-024-05695-9
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
110Subventions
Organisme : National Centre for HPC, Big Data and Quantum Computing
ID : CN00000013
Informations de copyright
© 2024. The Author(s).
Références
Kulkarni N, Alessandri L, Panero R, Arigoni M, Olivero M, Ferrero G, Cordero F, Beccuti M, Calogero RA. Reproducible bioinformatics project: a community for reproducible bioinformatics analysis pipelines. BMC Bioinform. 2018;19(Suppl 10):349. https://doi.org/10.1186/s12859-018-2296-x .
doi: 10.1186/s12859-018-2296-x
Bayat A. Science, medicine, and the future: bioinformatics. BMJ. 2002;324(7344):1018–22. https://doi.org/10.1136/bmj.324.7344.1018 .
doi: 10.1136/bmj.324.7344.1018
pubmed: 11976246
pmcid: 1122955
Dall’Alba G, Casa PL, Abreu FP, Notari DL, de Avila ESS. A survey of biological data in a big data perspective. Big Data. 2022;10(4):279–97. https://doi.org/10.1089/big.2020.0383 .
doi: 10.1089/big.2020.0383
pubmed: 35394342
Sun W, Nasraoui O, Shafto P. Evolution and impact of bias in human and machine learning algorithm interaction. PLoS ONE. 2020;15(8):e0235502. https://doi.org/10.1371/journal.pone.0235502 .
doi: 10.1371/journal.pone.0235502
pubmed: 32790666
pmcid: 7425868
Hollmann S, Kremer A, Baebler S, Trefois C, Gruden K, Rudnicki WR, Tong W, Gruca A, Bongcam-Rudloff E, Evelo CT, Nechyporenko A, Frohme M, Safranek D, Regierer B, D’Elia D. The need for standardisation in life science research—an approach to excellence and trust. F1000Res. 2020;9:1398. https://doi.org/10.12688/f1000research.27500.2 .
doi: 10.12688/f1000research.27500.2
pubmed: 33604028
Brito JJ, Li J, Moore JH, Greene CS, Nogoy NA, Garmire LX, Mangul S. Recommendations to enhance rigor and reproducibility in biomedical research. Gigascience. 2020. https://doi.org/10.1093/gigascience/giaa056 .
doi: 10.1093/gigascience/giaa056
pubmed: 32940333
pmcid: 7495904
Nust D, Sochat V, Marwick B, Eglen SJ, Head T, Hirst T, Evans BD. Ten simple rules for writing Dockerfiles for reproducible data science. PLoS Comput Biol. 2020;16(11):e1008316. https://doi.org/10.1371/journal.pcbi.1008316 .
doi: 10.1371/journal.pcbi.1008316
pubmed: 33170857
pmcid: 7654784
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5(10):R80. https://doi.org/10.1186/gb-2004-5-10-r80 .
doi: 10.1186/gb-2004-5-10-r80
pubmed: 15461798
pmcid: 545600
https://docs.conda.io/en/latest/ .
Köster J, Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28(19):2520–2. https://doi.org/10.1093/bioinformatics/bts480 .
doi: 10.1093/bioinformatics/bts480
pubmed: 22908215
Merkel D. Docker: lightweight linux containers for consistent development and deployment. Linux J. 2014;2014(239):2–11.