CREDO: a friendly Customizable, REproducible, DOcker file generator for bioinformatics applications.

Bioinformatics Docker Open science Reproducibility Software sharing

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
12 Mar 2024
Historique:
received: 14 03 2023
accepted: 09 02 2024
medline: 13 3 2024
pubmed: 13 3 2024
entrez: 13 3 2024
Statut: epublish

Résumé

The analysis of large and complex biological datasets in bioinformatics poses a significant challenge to achieving reproducible research outcomes due to inconsistencies and the lack of standardization in the analysis process. These issues can lead to discrepancies in results, undermining the credibility and impact of bioinformatics research and creating mistrust in the scientific process. To address these challenges, open science practices such as sharing data, code, and methods have been encouraged. CREDO, a Customizable, REproducible, DOcker file generator for bioinformatics applications, has been developed as a tool to moderate reproducibility issues by building and distributing docker containers with embedded bioinformatics tools. CREDO simplifies the process of generating Docker images, facilitating reproducibility and efficient research in bioinformatics. The crucial step in generating a Docker image is creating the Dockerfile, which requires incorporating heterogeneous packages and environments such as Bioconductor and Conda. CREDO stores all required package information and dependencies in a Github-compatible format to enhance Docker image reproducibility, allowing easy image creation from scratch. The user-friendly GUI and CREDO's ability to generate modular Docker images make it an ideal tool for life scientists to efficiently create Docker images. Overall, CREDO is a valuable tool for addressing reproducibility issues in bioinformatics research and promoting open science practices.

Sections du résumé

BACKGROUND BACKGROUND
The analysis of large and complex biological datasets in bioinformatics poses a significant challenge to achieving reproducible research outcomes due to inconsistencies and the lack of standardization in the analysis process. These issues can lead to discrepancies in results, undermining the credibility and impact of bioinformatics research and creating mistrust in the scientific process. To address these challenges, open science practices such as sharing data, code, and methods have been encouraged.
RESULTS RESULTS
CREDO, a Customizable, REproducible, DOcker file generator for bioinformatics applications, has been developed as a tool to moderate reproducibility issues by building and distributing docker containers with embedded bioinformatics tools. CREDO simplifies the process of generating Docker images, facilitating reproducibility and efficient research in bioinformatics. The crucial step in generating a Docker image is creating the Dockerfile, which requires incorporating heterogeneous packages and environments such as Bioconductor and Conda. CREDO stores all required package information and dependencies in a Github-compatible format to enhance Docker image reproducibility, allowing easy image creation from scratch. The user-friendly GUI and CREDO's ability to generate modular Docker images make it an ideal tool for life scientists to efficiently create Docker images. Overall, CREDO is a valuable tool for addressing reproducibility issues in bioinformatics research and promoting open science practices.

Identifiants

pubmed: 38475691
doi: 10.1186/s12859-024-05695-9
pii: 10.1186/s12859-024-05695-9
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

110

Subventions

Organisme : National Centre for HPC, Big Data and Quantum Computing
ID : CN00000013

Informations de copyright

© 2024. The Author(s).

Références

Kulkarni N, Alessandri L, Panero R, Arigoni M, Olivero M, Ferrero G, Cordero F, Beccuti M, Calogero RA. Reproducible bioinformatics project: a community for reproducible bioinformatics analysis pipelines. BMC Bioinform. 2018;19(Suppl 10):349. https://doi.org/10.1186/s12859-018-2296-x .
doi: 10.1186/s12859-018-2296-x
Bayat A. Science, medicine, and the future: bioinformatics. BMJ. 2002;324(7344):1018–22. https://doi.org/10.1136/bmj.324.7344.1018 .
doi: 10.1136/bmj.324.7344.1018 pubmed: 11976246 pmcid: 1122955
Dall’Alba G, Casa PL, Abreu FP, Notari DL, de Avila ESS. A survey of biological data in a big data perspective. Big Data. 2022;10(4):279–97. https://doi.org/10.1089/big.2020.0383 .
doi: 10.1089/big.2020.0383 pubmed: 35394342
Sun W, Nasraoui O, Shafto P. Evolution and impact of bias in human and machine learning algorithm interaction. PLoS ONE. 2020;15(8):e0235502. https://doi.org/10.1371/journal.pone.0235502 .
doi: 10.1371/journal.pone.0235502 pubmed: 32790666 pmcid: 7425868
Hollmann S, Kremer A, Baebler S, Trefois C, Gruden K, Rudnicki WR, Tong W, Gruca A, Bongcam-Rudloff E, Evelo CT, Nechyporenko A, Frohme M, Safranek D, Regierer B, D’Elia D. The need for standardisation in life science research—an approach to excellence and trust. F1000Res. 2020;9:1398. https://doi.org/10.12688/f1000research.27500.2 .
doi: 10.12688/f1000research.27500.2 pubmed: 33604028
Brito JJ, Li J, Moore JH, Greene CS, Nogoy NA, Garmire LX, Mangul S. Recommendations to enhance rigor and reproducibility in biomedical research. Gigascience. 2020. https://doi.org/10.1093/gigascience/giaa056 .
doi: 10.1093/gigascience/giaa056 pubmed: 32940333 pmcid: 7495904
Nust D, Sochat V, Marwick B, Eglen SJ, Head T, Hirst T, Evans BD. Ten simple rules for writing Dockerfiles for reproducible data science. PLoS Comput Biol. 2020;16(11):e1008316. https://doi.org/10.1371/journal.pcbi.1008316 .
doi: 10.1371/journal.pcbi.1008316 pubmed: 33170857 pmcid: 7654784
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5(10):R80. https://doi.org/10.1186/gb-2004-5-10-r80 .
doi: 10.1186/gb-2004-5-10-r80 pubmed: 15461798 pmcid: 545600
https://docs.conda.io/en/latest/ .
Köster J, Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28(19):2520–2. https://doi.org/10.1093/bioinformatics/bts480 .
doi: 10.1093/bioinformatics/bts480 pubmed: 22908215
Merkel D. Docker: lightweight linux containers for consistent development and deployment. Linux J. 2014;2014(239):2–11.

Auteurs

Simone Alessandri (S)

Polytechnic of Turin, Turin, Italy.

Maria L Ratto (ML)

Department of Molecular Biotechnology and Health Sciences, University of Torino, Turin, Italy.

Sergio Rabellino (S)

Department of Computer Science, University of Torino, Turin, Italy.

Gabriele Piacenti (G)

Department of Molecular Biotechnology and Health Sciences, University of Torino, Turin, Italy.

Sandro Gepiro Contaldo (SG)

Department of Computer Science, University of Torino, Turin, Italy.

Simone Pernice (S)

Department of Computer Science, University of Torino, Turin, Italy.

Marco Beccuti (M)

Department of Computer Science, University of Torino, Turin, Italy.

Raffaele A Calogero (RA)

Department of Molecular Biotechnology and Health Sciences, University of Torino, Turin, Italy. raffaele.calogero@unito.it.

Luca Alessandri (L)

Department of Molecular Biotechnology and Health Sciences, University of Torino, Turin, Italy.
Department of Pathology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.

Classifications MeSH