ENCORE: a practical implementation to improve reproducibility and transparency of computational research.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
16 Sep 2024
16 Sep 2024
Historique:
received:
19
02
2024
accepted:
06
09
2024
medline:
17
9
2024
pubmed:
17
9
2024
entrez:
16
9
2024
Statut:
epublish
Résumé
Reproducibility of computational research is often challenging despite established guidelines and best practices. Translating these guidelines into practical applications remains difficult. Here, we present ENCORE, an approach to enhance transparency and reproducibility by guiding researchers in how to structure and document a computational project. ENCORE builds on previous efforts in computational reproducibility and integrates all project components into a standardized file system structure. It utilizes pre-defined files as documentation templates, leverages GitHub for software versioning, and includes an HTML-based navigator. ENCORE is designed to be agnostic to the type of computational project, data, programming language, and ICT infrastructure, and does not rely on specific software tools. We also share our group's experience using ENCORE, highlighting that the most significant challenge to the routine adoption of approaches like ours is the lack of incentives to motivate researchers to dedicate sufficient time and effort to ensure reproducibility.
Identifiants
pubmed: 39284801
doi: 10.1038/s41467-024-52446-8
pii: 10.1038/s41467-024-52446-8
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
8117Subventions
Organisme : EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 Marie Skłodowska-Curie Actions (H2020 Excellent Science - Marie Skłodowska-Curie Actions)
ID : 765158
Organisme : EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 Marie Skłodowska-Curie Actions (H2020 Excellent Science - Marie Skłodowska-Curie Actions)
ID : 847551
Organisme : EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 Marie Skłodowska-Curie Actions (H2020 Excellent Science - Marie Skłodowska-Curie Actions)
ID : 847551
Organisme : EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 Marie Skłodowska-Curie Actions (H2020 Excellent Science - Marie Skłodowska-Curie Actions)
ID : 847551
Organisme : Innovative Medicines Initiative (IMI)
ID : 831434
Organisme : Innovative Medicines Initiative (IMI)
ID : 831434
Informations de copyright
© 2024. The Author(s).
Références
Stupple, A., Singerman, D. & Celi, L. A. The reproducibility crisis in the age of digital medicine. NPJ Digit Med 2, 2 (2019).
pubmed: 31304352
pmcid: 6550262
doi: 10.1038/s41746-019-0079-z
Ioannidis, J. P. Why most published research findings are false. PLoS Med 2, e124 (2005).
pubmed: 16060722
pmcid: 1182327
doi: 10.1371/journal.pmed.0020124
Fidler F., Wilcox J. Reproducibility of scientific results. the Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/archives/sum2021/entries/scientific-reproducibility (2021).
Diaba-Nuhoho, P. & Amponsah-Offeh, M. Reproducibility and research integrity: the role of scientists and institutions. BMC Res Notes 14, 451 (2021).
pubmed: 34906213
pmcid: 8672590
doi: 10.1186/s13104-021-05875-3
Turkyilmaz-van der Velden, Y., Dintzner, N. & Teperek, M. Reproducibility starts from you today. Patterns (N. Y) 1, 100099 (2020).
pubmed: 33205134
doi: 10.1016/j.patter.2020.100099
Merino Tejero, E. et al. Multiscale modeling of germinal center recapitulates the temporal transition from memory B cells to plasma cells differentiation as regulated by antigen affinity-based Tfh cell help. Front Immunol. 11, 620716 (2020).
pubmed: 33613551
doi: 10.3389/fimmu.2020.620716
Lashgari, D. et al. From affinity selection to kinetic selection in Germinal Centre modelling. PLoS Comput Biol. 18, e1010168 (2022).
pubmed: 35658003
pmcid: 9200358
doi: 10.1371/journal.pcbi.1010168
van der Heyden, M. A. G. & van Veen, T. A. B. Gold open access: the best of both worlds. Neth. Heart J. 26, 3–4 (2018).
pubmed: 29196877
doi: 10.1007/s12471-017-1064-2
Pulverer, B. Open access-or open science? EMBO J. 37, e101215 (2018).
pubmed: 30509971
pmcid: 6293272
doi: 10.15252/embj.2018101215
Open Source Initiative. OSI approved licenses. https://opensource.org/licenses (2023).
Garijo, D. et al. Nine best practices for research software registries and repositories. PeerJ Comput Sci. 8, e1023 (2022).
pubmed: 36092012
pmcid: 9455149
doi: 10.7717/peerj-cs.1023
Rigden, D. J. & Fernandez, X. M. The 2023 nucleic acids research database Issue and the online molecular biology database collection. Nucleic acids Res. 51, D1–D8 (2023).
pubmed: 36624667
pmcid: 9825711
doi: 10.1093/nar/gkac1186
Figshare. Repository to make research outputs available in a citable, shareable and discoverable manner. https://figshare.com (2023).
Zenodo. E. U. open research repository. https://doi.org/10.25495/25497gxk-rd25471 (2023).
Papin, J. A., Mac Gabhann, F., Sauro, H. M., Nickerson, D. & Rampadarath, A. Improving reproducibility in computational biology research. PLoS Computational Biol. 16, e1007881 (2020).
doi: 10.1371/journal.pcbi.1007881
Tiwari, K. et al. Reproducibility in systems biology modelling. Mol. Syst. Biol. 17, e9982 (2021).
pubmed: 33620773
pmcid: 7901289
doi: 10.15252/msb.20209982
Stodden, V., Seiler, J. & Ma, Z. An empirical analysis of journal policy effectiveness for computational reproducibility. Proc. Natl Acad. Sci. USA 115, 2584–2589 (2018).
pubmed: 29531050
pmcid: 5856507
doi: 10.1073/pnas.1708290115
Mendes, P. Reproducible Research Using Biomodels. Bull. Math. Biol. 80, 3081–3087 (2018).
pubmed: 30191472
pmcid: 6234049
doi: 10.1007/s11538-018-0498-z
Sandve, G. K., Nekrutenko, A., Taylor, J. & Hovig, E. Ten simple rules for reproducible computational research. PLoS Computational Biol. 9, e1003285 (2013).
doi: 10.1371/journal.pcbi.1003285
Wilson, G. et al. Good enough practices in scientific computing. PLoS Computational Biol. 13, e1005510 (2017).
doi: 10.1371/journal.pcbi.1005510
Ziemann, M., Poulain, P. & Bora, A. The five pillars of computational reproducibility: bioinformatics and beyond. Brief. Bioinforma. 24, bbad375 (2023).
doi: 10.1093/bib/bbad375
Hunter-Zinck, H., de Siqueira, A. F., Vasquez, V. N., Barnes, R. & Martinez, C. C. Ten simple rules on writing clean and reliable open-source scientific software. PLoS Comput Biol. 17, e1009481 (2021).
pubmed: 34762641
pmcid: 8584773
doi: 10.1371/journal.pcbi.1009481
Deardorff, A. Assessing the impact of introductory programming workshops on the computational reproducibility of biomedical workflows. PloS one 15, e0230697 (2020).
pubmed: 32639955
pmcid: 7343163
doi: 10.1371/journal.pone.0230697
Margan D., Candrlic S. The Success Of Open Source Software: A Review. 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO); Opatija 1463-1468 (2015).
Blischak, J. D., Davenport, E. R. & Wilson, G. A quick introduction to version control with Git and GitHub. PLoS Computational Biol. 12, e1004668 (2016).
doi: 10.1371/journal.pcbi.1004668
Ram, K. Git can facilitate greater reproducibility and increased transparency in science. Source Code Biol. Med 8, 7 (2013).
pubmed: 23448176
pmcid: 3639880
doi: 10.1186/1751-0473-8-7
Perez-Riverol, Y. et al. Ten simple rules for taking advantage of Git and GitHub. PLoS Computational Biol. 12, e1004947 (2016).
doi: 10.1371/journal.pcbi.1004947
Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
pubmed: 26978244
pmcid: 4792175
doi: 10.1038/sdata.2016.18
Di Cosmo, R. Archiving and referencing source code with software heritage. Math. Softw. - ICMS 2020 12097, 362–373 (2020).
doi: 10.1007/978-3-030-52200-1_36
van Kampen A. H. C., et al. ENCORE: a practical implementation to improve reproducibility and transparency of computational research. ENCORE https://zenodo.org/records/12938252 , (2024).
Spreckelsen, F. et al. Guidelines for a standardized filesystem layout for scientific data. Data 5, 43 (2020).
doi: 10.3390/data5020043
Brito, J. J. et al. Recommendations to enhance rigor and reproducibility in biomedical research. Gigascience 9, giaa056 (2020).
pubmed: 32479592
pmcid: 7263079
doi: 10.1093/gigascience/giaa056
Gruning, B. et al. Practical computational reproducibility in the life sciences. Cell Syst. 6, 631–635 (2018).
pubmed: 29953862
pmcid: 6263957
doi: 10.1016/j.cels.2018.03.014
Lee, B. D. Ten simple rules for documenting scientific software. PLoS Computational Biol. 14, e1006561 (2018).
doi: 10.1371/journal.pcbi.1006561
Markowetz, F. Five selfish reasons to work reproducibly. Genome Biol. 16, 274 (2015).
pubmed: 26646147
pmcid: 4673789
doi: 10.1186/s13059-015-0850-7
Noble, W. S. A quick guide to organizing computational biology projects. PLoS Computational Biol. 5, e1000424 (2009).
doi: 10.1371/journal.pcbi.1000424
Stodden, V. & Miguez, S. Best practices for computational science: software infrastructure and environments for reproducible and extensible research. J. Open Res. Softw. 2, e21 (2014).
doi: 10.5334/jors.ay
Taschuk, M. & Wilson, G. Ten simple rules for making research software more robust. PLoS Computational Biol. 13, e1005412 (2017).
doi: 10.1371/journal.pcbi.1005412
Wilson, G. et al. Best practices for scientific computing. PLoS Biol. 12, e1001745 (2014).
pubmed: 24415924
pmcid: 3886731
doi: 10.1371/journal.pbio.1001745
Stodden, V. Reproducing statistical results. Annu. Rev. Stat. Its Application 2, 1–19 (2015).
doi: 10.1146/annurev-statistics-010814-020127
Types of GitHub accounts. https://docs.github.com/en/get-started/learning-about-github/types-of-github-accounts (2023).
Van Kampen, A. H. C., Jongejan, A. & Mahamune, U. The standardized file system structure (FSS) navigator repository. FSS-Navigator https://doi.org/10.5281/zenodo.7985655 (2023).
doi: 10.5281/zenodo.7985655
National Academies of Sciences E, Medicine. Reproducibility and Replicability in Science. (The National Academies Press, 2019).
Patil, P., Peng, R. D. & Leek, J. T. A visual tool for defining reproducibility and replicability. Nat. Hum. Behav. 3, 650–652 (2019).
pubmed: 31209370
doi: 10.1038/s41562-019-0629-z
Milkowski, M., Hensel, W. M. & Hohol, M. Replicability or reproducibility? on the replication crisis in computational neuroscience and sharing only relevant detail. J. Comput Neurosci. 45, 163–172 (2018).
pubmed: 30377880
pmcid: 6306493
doi: 10.1007/s10827-018-0702-z
Plesser, H. E. Reproducibility vs. replicability: a brief history of a confused terminology. Front Neuroinform 11, 76 (2017).
pubmed: 29403370
doi: 10.3389/fninf.2017.00076
Lyon, L. Transparency: the emerging third dimension of open science and open data. Lib. Q. 25, 153–171 (2016).
doi: 10.18352/lq.10113
Anderson, J. A., Eijkholt, M. & Illes, J. Ethical reproducibility: towards transparent reporting in biomedical research. Nat. methods 10, 843–845 (2013).
pubmed: 23985730
doi: 10.1038/nmeth.2564
Kulikowski, C. & Maojo, V. M. COVID-19 pandemic and artificial intelligence: challenges of ethical bias and trustworthy reliable reproducibility? BMJ Health Care Inf. 28, e100438 (2021).
doi: 10.1136/bmjhci-2021-100438
ICMJE. International Committee of Medical Journal Editors. Defining the role of authors and contributors. https://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authors-and-contributors.html (2023).
Editorial. Supporting computational reproducibility through code review. Nat. Hum. Behav. 5, 965–966 (2021).
doi: 10.1038/s41562-021-01190-w
Stodden V., et al. Setting the default to reproducible reproducibility in computational and experimental mathematics. Preprint at https://icerm.brown.edu/topical_workshops/tw12-5-rcem/icerm_report.pdf (2012).
DORA. The declaration on research assessment. https://sfdora.org (2023).
Implementation of the UNESCO recommendation on open science. working group on open science funding and incentives, https://unesdoc.unesco.org/ark:/48223/pf0000383806 (2022).
VSNU, NFU, KNAW, NWO, ZonMw. Room for everyone’s talent: towards a new balance in the recognition and rewards for academics. https://recognitionrewards.nl/ (2019).
Osiris. Open Science to Increase Reproducibility in Science. https://osiris4r.eu/ (2024).
Moseley, H. In the AI science boom, beware: your results are only as good as your data. Nature https://doi.org/10.1038/d41586-024-00306-2 (2024).
JSON. JavaScript Object Notation. https://www.json.org (2024).
YAML Ain’t Markup Language. https://yaml.org (2023).
Gruning, B. et al. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. methods 15, 475–476 (2018).
pubmed: 29967506
pmcid: 11070151
doi: 10.1038/s41592-018-0046-7
Anaconda Software Distribution. https://docs.anaconda.com (2023).
Nust, D. et al. Ten simple rules for writing dockerfiles for reproducible data science. PLoS computational Biol. 16, e1008316 (2020).
doi: 10.1371/journal.pcbi.1008316
Kurtzer, G. M., Sochat, V. & Bauer, M. W. Singularity: Scientific containers for mobility of compute. PloS one 12, e0177459 (2017).
pubmed: 28494014
pmcid: 5426675
doi: 10.1371/journal.pone.0177459
Podman. A tool designed to find, run, build, share and deploy applications using Open Containers Initiative (OCI). https://podman.io (2024).
Ushey K., Wickham H. renv: Project environments. https://rstudio.github.io/renv (2024).
Galaxy, C. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic acids Res. 50, W345–W351 (2022).
doi: 10.1093/nar/gkac247
Berthold M. R., et al. KNIME: The Konstanz Information Miner. In: Data Analysis, Machine Learning and Applications (eds Preisach C., Burkhardt H., Schmidt-Thieme L., Decker R.). Springer Berlin Heidelberg (2008).
Molder, F. et al. Sustainable data analysis with Snakemake. F1000Res 10, 33 (2021).
pubmed: 34035898
pmcid: 8114187
doi: 10.12688/f1000research.29032.2
Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).
pubmed: 28398311
doi: 10.1038/nbt.3820
Madougou, S. et al. Provenance for distributed biomedical workflow execution. Stud. health Technol. Inform. 175, 91–100 (2012).
pubmed: 22941992
Boekel, J. et al. Multi-omic data analysis using Galaxy. Nat. Biotechnol. 33, 137–139 (2015).
pubmed: 25658277
doi: 10.1038/nbt.3134
Martin R. C. Clean Code: A Handbook of Agile Software Craftsmanship. Pearson (2008).
Hermann, S. & Fehr, J. Documenting research software in engineering science. Sci. Rep. 12, 6567 (2022).
pubmed: 35449149
pmcid: 9023583
doi: 10.1038/s41598-022-10376-9
Rule, A. et al. Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks. PLoS computational Biol. 15, e1007007 (2019).
doi: 10.1371/journal.pcbi.1007007
Ebert, C., Bajaj, D. & Weyrich, M. Testing Software Systems. IEEE Softw. 39, 8–17 (2022).
doi: 10.1109/MS.2022.3166755
Wang, J. et al. Software testing with large language models: survey, landscape, and vision. IEEE Trans. Softw. Eng. 50, 911–936 (2024).
doi: 10.1109/TSE.2024.3368208
Copilot G. AI coding assistant. https://github.com/features/copilot (2024).
Tidyverse Style Guide. https://style.tidyverse.org (2024).
Python Style Guide. https://www.python.org/doc/essays/styleguide/ (2024).
Karimzadeh, M. & Hoffman, M. M. Top considerations for creating bioinformatics software documentation. Brief. Bioinforma. 19, 693–699 (2018).
doi: 10.1093/bib/bbw134
Brandl G. Sphinx documentation. https://www.sphinx-doc.org (2021).
r2readthedocs. Convert R package documentation to a ‘readthedocs’ website, https://github.com/ropenscilabs/r2readthedocs (2023).
Roxygen2. Dynamic documentation system. https://cran.r-project.org/web/packages/roxygen2 (2023).
Martinez-Ortiz C., et al. Practical guide to software management plans (1.1). https://doi.org/10.5281/zenodo.7589725 (2023).
Toelch, U. & Ostwald, D. Digital open science-Teaching digital tools for reproducible and transparent research. PLoS Biol. 16, e2006022 (2018).
pubmed: 30048447
pmcid: 6095603
doi: 10.1371/journal.pbio.2006022
Carey, M. A. & Papin, J. A. Ten simple rules for biologists learning to program. PLoS computational Biol. 14, e1005871 (2018).
doi: 10.1371/journal.pcbi.1005871
Larcombe L., et al. ELIXIR-UK role in bioinformatics training at the national level and across ELIXIR. F1000Res 6, (2017).
Lapp, Z. et al. Developing and deploying an integrated workshop curriculum teaching computational skills for reproducible research. J. Open Source Educ. 5, 144 (2022).
pubmed: 35224460
pmcid: 8872090
doi: 10.21105/jose.00144
FAIRsharing.org. A curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies. https://fairsharing.org (2023).
EQUATOR Network. What is a reporting guideline? https://www.equator-network.org/about-us/what-is-a-reporting-guideline (2023).
Schulz, K. F., Altman, D. G., Moher, D. & Group, C. CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. Ann. Intern Med 152, 726–732 (2010).
pubmed: 20335313
doi: 10.7326/0003-4819-152-11-201006010-00232
Bossuyt, P. M. et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ 351, h5527 (2015).
pubmed: 26511519
pmcid: 4623764
doi: 10.1136/bmj.h5527
von Elm, E. et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Ann. Intern Med 147, 573–577 (2007).
doi: 10.7326/0003-4819-147-8-200710160-00010
Nature. Reporting Summary. https://www.nature.com/documents/nr-reporting-summary-flat.pdf (2023).
Nature. Availability and peer review of computer code and algorithm. https://www.nature.com/nature-portfolio/editorial-policies/reporting-standards#availability-of-computer-code (2023).
Editorial. Seamless sharing and peer review of code. Nat. Comput Sci. 2, 773 (2022).
doi: 10.1038/s43588-022-00388-w
Editorial. Easing the burden of code review. Nat. methods 15, 641 (2018).
doi: 10.1038/s41592-018-0137-5
PLOS Computational Biology. Material, Software, and Code Sharing. https://journals.plos.org/ploscompbiol/s/materials-software-and-code-sharing (2023).
Science. Research Standards. Transparency and Openness Promotion (TOP) guidelines. https://www.science.org/content/page/science-journals-editorial-policies#TOP-guidelines (2023).
Gavaghan, D. Problems with the current approach to the dissemination of computational science research and its implications for research integrity. Bull. Math. Biol. 80, 3088–3094 (2018).
pubmed: 30324270
pmcid: 6244992
doi: 10.1007/s11538-018-0499-y
Ibrahim, H. et al. Reporting guidelines for clinical trials of artificial intelligence interventions: the SPIRIT-AI and CONSORT-AI guidelines. Trials 22, 11 (2021).
pubmed: 33407780
pmcid: 7788716
doi: 10.1186/s13063-020-04951-6
Ibrahim, H., Liu, X. & Denniston, A. K. Reporting guidelines for artificial intelligence in healthcare research. Clin. Exp. Ophthalmol. 49, 470–476 (2021).
pubmed: 33956386
doi: 10.1111/ceo.13943
Haibe-Kains, B. et al. Transparency and reproducibility in artificial intelligence. Nature 586, E14–E16 (2020).
pubmed: 33057217
pmcid: 8144864
doi: 10.1038/s41586-020-2766-y
Marwick, B., Boettiger, C. & Mullen, L. Packaging data analytical work reproducibly using R (and Friends). Am. Statistician 72, 80–88 (2018).
doi: 10.1080/00031305.2017.1375986
Barker, M. et al. Introducing the FAIR principles for research software. Sci. Data 9, 622 (2022).
pubmed: 36241754
pmcid: 9562067
doi: 10.1038/s41597-022-01710-x
Five Recommendations for FAIR Software. https://fair-software.eu (2023).
AIRR Software Guidelines. https://docs.airr-community.org/en/stable/swtools/airr_swtools_standard.html (2023).
Malone, J. et al. The Software Ontology (SWO): a resource for reproducibility in biomedical data analysis, curation and digital preservation. J. Biomed. Semant. 5, 25 (2014).
doi: 10.1186/2041-1480-5-25
Ison, J. et al. EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats. Bioinformatics 29, 1325–1332 (2013).
pubmed: 23479348
pmcid: 3654706
doi: 10.1093/bioinformatics/btt113
Waltemath, D. et al. Minimum Information About a Simulation Experiment (MIASE). PLoS computational Biol. 7, e1001122 (2011).
doi: 10.1371/journal.pcbi.1001122
Smith, L. P. et al. The simulation experiment description markup language (SED-ML): language specification for level 1 version 4. J. Integr. Bioinform 18, 20210021 (2021).
pubmed: 35330701
pmcid: 8560344
doi: 10.1515/jib-2021-0021
Bergmann, F. T. et al. COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC Bioinforma. 15, 369 (2014).
doi: 10.1186/s12859-014-0369-z
Tatka L. T., Smith L. P., Hellerstein J. L., Sauro H. M. Adapting modeling and simulation credibility standards to computational systems biology. arXiv, https://doi.org/10.48550/arXiv.42301.06007 (2022).
Resource Description Framework (RDF). https://www.w3.org/RDF (2024).
Spreckelsen F., et al. Guidelines for a standardized filesystem layout for scientific data. Data 5, 43 (2020).
Greiff, V., Miho, E., Menzel, U. & Reddy, S. T. Bioinformatic and statistical analysis of adaptive immune repertoires. Trends Immunol. 36, 738–749 (2015).
pubmed: 26508293
doi: 10.1016/j.it.2015.09.006
Vaz, F. M., Pras-Raves, M., Bootsma, A. H. & van Kampen, A. H. Principles and practice of lipidomics. J. Inherit. Metab. Dis. 38, 41–52 (2015).
pubmed: 25409862
doi: 10.1007/s10545-014-9792-6
van Kampen A. H. C., Mahamune U., van Schaik B. D. C. Tools to automatic ENCORE tasks. (V1.0). AUTOMATE, https://doi.org/10.5281/zenodo.12955697 (2024).
Everything you need to learn Markdown. https://www.markdownguide.org/about (2023).