Genomic reproducibility in the bioinformatics era.

Reproducibility bioinformatics tools genomics synthetic replicates technical replicates

Journal

Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660

Informations de publication

Date de publication:
09 Aug 2024
Historique:
received: 14 09 2023
accepted: 23 07 2024
medline: 10 8 2024
pubmed: 10 8 2024
entrez: 9 8 2024
Statut: epublish

Résumé

In biomedical research, validating a scientific discovery hinges on the reproducibility of its experimental results. However, in genomics, the definition and implementation of reproducibility remain imprecise. We argue that genomic reproducibility, defined as the ability of bioinformatics tools to maintain consistent results across technical replicates, is essential for advancing scientific knowledge and medical applications. Initially, we examine different interpretations of reproducibility in genomics to clarify terms. Subsequently, we discuss the impact of bioinformatics tools on genomic reproducibility and explore methods for evaluating these tools regarding their effectiveness in ensuring genomic reproducibility. Finally, we recommend best practices to improve genomic reproducibility.

Identifiants

pubmed: 39123217
doi: 10.1186/s13059-024-03343-2
pii: 10.1186/s13059-024-03343-2
doi:

Types de publication

Journal Article Review

Langues

eng

Sous-ensembles de citation

IM

Pagination

213

Subventions

Organisme : National Science Foundation
ID : 2041984
Organisme : National Science Foundation
ID : 2316223
Organisme : NIH HHS
ID : R01AI173172
Pays : United States

Informations de copyright

© 2024. The Author(s).

Références

Leipzig J, Nüst D, Hoyt CT, Ram K, Greenberg J. The role of metadata in reproducible computational research. Patterns (N Y). 2021;2:100322.
doi: 10.1016/j.patter.2021.100322 pubmed: 34553169
Bakinam T Essawy, Jonathan L. Goodall, Daniel Voce, Mohamed M. Morsy, Jeffrey M. Sadler, Young Don Choi, David G. Tarboton, Tanu Malik. A taxonomy for reproducible and replicable research in environmental modelling. Environmental Modelling and Software. 2020;134:104753.
Arnold, B. et al. The Turing Way: A Handbook for Reproducible Data Science. https://doi.org/10.5281/zenodo.3233986 .
Goodman, S. N., Fanelli, D. & Ioannidis, J. P. A. What does research reproducibility mean? Sci. Transl. Med. 8, 341ps12 (2016).
Whitaker, K. Showing Your Working: A Guide to Reproducible Neuroimaging Analyses. (figshare, 2016). https://doi.org/10.6084/M9.FIGSHARE.4244996.V1 .
Hussen BM, et al. The emerging roles of NGS in clinical oncology and personalized medicine. Pathol Res Pract. 2022;230:153760.
doi: 10.1016/j.prp.2022.153760 pubmed: 35033746
Pan B, et al. Assessing reproducibility of inherited variants detected with short-read whole genome sequencing. Genome Biol. 2022;23:2.
doi: 10.1186/s13059-021-02569-8 pubmed: 34980216 pmcid: 8722114
Erik Gundersen O. The fundamental principles of reproducibility. Philos Trans A Math Phys Eng Sci. 2021;379:20200210.
pubmed: 33775150
Foox J, et al. Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study. Nat Biotechnol. 2021;39:1129–40.
doi: 10.1038/s41587-021-01049-5 pubmed: 34504351 pmcid: 8985210
Website. Consortium, S.-I. & SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nature Biotechnology vol. 32 903–914 Preprint at https://doi.org/10.1038/nbt.2957 (2014).
Website. Blainey, P., Krzywinski, M. & Altman, N. Replication. Nature Methods vol. 11 879–880 Preprint at https://doi.org/10.1038/nmeth.3091 (2014).
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18:1509–17.
doi: 10.1101/gr.079558.108 pubmed: 18550803 pmcid: 2527709
Łabaj PP, et al. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics. 2011;27:i383–91.
doi: 10.1093/bioinformatics/btr247 pubmed: 21685096 pmcid: 3117338
Fu GK, et al. Molecular indexing enables quantitative targeted RNA sequencing and reveals poor efficiencies in standard library preparations. Proc Natl Acad Sci U S A. 2014;111:1891–6.
doi: 10.1073/pnas.1323732111 pubmed: 24449890 pmcid: 3918775
Bell G. Replicates and repeats. BMC Biol. 2016;14:28.
doi: 10.1186/s12915-016-0254-5 pubmed: 27055650 pmcid: 4825082
Mapping-friendly sequence reductions. Going beyond homopolymer compression iScience. 2022;25:105305.
Li S, et al. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat Biotechnol. 2014;32:888–95.
doi: 10.1038/nbt.3000 pubmed: 25150837 pmcid: 4160374
Tong L, et al. Impact of RNA-seq data analysis algorithms on gene expression estimation and downstream prediction. Sci Rep. 2020;10:17925.
doi: 10.1038/s41598-020-74567-y pubmed: 33087762 pmcid: 7578822
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q-bio.GN]. 2013.
Lunter G, Goodson M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 2011;21:936–9.
doi: 10.1101/gr.111120.110 pubmed: 20980556 pmcid: 3106326
Ros-Freixedes R, et al. Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing. Genet Sel Evol. 2018;50:64.
doi: 10.1186/s12711-018-0436-4 pubmed: 30545283 pmcid: 6293637
Alser M, et al. Technology dictates algorithms: recent developments in read alignment. Genome Biol. 2021;22:249.
doi: 10.1186/s13059-021-02443-7 pubmed: 34446078 pmcid: 8390189
Zaharia M, et al. Faster and More Accurate Sequence Alignment with SNAP. arXiv [cs.DS]. 2011.
Weese D, Holtgrewe M, Reinert K. RazerS 3: faster, fully sensitive read mapping. Bioinformatics. 2012;28:2592–9.
doi: 10.1093/bioinformatics/bts505 pubmed: 22923295
Alkan C, et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet. 2009;41:1061–7.
doi: 10.1038/ng.437 pubmed: 19718026 pmcid: 2875196
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology vol. 10 R25 Preprint at https://doi.org/10.1186/gb-2009-10-3-r25 (2009).
Firtina, C. & Alkan, C. On genomic repeats and reproducibility. Bioinformatics vol. 32 2243–2247 Preprint at https://doi.org/10.1093/bioinformatics/btw139 (2016).
Ball MP, et al. A public resource facilitating clinical use of genomes. Proc Natl Acad Sci U S A. 2012;109:11920–7.
doi: 10.1073/pnas.1201904109 pubmed: 22797899 pmcid: 3409785
Consortium, †the International Hapmap & †The International HapMap Consortium. The International HapMap Project. Nature vol. 426 789–796 Preprint at https://doi.org/10.1038/nature02168 (2003).
Zook JM, et al. An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol. 2019;37:561–6.
doi: 10.1038/s41587-019-0074-6 pubmed: 30936564 pmcid: 6500473
Khayat MM, et al. Hidden biases in germline structural variant detection. Genome Biol. 2021;22:347.
doi: 10.1186/s13059-021-02558-x pubmed: 34930391 pmcid: 8686633
SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium." Nat Biotechnol. 2014;32(9):903–14.
Munro SA, et al. Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures. Nat Commun. 2014;5:5125.
doi: 10.1038/ncomms6125 pubmed: 25254650
Guo Y, et al. The effect of strand bias in Illumina short-read sequencing data. BMC Genomics. 2012;13:666.
doi: 10.1186/1471-2164-13-666 pubmed: 23176052 pmcid: 3532123
Validation of a Customized Bioinformatics Pipeline for a Clinical Next-Generation Sequencing Test Targeting Solid Tumor–Associated Variants. J Mol Diagn. 2018;20, 355–365.
Al Seesi S, et al. Bootstrap-based differential gene expression analysis for RNA-Seq data with and without replicates. BMC Genomics. 2014;15:S2.
doi: 10.1186/1471-2164-15-S8-S2 pubmed: 25435284 pmcid: 4248812
Saremi B, Kohls M, Liebig P, Siebert U, Jung K. Measuring reproducibility of virus metagenomics analyses using bootstrap samples from FASTQ-files. Bioinformatics. 2021;37:1068–75.
doi: 10.1093/bioinformatics/btaa926 pubmed: 33135067
Alser M, et al. Packaging and containerization of computational methods. Nat Protoc. 2024. https://doi.org/10.1038/s41596-024-00986-0 .
doi: 10.1038/s41596-024-00986-0 pubmed: 38565959
Brito, J. J. et al. Recommendations to enhance rigor and reproducibility in biomedical research. Gigascience 9, (2020).
Weber LM, et al. Essential guidelines for computational method benchmarking. Genome Biol. 2019;20:1–12.
doi: 10.1186/s13059-019-1738-8
Mangul S, et al. Challenges and recommendations to improve the installability and archival stability of omics computational tools. PLoS Biol. 2019;17:e3000333.
doi: 10.1371/journal.pbio.3000333 pubmed: 31220077 pmcid: 6605654
Mangul S, et al. Systematic benchmarking of omics computational tools. Nat Commun. 2019;10(1):1393.
doi: 10.1038/s41467-019-09406-4 pubmed: 30918265 pmcid: 6437167
Home - OMNIBENCHMARK. https://omnibenchmark.org .
Wünsch M, et al. "From RNA sequencing measurements to the final results: A practical guide to navigating the choices and uncertainties of gene set analysis." Wiley Interdiscip Rev Comput Stat. 2024;16(1):e1643.
doi: 10.1002/wics.1643
Wünsch, M., Sauer, C., Herrmann, M., Hinske, L. C. & Boulesteix, A.-L. To tweak or not to tweak. How exploiting flexibilities in gene set analysis leads to over-optimism. (2024).

Auteurs

Pelin Icer Baykal (PI)

Department of Biosystems Science and Engineering, ETH Zurich, 4058, Basel, Switzerland.
SIB Swiss Institute of Bioinformatics, 4058, Basel, Switzerland.

Paweł Piotr Łabaj (PP)

Małopolska Centre of Biotechnology, Jagiellonian University, 30-387, Gronostajowa 7A, Krakow, Poland.
Department of Biotechnology, Boku University Vienna, Muthgasse 18, 1190, Vienna, Austria.

Florian Markowetz (F)

Cancer Research UK Cambridge Research Institute, Cambridge, CB2 0RE, UK.
Department of Oncology, University of Cambridge, Cambridge, CB2 2XZ, UK.

Lynn M Schriml (LM)

Institute for Genome Sciences, University of Maryland School of Medicine, HSFIII, 670 W. Baltimore St, Baltimore, MD, 21201, USA.

Daniel J Stekhoven (DJ)

SIB Swiss Institute of Bioinformatics, 4058, Basel, Switzerland.
NEXUS Personalized Health Technologies, ETH Zurich, 8952, Zurich, Switzerland.

Serghei Mangul (S)

Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, 1540 Alcazar Street, Los Angeles, CA, 90033, USA. serghei.mangul@gmail.com.
Department of Quantitative and Computational Biology, University of Southern California Dornsife College of Letters, Arts, and Sciences, Los Angeles, CA, 90089, USA. serghei.mangul@gmail.com.

Niko Beerenwinkel (N)

Department of Biosystems Science and Engineering, ETH Zurich, 4058, Basel, Switzerland. niko.beerenwinkel@bsse.ethz.ch.
SIB Swiss Institute of Bioinformatics, 4058, Basel, Switzerland. niko.beerenwinkel@bsse.ethz.ch.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH