Genomic reproducibility in the bioinformatics era.
Reproducibility
bioinformatics tools
genomics
synthetic replicates
technical replicates
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
09 Aug 2024
09 Aug 2024
Historique:
received:
14
09
2023
accepted:
23
07
2024
medline:
10
8
2024
pubmed:
10
8
2024
entrez:
9
8
2024
Statut:
epublish
Résumé
In biomedical research, validating a scientific discovery hinges on the reproducibility of its experimental results. However, in genomics, the definition and implementation of reproducibility remain imprecise. We argue that genomic reproducibility, defined as the ability of bioinformatics tools to maintain consistent results across technical replicates, is essential for advancing scientific knowledge and medical applications. Initially, we examine different interpretations of reproducibility in genomics to clarify terms. Subsequently, we discuss the impact of bioinformatics tools on genomic reproducibility and explore methods for evaluating these tools regarding their effectiveness in ensuring genomic reproducibility. Finally, we recommend best practices to improve genomic reproducibility.
Identifiants
pubmed: 39123217
doi: 10.1186/s13059-024-03343-2
pii: 10.1186/s13059-024-03343-2
doi:
Types de publication
Journal Article
Review
Langues
eng
Sous-ensembles de citation
IM
Pagination
213Subventions
Organisme : National Science Foundation
ID : 2041984
Organisme : National Science Foundation
ID : 2316223
Organisme : NIH HHS
ID : R01AI173172
Pays : United States
Informations de copyright
© 2024. The Author(s).
Références
Leipzig J, Nüst D, Hoyt CT, Ram K, Greenberg J. The role of metadata in reproducible computational research. Patterns (N Y). 2021;2:100322.
doi: 10.1016/j.patter.2021.100322
pubmed: 34553169
Bakinam T Essawy, Jonathan L. Goodall, Daniel Voce, Mohamed M. Morsy, Jeffrey M. Sadler, Young Don Choi, David G. Tarboton, Tanu Malik. A taxonomy for reproducible and replicable research in environmental modelling. Environmental Modelling and Software. 2020;134:104753.
Arnold, B. et al. The Turing Way: A Handbook for Reproducible Data Science. https://doi.org/10.5281/zenodo.3233986 .
Goodman, S. N., Fanelli, D. & Ioannidis, J. P. A. What does research reproducibility mean? Sci. Transl. Med. 8, 341ps12 (2016).
Whitaker, K. Showing Your Working: A Guide to Reproducible Neuroimaging Analyses. (figshare, 2016). https://doi.org/10.6084/M9.FIGSHARE.4244996.V1 .
Hussen BM, et al. The emerging roles of NGS in clinical oncology and personalized medicine. Pathol Res Pract. 2022;230:153760.
doi: 10.1016/j.prp.2022.153760
pubmed: 35033746
Pan B, et al. Assessing reproducibility of inherited variants detected with short-read whole genome sequencing. Genome Biol. 2022;23:2.
doi: 10.1186/s13059-021-02569-8
pubmed: 34980216
pmcid: 8722114
Erik Gundersen O. The fundamental principles of reproducibility. Philos Trans A Math Phys Eng Sci. 2021;379:20200210.
pubmed: 33775150
Foox J, et al. Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study. Nat Biotechnol. 2021;39:1129–40.
doi: 10.1038/s41587-021-01049-5
pubmed: 34504351
pmcid: 8985210
Website. Consortium, S.-I. & SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nature Biotechnology vol. 32 903–914 Preprint at https://doi.org/10.1038/nbt.2957 (2014).
Website. Blainey, P., Krzywinski, M. & Altman, N. Replication. Nature Methods vol. 11 879–880 Preprint at https://doi.org/10.1038/nmeth.3091 (2014).
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18:1509–17.
doi: 10.1101/gr.079558.108
pubmed: 18550803
pmcid: 2527709
Łabaj PP, et al. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics. 2011;27:i383–91.
doi: 10.1093/bioinformatics/btr247
pubmed: 21685096
pmcid: 3117338
Fu GK, et al. Molecular indexing enables quantitative targeted RNA sequencing and reveals poor efficiencies in standard library preparations. Proc Natl Acad Sci U S A. 2014;111:1891–6.
doi: 10.1073/pnas.1323732111
pubmed: 24449890
pmcid: 3918775
Bell G. Replicates and repeats. BMC Biol. 2016;14:28.
doi: 10.1186/s12915-016-0254-5
pubmed: 27055650
pmcid: 4825082
Mapping-friendly sequence reductions. Going beyond homopolymer compression iScience. 2022;25:105305.
Li S, et al. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat Biotechnol. 2014;32:888–95.
doi: 10.1038/nbt.3000
pubmed: 25150837
pmcid: 4160374
Tong L, et al. Impact of RNA-seq data analysis algorithms on gene expression estimation and downstream prediction. Sci Rep. 2020;10:17925.
doi: 10.1038/s41598-020-74567-y
pubmed: 33087762
pmcid: 7578822
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q-bio.GN]. 2013.
Lunter G, Goodson M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 2011;21:936–9.
doi: 10.1101/gr.111120.110
pubmed: 20980556
pmcid: 3106326
Ros-Freixedes R, et al. Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing. Genet Sel Evol. 2018;50:64.
doi: 10.1186/s12711-018-0436-4
pubmed: 30545283
pmcid: 6293637
Alser M, et al. Technology dictates algorithms: recent developments in read alignment. Genome Biol. 2021;22:249.
doi: 10.1186/s13059-021-02443-7
pubmed: 34446078
pmcid: 8390189
Zaharia M, et al. Faster and More Accurate Sequence Alignment with SNAP. arXiv [cs.DS]. 2011.
Weese D, Holtgrewe M, Reinert K. RazerS 3: faster, fully sensitive read mapping. Bioinformatics. 2012;28:2592–9.
doi: 10.1093/bioinformatics/bts505
pubmed: 22923295
Alkan C, et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet. 2009;41:1061–7.
doi: 10.1038/ng.437
pubmed: 19718026
pmcid: 2875196
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology vol. 10 R25 Preprint at https://doi.org/10.1186/gb-2009-10-3-r25 (2009).
Firtina, C. & Alkan, C. On genomic repeats and reproducibility. Bioinformatics vol. 32 2243–2247 Preprint at https://doi.org/10.1093/bioinformatics/btw139 (2016).
Ball MP, et al. A public resource facilitating clinical use of genomes. Proc Natl Acad Sci U S A. 2012;109:11920–7.
doi: 10.1073/pnas.1201904109
pubmed: 22797899
pmcid: 3409785
Consortium, †the International Hapmap & †The International HapMap Consortium. The International HapMap Project. Nature vol. 426 789–796 Preprint at https://doi.org/10.1038/nature02168 (2003).
Zook JM, et al. An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol. 2019;37:561–6.
doi: 10.1038/s41587-019-0074-6
pubmed: 30936564
pmcid: 6500473
Khayat MM, et al. Hidden biases in germline structural variant detection. Genome Biol. 2021;22:347.
doi: 10.1186/s13059-021-02558-x
pubmed: 34930391
pmcid: 8686633
SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium." Nat Biotechnol. 2014;32(9):903–14.
Munro SA, et al. Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures. Nat Commun. 2014;5:5125.
doi: 10.1038/ncomms6125
pubmed: 25254650
Guo Y, et al. The effect of strand bias in Illumina short-read sequencing data. BMC Genomics. 2012;13:666.
doi: 10.1186/1471-2164-13-666
pubmed: 23176052
pmcid: 3532123
Validation of a Customized Bioinformatics Pipeline for a Clinical Next-Generation Sequencing Test Targeting Solid Tumor–Associated Variants. J Mol Diagn. 2018;20, 355–365.
Al Seesi S, et al. Bootstrap-based differential gene expression analysis for RNA-Seq data with and without replicates. BMC Genomics. 2014;15:S2.
doi: 10.1186/1471-2164-15-S8-S2
pubmed: 25435284
pmcid: 4248812
Saremi B, Kohls M, Liebig P, Siebert U, Jung K. Measuring reproducibility of virus metagenomics analyses using bootstrap samples from FASTQ-files. Bioinformatics. 2021;37:1068–75.
doi: 10.1093/bioinformatics/btaa926
pubmed: 33135067
Alser M, et al. Packaging and containerization of computational methods. Nat Protoc. 2024. https://doi.org/10.1038/s41596-024-00986-0 .
doi: 10.1038/s41596-024-00986-0
pubmed: 38565959
Brito, J. J. et al. Recommendations to enhance rigor and reproducibility in biomedical research. Gigascience 9, (2020).
Weber LM, et al. Essential guidelines for computational method benchmarking. Genome Biol. 2019;20:1–12.
doi: 10.1186/s13059-019-1738-8
Mangul S, et al. Challenges and recommendations to improve the installability and archival stability of omics computational tools. PLoS Biol. 2019;17:e3000333.
doi: 10.1371/journal.pbio.3000333
pubmed: 31220077
pmcid: 6605654
Mangul S, et al. Systematic benchmarking of omics computational tools. Nat Commun. 2019;10(1):1393.
doi: 10.1038/s41467-019-09406-4
pubmed: 30918265
pmcid: 6437167
Home - OMNIBENCHMARK. https://omnibenchmark.org .
Wünsch M, et al. "From RNA sequencing measurements to the final results: A practical guide to navigating the choices and uncertainties of gene set analysis." Wiley Interdiscip Rev Comput Stat. 2024;16(1):e1643.
doi: 10.1002/wics.1643
Wünsch, M., Sauer, C., Herrmann, M., Hinske, L. C. & Boulesteix, A.-L. To tweak or not to tweak. How exploiting flexibilities in gene set analysis leads to over-optimism. (2024).