Extracting, filtering and simulating cellular barcodes using CellBarcode tools.
Journal
Nature computational science
ISSN: 2662-8457
Titre abrégé: Nat Comput Sci
Pays: United States
ID NLM: 101775476
Informations de publication
Date de publication:
19 Feb 2024
19 Feb 2024
Historique:
received:
21
06
2023
accepted:
16
01
2024
medline:
20
2
2024
pubmed:
20
2
2024
entrez:
20
2
2024
Statut:
aheadofprint
Résumé
Identifying true DNA cellular barcodes among polymerase chain reaction and sequencing errors is challenging. Current tools are restricted in the diversity of barcode types supported or the analysis strategies implemented. As such, there is a need for more versatile and efficient tools for barcode extraction, as well as for tools to investigate which factors impact barcode detection and which filtering strategies to best apply. Here we introduce the package CellBarcode and its barcode simulation kit, CellBarcodeSim, that allows efficient and versatile barcode extraction and filtering for a range of barcode types from bulk or single-cell sequencing data using a variety of filtering strategies. Using the barcode simulation kit and biological data, we explore the technical and biological factors influencing barcode identification and provide a decision tree on how to optimize barcode identification for different barcode settings. We believe that CellBarcode and CellBarcodeSim have the capability to enhance the reproducibility and interpretation of barcode results across studies.
Identifiants
pubmed: 38374363
doi: 10.1038/s43588-024-00595-7
pii: 10.1038/s43588-024-00595-7
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Schlumberger Foundation
ID : FSER20200211117
Organisme : Fondation ARC pour la Recherche sur le Cancer (ARC Foundation for Cancer Research)
ID : ARCPGA2021120004232_4874
Organisme : EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 European Research Council (H2020 Excellent Science - European Research Council)
ID : ERC StG 758170-Microbar
Organisme : Centre National de la Recherche Scientifique (National Center for Scientific Research)
ID : ATIPAvenir
Informations de copyright
© 2024. The Author(s).
Références
Sankaran, V. G., Weissman, J. S. & Zon, L. I. Cellular barcoding to decipher clonal dynamics in disease. Science 378, eabm5874 (2022).
pubmed: 36227997
pmcid: 10111813
doi: 10.1126/science.abm5874
Perié, L. & Duffy, K. R. Retracing the in vivo haematopoietic tree using single-cell methods. FEBS Lett. 590, 4068–4083 (2016).
pubmed: 27404207
doi: 10.1002/1873-3468.12299
Lu, R., Neff, N. F., Quake, S. R. & Weissman, I. L. Tracking single hematopoietic stem cells in vivo using high-throughput sequencing in conjunction with viral genetic barcoding. Nat. Biotechnol. 29, 928–933 (2011).
pubmed: 21964413
pmcid: 3196379
doi: 10.1038/nbt.1977
Kok, L., Masopust, D. & Schumacher, T. N. The precursors of CD8+ tissue resident memory T cells: from lymphoid organs to infected tissues. Nat. Rev. Immunol. 22, 283–293 (2022).
pubmed: 34480118
doi: 10.1038/s41577-021-00590-3
Naik, S. H. et al. Diverse and heritable lineage imprinting of early haematopoietic progenitors. Nature 496, 229–232 (2013).
pubmed: 23552896
doi: 10.1038/nature12013
Dhimolea, E. et al. An embryonic diapause-like adaptation with suppressed Myc activity enables tumor treatment persistence. Cancer Cell 39, 240–256.e11 (2021).
pubmed: 33417832
pmcid: 8670073
doi: 10.1016/j.ccell.2020.12.002
Merino, D. et al. Barcoding reveals complex clonal behavior in patient-derived xenografts of metastatic triple negative breast cancer. Nat. Commun. 10, 766 (2019).
pubmed: 30770823
pmcid: 6377663
doi: 10.1038/s41467-019-08595-2
Echeverria, G. V. et al. Resistance to neoadjuvant chemotherapy in triple negative breast cancer mediated by a reversible drug-tolerant state. Sci. Transl. Med. 11, eaav0936 (2019).
pubmed: 30996079
pmcid: 6541393
doi: 10.1126/scitranslmed.aav0936
Echeverria, G. V. et al. High-resolution clonal mapping of multi-organ metastasis in triple negative breast cancer. Nat. Commun. 9, 5079 (2018).
pubmed: 30498242
pmcid: 6265294
doi: 10.1038/s41467-018-07406-4
Blundell, J. R. & Levy, S. F. Beyond genome sequencing: lineage tracking with barcodes to study the dynamics of evolution, infection, and cancer. Genomics 104, 417–430 (2014).
pubmed: 25260907
doi: 10.1016/j.ygeno.2014.09.005
Naik, S. H., Schumacher, T. N. & Perié, L. Cellular barcoding: a technical appraisal. Exp. Hematol. 42, 598–608 (2014).
pubmed: 24996012
doi: 10.1016/j.exphem.2014.05.003
McKenna, A. et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016).
pubmed: 27229144
pmcid: 4967023
doi: 10.1126/science.aaf7907
Frieda, K. L. et al. Synthetic recording and in situ readout of lineage information in single cells. Nature 541, 107–111 (2017).
pubmed: 27869821
doi: 10.1038/nature20777
Alemany, A., Florescu, M., Baron, C. S., Peterson-Maduro, J. & van Oudenaarden, A. Whole-organism clone tracing using single-cell sequencing. Nature 556, 108–112 (2018).
pubmed: 29590089
doi: 10.1038/nature25969
Raj, B., Gagnon, J. A. & Schier, A. F. Large-scale reconstruction of cell lineages using single-cell readout of transcriptomes and CRISPR–Cas9 barcodes by scGESTALT. Nat. Protoc. 13, 2685–2713 (2018).
pubmed: 30353175
pmcid: 6279253
doi: 10.1038/s41596-018-0058-x
Spanjaard, B. et al. Simultaneous lineage tracing and cell-type identification using CRISPR–Cas9-induced genetic scars. Nat. Biotechnol. 36, 469–473 (2018).
pubmed: 29644996
pmcid: 5942543
doi: 10.1038/nbt.4124
Marsolier, J. et al. H3K27me3 conditions chemotolerance in triple-negative breast cancer. Nat. Genet. 54, 459–468 (2022).
pubmed: 35410383
pmcid: 7612638
doi: 10.1038/s41588-022-01047-6
Thielecke, L. et al. Limitations and challenges of genetic barcode quantification. Sci. Rep. 7, 43249 (2017).
pubmed: 28256524
pmcid: 5335698
doi: 10.1038/srep43249
Pei, W. et al. Polylox barcoding reveals haematopoietic stem cell fates realized in vivo. Nature 548, 456–460 (2017).
pubmed: 28813413
pmcid: 5905670
doi: 10.1038/nature23653
Urbanus, J. et al. DRAG in situ barcoding reveals an increased number of HSPCs contributing to myelopoiesis with age. Nat. Commun. 14, 2184 (2023).
pubmed: 37069150
pmcid: 10110593
doi: 10.1038/s41467-023-37167-8
Beltman, J. B. et al. Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells. BMC Bioinformatics 17, 151 (2016).
pubmed: 27038897
pmcid: 4818877
doi: 10.1186/s12859-016-0999-4
Lyne, A.-M. et al. A track of the clones: new developments in cellular barcoding. Exp. Hematol. 68, 15–20 (2018).
pubmed: 30448259
doi: 10.1016/j.exphem.2018.11.005
Hadj Abed, L., Tak, T., Cosgrove, J. & Perié, L. CellDestiny: a RShiny application for the visualization and analysis of single-cell lineage tracing data. Front. Med. 9, 919345 (2022).
doi: 10.3389/fmed.2022.919345
Espinoza, D. A., Mortlock, R. D., Koelle, S. J., Wu, C. & Dunbar, C. E. Interrogation of clonal tracking data using barcodetrackR. Nat. Comput. Sci. 1, 280–289 (2021).
pubmed: 37621673
pmcid: 10449013
doi: 10.1038/s43588-021-00057-4
Lin, D. S. et al. DiSNE movie visualization and assessment of clonal kinetics reveal multiple trajectories of dendritic cell development. Cell Rep. 22, 2557–2566 (2018).
pubmed: 29514085
doi: 10.1016/j.celrep.2018.02.046
Thielecke, L., Cornils, K. & Glauche, I. genBaRcode: a comprehensive R-package for genetic barcode analysis. Bioinformatics 36, 2189–2194 (2020).
pubmed: 31782763
doi: 10.1093/bioinformatics/btz872
Zhao, L., Liu, Z., Levy, S. F. & Wu, S. Bartender: a fast and accurate clustering algorithm to count barcode reads. Bioinformatics 34, 739–747 (2018).
pubmed: 29069318
doi: 10.1093/bioinformatics/btx655
Kong, W. et al. CellTagging: combinatorial indexing to simultaneously map lineage and identity at single-cell resolution. Nat. Protoc. 15, 750–772 (2020).
pubmed: 32051617
pmcid: 7427510
doi: 10.1038/s41596-019-0247-2
Bandler, R. C. et al. Single-cell delineation of lineage and genetic identity in the mouse brain. Nature 601, 404–409 (2022).
pubmed: 34912118
doi: 10.1038/s41586-021-04237-0
Eisele, A. S. et al. Erythropoietin directly remodels the clonal composition of murine hematopoietic multipotent progenitor cells. eLife 11, e66922 (2022).
pubmed: 35166672
pmcid: 8884727
doi: 10.7554/eLife.66922
Sender, R. & Milo, R. The distribution of cellular turnover in the human body. Nat. Med. 27, 45–48 (2021).
pubmed: 33432173
doi: 10.1038/s41591-020-01182-9
Bystrykh, L. V. Generalized DNA barcode design based on Hamming codes. PLoS ONE 7, e36852 (2012).
pubmed: 22615825
pmcid: 3355179
doi: 10.1371/journal.pone.0036852
Beneyto-Calabuig, S. et al. Clonally resolved single-cell multi-omics identifies routes of cellular differentiation in acute myeloid leukemia. Cell Stem Cell 30, 706–721.e8 (2023).
pubmed: 37098346
doi: 10.1016/j.stem.2023.04.001
Jindal, K., Adil, M.T., Yamaguchi, N. et al. Single-cell lineage capture across genomic modalities with CellTag-multi reveals fate-specific gene regulatory changes. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01931-4 (2023).
Cosgrove, J. et al. Metabolically primed multipotent hematopoietic progenitors fuel innate immunity. Preprint at https://doi.org/10.1101/2023.01.24.525166 (2023).
Biddy, B. A. et al. Single-cell mapping of lineage and identity in direct reprogramming. Nature 564, 219–224 (2018).
pubmed: 30518857
pmcid: 6635140
doi: 10.1038/s41586-018-0744-4
Radtke, S. et al. Stochastic fate decisions of HSCs after transplantation: early contribution, symmetric expansion, and pool formation. Blood 142, 33–43 (2023).
pubmed: 36821766
Hamming, R. W. Error detecting and error correcting codes. Bell Syst. Tech. J. 29, 147–160 (1950).
doi: 10.1002/j.1538-7305.1950.tb00463.x
Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).
pubmed: 22199392
doi: 10.1093/bioinformatics/btr708
Buschmann, T. DNABarcodes: an R package for the systematic construction of DNA sample tags. Bioinformatics 33, 920–922 (2017).
pubmed: 28052927
doi: 10.1093/bioinformatics/btw759
Marcou, Q., Mora, T. & Walczak, A. M. High-throughput immune repertoire analysis with IGoR. Nat. Commun. 9, 561 (2018).
pubmed: 29422654
pmcid: 5805751
doi: 10.1038/s41467-018-02832-w
Desponds, J., Mora, T. & Walczak, A. M. Fluctuating fitness shapes the clone-size distribution of immune repertoires. Proc. Natl Acad. Sci. USA 113, 274–279 (2016).
pubmed: 26711994
doi: 10.1073/pnas.1512977112
Adair, J. E. et al. DNA barcoding in nonhuman primates reveals important limitations in retrovirus integration site analysis. Mol. Ther. Methods Clin. Dev. 17, 796–809 (2020).
pubmed: 32355868
pmcid: 7184234
doi: 10.1016/j.omtm.2020.03.021
R Core Team R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2016).
Weiss, G. & von Haeseler, A. A coalescent approach to the polymerase chain reaction. Nucleic Acids Res. 25, 3082–3087 (1997).
pubmed: 9224608
pmcid: 146862
doi: 10.1093/nar/25.15.3082
McInerney, P., Adams, P. & Hadi, M. Z. Error rate comparison during polymerase chain reaction by DNA polymerase. Mol. Biol. Int. 2014, 287430 (2014).
pubmed: 25197572
pmcid: 4150459
doi: 10.1155/2014/287430
Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifier performance in R. Bioinformatics 21, 3940–3941 (2005).
pubmed: 16096348
doi: 10.1093/bioinformatics/bti623
Wang, H. & Song, M. Ckmeans.1d.dp: optimal k-means clustering in one dimension by dynamic programming. R J. 3, 29–33 (2011).
pubmed: 27942416
pmcid: 5148156
doi: 10.32614/RJ-2011-015
Johnson, M. S., Venkataram, S. & Kryazhimskiy, S. Best practices in designing, sequencing, and identifying random DNA barcodes. J. Mol. Evol. 91, 263–280 (2023).
pubmed: 36651964
pmcid: 10276077
doi: 10.1007/s00239-022-10083-z
Fodde, R. et al. A targeted chain-termination mutation in the mouse Apc gene results in multiple intestinal tumors. Proc. Natl Acad. Sci. USA 91, 8969–8973 (1994).
pubmed: 8090754
pmcid: 44728
doi: 10.1073/pnas.91.19.8969
Jacquemin, G. et al. Paracrine signalling between intestinal epithelial and tumour cells induces a regenerative programme. eLife https://doi.org/10.7554/eLife.76541 (2022).
Mourao, L. et al. Lineage tracing of Notch1-expressing cells in intestinal tumours reveals a distinct population of cancer stem cells. Sci. Rep. 9, 888 (2019).
pubmed: 30696875
pmcid: 6351556
doi: 10.1038/s41598-018-37301-3
Fre, S. et al. Notch lineages and activity in intestinal stem cells determined by a new set of knock-in mice. PLoS ONE 6, e25785 (2011).
pubmed: 21991352
pmcid: 3185035
doi: 10.1371/journal.pone.0025785
Lilja, A. M. et al. Clonal analysis of Notch1-expressing cells reveals the existence of unipotent stem cells that retain long-term plasticity in the embryonic mammary gland. Nat. Cell Biol. 20, 677–687 (2018).
pubmed: 29784917
pmcid: 6984964
doi: 10.1038/s41556-018-0108-1
Lloyd-Lewis, B. et al. In vivo imaging of mammary epithelial cell dynamics in response to lineage-biased Wnt/β-catenin activation. Cell Rep. 38, 110461 (2022).
pubmed: 35263603
pmcid: 7615182
doi: 10.1016/j.celrep.2022.110461
Zorita, E., Cuscó, P. & Filion, G. J. Starcode: sequence clustering based on all-pairs search. Bioinformatics 31, 1913–1919 (2015).
pubmed: 25638815
pmcid: 4765884
doi: 10.1093/bioinformatics/btv053
Eisele, A. S. et al. Erythropoietin directly remodels the clonal composition of murine hematopoietic multipotent progenitor cells. Zenodo (2021) https://doi.org/10.5281/zenodo.5645045
Sun, W. et al. CellBarcode package paper dataset. Zenodo https://doi.org/10.5281/zenodo.8124948 (2023).
Urbanus, J. et al. UrbanusCosgrove-et-al-DRAG-mouse. Zenodo https://doi.org/10.5281/zenodo.10027001 (2023).
Sun, W. et al. TeamPerie/CellBarcode_paper_Sun_et_al. Zenodo https://doi.org/10.5281/zenodo.10492761 (2024).
Sun, W. et al. CellBarcode. Bioconductor https://doi.org/10.18129/B9.bioc.CellBarcode (2021).
Sun, W. et al. TeamPerie/CellBarcodeSim. Zenodo https://doi.org/10.5281/zenodo.10492831 (2024).