Extracting, filtering and simulating cellular barcodes using CellBarcode tools.


Journal

Nature computational science
ISSN: 2662-8457
Titre abrégé: Nat Comput Sci
Pays: United States
ID NLM: 101775476

Informations de publication

Date de publication:
19 Feb 2024
Historique:
received: 21 06 2023
accepted: 16 01 2024
medline: 20 2 2024
pubmed: 20 2 2024
entrez: 20 2 2024
Statut: aheadofprint

Résumé

Identifying true DNA cellular barcodes among polymerase chain reaction and sequencing errors is challenging. Current tools are restricted in the diversity of barcode types supported or the analysis strategies implemented. As such, there is a need for more versatile and efficient tools for barcode extraction, as well as for tools to investigate which factors impact barcode detection and which filtering strategies to best apply. Here we introduce the package CellBarcode and its barcode simulation kit, CellBarcodeSim, that allows efficient and versatile barcode extraction and filtering for a range of barcode types from bulk or single-cell sequencing data using a variety of filtering strategies. Using the barcode simulation kit and biological data, we explore the technical and biological factors influencing barcode identification and provide a decision tree on how to optimize barcode identification for different barcode settings. We believe that CellBarcode and CellBarcodeSim have the capability to enhance the reproducibility and interpretation of barcode results across studies.

Identifiants

pubmed: 38374363
doi: 10.1038/s43588-024-00595-7
pii: 10.1038/s43588-024-00595-7
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Schlumberger Foundation
ID : FSER20200211117
Organisme : Fondation ARC pour la Recherche sur le Cancer (ARC Foundation for Cancer Research)
ID : ARCPGA2021120004232_4874
Organisme : EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 European Research Council (H2020 Excellent Science - European Research Council)
ID : ERC StG 758170-Microbar
Organisme : Centre National de la Recherche Scientifique (National Center for Scientific Research)
ID : ATIPAvenir

Informations de copyright

© 2024. The Author(s).

Références

Sankaran, V. G., Weissman, J. S. & Zon, L. I. Cellular barcoding to decipher clonal dynamics in disease. Science 378, eabm5874 (2022).
pubmed: 36227997 pmcid: 10111813 doi: 10.1126/science.abm5874
Perié, L. & Duffy, K. R. Retracing the in vivo haematopoietic tree using single-cell methods. FEBS Lett. 590, 4068–4083 (2016).
pubmed: 27404207 doi: 10.1002/1873-3468.12299
Lu, R., Neff, N. F., Quake, S. R. & Weissman, I. L. Tracking single hematopoietic stem cells in vivo using high-throughput sequencing in conjunction with viral genetic barcoding. Nat. Biotechnol. 29, 928–933 (2011).
pubmed: 21964413 pmcid: 3196379 doi: 10.1038/nbt.1977
Kok, L., Masopust, D. & Schumacher, T. N. The precursors of CD8+ tissue resident memory T cells: from lymphoid organs to infected tissues. Nat. Rev. Immunol. 22, 283–293 (2022).
pubmed: 34480118 doi: 10.1038/s41577-021-00590-3
Naik, S. H. et al. Diverse and heritable lineage imprinting of early haematopoietic progenitors. Nature 496, 229–232 (2013).
pubmed: 23552896 doi: 10.1038/nature12013
Dhimolea, E. et al. An embryonic diapause-like adaptation with suppressed Myc activity enables tumor treatment persistence. Cancer Cell 39, 240–256.e11 (2021).
pubmed: 33417832 pmcid: 8670073 doi: 10.1016/j.ccell.2020.12.002
Merino, D. et al. Barcoding reveals complex clonal behavior in patient-derived xenografts of metastatic triple negative breast cancer. Nat. Commun. 10, 766 (2019).
pubmed: 30770823 pmcid: 6377663 doi: 10.1038/s41467-019-08595-2
Echeverria, G. V. et al. Resistance to neoadjuvant chemotherapy in triple negative breast cancer mediated by a reversible drug-tolerant state. Sci. Transl. Med. 11, eaav0936 (2019).
pubmed: 30996079 pmcid: 6541393 doi: 10.1126/scitranslmed.aav0936
Echeverria, G. V. et al. High-resolution clonal mapping of multi-organ metastasis in triple negative breast cancer. Nat. Commun. 9, 5079 (2018).
pubmed: 30498242 pmcid: 6265294 doi: 10.1038/s41467-018-07406-4
Blundell, J. R. & Levy, S. F. Beyond genome sequencing: lineage tracking with barcodes to study the dynamics of evolution, infection, and cancer. Genomics 104, 417–430 (2014).
pubmed: 25260907 doi: 10.1016/j.ygeno.2014.09.005
Naik, S. H., Schumacher, T. N. & Perié, L. Cellular barcoding: a technical appraisal. Exp. Hematol. 42, 598–608 (2014).
pubmed: 24996012 doi: 10.1016/j.exphem.2014.05.003
McKenna, A. et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016).
pubmed: 27229144 pmcid: 4967023 doi: 10.1126/science.aaf7907
Frieda, K. L. et al. Synthetic recording and in situ readout of lineage information in single cells. Nature 541, 107–111 (2017).
pubmed: 27869821 doi: 10.1038/nature20777
Alemany, A., Florescu, M., Baron, C. S., Peterson-Maduro, J. & van Oudenaarden, A. Whole-organism clone tracing using single-cell sequencing. Nature 556, 108–112 (2018).
pubmed: 29590089 doi: 10.1038/nature25969
Raj, B., Gagnon, J. A. & Schier, A. F. Large-scale reconstruction of cell lineages using single-cell readout of transcriptomes and CRISPR–Cas9 barcodes by scGESTALT. Nat. Protoc. 13, 2685–2713 (2018).
pubmed: 30353175 pmcid: 6279253 doi: 10.1038/s41596-018-0058-x
Spanjaard, B. et al. Simultaneous lineage tracing and cell-type identification using CRISPR–Cas9-induced genetic scars. Nat. Biotechnol. 36, 469–473 (2018).
pubmed: 29644996 pmcid: 5942543 doi: 10.1038/nbt.4124
Marsolier, J. et al. H3K27me3 conditions chemotolerance in triple-negative breast cancer. Nat. Genet. 54, 459–468 (2022).
pubmed: 35410383 pmcid: 7612638 doi: 10.1038/s41588-022-01047-6
Thielecke, L. et al. Limitations and challenges of genetic barcode quantification. Sci. Rep. 7, 43249 (2017).
pubmed: 28256524 pmcid: 5335698 doi: 10.1038/srep43249
Pei, W. et al. Polylox barcoding reveals haematopoietic stem cell fates realized in vivo. Nature 548, 456–460 (2017).
pubmed: 28813413 pmcid: 5905670 doi: 10.1038/nature23653
Urbanus, J. et al. DRAG in situ barcoding reveals an increased number of HSPCs contributing to myelopoiesis with age. Nat. Commun. 14, 2184 (2023).
pubmed: 37069150 pmcid: 10110593 doi: 10.1038/s41467-023-37167-8
Beltman, J. B. et al. Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells. BMC Bioinformatics 17, 151 (2016).
pubmed: 27038897 pmcid: 4818877 doi: 10.1186/s12859-016-0999-4
Lyne, A.-M. et al. A track of the clones: new developments in cellular barcoding. Exp. Hematol. 68, 15–20 (2018).
pubmed: 30448259 doi: 10.1016/j.exphem.2018.11.005
Hadj Abed, L., Tak, T., Cosgrove, J. & Perié, L. CellDestiny: a RShiny application for the visualization and analysis of single-cell lineage tracing data. Front. Med. 9, 919345 (2022).
doi: 10.3389/fmed.2022.919345
Espinoza, D. A., Mortlock, R. D., Koelle, S. J., Wu, C. & Dunbar, C. E. Interrogation of clonal tracking data using barcodetrackR. Nat. Comput. Sci. 1, 280–289 (2021).
pubmed: 37621673 pmcid: 10449013 doi: 10.1038/s43588-021-00057-4
Lin, D. S. et al. DiSNE movie visualization and assessment of clonal kinetics reveal multiple trajectories of dendritic cell development. Cell Rep. 22, 2557–2566 (2018).
pubmed: 29514085 doi: 10.1016/j.celrep.2018.02.046
Thielecke, L., Cornils, K. & Glauche, I. genBaRcode: a comprehensive R-package for genetic barcode analysis. Bioinformatics 36, 2189–2194 (2020).
pubmed: 31782763 doi: 10.1093/bioinformatics/btz872
Zhao, L., Liu, Z., Levy, S. F. & Wu, S. Bartender: a fast and accurate clustering algorithm to count barcode reads. Bioinformatics 34, 739–747 (2018).
pubmed: 29069318 doi: 10.1093/bioinformatics/btx655
Kong, W. et al. CellTagging: combinatorial indexing to simultaneously map lineage and identity at single-cell resolution. Nat. Protoc. 15, 750–772 (2020).
pubmed: 32051617 pmcid: 7427510 doi: 10.1038/s41596-019-0247-2
Bandler, R. C. et al. Single-cell delineation of lineage and genetic identity in the mouse brain. Nature 601, 404–409 (2022).
pubmed: 34912118 doi: 10.1038/s41586-021-04237-0
Eisele, A. S. et al. Erythropoietin directly remodels the clonal composition of murine hematopoietic multipotent progenitor cells. eLife 11, e66922 (2022).
pubmed: 35166672 pmcid: 8884727 doi: 10.7554/eLife.66922
Sender, R. & Milo, R. The distribution of cellular turnover in the human body. Nat. Med. 27, 45–48 (2021).
pubmed: 33432173 doi: 10.1038/s41591-020-01182-9
Bystrykh, L. V. Generalized DNA barcode design based on Hamming codes. PLoS ONE 7, e36852 (2012).
pubmed: 22615825 pmcid: 3355179 doi: 10.1371/journal.pone.0036852
Beneyto-Calabuig, S. et al. Clonally resolved single-cell multi-omics identifies routes of cellular differentiation in acute myeloid leukemia. Cell Stem Cell 30, 706–721.e8 (2023).
pubmed: 37098346 doi: 10.1016/j.stem.2023.04.001
Jindal, K., Adil, M.T., Yamaguchi, N. et al. Single-cell lineage capture across genomic modalities with CellTag-multi reveals fate-specific gene regulatory changes. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01931-4 (2023).
Cosgrove, J. et al. Metabolically primed multipotent hematopoietic progenitors fuel innate immunity. Preprint at https://doi.org/10.1101/2023.01.24.525166 (2023).
Biddy, B. A. et al. Single-cell mapping of lineage and identity in direct reprogramming. Nature 564, 219–224 (2018).
pubmed: 30518857 pmcid: 6635140 doi: 10.1038/s41586-018-0744-4
Radtke, S. et al. Stochastic fate decisions of HSCs after transplantation: early contribution, symmetric expansion, and pool formation. Blood 142, 33–43 (2023).
pubmed: 36821766
Hamming, R. W. Error detecting and error correcting codes. Bell Syst. Tech. J. 29, 147–160 (1950).
doi: 10.1002/j.1538-7305.1950.tb00463.x
Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).
pubmed: 22199392 doi: 10.1093/bioinformatics/btr708
Buschmann, T. DNABarcodes: an R package for the systematic construction of DNA sample tags. Bioinformatics 33, 920–922 (2017).
pubmed: 28052927 doi: 10.1093/bioinformatics/btw759
Marcou, Q., Mora, T. & Walczak, A. M. High-throughput immune repertoire analysis with IGoR. Nat. Commun. 9, 561 (2018).
pubmed: 29422654 pmcid: 5805751 doi: 10.1038/s41467-018-02832-w
Desponds, J., Mora, T. & Walczak, A. M. Fluctuating fitness shapes the clone-size distribution of immune repertoires. Proc. Natl Acad. Sci. USA 113, 274–279 (2016).
pubmed: 26711994 doi: 10.1073/pnas.1512977112
Adair, J. E. et al. DNA barcoding in nonhuman primates reveals important limitations in retrovirus integration site analysis. Mol. Ther. Methods Clin. Dev. 17, 796–809 (2020).
pubmed: 32355868 pmcid: 7184234 doi: 10.1016/j.omtm.2020.03.021
R Core Team R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2016).
Weiss, G. & von Haeseler, A. A coalescent approach to the polymerase chain reaction. Nucleic Acids Res. 25, 3082–3087 (1997).
pubmed: 9224608 pmcid: 146862 doi: 10.1093/nar/25.15.3082
McInerney, P., Adams, P. & Hadi, M. Z. Error rate comparison during polymerase chain reaction by DNA polymerase. Mol. Biol. Int. 2014, 287430 (2014).
pubmed: 25197572 pmcid: 4150459 doi: 10.1155/2014/287430
Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifier performance in R. Bioinformatics 21, 3940–3941 (2005).
pubmed: 16096348 doi: 10.1093/bioinformatics/bti623
Wang, H. & Song, M. Ckmeans.1d.dp: optimal k-means clustering in one dimension by dynamic programming. R J. 3, 29–33 (2011).
pubmed: 27942416 pmcid: 5148156 doi: 10.32614/RJ-2011-015
Johnson, M. S., Venkataram, S. & Kryazhimskiy, S. Best practices in designing, sequencing, and identifying random DNA barcodes. J. Mol. Evol. 91, 263–280 (2023).
pubmed: 36651964 pmcid: 10276077 doi: 10.1007/s00239-022-10083-z
Fodde, R. et al. A targeted chain-termination mutation in the mouse Apc gene results in multiple intestinal tumors. Proc. Natl Acad. Sci. USA 91, 8969–8973 (1994).
pubmed: 8090754 pmcid: 44728 doi: 10.1073/pnas.91.19.8969
Jacquemin, G. et al. Paracrine signalling between intestinal epithelial and tumour cells induces a regenerative programme. eLife https://doi.org/10.7554/eLife.76541 (2022).
Mourao, L. et al. Lineage tracing of Notch1-expressing cells in intestinal tumours reveals a distinct population of cancer stem cells. Sci. Rep. 9, 888 (2019).
pubmed: 30696875 pmcid: 6351556 doi: 10.1038/s41598-018-37301-3
Fre, S. et al. Notch lineages and activity in intestinal stem cells determined by a new set of knock-in mice. PLoS ONE 6, e25785 (2011).
pubmed: 21991352 pmcid: 3185035 doi: 10.1371/journal.pone.0025785
Lilja, A. M. et al. Clonal analysis of Notch1-expressing cells reveals the existence of unipotent stem cells that retain long-term plasticity in the embryonic mammary gland. Nat. Cell Biol. 20, 677–687 (2018).
pubmed: 29784917 pmcid: 6984964 doi: 10.1038/s41556-018-0108-1
Lloyd-Lewis, B. et al. In vivo imaging of mammary epithelial cell dynamics in response to lineage-biased Wnt/β-catenin activation. Cell Rep. 38, 110461 (2022).
pubmed: 35263603 pmcid: 7615182 doi: 10.1016/j.celrep.2022.110461
Zorita, E., Cuscó, P. & Filion, G. J. Starcode: sequence clustering based on all-pairs search. Bioinformatics 31, 1913–1919 (2015).
pubmed: 25638815 pmcid: 4765884 doi: 10.1093/bioinformatics/btv053
Eisele, A. S. et al. Erythropoietin directly remodels the clonal composition of murine hematopoietic multipotent progenitor cells. Zenodo (2021) https://doi.org/10.5281/zenodo.5645045
Sun, W. et al. CellBarcode package paper dataset. Zenodo https://doi.org/10.5281/zenodo.8124948 (2023).
Urbanus, J. et al. UrbanusCosgrove-et-al-DRAG-mouse. Zenodo https://doi.org/10.5281/zenodo.10027001 (2023).
Sun, W. et al. TeamPerie/CellBarcode_paper_Sun_et_al. Zenodo https://doi.org/10.5281/zenodo.10492761 (2024).
Sun, W. et al. CellBarcode. Bioconductor https://doi.org/10.18129/B9.bioc.CellBarcode (2021).
Sun, W. et al. TeamPerie/CellBarcodeSim. Zenodo https://doi.org/10.5281/zenodo.10492831 (2024).

Auteurs

Wenjie Sun (W)

Institut Curie, Université PSL, Sorbonne Université, CNRS UMR168, Laboratoire Physico Chimie Curie, Paris, France. sunwjie@gmail.com.

Meghan Perkins (M)

Institut Curie, Laboratory of Genetics and Developmental Biology, PSL Research University, INSERM U934, CNRS UMR3215, Paris, France.

Mathilde Huyghe (M)

Institut Curie, Laboratory of Genetics and Developmental Biology, PSL Research University, INSERM U934, CNRS UMR3215, Paris, France.

Marisa M Faraldo (MM)

Institut Curie, Laboratory of Genetics and Developmental Biology, PSL Research University, INSERM U934, CNRS UMR3215, Paris, France.

Silvia Fre (S)

Institut Curie, Laboratory of Genetics and Developmental Biology, PSL Research University, INSERM U934, CNRS UMR3215, Paris, France.

Leïla Perié (L)

Institut Curie, Université PSL, Sorbonne Université, CNRS UMR168, Laboratoire Physico Chimie Curie, Paris, France. leila.perie@curie.fr.

Anne-Marie Lyne (AM)

Institut Curie, Université PSL, Sorbonne Université, CNRS UMR168, Laboratoire Physico Chimie Curie, Paris, France. anne-marie.lyne@curie.fr.
INSERM U900, Paris, France. anne-marie.lyne@curie.fr.
MINES ParisTech, CBIO-Centre for Computational Biology, PSL Research University, Paris, France. anne-marie.lyne@curie.fr.

Classifications MeSH