De novo detection of somatic mutations in high-throughput single-cell profiling data sets.


Journal

Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648

Informations de publication

Date de publication:
06 Jul 2023
Historique:
received: 23 11 2022
accepted: 07 06 2023
medline: 7 7 2023
pubmed: 7 7 2023
entrez: 6 7 2023
Statut: aheadofprint

Résumé

Characterization of somatic mutations at single-cell resolution is essential to study cancer evolution, clonal mosaicism and cell plasticity. Here, we describe SComatic, an algorithm designed for the detection of somatic mutations in single-cell transcriptomic and ATAC-seq (assay for transposase-accessible chromatin sequence) data sets directly without requiring matched bulk or single-cell DNA sequencing data. SComatic distinguishes somatic mutations from polymorphisms, RNA-editing events and artefacts using filters and statistical tests parameterized on non-neoplastic samples. Using >2.6 million single cells from 688 single-cell RNA-seq (scRNA-seq) and single-cell ATAC-seq (scATAC-seq) data sets spanning cancer and non-neoplastic samples, we show that SComatic detects mutations in single cells accurately, even in differentiated cells from polyclonal tissues that are not amenable to mutation detection using existing methods. Validated against matched genome sequencing and scRNA-seq data, SComatic achieves F1 scores between 0.6 and 0.7 across diverse data sets, in comparison to 0.2-0.4 for the second-best performing method. In summary, SComatic permits de novo mutational signature analysis, and the study of clonal heterogeneity and mutational burdens at single-cell resolution.

Identifiants

pubmed: 37414936
doi: 10.1038/s41587-023-01863-z
pii: 10.1038/s41587-023-01863-z
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : NHLBI NIH HHS
ID : R01 HL158269
Pays : United States

Informations de copyright

© 2023. The Author(s).

Références

Neftel, C. et al. An integrative model of cellular states, plasticity, and genetics for glioblastoma. Cell 178, 835–849.e21 (2019).
pubmed: 31327527 pmcid: 6703186 doi: 10.1016/j.cell.2019.06.024
Kakiuchi, N. & Ogawa, S. Clonal expansion in non-cancer tissues. Nat. Rev. Cancer 21, 239–256 (2021).
pubmed: 33627798 doi: 10.1038/s41568-021-00335-3
Nam, A. S., Chaligne, R. & Landau, D. A. Integrating genetic and non-genetic determinants of cancer evolution by single-cell multi-omics. Nat. Rev. Genet. 22, 3–18 (2021).
pubmed: 32807900 doi: 10.1038/s41576-020-0265-5
Lim, B., Lin, Y. & Navin, N. Advancing cancer research and medicine with single-cell genomics. Cancer Cell 37, 456–470 (2020).
pubmed: 32289270 pmcid: 7899145 doi: 10.1016/j.ccell.2020.03.008
Gawad, C., Koh, W. & Quake, S. R. Single-cell genome sequencing: current state of the science. Nat. Rev. Genet. 17, 175–188 (2016).
pubmed: 26806412 doi: 10.1038/nrg.2015.16
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
pubmed: 30185910 pmcid: 6163040 doi: 10.1038/s41586-018-0497-0
Moore, L. et al. The mutational landscape of human somatic and germline cells. Nature 597, 381–386 (2021).
pubmed: 34433962 doi: 10.1038/s41586-021-03822-7
Van Egeren, D. et al. Reconstructing the lineage histories and differentiation trajectories of individual cancer cells in myeloproliferative neoplasms. Cell Stem Cell 28, 514–523.e9 (2021).
pubmed: 33621486 pmcid: 7939520 doi: 10.1016/j.stem.2021.02.001
Zhang, C.-Z. et al. Calibrating genomic and allelic coverage bias in single-cell sequencing. Nat. Commun. 6, 6822 (2015).
pubmed: 25879913 doi: 10.1038/ncomms7822
Xing, D., Tan, L., Chang, C.-H., Li, H. & Xie, X. S. Accurate SNV detection in single cells by transposon-based whole-genome amplification of complementary strands. Proc. Natl Acad. Sci. USA 118, e2013106118 (2021).
pubmed: 33593904 pmcid: 7923680 doi: 10.1073/pnas.2013106118
Abascal, F. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021).
pubmed: 33911282 doi: 10.1038/s41586-021-03477-4
van Galen, P. et al. Single-cell RNA-seq reveals AML hierarchies relevant to disease progression and immunity. Cell 176, 1265–1281.e24 (2019).
pubmed: 30827681 pmcid: 6515904 doi: 10.1016/j.cell.2019.01.031
Li, R. et al. Mapping single-cell transcriptomes in the intra-tumoral and associated territories of kidney cancer. Cancer Cell 40, 1583–1599.e10 (2022).
Macaulay, I. C. et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat. Methods 12, 519–522 (2015).
pubmed: 25915121 doi: 10.1038/nmeth.3370
Nam, A. S. et al. Somatic mutations and cell identity linked by Genotyping of Transcriptomes. Nature 571, 355–360 (2019).
pubmed: 31270458 pmcid: 6782071 doi: 10.1038/s41586-019-1367-0
Reuter, J. A., Spacek, D. V., Pai, R. K. & Snyder, M. P. Simul-seq: combined DNA and RNA sequencing for whole-genome and transcriptome profiling. Nat. Methods 13, 953–958 (2016).
pubmed: 27723755 pmcid: 5734913 doi: 10.1038/nmeth.4028
Fan, J. et al. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data. Genome Res. 28, 1217–1227 (2018).
pubmed: 29898899 pmcid: 6071640 doi: 10.1101/gr.228080.117
Petti, A. A. et al. A general approach for detecting expressed mutations in AML cells using single cell RNA-sequencing. Nat. Commun. 10, 3660 (2019).
pubmed: 31413257 pmcid: 6694122 doi: 10.1038/s41467-019-11591-1
Kharchenko, P. V., Silberstein, L. & Scadden, D. T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).
pubmed: 24836921 pmcid: 4112276 doi: 10.1038/nmeth.2967
Huang, A. Y. et al. Parallel RNA and DNA analysis after deep sequencing (PRDD-seq) reveals cell type-specific lineage patterns in human brain. Proc. Natl Acad. Sci. USA 117, 13886–13895 (2020).
pubmed: 32522880 pmcid: 7322034 doi: 10.1073/pnas.2006163117
McCarthy, D. J. et al. Cardelino: computational integration of somatic clonal substructure and single-cell transcriptomes. Nat. Methods 17, 414–421 (2020).
pubmed: 32203388 doi: 10.1038/s41592-020-0766-3
Liu, F. et al. Systematic comparative analysis of single-nucleotide variant detection methods from single-cell RNA sequencing data. Genome Biol. 20, 242 (2019).
pubmed: 31744515 pmcid: 6862814 doi: 10.1186/s13059-019-1863-4
Bizzotto, S. et al. Landmarks of human embryonic development inscribed in somatic mutations. Science 371, 1249–1253 (2021).
pubmed: 33737485 pmcid: 8170505 doi: 10.1126/science.abe1544
Coorens, T. H. H. et al. Extensive phylogenies of human development inferred from somatic mutations. Nature 597, 387–392 (2021).
pubmed: 34433963 doi: 10.1038/s41586-021-03790-y
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
pubmed: 32461654 pmcid: 7334197 doi: 10.1038/s41586-020-2308-7
Ji, A. L. et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell 182, 497–514.e22 (2020).
pubmed: 32579974 pmcid: 7391009 doi: 10.1016/j.cell.2020.05.039
Martincorena, I. et al. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).
pubmed: 25999502 pmcid: 4471149 doi: 10.1126/science.aaa6806
Reble, E., Castellani, C. A., Melka, M. G., O’Reilly, R. & Singh, S. M. VarScan2 analysis of de novo variants in monozygotic twins discordant for schizophrenia. Psychiatr. Genet. 27, 62–70 (2017).
pubmed: 28125460 doi: 10.1097/YPG.0000000000000162
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
pubmed: 19451168 pmcid: 2705234 doi: 10.1093/bioinformatics/btp324
Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).
pubmed: 30013048 doi: 10.1038/s41592-018-0051-x
Zafar, H., Wang, Y., Nakhleh, L., Navin, N. & Chen, K. Monovar: single-nucleotide variant detection in single cells. Nat. Methods 13, 505–507 (2016).
pubmed: 27088313 pmcid: 4887298 doi: 10.1038/nmeth.3835
Prashant, N. M. et al. SCReadCounts: estimation of cell-level SNVs expression from scRNA-seq data. BMC Genomics 22, 689 (2021).
pubmed: 34551708 pmcid: 8459565 doi: 10.1186/s12864-021-07974-8
Vázquez-García, I. et al. Ovarian cancer mutational processes drive site-specific immune evasion. Nature 612, 778–786 (2022).
pubmed: 36517593 pmcid: 9771812 doi: 10.1038/s41586-022-05496-1
Li, R. et al. Mapping single-cell transcriptomes in the intra-tumoral and associated territories of kidney cancer. Cancer Cell 40, 1583–1599.e10 (2022).
pubmed: 36423636 pmcid: 9767677 doi: 10.1016/j.ccell.2022.11.001
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).
pubmed: 27135926 pmcid: 4910866 doi: 10.1038/nature17676
Gulhan, D. C., Lee, J. J.-K., Melloni, G. E. M., Cortés-Ciriano, I. & Park, P. J. Detecting the mutational signature of homologous recombination deficiency in clinical samples. Nat. Genet. 51, 912–919 (2019).
pubmed: 30988514 doi: 10.1038/s41588-019-0390-2
Pelka, K. et al. Spatially organized multicellular immune hubs in human colorectal cancer. Cell 184, 4734–4752.e20 (2021).
pubmed: 34450029 pmcid: 8772395 doi: 10.1016/j.cell.2021.08.003
Lee, H.-O. et al. Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nat. Genet. 52, 594–603 (2020).
pubmed: 32451460 doi: 10.1038/s41588-020-0636-z
Cortes-Ciriano, I., Lee, S., Park, W.-Y., Kim, T.-M. & Park, P. J. A molecular portrait of microsatellite instability across multiple cancers. Nat. Commun. 8, 15180 (2017).
pubmed: 28585546 pmcid: 5467167 doi: 10.1038/ncomms15180
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 174, 1034–1035 (2018).
pubmed: 30096302 pmcid: 8045146 doi: 10.1016/j.cell.2018.07.034
Haradhvala, N. J. et al. Distinct mutational signatures characterize concurrent loss of polymerase proofreading and mismatch repair. Nat. Commun. 9, 1746 (2018).
pubmed: 29717118 pmcid: 5931517 doi: 10.1038/s41467-018-04002-4
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
pubmed: 32025018 pmcid: 7054213 doi: 10.1038/s41586-020-1943-3
Osorio, F. G. et al. Somatic mutations reveal lineage relationships and age-related mutagenesis in human hematopoiesis. Cell Rep. 25, 2308–2316.e4 (2018).
pubmed: 30485801 pmcid: 6289083 doi: 10.1016/j.celrep.2018.11.014
Williams, N. et al. Life histories of myeloproliferative neoplasms inferred from phylogenies. Nature 602, 162–168 (2022).
pubmed: 35058638 doi: 10.1038/s41586-021-04312-6
Litviňuková, M. et al. Cells of the adult human heart. Nature 588, 466–472 (2020).
pubmed: 32971526 pmcid: 7681775 doi: 10.1038/s41586-020-2797-4
Choudhury, S. et al. Somatic mutations in single human cardiomyocytes reveal age-associated DNA damage and widespread oxidative genotoxicity. Nat. Aging 2, 714–725 (2022).
pubmed: 36051457 pmcid: 9432807 doi: 10.1038/s43587-022-00261-5
Eraslan, G. et al. Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science 376, eabl4290 (2022).
pubmed: 35549429 pmcid: 9383269 doi: 10.1126/science.abl4290
Zhang, K. et al. A single-cell atlas of chromatin accessibility in the human genome. Cell 184, 5985–6001.e19 (2021).
pubmed: 34774128 pmcid: 8664161 doi: 10.1016/j.cell.2021.10.024
Ng, S. W. K. et al. Convergent somatic mutations in metabolism genes in chronic liver disease. Nature 598, 473–478 (2021).
pubmed: 34646017 doi: 10.1038/s41586-021-03974-6
Gao, T. et al. Haplotype-aware analysis of somatic copy number variations from single-cell transcriptomes. Nat. Biotechnol. 41, 417–426 (2023).
pubmed: 36163550 doi: 10.1038/s41587-022-01468-y
Van Egeren, D. et al. Transcriptional differences between JAK2-V617F and wild-type bone marrow cells in patients with myeloproliferative neoplasms. Exp. Hematol. 107, 14–19 (2022).
pubmed: 34921959 doi: 10.1016/j.exphem.2021.12.364
Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).
pubmed: 29206104 pmcid: 5762154 doi: 10.7554/eLife.27041
Rozenblatt-Rosen, O. et al. The Human Tumor Atlas Network: charting tumor transitions across space and time at single-cell resolution. Cell 181, 236–249 (2020).
pubmed: 32302568 pmcid: 7376497 doi: 10.1016/j.cell.2020.03.053
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
pubmed: 28091601 pmcid: 5241818 doi: 10.1038/ncomms14049
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv https://doi.org/10.48550/arXiv.1303.3997 (2013).
Van der Auwera, G. A. & O’Connor, B. D. Genomics in the cloud: using Docker, GATK, and WDL in Terra (O’Reilly Media, 2020).
Muyas, F., Zapata, L., Guigó, R. & Ossowski, S. The rate and spectrum of mosaic mutations during embryogenesis revealed by RNA sequencing of 49 tissues. Genome Med. 12, 49 (2020).
pubmed: 32460841 pmcid: 7254727 doi: 10.1186/s13073-020-00746-1
Bonfield, J. K. et al. HTSlib: C library for reading/writing high-throughput sequencing data. Gigascience 10, giab007 (2021). https://github.com/pysam-developers/pysam
Lo Giudice, C., Tangaro, M. A., Pesole, G. & Picardi, E. Investigating RNA editing in deep transcriptome datasets with REDItools and REDIportal. Nat. Protoc. 15, 1098–1131 (2020).
pubmed: 31996844 doi: 10.1038/s41596-019-0279-7
Kiran, A. & Baranov, P. V. DARNED: a DAtabase of RNa EDiting in humans. Bioinformatics 26, 1772–1776 (2010).
pubmed: 20547637 doi: 10.1093/bioinformatics/btq285
Nakamura, K. et al. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 39, e90 (2011).
pubmed: 21576222 pmcid: 3141275 doi: 10.1093/nar/gkr344
Blokzijl, F., Janssen, R., van Boxtel, R. & Cuppen, E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 10, 33 (2018).
pubmed: 29695279 pmcid: 5922316 doi: 10.1186/s13073-018-0539-0
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
pubmed: 21478889 pmcid: 3083463 doi: 10.1038/ng.806
Fan, Y. et al. MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data. Genome Biol. 17, 178 (2016).
pubmed: 27557938 pmcid: 4995747 doi: 10.1186/s13059-016-1029-6
Dentro, S. C. et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 184, 2239–2254.e39 (2021).
pubmed: 33831375 pmcid: 8054914 doi: 10.1016/j.cell.2021.03.009
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943 pmcid: 2723002 doi: 10.1093/bioinformatics/btp352
Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
pubmed: 22300766 pmcid: 3290792 doi: 10.1101/gr.129684.111
Karczewski, K. J. et al. The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acids Res. 45, D840–D845 (2017).
pubmed: 27899611 doi: 10.1093/nar/gkw971
Huang, X. & Huang, Y. Cellsnp-lite: an efficient tool for genotyping single cells. Bioinformatics 37, 4569–4571 (2021).
pubmed: 33963851 doi: 10.1093/bioinformatics/btab358
Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
pubmed: 27694958 pmcid: 5096458 doi: 10.1038/ng.3679
Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).
pubmed: 30293088 pmcid: 6450507 doi: 10.1038/s41568-018-0060-1
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
pubmed: 20601685 pmcid: 2938201 doi: 10.1093/nar/gkq603

Auteurs

Francesc Muyas (F)

European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK.

Carolin M Sauer (CM)

European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK.

Jose Espejo Valle-Inclán (JE)

European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK.

Ruoyan Li (R)

Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.

Raheleh Rahbari (R)

Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.

Thomas J Mitchell (TJ)

Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.
Cambridge University Hospitals NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, UK.
Department of Surgery, University of Cambridge, Cambridge, UK.

Sahand Hormoz (S)

Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Isidro Cortés-Ciriano (I)

European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK. icortes@ebi.ac.uk.

Classifications MeSH