Single-cell gene fusion detection by scFusion.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
28 02 2022
Historique:
received: 23 03 2021
accepted: 03 02 2022
entrez: 1 3 2022
pubmed: 2 3 2022
medline: 13 4 2022
Statut: epublish

Résumé

Gene fusions can play important roles in tumor initiation and progression. While fusion detection so far has been from bulk samples, full-length single-cell RNA sequencing (scRNA-seq) offers the possibility of detecting gene fusions at the single-cell level. However, scRNA-seq data have a high noise level and contain various technical artifacts that can lead to spurious fusion discoveries. Here, we present a computational tool, scFusion, for gene fusion detection based on scRNA-seq. We evaluate the performance of scFusion using simulated and five real scRNA-seq datasets and find that scFusion can efficiently and sensitively detect fusions with a low false discovery rate. In a T cell dataset, scFusion detects the invariant TCR gene recombinations in mucosal-associated invariant T cells that many methods developed for bulk data fail to detect; in a multiple myeloma dataset, scFusion detects the known recurrent fusion IgH-WHSC1, which is associated with overexpression of the WHSC1 oncogene. Our results demonstrate that scFusion can be used to investigate cellular heterogeneity of gene fusions and their transcriptional impact at the single-cell level.

Identifiants

pubmed: 35228538
doi: 10.1038/s41467-022-28661-6
pii: 10.1038/s41467-022-28661-6
pmc: PMC8885711
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

1084

Subventions

Organisme : National Natural Science Foundation of China (National Science Foundation of China)
ID : 11971039

Informations de copyright

© 2022. The Author(s).

Références

Rowley, J. D. Identificaton of a translocation with quinacrine fluorescence in a patient with acute leukemia. Annal. Genetique 16, 109–112 (1973).
Nowell, P. C. & Hungerford, D. A. Chromosome studies on normal and leukemic human leukocytes. J. Natl Cancer Inst. 25, 85–109 (1960).
pubmed: 14427847
Demichelis, F. et al. TMPRSS2:ERG gene fusion associated with lethal prostate cancer in a watchful waiting cohort. Oncogene 26, 4596–4599 (2007).
pubmed: 17237811 doi: 10.1038/sj.onc.1210237
Choi, Y. L. et al. EML4-ALK mutations in lung cancer that confer resistance to ALK inhibitors. N. Engl. J. Med. 363, 1734–1739 (2010).
pubmed: 20979473 doi: 10.1056/NEJMoa1007478
O’Hare, T. et al. In vitro activity of Bcr-Abl inhibitors AMN107 and BMS-354825 against clinically relevant imatinib-resistant Abl kinase domain mutants. Cancer Res. 65, 4500–4505 (2005).
pubmed: 15930265 doi: 10.1158/0008-5472.CAN-05-0259
Shaw, A. T. et al. Crizotinib versus chemotherapy in advanced ALK-positive lung cancer. N. Engl. J. Med. 368, 2385–2394 (2013).
pubmed: 23724913 doi: 10.1056/NEJMoa1214886
Laetsch, T. W. et al. Larotrectinib for paediatric solid tumours harbouring NTRK gene fusions: phase 1 results from a multicentre, open-label, phase 1/2 study. Lancet Oncol. 19, 705–714 (2018).
pubmed: 29606586 pmcid: 5949072 doi: 10.1016/S1470-2045(18)30119-0
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).
pubmed: 24056875 doi: 10.1038/nmeth.2639
Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).
pubmed: 24385147 doi: 10.1038/nprot.2014.006
Ramskold, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).
pubmed: 22820318 pmcid: 3467340 doi: 10.1038/nbt.2282
Kharchenko, P. V. The triumphs and limitations of computational methods for scRNA-seq. Nat. Methods 18, 723–732 (2021).
pubmed: 34155396 doi: 10.1038/s41592-021-01171-x
Chen, K. et al. BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data. Bioinformatics 28, 1923–1924 (2012).
pubmed: 22563071 pmcid: 3389765 doi: 10.1093/bioinformatics/bts272
Nicorici, D. et al. FusionCatcher—a tool for finding somatic fusion genes in paired-end RNA-sequencing data. BioRxiv https://doi.org/10.1101/011650 (2014).
Davidson, N. M., Majewski, I. J. & Oshlack, A. JAFFA: High sensitivity transcriptome-focused fusion gene detection. Genome Med. 7, 43 (2015).
pubmed: 26019724 pmcid: 4445815 doi: 10.1186/s13073-015-0167-x
Francis, R. W. et al. FusionFinder: a software tool to identify expressed gene fusion candidates from RNA-Seq data. PLoS ONE 7, e39987 (2012).
pubmed: 22761941 pmcid: 3384600 doi: 10.1371/journal.pone.0039987
Li, Y., Chien, J., Smith, D. I. & Ma, J. FusionHunter: identifying fusion transcripts in cancer using paired-end RNA-seq. Bioinformatics 27, 1708–1710 (2011).
pubmed: 21546395 doi: 10.1093/bioinformatics/btr265
McPherson, A. et al. deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data. PLoS Comput. Biol. 7, e1001138 (2011).
pubmed: 21625565 pmcid: 3098195 doi: 10.1371/journal.pcbi.1001138
Benelli, M. et al. Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript. Bioinformatics 28, 3232–3239 (2012).
pubmed: 23093608 doi: 10.1093/bioinformatics/bts617
Uhrig, S. et al. Accurate and efficient detection of gene fusions from RNA sequencing data. Genome Res. 31, 448–460 (2021).
pubmed: 33441414 pmcid: 7919457 doi: 10.1101/gr.257246.119
Haas, B. J. et al. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol. 20, 213 (2019).
pubmed: 31639029 pmcid: 6802306 doi: 10.1186/s13059-019-1842-9
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
pubmed: 23104886 doi: 10.1093/bioinformatics/bts635
Ashurst, J. L. et al. The vertebrate genome annotation (Vega) database. Nucleic Acids Res. 33, D459–D465 (2005).
pubmed: 15608237 doi: 10.1093/nar/gki135
Zhang, Q. et al. Landscape and dynamics of single immune. Cells Hepatocell. Carcinoma Cell 179, 829–845.e820 (2019).
Sun, T., Song, D., Li, W. V. & Li, J. J. scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. Genome Biol. 22, 1–37 (2021).
Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J. P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 9, 284 (2018).
pubmed: 29348443 pmcid: 5773593 doi: 10.1038/s41467-017-02554-5
Townes, F. W., Hicks, S. C., Aryee, M. J. & Irizarry, R. A. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol. 20, 1–16 (2019).
doi: 10.1186/s13059-019-1861-6
Sarkar, A. & Stephens, M. Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis. Nat. Genet. 53, 770–777 (2021).
pubmed: 34031584 pmcid: 8370014 doi: 10.1038/s41588-021-00873-4
Schuster, M. & Paliwal, K. K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 2673–2681 (1997).
doi: 10.1109/78.650093
Quang, D. & Xie, X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107–e107 (2016).
pubmed: 27084946 pmcid: 4914104 doi: 10.1093/nar/gkw226
Yang, L. et al. Single-cell RNA-seq of esophageal squamous cell carcinoma cell line with fractionated irradiation reveals radioresistant gene expression patterns. BMC Genomics 20, 611 (2019).
pubmed: 31345182 pmcid: 6659267 doi: 10.1186/s12864-019-5970-0
Horning, A. M. et al. Single-Cell RNA-seq reveals a subpopulation of prostate cancer cells with enhanced cell-Cycle–Related transcription and attenuated androgen response. Cancer Res. 78, 853–864 (2018).
pubmed: 29233929 doi: 10.1158/0008-5472.CAN-17-1924
Fan, J. et al. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data. Genome Res. 28, 1217–1227 (2018).
pubmed: 29898899 pmcid: 6071640 doi: 10.1101/gr.228080.117
Jang, J. S. et al. Molecular signatures of multiple myeloma progression through single cell RNA-Seq. Blood Cancer J. 9, 2 (2019).
pubmed: 30607001 pmcid: 6318319 doi: 10.1038/s41408-018-0160-x
Krivtsov, A. V. et al. A menin-MLL inhibitor induces specific chromatin changes and eradicates disease in models of MLL-rearranged leukemia. Cancer Cell 36, 660–673. e611 (2019).
pubmed: 31821784 pmcid: 7227117 doi: 10.1016/j.ccell.2019.11.001
Calabrese, C. et al. Genomic basis for RNA alterations in cancer. Nature 578, 129–136 (2020).
pubmed: 32025019 pmcid: 7054216 doi: 10.1038/s41586-020-1970-0
Campbell, P. J. et al. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
doi: 10.1038/s41586-020-1969-6
Haas, B. J. et al. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res. 21, 494–504 (2011).
pubmed: 21212162 pmcid: 3044863 doi: 10.1101/gr.112730.110
He, M. X. et al. Transcriptional mediators of treatment resistance in lethal prostate cancer. Nat. Med. 27, 426–433 (2021).
pubmed: 33664492 pmcid: 7960507 doi: 10.1038/s41591-021-01244-6
Zheng, C. et al. Landscape of infiltrating T cells in liver cancer revealed by single-cell sequencing. Cell 169, 1342–1356 e1316 (2017).
pubmed: 28622514 doi: 10.1016/j.cell.2017.05.035
Rudak, P. T., Yao, T., Richardson, C. D. & Haeryfar, S. Measles virus infects and programs MAIT cells for apoptosis. J. Infect. Dis. 223, 667–672 (2020).
Godfrey, D. I., Koay, H.-F., McCluskey, J. & Gherardin, N. A. The biology and functional importance of MAIT cells. Nat. Immunol. 20, 1110–1128 (2019).
pubmed: 31406380 doi: 10.1038/s41590-019-0444-8
Barwick, B. G. et al. Multiple myeloma immunoglobulin lambda translocations portend poor prognosis. Nat. Commun. 10, 1911 (2019).
pubmed: 31015454 pmcid: 6478743 doi: 10.1038/s41467-019-09555-6
Bergsagel, P. L. et al. Promiscuous translocations into immunoglobulin heavy chain switch regions in multiple myeloma. Proc. Natl Acad. Sci. USA 93, 13931–13936 (1996).
pubmed: 8943038 pmcid: 19472 doi: 10.1073/pnas.93.24.13931
Forbes, S. A. et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 43, D805–D811 (2015).
pubmed: 25355519 doi: 10.1093/nar/gku1075
Stec, I. et al. WHSC1, a 90 kb SET domain-containing gene, expressed in early development and homologous to a Drosophila dysmorphy gene maps in the Wolf-Hirschhorn syndrome critical region and is fused to IgH in t (1; 14) multiple myeloma. Hum. Mol. Genet. 7, 1071–1082 (1998).
pubmed: 9618163 doi: 10.1093/hmg/7.7.1071
Santra, M., Zhan, F., Tian, E., Barlogie, B. & Shaughnessy, J. Jr A subset of multiple myeloma harboring the t (4; 14)(p16; q32) translocation lacks FGFR3 expression but maintains an IGH/MMSET fusion transcript. Blood J. Am. Soc. Hematol. 101, 2374–2376 (2003).
Malgeri, U. et al. Detection of t (4; 14)(p16. 3; q32) chromosomal translocation in multiple myeloma by reverse transcription-polymerase chain reaction analysis of IGH-MMSET fusion transcripts. Cancer Res. 60, 4058–4061 (2000).
pubmed: 10945609
Kuo, A. J. et al. NSD2 links dimethylation of histone H3 at lysine 36 to oncogenic programming. Mol. Cell 44, 609–620 (2011).
pubmed: 22099308 pmcid: 3222870 doi: 10.1016/j.molcel.2011.08.042
Keats, J. J., Reiman, T., Belch, A. R. & Pilarski, L. M. Ten years and counting: so what do we know about t(4;14)(p16;q32) multiple myeloma. Leuk. Lymphoma 47, 2289–2300 (2006).
pubmed: 17107900 doi: 10.1080/10428190600822128
Mahajan, N., Weber, J. D., Maggi, L. B. & Tomasson, M. H. ACA11, a small nucleolar RNA activated in multiple myeloma, stimulates proliferation by inactivating NRF2 and increasing redox signaling. FASEB J. 30, 1054.1057–1054.1057 (2016).
Mani, R.-S. et al. TMPRSS2–ERG-mediated feed-forward regulation of wild-type ERG in human prostate cancers. Cancer Res. 71, 5387–5392 (2011).
pubmed: 21676887 pmcid: 3156376 doi: 10.1158/0008-5472.CAN-11-0876
Adamo, P. & Ladomery, M. R. The oncogene ERG: a key factor in prostate cancer. Oncogene 35, 403–414 (2016).
pubmed: 25915839 doi: 10.1038/onc.2015.109
Semaan, L., Mander, N., Cher, M. L. & Chinni, S. R. TMPRSS2-ERG fusions confer efficacy of enzalutamide in an in vivo bone tumor growth model. BMC Cancer 19, 972 (2019).
pubmed: 31638934 pmcid: 6802314 doi: 10.1186/s12885-019-6185-0
Zimmermann, S. et al. ALPK1- and TIFA-dependent innate immune response triggered by the Helicobacter pylori type IV secretion system. Cell Rep. 20, 2384–2395 (2017).
pubmed: 28877472 doi: 10.1016/j.celrep.2017.08.039
Keats, J. J. et al. Overexpression of transcripts originating from the MMSET locus characterizes all t(4;14)(p16;q32)-positive multiple myeloma patients. Blood 105, 4060–4069 (2005).
pubmed: 15677557 pmcid: 1895072 doi: 10.1182/blood-2004-09-3704
Sims, D., Sudbery, I., Ilott, N. E., Heger, A. & Ponting, C. P. Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet. 15, 121–132 (2014).
pubmed: 24434847 doi: 10.1038/nrg3642
Kingma, D. P. & Ba, J. L. Adam: a method for stochastic optimization. 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc. 1–15 (2015).
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
pubmed: 26213851 doi: 10.1038/nbt.3300
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
pubmed: 29608179 pmcid: 6700744 doi: 10.1038/nbt.4096
Stuart, T. et al. Comprehensive Integration of Single-. Cell Data. Cell 177, 1888–1902.e1821 (2019).
pubmed: 31178118
Jin, Z. et al. Single cell gene fusion detection by scFusion. GitHub https://doi.org/10.5281/zenodo.5879110 (2022)

Auteurs

Zijie Jin (Z)

School of Mathematical Sciences, Peking University, Beijing, 100871, China.

Wenjian Huang (W)

Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China.

Ning Shen (N)

Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, 311121, China.
Department of Biomedical Informatics, Harvard Medical School, Boston, 02115, MA, USA.

Juan Li (J)

Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China.

Xiaochen Wang (X)

School of Mathematical Sciences, Peking University, Beijing, 100871, China.

Jiqiao Dong (J)

GeneX Health Co. Ltd, Beijing, 100195, China.

Peter J Park (PJ)

Department of Biomedical Informatics, Harvard Medical School, Boston, 02115, MA, USA.

Ruibin Xi (R)

School of Mathematical Sciences, Peking University, Beijing, 100871, China. ruibinxi@math.pku.edu.cn.
Center for Statistical Science, Peking University, Beijing, 100871, China. ruibinxi@math.pku.edu.cn.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Cephalometry Humans Anatomic Landmarks Software Internet
Animals Lung India Sheep Transcriptome

Classifications MeSH