A Biterm Topic Model for Sparse Mutation Data.

biterm topic model mutational signature panel sequencing data

Journal

Cancers
ISSN: 2072-6694
Titre abrégé: Cancers (Basel)
Pays: Switzerland
ID NLM: 101526829

Informations de publication

Date de publication:
04 Mar 2023
Historique:
received: 05 01 2023
revised: 28 02 2023
accepted: 03 03 2023
entrez: 11 3 2023
pubmed: 12 3 2023
medline: 12 3 2023
Statut: epublish

Résumé

Mutational signature analysis promises to reveal the processes that shape cancer genomes for applications in diagnosis and therapy. However, most current methods are geared toward rich mutation data that has been extracted from whole-genome or whole-exome sequencing. Methods that process sparse mutation data typically found in practice are only in the earliest stages of development. In particular, we previously developed the Mix model that clusters samples to handle data sparsity. However, the Mix model had two hyper-parameters, including the number of signatures and the number of clusters, that were very costly to learn. Therefore, we devised a new method that was several orders-of-magnitude more efficient for handling sparse data, was based on mutation co-occurrences, and imitated word co-occurrence analyses of Twitter texts. We showed that the model produced significantly improved hyper-parameter estimates that led to higher likelihoods of discovering overlooked data and had better correspondence with known signatures.

Identifiants

pubmed: 36900390
pii: cancers15051601
doi: 10.3390/cancers15051601
pmc: PMC10000560
pii:
doi:

Types de publication

Journal Article

Langues

eng

Subventions

Organisme : United States-Israel Binational Science Foundation
ID : 2019336

Références

Nucleic Acids Res. 2019 Jan 8;47(D1):D941-D947
pubmed: 30371878
Bioinformatics. 2018 Jan 15;34(2):330-337
pubmed: 29028923
Cell Rep. 2013 Jan 31;3(1):246-59
pubmed: 23318258
Genome Biol. 2013 Apr 29;14(4):R39
pubmed: 23628380
Nat Genet. 2016 Jun;48(6):600-606
pubmed: 27111033
Genome Med. 2018 Apr 25;10(1):33
pubmed: 29695279
Swiss Med Wkly. 2020 Jan 27;150:w20158
pubmed: 31986218
Bioinformatics. 2017 Jan 1;33(1):8-16
pubmed: 27591080
Nat Biotechnol. 2013 Nov;31(11):1023-31
pubmed: 24142049
PLoS Genet. 2015 Dec 02;11(12):e1005657
pubmed: 26630308
Nat Med. 2019 Feb;25(2):221-224
pubmed: 30510256
Nat Med. 2017 Jun;23(6):703-713
pubmed: 28481359
BMC Cancer. 2019 May 15;19(1):457
pubmed: 31092228
Nature. 2020 Feb;578(7793):94-101
pubmed: 32025018
Contemp Oncol (Pozn). 2015;19(1A):A68-77
pubmed: 25691825
Genome Med. 2021 Nov 1;13(1):173
pubmed: 34724984
Bioinformatics. 2019 Jul 15;35(14):i492-i500
pubmed: 31510643
Genome Med. 2019 Jul 26;11(1):49
pubmed: 31349863
Cell. 2017 Nov 16;171(5):1042-1056.e10
pubmed: 29056344
Nature. 2013 Aug 22;500(7463):415-21
pubmed: 23945592
Genome Biol. 2016 Feb 22;17:31
pubmed: 26899170
Nat Genet. 2019 May;51(5):912-919
pubmed: 30988514
J Mol Diagn. 2015 May;17(3):251-64
pubmed: 25801821
Nat Med. 2017 Apr;23(4):517-525
pubmed: 28288110

Auteurs

Itay Sason (I)

School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel.

Yuexi Chen (Y)

Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20740, USA.

Mark D M Leiserson (MDM)

Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20740, USA.

Roded Sharan (R)

School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel.

Classifications MeSH