A Biterm Topic Model for Sparse Mutation Data.
biterm topic model
mutational signature
panel sequencing data
Journal
Cancers
ISSN: 2072-6694
Titre abrégé: Cancers (Basel)
Pays: Switzerland
ID NLM: 101526829
Informations de publication
Date de publication:
04 Mar 2023
04 Mar 2023
Historique:
received:
05
01
2023
revised:
28
02
2023
accepted:
03
03
2023
entrez:
11
3
2023
pubmed:
12
3
2023
medline:
12
3
2023
Statut:
epublish
Résumé
Mutational signature analysis promises to reveal the processes that shape cancer genomes for applications in diagnosis and therapy. However, most current methods are geared toward rich mutation data that has been extracted from whole-genome or whole-exome sequencing. Methods that process sparse mutation data typically found in practice are only in the earliest stages of development. In particular, we previously developed the Mix model that clusters samples to handle data sparsity. However, the Mix model had two hyper-parameters, including the number of signatures and the number of clusters, that were very costly to learn. Therefore, we devised a new method that was several orders-of-magnitude more efficient for handling sparse data, was based on mutation co-occurrences, and imitated word co-occurrence analyses of Twitter texts. We showed that the model produced significantly improved hyper-parameter estimates that led to higher likelihoods of discovering overlooked data and had better correspondence with known signatures.
Identifiants
pubmed: 36900390
pii: cancers15051601
doi: 10.3390/cancers15051601
pmc: PMC10000560
pii:
doi:
Types de publication
Journal Article
Langues
eng
Subventions
Organisme : United States-Israel Binational Science Foundation
ID : 2019336
Références
Nucleic Acids Res. 2019 Jan 8;47(D1):D941-D947
pubmed: 30371878
Bioinformatics. 2018 Jan 15;34(2):330-337
pubmed: 29028923
Cell Rep. 2013 Jan 31;3(1):246-59
pubmed: 23318258
Genome Biol. 2013 Apr 29;14(4):R39
pubmed: 23628380
Nat Genet. 2016 Jun;48(6):600-606
pubmed: 27111033
Genome Med. 2018 Apr 25;10(1):33
pubmed: 29695279
Swiss Med Wkly. 2020 Jan 27;150:w20158
pubmed: 31986218
Bioinformatics. 2017 Jan 1;33(1):8-16
pubmed: 27591080
Nat Biotechnol. 2013 Nov;31(11):1023-31
pubmed: 24142049
PLoS Genet. 2015 Dec 02;11(12):e1005657
pubmed: 26630308
Nat Med. 2019 Feb;25(2):221-224
pubmed: 30510256
Nat Med. 2017 Jun;23(6):703-713
pubmed: 28481359
BMC Cancer. 2019 May 15;19(1):457
pubmed: 31092228
Nature. 2020 Feb;578(7793):94-101
pubmed: 32025018
Contemp Oncol (Pozn). 2015;19(1A):A68-77
pubmed: 25691825
Genome Med. 2021 Nov 1;13(1):173
pubmed: 34724984
Bioinformatics. 2019 Jul 15;35(14):i492-i500
pubmed: 31510643
Genome Med. 2019 Jul 26;11(1):49
pubmed: 31349863
Cell. 2017 Nov 16;171(5):1042-1056.e10
pubmed: 29056344
Nature. 2013 Aug 22;500(7463):415-21
pubmed: 23945592
Genome Biol. 2016 Feb 22;17:31
pubmed: 26899170
Nat Genet. 2019 May;51(5):912-919
pubmed: 30988514
J Mol Diagn. 2015 May;17(3):251-64
pubmed: 25801821
Nat Med. 2017 Apr;23(4):517-525
pubmed: 28288110