RFcaller: a machine learning approach combined with read-level features to detect somatic mutations.
Journal
NAR genomics and bioinformatics
ISSN: 2631-9268
Titre abrégé: NAR Genom Bioinform
Pays: England
ID NLM: 101756213
Informations de publication
Date de publication:
Jun 2023
Jun 2023
Historique:
received:
13
10
2022
revised:
11
05
2023
accepted:
23
05
2023
medline:
1
6
2023
pubmed:
1
6
2023
entrez:
1
6
2023
Statut:
epublish
Résumé
The cost reduction in sequencing and the extensive genomic characterization of a wide variety of cancers are expanding tumor sequencing to a wide number of research groups and the clinical practice. Although specific pipelines have been generated for the identification of somatic mutations, their results usually differ considerably, and a common approach is to use several callers to achieve a more reliable set of mutations. This procedure is computationally expensive and time-consuming, and it suffers from the same limitations in sensitivity and specificity as other approaches. Expert revision of mutant calls is therefore required to verify calls that might be used for clinical diagnosis. This step could take advantage of machine learning techniques, as they provide a useful approach to incorporate expert-reviewed information for the identification of somatic mutations. Here we present RFcaller, a pipeline based on machine learning algorithms, for the detection of somatic mutations in tumor-normal paired samples that does not require large computing resources. RFcaller shows high accuracy for the detection of substitutions and insertions/deletions from whole genome or exome data. It allows the detection of mutations in driver genes missed by other approaches, and has been validated by comparison to deep and Sanger sequencing.
Identifiants
pubmed: 37260508
doi: 10.1093/nargab/lqad056
pii: lqad056
pmc: PMC10227442
doi:
Types de publication
Journal Article
Langues
eng
Pagination
lqad056Informations de copyright
© The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
Références
Nat Commun. 2019 Mar 4;10(1):1041
pubmed: 30833567
Genome Biol. 2016 Jun 06;17(1):122
pubmed: 27268795
Nat Genet. 2022 Nov;54(11):1664-1674
pubmed: 35927489
Genome Biol. 2016 Aug 24;17(1):178
pubmed: 27557938
Nature. 2020 Feb;578(7793):94-101
pubmed: 32025018
Bioinformatics. 2005 Oct 15;21(20):3940-1
pubmed: 16096348
Front Genet. 2021 Nov 02;12:774846
pubmed: 34795698
Blood. 2020 Sep 17;136(12):1419-1432
pubmed: 32584970
BMC Med Genomics. 2019 May 16;12(1):63
pubmed: 31096972
Nature. 2015 Oct 22;526(7574):519-24
pubmed: 26200345
BMC Genomics. 2016 Nov 14;17(1):912
pubmed: 27842494
Leukemia. 2018 Mar;32(3):645-653
pubmed: 28924241
Cell Syst. 2018 Mar 28;6(3):271-281.e7
pubmed: 29596782
Sci Transl Med. 2018 Sep 5;10(457):
pubmed: 30185652
Nature. 2020 Feb;578(7793):112-121
pubmed: 32025012
Curr Protoc Bioinformatics. 2016 Dec 8;56:15.10.1-15.10.18
pubmed: 27930805
Nat Genet. 2018 Dec;50(12):1735-1743
pubmed: 30397337
Nat Biotechnol. 2018 Nov;36(10):983-987
pubmed: 30247488
Nature. 2016 May 02;534(7605):47-54
pubmed: 27135926
Bioinformatics. 2022 Jun 13;38(12):3181-3191
pubmed: 35512388
Nature. 2020 Feb;578(7793):82-93
pubmed: 32025007
Nature. 2021 Aug;596(7873):583-589
pubmed: 34265844
Nat Biotechnol. 2014 Nov;32(11):1106-12
pubmed: 25344728
Nature. 2010 Apr 15;464(7291):993-8
pubmed: 20393554
Bioinformatics. 2020 Jan 1;36(1):250-256
pubmed: 31165141
Bioinformatics. 2009 Nov 1;25(21):2865-71
pubmed: 19561018
Blood. 2016 Apr 28;127(17):2122-30
pubmed: 26837699
Nat Genet. 2013 Oct;45(10):1113-20
pubmed: 24071849
Cold Spring Harb Perspect Med. 2019 Sep 3;9(9):
pubmed: 30397020
Nat Methods. 2018 Aug;15(8):591-594
pubmed: 30013048
Nature. 2020 Feb;578(7793):102-111
pubmed: 32025015
Genome Biol. 2015 Sep 17;16:197
pubmed: 26381235
Molecules. 2020 Nov 12;25(22):
pubmed: 33198233