Improvement of detection performance of fusion genes from RNA-seq data by clustering short reads.
RNA-seq
SlideSort
cancer
fusion gene
Journal
Journal of bioinformatics and computational biology
ISSN: 1757-6334
Titre abrégé: J Bioinform Comput Biol
Pays: Singapore
ID NLM: 101187344
Informations de publication
Date de publication:
06 2019
06 2019
Historique:
entrez:
11
7
2019
pubmed:
11
7
2019
medline:
16
7
2020
Statut:
ppublish
Résumé
Fusion genes are involved in cancer, and their detection using RNA-Seq is insufficient given the relatively short reading length. Therefore, we proposed a shifted short-read clustering (SSC) method, which focuses on overlapping reads from the same loci and extends them as a representative sequence. To verify their usefulness, we applied the SSC method to RNA-Seq data from four types of cell lines (BT-474, MCF-7, SKBR-3, and T-47D). As the slide width of the SSC method increased to one, two, five, or ten bases, the read length was extended from 201 bases to 217 (108%), 234 (116%), 282 (140%), or 317 (158%) bases, respectively. Furthermore, fusion genes were investigated using STAR-Fusion, a fusion gene detection tool, with and without the SSC method. When one base was shifted by the SSC method, the reads mapped to multiple loci decreased from 9.7% to 4.6%, and the sensitivity of the fusion gene was improved from 47% to 54% on average (BT-474: from 48% to 57%, MCF-7: 49% to 53%, SKBR-3: 50% to 57%, and T-47D: 43% to 50%) compared with original data. When the reads are shifted more, the positive predictive value was also improved. The SSC method could be an effective method for fusion gene detection.
Identifiants
pubmed: 31288642
doi: 10.1142/S0219720019400080
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM