Fcirc: A comprehensive pipeline for the exploration of fusion linear and circular RNAs.


Journal

GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872

Informations de publication

Date de publication:
01 06 2020
Historique:
received: 06 11 2019
revised: 01 03 2020
accepted: 29 04 2020
entrez: 30 5 2020
pubmed: 30 5 2020
medline: 5 10 2021
Statut: ppublish

Résumé

In cancer cells, fusion genes can produce linear and chimeric fusion-circular RNAs (f-circRNAs), which are functional in gene expression regulation and implicated in malignant transformation, cancer progression, and therapeutic resistance. For specific cancers, proteins encoded by fusion transcripts have been identified as innovative therapeutic targets (e.g., EML4-ALK). Even though RNA sequencing (RNA-Seq) technologies combined with existing bioinformatics approaches have enabled researchers to systematically identify fusion transcripts, specifically detecting f-circRNAs in cells remains challenging owing to their general sparsity and low abundance in cancer cells but also owing to imperfect computational methods. We developed the Python-based workflow "Fcirc" to identify fusion linear and f-circRNAs from RNA-Seq data with high specificity. We applied Fcirc to 3 different types of RNA-Seq data scenarios: (i) actual synthetic spike-in RNA-Seq data, (ii) simulated RNA-Seq data, and (iii) actual cancer cell-derived RNA-Seq data. Fcirc showed significant advantages over existing methods regarding both detection accuracy (i.e., precision, recall, F-measure) and computing performance (i.e., lower runtimes). Fcirc is a powerful and comprehensive Python-based pipeline to identify linear and circular RNA transcripts from known fusion events in RNA-Seq datasets with higher accuracy and shorter computing times compared with previously published algorithms. Fcirc empowers the research community to study the biology of fusion RNAs in cancer more effectively.

Sections du résumé

BACKGROUND
In cancer cells, fusion genes can produce linear and chimeric fusion-circular RNAs (f-circRNAs), which are functional in gene expression regulation and implicated in malignant transformation, cancer progression, and therapeutic resistance. For specific cancers, proteins encoded by fusion transcripts have been identified as innovative therapeutic targets (e.g., EML4-ALK). Even though RNA sequencing (RNA-Seq) technologies combined with existing bioinformatics approaches have enabled researchers to systematically identify fusion transcripts, specifically detecting f-circRNAs in cells remains challenging owing to their general sparsity and low abundance in cancer cells but also owing to imperfect computational methods.
RESULTS
We developed the Python-based workflow "Fcirc" to identify fusion linear and f-circRNAs from RNA-Seq data with high specificity. We applied Fcirc to 3 different types of RNA-Seq data scenarios: (i) actual synthetic spike-in RNA-Seq data, (ii) simulated RNA-Seq data, and (iii) actual cancer cell-derived RNA-Seq data. Fcirc showed significant advantages over existing methods regarding both detection accuracy (i.e., precision, recall, F-measure) and computing performance (i.e., lower runtimes).
CONCLUSION
Fcirc is a powerful and comprehensive Python-based pipeline to identify linear and circular RNA transcripts from known fusion events in RNA-Seq datasets with higher accuracy and shorter computing times compared with previously published algorithms. Fcirc empowers the research community to study the biology of fusion RNAs in cancer more effectively.

Identifiants

pubmed: 32470133
pii: 5848590
doi: 10.1093/gigascience/giaa054
pmc: PMC7259471
pii:
doi:

Substances chimiques

RNA, Circular 0
RNA 63231-63-0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press.

Références

Cell Res. 2018 Jun;28(6):693-695
pubmed: 29628502
Mol Cancer. 2018 Sep 20;17(1):138
pubmed: 30236141
Bioinformatics. 2011 Oct 15;27(20):2903-4
pubmed: 21840877
Genome Biol. 2015 Jun 16;16:126
pubmed: 26076956
Sci Rep. 2016 Feb 10;6:21597
pubmed: 26862001
PLoS One. 2012;7(2):e31229
pubmed: 22359579
Bioinformatics. 2012 Feb 15;28(4):593-4
pubmed: 22199392
BMC Genomics. 2014 Sep 30;15:824
pubmed: 25266161
Trends Genet. 2006 Apr;22(4):193-6
pubmed: 16499992
Cancer Discov. 2018 Jan;8(1):59-73
pubmed: 29054992
Nat Methods. 2015 Apr;12(4):357-60
pubmed: 25751142
Database (Oxford). 2015 Sep 16;2015:
pubmed: 26384373
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Nucleic Acids Res. 2016 May 19;44(9):e87
pubmed: 26873924
Diagn Pathol. 2015 Jul 28;10:131
pubmed: 26215638
Mol Cell. 1998 Aug;2(2):259-65
pubmed: 9734364
Gigascience. 2020 Jun 1;9(6):
pubmed: 32470133
Nature. 1984 Apr 12-18;308(5960):607-8
pubmed: 6709072
Cell. 2016 Apr 7;165(2):289-302
pubmed: 27040497
Nature. 2009 Apr 9;458(7239):719-24
pubmed: 19360079
Science. 2018 Aug 31;361(6405):848-849
pubmed: 30166475
Nucleic Acids Res. 2017 Jan 4;45(D1):D777-D783
pubmed: 27899578
BMC Genomics. 2007 Jan 26;8:33
pubmed: 17257420
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761
pubmed: 29155950
PLoS Comput Biol. 2017 Jun 8;13(6):e1005420
pubmed: 28594838
Genome Biol. 2015 Jan 13;16:4
pubmed: 25583365
N Engl J Med. 2013 Jun 20;368(25):2385-94
pubmed: 23724913
Nature. 2011 Feb 3;470(7332):46-7
pubmed: 21293366
Nucleic Acids Res. 2017 Jan 4;45(D1):D784-D789
pubmed: 27899563
Genome Med. 2015 May 11;7(1):43
pubmed: 26019724
Genome Biol. 2011 Aug 11;12(8):R72
pubmed: 21835007
Nucleic Acids Res. 2010 Oct;38(18):e178
pubmed: 20802226

Auteurs

Zhaoqing Cai (Z)

School of Life Sciences and Technology, Tongji University, 1239 Siping Road, Shanghai 200092, China.

Hongzhang Xue (H)

School of Life Sciences and Technology, Tongji University, 1239 Siping Road, Shanghai 200092, China.
School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China.

Yue Xu (Y)

School of Life Sciences and Technology, Tongji University, 1239 Siping Road, Shanghai 200092, China.

Jens Köhler (J)

Department of Medical Oncology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA 02215, USA.

Xiaojie Cheng (X)

School of Life Sciences and Technology, Tongji University, 1239 Siping Road, Shanghai 200092, China.

Yao Dai (Y)

School of Life Sciences and Technology, Tongji University, 1239 Siping Road, Shanghai 200092, China.

Jie Zheng (J)

School of Life Sciences and Technology, Tongji University, 1239 Siping Road, Shanghai 200092, China.

Haiyun Wang (H)

School of Life Sciences and Technology, Tongji University, 1239 Siping Road, Shanghai 200092, China.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Humans Male Female Health Knowledge, Attitudes, Practice Middle Aged

Classifications MeSH