StringFix: an annotation-guided transcriptome assembler improves the recovery of amino acid sequences from RNA-Seq reads.
Journal
Genes & genomics
ISSN: 2092-9293
Titre abrégé: Genes Genomics
Pays: Korea (South)
ID NLM: 101481027
Informations de publication
Date de publication:
Dec 2023
Dec 2023
Historique:
received:
16
08
2023
accepted:
01
10
2023
medline:
28
11
2023
pubmed:
15
10
2023
entrez:
14
10
2023
Statut:
ppublish
Résumé
Reconstruction of amino acid sequences from assembled transcriptome is of interest in personalized medicine, for example, to predict drug-target (or protein-protein) interaction considering individual's genomic variations. Most of the existing transcriptome assemblers, however, seems not well suited for this purpose. In this work, we present StringFix, an annotation guided transcriptome assembly and protein sequence reconstruction software tool that takes genome-aligned reads and the annotations associated to the reference genome as input. The tool 'fixes' the pre-annotated transcript sequence by taking small variations into account, finally to produce possible amino acid sequences that are likely to exist in the test tissue. The results show that, using outputs from existing reference-based assemblers as the input GTF-guide, StringFix could reconstruct amino acid sequences more precisely with higher sensitivity than direct generation using the recovered transcripts from all the assemblers we tested. By using StringFix with the existing reference-based assemblers, one can recover not only a novel transcripts and isoforms but also the possible amino acid sequence stemming from them.
Sections du résumé
BACKGROUND
BACKGROUND
Reconstruction of amino acid sequences from assembled transcriptome is of interest in personalized medicine, for example, to predict drug-target (or protein-protein) interaction considering individual's genomic variations. Most of the existing transcriptome assemblers, however, seems not well suited for this purpose.
METHODS
METHODS
In this work, we present StringFix, an annotation guided transcriptome assembly and protein sequence reconstruction software tool that takes genome-aligned reads and the annotations associated to the reference genome as input. The tool 'fixes' the pre-annotated transcript sequence by taking small variations into account, finally to produce possible amino acid sequences that are likely to exist in the test tissue.
RESULTS
RESULTS
The results show that, using outputs from existing reference-based assemblers as the input GTF-guide, StringFix could reconstruct amino acid sequences more precisely with higher sensitivity than direct generation using the recovered transcripts from all the assemblers we tested.
CONCLUSION
CONCLUSIONS
By using StringFix with the existing reference-based assemblers, one can recover not only a novel transcripts and isoforms but also the possible amino acid sequence stemming from them.
Identifiants
pubmed: 37837515
doi: 10.1007/s13258-023-01458-7
pii: 10.1007/s13258-023-01458-7
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
1599-1609Subventions
Organisme : Dankook University
ID : Research Fund in 2022
Informations de copyright
© 2023. The Author(s) under exclusive licence to The Genetics Society of Korea.
Références
Adam G et al (2020) Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precis Oncol 4:19
doi: 10.1038/s41698-020-0122-1
pubmed: 32566759
pmcid: 7296033
Ahmadi Moughari F, Eslahchi C (2021) A computational method for drug sensitivity prediction of cancer cell lines based on various molecular information. PLoS ONE 16(4):e0250620
doi: 10.1371/journal.pone.0250620
pubmed: 33914775
pmcid: 8084246
Alser M et al (2021) Technology dictates algorithms: recent developments in read alignment. Genome Biol 22(1):249
doi: 10.1186/s13059-021-02443-7
pubmed: 34446078
pmcid: 8390189
Bhatti H et al (2021) Recent advances in biological nanopores for nanopore sequencing, sensing and comparison of functional variations in MspA mutants. RSC Adv 11(46):28996–29014
doi: 10.1039/D1RA02364K
pubmed: 35478559
pmcid: 9038099
Camacho C et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421
doi: 10.1186/1471-2105-10-421
pubmed: 20003500
pmcid: 2803857
Chang Z et al (2015) Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol 16(1):30
doi: 10.1186/s13059-015-0596-2
pubmed: 25723335
pmcid: 4342890
Chin CS et al (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10(6):563–569
doi: 10.1038/nmeth.2474
pubmed: 23644548
Danecek P et al (2021) Twelve years of SAMtools and BCFtools. Gigascience, 10(2)
Dobin A et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21
doi: 10.1093/bioinformatics/bts635
pubmed: 23104886
Emdadi A, Eslahchi C (2020) DSPLMF: a method for Cancer Drug Sensitivity Prediction using a Novel Regularization Approach in Logistic Matrix Factorization. Front Genet 11:75
doi: 10.3389/fgene.2020.00075
pubmed: 32174963
pmcid: 7056895
Feng J, Li W, Jiang T (2011) Inference of isoforms from short sequence reads. J Comput Biol 18(3):305–321
doi: 10.1089/cmb.2010.0243
pubmed: 21385036
pmcid: 3123862
Firtina C et al (2020) Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm. Bioinformatics 36(12):3669–3679
doi: 10.1093/bioinformatics/btaa179
pubmed: 32167530
Fu Y et al (2021) Vulcan: improved long-read mapping and structural variant calling via dual-mode alignment. Gigascience, 10(9)
Gatter T, Stadler PF (2019) Ryuto: network-flow based transcriptome reconstruction. BMC Bioinformatics 20(1):190
doi: 10.1186/s12859-019-2786-5
pubmed: 30991937
pmcid: 6469118
Grabherr MG et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29(7):644–652
doi: 10.1038/nbt.1883
pubmed: 21572440
pmcid: 3571712
Griebel T et al (2012) Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res 40(20):10073–10083
doi: 10.1093/nar/gks666
pubmed: 22962361
pmcid: 3488205
Guttman M et al (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28(5):503–510
doi: 10.1038/nbt.1633
pubmed: 20436462
pmcid: 2868100
Koren S et al (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27(5):722–736
doi: 10.1101/gr.215087.116
pubmed: 28298431
pmcid: 5411767
Li W, Feng J, Jiang T (2011) IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly. J Comput Biol 18(11):1693–1707
doi: 10.1089/cmb.2011.0171
pubmed: 21951053
pmcid: 3216102
Liu R, Dickerson J (2017) Strawberry: fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq. PLoS Comput Biol 13(11):e1005851
doi: 10.1371/journal.pcbi.1005851
pubmed: 29176847
pmcid: 5720828
Liu J et al (2016a) TransComb: genome-guided transcriptome assembly via combing junctions in splicing graphs. Genome Biol 17(1):213
doi: 10.1186/s13059-016-1074-1
pubmed: 27760567
pmcid: 5069867
Liu J et al (2016b) BinPacker: packing-based De Novo Transcriptome Assembly from RNA-seq data. PLoS Comput Biol 12(2):e1004772
doi: 10.1371/journal.pcbi.1004772
pubmed: 26894997
pmcid: 4760927
Loman NJ, Quick J, Simpson JT (2015) A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 12(8):733–735
doi: 10.1038/nmeth.3444
pubmed: 26076426
Maitra RD, Kim J, Dunbar WB (2012) Recent advances in nanopore sequencing. Electrophoresis 33(23):3418–3428
doi: 10.1002/elps.201200272
pubmed: 23138639
pmcid: 3804109
Mao S et al (2020) RefShannon: a genome-guided transcriptome assembler using sparse flow decomposition. PLoS ONE 15(6):e0232946
doi: 10.1371/journal.pone.0232946
pubmed: 32484809
pmcid: 7266320
Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12(10):671–682
doi: 10.1038/nrg3068
pubmed: 21897427
Mir K et al (2012) Predicting statistical properties of open reading frames in bacterial genomes. PLoS ONE 7(9):e45103
doi: 10.1371/journal.pone.0045103
pubmed: 23028785
pmcid: 3454372
Peng Y et al (2013) IDBA-tran: a more robust de novo de bruijn graph assembler for transcriptomes with uneven expression levels. Bioinformatics 29(13):i326–i334
doi: 10.1093/bioinformatics/btt219
pubmed: 23813001
pmcid: 3694675
Pertea G, Pertea M (2020) GFF Utilities: GffRead and GffCompare F1000Res, 9
Pertea M et al (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33(3):290–295
doi: 10.1038/nbt.3122
pubmed: 25690850
pmcid: 4643835
Robertson G et al (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7(11):909–912
doi: 10.1038/nmeth.1517
pubmed: 20935650
Sachdev K, Gupta MK (2019) A comprehensive review of feature based methods for drug target interaction prediction. J Biomed Inform 93:103159
doi: 10.1016/j.jbi.2019.103159
pubmed: 30926470
Schulz MH et al (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28(8):1086–1092
doi: 10.1093/bioinformatics/bts094
pubmed: 22368243
pmcid: 3324515
Song L, Sabunciyan S, Florea L (2016) CLASS2: accurate and efficient splice variant annotation from RNA-seq reads. Nucleic Acids Res 44(10):e98
doi: 10.1093/nar/gkw158
pubmed: 26975657
pmcid: 4889935
Stransky N et al (2015) Pharmacogenomic agreement between two cancer cell line data sets. Nature 528(7580):84–
doi: 10.1038/nature15736
pmcid: 6343827
Trapnell C et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515
doi: 10.1038/nbt.1621
pubmed: 20436464
pmcid: 3146043
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a Revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63
doi: 10.1038/nrg2484
pubmed: 19015660
pmcid: 2949280
Wang L et al (2020) Incorporating chemical sub-structures and protein evolutionary information for inferring drug-target interactions. Sci Rep 10(1):6641
doi: 10.1038/s41598-020-62891-2
pubmed: 32313024
pmcid: 7171114
Wei D et al (2019) Comprehensive anticancer drug response prediction based on a simple cell line-drug complex network model. BMC Bioinformatics 20(1):44
doi: 10.1186/s12859-019-2608-9
pubmed: 30670007
pmcid: 6341656
Xie Y et al (2014) SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30(12):1660–1666
doi: 10.1093/bioinformatics/btu077
pubmed: 24532719
Yoon S et al (2018) TraRECo: a greedy approach based de novo transcriptome assembler with read error correction using consensus matrix. BMC Genomics 19(1):653
doi: 10.1186/s12864-018-5034-x
pubmed: 30180798
pmcid: 6123912