StringFix: an annotation-guided transcriptome assembler improves the recovery of amino acid sequences from RNA-Seq reads.


Journal

Genes & genomics
ISSN: 2092-9293
Titre abrégé: Genes Genomics
Pays: Korea (South)
ID NLM: 101481027

Informations de publication

Date de publication:
Dec 2023
Historique:
received: 16 08 2023
accepted: 01 10 2023
medline: 28 11 2023
pubmed: 15 10 2023
entrez: 14 10 2023
Statut: ppublish

Résumé

Reconstruction of amino acid sequences from assembled transcriptome is of interest in personalized medicine, for example, to predict drug-target (or protein-protein) interaction considering individual's genomic variations. Most of the existing transcriptome assemblers, however, seems not well suited for this purpose. In this work, we present StringFix, an annotation guided transcriptome assembly and protein sequence reconstruction software tool that takes genome-aligned reads and the annotations associated to the reference genome as input. The tool 'fixes' the pre-annotated transcript sequence by taking small variations into account, finally to produce possible amino acid sequences that are likely to exist in the test tissue. The results show that, using outputs from existing reference-based assemblers as the input GTF-guide, StringFix could reconstruct amino acid sequences more precisely with higher sensitivity than direct generation using the recovered transcripts from all the assemblers we tested. By using StringFix with the existing reference-based assemblers, one can recover not only a novel transcripts and isoforms but also the possible amino acid sequence stemming from them.

Sections du résumé

BACKGROUND BACKGROUND
Reconstruction of amino acid sequences from assembled transcriptome is of interest in personalized medicine, for example, to predict drug-target (or protein-protein) interaction considering individual's genomic variations. Most of the existing transcriptome assemblers, however, seems not well suited for this purpose.
METHODS METHODS
In this work, we present StringFix, an annotation guided transcriptome assembly and protein sequence reconstruction software tool that takes genome-aligned reads and the annotations associated to the reference genome as input. The tool 'fixes' the pre-annotated transcript sequence by taking small variations into account, finally to produce possible amino acid sequences that are likely to exist in the test tissue.
RESULTS RESULTS
The results show that, using outputs from existing reference-based assemblers as the input GTF-guide, StringFix could reconstruct amino acid sequences more precisely with higher sensitivity than direct generation using the recovered transcripts from all the assemblers we tested.
CONCLUSION CONCLUSIONS
By using StringFix with the existing reference-based assemblers, one can recover not only a novel transcripts and isoforms but also the possible amino acid sequence stemming from them.

Identifiants

pubmed: 37837515
doi: 10.1007/s13258-023-01458-7
pii: 10.1007/s13258-023-01458-7
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

1599-1609

Subventions

Organisme : Dankook University
ID : Research Fund in 2022

Informations de copyright

© 2023. The Author(s) under exclusive licence to The Genetics Society of Korea.

Références

Adam G et al (2020) Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precis Oncol 4:19
doi: 10.1038/s41698-020-0122-1 pubmed: 32566759 pmcid: 7296033
Ahmadi Moughari F, Eslahchi C (2021) A computational method for drug sensitivity prediction of cancer cell lines based on various molecular information. PLoS ONE 16(4):e0250620
doi: 10.1371/journal.pone.0250620 pubmed: 33914775 pmcid: 8084246
Alser M et al (2021) Technology dictates algorithms: recent developments in read alignment. Genome Biol 22(1):249
doi: 10.1186/s13059-021-02443-7 pubmed: 34446078 pmcid: 8390189
Bhatti H et al (2021) Recent advances in biological nanopores for nanopore sequencing, sensing and comparison of functional variations in MspA mutants. RSC Adv 11(46):28996–29014
doi: 10.1039/D1RA02364K pubmed: 35478559 pmcid: 9038099
Camacho C et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421
doi: 10.1186/1471-2105-10-421 pubmed: 20003500 pmcid: 2803857
Chang Z et al (2015) Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol 16(1):30
doi: 10.1186/s13059-015-0596-2 pubmed: 25723335 pmcid: 4342890
Chin CS et al (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10(6):563–569
doi: 10.1038/nmeth.2474 pubmed: 23644548
Danecek P et al (2021) Twelve years of SAMtools and BCFtools. Gigascience, 10(2)
Dobin A et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21
doi: 10.1093/bioinformatics/bts635 pubmed: 23104886
Emdadi A, Eslahchi C (2020) DSPLMF: a method for Cancer Drug Sensitivity Prediction using a Novel Regularization Approach in Logistic Matrix Factorization. Front Genet 11:75
doi: 10.3389/fgene.2020.00075 pubmed: 32174963 pmcid: 7056895
Feng J, Li W, Jiang T (2011) Inference of isoforms from short sequence reads. J Comput Biol 18(3):305–321
doi: 10.1089/cmb.2010.0243 pubmed: 21385036 pmcid: 3123862
Firtina C et al (2020) Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm. Bioinformatics 36(12):3669–3679
doi: 10.1093/bioinformatics/btaa179 pubmed: 32167530
Fu Y et al (2021) Vulcan: improved long-read mapping and structural variant calling via dual-mode alignment. Gigascience, 10(9)
Gatter T, Stadler PF (2019) Ryuto: network-flow based transcriptome reconstruction. BMC Bioinformatics 20(1):190
doi: 10.1186/s12859-019-2786-5 pubmed: 30991937 pmcid: 6469118
Grabherr MG et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29(7):644–652
doi: 10.1038/nbt.1883 pubmed: 21572440 pmcid: 3571712
Griebel T et al (2012) Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res 40(20):10073–10083
doi: 10.1093/nar/gks666 pubmed: 22962361 pmcid: 3488205
Guttman M et al (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28(5):503–510
doi: 10.1038/nbt.1633 pubmed: 20436462 pmcid: 2868100
Koren S et al (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27(5):722–736
doi: 10.1101/gr.215087.116 pubmed: 28298431 pmcid: 5411767
Li W, Feng J, Jiang T (2011) IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly. J Comput Biol 18(11):1693–1707
doi: 10.1089/cmb.2011.0171 pubmed: 21951053 pmcid: 3216102
Liu R, Dickerson J (2017) Strawberry: fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq. PLoS Comput Biol 13(11):e1005851
doi: 10.1371/journal.pcbi.1005851 pubmed: 29176847 pmcid: 5720828
Liu J et al (2016a) TransComb: genome-guided transcriptome assembly via combing junctions in splicing graphs. Genome Biol 17(1):213
doi: 10.1186/s13059-016-1074-1 pubmed: 27760567 pmcid: 5069867
Liu J et al (2016b) BinPacker: packing-based De Novo Transcriptome Assembly from RNA-seq data. PLoS Comput Biol 12(2):e1004772
doi: 10.1371/journal.pcbi.1004772 pubmed: 26894997 pmcid: 4760927
Loman NJ, Quick J, Simpson JT (2015) A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 12(8):733–735
doi: 10.1038/nmeth.3444 pubmed: 26076426
Maitra RD, Kim J, Dunbar WB (2012) Recent advances in nanopore sequencing. Electrophoresis 33(23):3418–3428
doi: 10.1002/elps.201200272 pubmed: 23138639 pmcid: 3804109
Mao S et al (2020) RefShannon: a genome-guided transcriptome assembler using sparse flow decomposition. PLoS ONE 15(6):e0232946
doi: 10.1371/journal.pone.0232946 pubmed: 32484809 pmcid: 7266320
Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12(10):671–682
doi: 10.1038/nrg3068 pubmed: 21897427
Mir K et al (2012) Predicting statistical properties of open reading frames in bacterial genomes. PLoS ONE 7(9):e45103
doi: 10.1371/journal.pone.0045103 pubmed: 23028785 pmcid: 3454372
Peng Y et al (2013) IDBA-tran: a more robust de novo de bruijn graph assembler for transcriptomes with uneven expression levels. Bioinformatics 29(13):i326–i334
doi: 10.1093/bioinformatics/btt219 pubmed: 23813001 pmcid: 3694675
Pertea G, Pertea M (2020) GFF Utilities: GffRead and GffCompare F1000Res, 9
Pertea M et al (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33(3):290–295
doi: 10.1038/nbt.3122 pubmed: 25690850 pmcid: 4643835
Robertson G et al (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7(11):909–912
doi: 10.1038/nmeth.1517 pubmed: 20935650
Sachdev K, Gupta MK (2019) A comprehensive review of feature based methods for drug target interaction prediction. J Biomed Inform 93:103159
doi: 10.1016/j.jbi.2019.103159 pubmed: 30926470
Schulz MH et al (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28(8):1086–1092
doi: 10.1093/bioinformatics/bts094 pubmed: 22368243 pmcid: 3324515
Song L, Sabunciyan S, Florea L (2016) CLASS2: accurate and efficient splice variant annotation from RNA-seq reads. Nucleic Acids Res 44(10):e98
doi: 10.1093/nar/gkw158 pubmed: 26975657 pmcid: 4889935
Stransky N et al (2015) Pharmacogenomic agreement between two cancer cell line data sets. Nature 528(7580):84–
doi: 10.1038/nature15736 pmcid: 6343827
Trapnell C et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515
doi: 10.1038/nbt.1621 pubmed: 20436464 pmcid: 3146043
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a Revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63
doi: 10.1038/nrg2484 pubmed: 19015660 pmcid: 2949280
Wang L et al (2020) Incorporating chemical sub-structures and protein evolutionary information for inferring drug-target interactions. Sci Rep 10(1):6641
doi: 10.1038/s41598-020-62891-2 pubmed: 32313024 pmcid: 7171114
Wei D et al (2019) Comprehensive anticancer drug response prediction based on a simple cell line-drug complex network model. BMC Bioinformatics 20(1):44
doi: 10.1186/s12859-019-2608-9 pubmed: 30670007 pmcid: 6341656
Xie Y et al (2014) SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30(12):1660–1666
doi: 10.1093/bioinformatics/btu077 pubmed: 24532719
Yoon S et al (2018) TraRECo: a greedy approach based de novo transcriptome assembler with read error correction using consensus matrix. BMC Genomics 19(1):653
doi: 10.1186/s12864-018-5034-x pubmed: 30180798 pmcid: 6123912

Auteurs

Joongho Lee (J)

Dept. of Computer Science, College of SW Convergence, Dankook Univ, Yongin-si, 16890, Korea.

Minsoo Kim (M)

Dept. of Computer Science, College of SW Convergence, Dankook Univ, Yongin-si, 16890, Korea.

Kyudong Han (K)

Center for Bio-Medical Engineering Core Facility, Dankook Univ, Cheonan, 31116, Korea.
Dept. of Microbiology, College of Science & Technology, Dankook Univ, Cheonan, 31116, Korea.
HuNbiome Co., Ltd, R&D Center, Seoul, 08503, Korea.

Seokhyun Yoon (S)

Dept. of Electronics and Electrical Engineering, College of Engineering, Dankook Univ, Yongin-si, 16890, Korea. syoon@dku.edu.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Animals Hemiptera Insect Proteins Phylogeny Insecticides

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Arabidopsis Arabidopsis Proteins Osmotic Pressure Cytoplasm RNA, Messenger

Classifications MeSH