StringFix: an annotation-guided transcriptome assembler improves the recovery of amino acid sequences from RNA-Seq reads.

Transcriptome RNA-Seq Amino Acid Sequence Sequence Analysis, RNA / methods Software

Journal

Genes & genomics

ISSN: 2092-9293

Titre abrégé: Genes Genomics

Pays: Korea (South)

ID NLM: 101481027

Informations de publication

Date de publication:
Dec 2023

Historique:

received: 16 08 2023

accepted: 01 10 2023

medline: 28 11 2023

pubmed: 15 10 2023

entrez: 14 10 2023

Statut: ppublish

Résumé

Reconstruction of amino acid sequences from assembled transcriptome is of interest in personalized medicine, for example, to predict drug-target (or protein-protein) interaction considering individual's genomic variations. Most of the existing transcriptome assemblers, however, seems not well suited for this purpose. In this work, we present StringFix, an annotation guided transcriptome assembly and protein sequence reconstruction software tool that takes genome-aligned reads and the annotations associated to the reference genome as input. The tool 'fixes' the pre-annotated transcript sequence by taking small variations into account, finally to produce possible amino acid sequences that are likely to exist in the test tissue. The results show that, using outputs from existing reference-based assemblers as the input GTF-guide, StringFix could reconstruct amino acid sequences more precisely with higher sensitivity than direct generation using the recovered transcripts from all the assemblers we tested. By using StringFix with the existing reference-based assemblers, one can recover not only a novel transcripts and isoforms but also the possible amino acid sequence stemming from them.

Sections du résumé

BACKGROUND BACKGROUND

METHODS METHODS

In this work, we present StringFix, an annotation guided transcriptome assembly and protein sequence reconstruction software tool that takes genome-aligned reads and the annotations associated to the reference genome as input. The tool 'fixes' the pre-annotated transcript sequence by taking small variations into account, finally to produce possible amino acid sequences that are likely to exist in the test tissue.

RESULTS RESULTS

The results show that, using outputs from existing reference-based assemblers as the input GTF-guide, StringFix could reconstruct amino acid sequences more precisely with higher sensitivity than direct generation using the recovered transcripts from all the assemblers we tested.

CONCLUSION CONCLUSIONS

By using StringFix with the existing reference-based assemblers, one can recover not only a novel transcripts and isoforms but also the possible amino acid sequence stemming from them.

Identifiants

DOI: 10.1007/s13258-023-01458-7 PMID: 37837515

pubmed: 37837515

doi: 10.1007/s13258-023-01458-7

pii: 10.1007/s13258-023-01458-7

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

1599-1609

Subventions

Organisme : Dankook University

ID : Research Fund in 2022

Informations de copyright

Références

Adam G et al (2020) Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precis Oncol 4:19

doi: 10.1038/s41698-020-0122-1 pubmed: 32566759 pmcid: 7296033

Ahmadi Moughari F, Eslahchi C (2021) A computational method for drug sensitivity prediction of cancer cell lines based on various molecular information. PLoS ONE 16(4):e0250620

doi: 10.1371/journal.pone.0250620 pubmed: 33914775 pmcid: 8084246

Alser M et al (2021) Technology dictates algorithms: recent developments in read alignment. Genome Biol 22(1):249

doi: 10.1186/s13059-021-02443-7 pubmed: 34446078 pmcid: 8390189

Bhatti H et al (2021) Recent advances in biological nanopores for nanopore sequencing, sensing and comparison of functional variations in MspA mutants. RSC Adv 11(46):28996–29014

doi: 10.1039/D1RA02364K pubmed: 35478559 pmcid: 9038099

Camacho C et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421

doi: 10.1186/1471-2105-10-421 pubmed: 20003500 pmcid: 2803857

Chang Z et al (2015) Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol 16(1):30

doi: 10.1186/s13059-015-0596-2 pubmed: 25723335 pmcid: 4342890

Chin CS et al (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10(6):563–569

doi: 10.1038/nmeth.2474 pubmed: 23644548

Danecek P et al (2021) Twelve years of SAMtools and BCFtools. Gigascience, 10(2)

Dobin A et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21

doi: 10.1093/bioinformatics/bts635 pubmed: 23104886

Emdadi A, Eslahchi C (2020) DSPLMF: a method for Cancer Drug Sensitivity Prediction using a Novel Regularization Approach in Logistic Matrix Factorization. Front Genet 11:75

doi: 10.3389/fgene.2020.00075 pubmed: 32174963 pmcid: 7056895

Feng J, Li W, Jiang T (2011) Inference of isoforms from short sequence reads. J Comput Biol 18(3):305–321

doi: 10.1089/cmb.2010.0243 pubmed: 21385036 pmcid: 3123862

Firtina C et al (2020) Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm. Bioinformatics 36(12):3669–3679

doi: 10.1093/bioinformatics/btaa179 pubmed: 32167530

Fu Y et al (2021) Vulcan: improved long-read mapping and structural variant calling via dual-mode alignment. Gigascience, 10(9)

Gatter T, Stadler PF (2019) Ryuto: network-flow based transcriptome reconstruction. BMC Bioinformatics 20(1):190

doi: 10.1186/s12859-019-2786-5 pubmed: 30991937 pmcid: 6469118

Grabherr MG et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29(7):644–652

doi: 10.1038/nbt.1883 pubmed: 21572440 pmcid: 3571712

Griebel T et al (2012) Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res 40(20):10073–10083

doi: 10.1093/nar/gks666 pubmed: 22962361 pmcid: 3488205

Guttman M et al (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28(5):503–510

doi: 10.1038/nbt.1633 pubmed: 20436462 pmcid: 2868100

Koren S et al (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27(5):722–736

doi: 10.1101/gr.215087.116 pubmed: 28298431 pmcid: 5411767

Li W, Feng J, Jiang T (2011) IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly. J Comput Biol 18(11):1693–1707

doi: 10.1089/cmb.2011.0171 pubmed: 21951053 pmcid: 3216102

Liu R, Dickerson J (2017) Strawberry: fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq. PLoS Comput Biol 13(11):e1005851

doi: 10.1371/journal.pcbi.1005851 pubmed: 29176847 pmcid: 5720828

Liu J et al (2016a) TransComb: genome-guided transcriptome assembly via combing junctions in splicing graphs. Genome Biol 17(1):213

doi: 10.1186/s13059-016-1074-1 pubmed: 27760567 pmcid: 5069867

Liu J et al (2016b) BinPacker: packing-based De Novo Transcriptome Assembly from RNA-seq data. PLoS Comput Biol 12(2):e1004772

doi: 10.1371/journal.pcbi.1004772 pubmed: 26894997 pmcid: 4760927

Loman NJ, Quick J, Simpson JT (2015) A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 12(8):733–735

doi: 10.1038/nmeth.3444 pubmed: 26076426

Maitra RD, Kim J, Dunbar WB (2012) Recent advances in nanopore sequencing. Electrophoresis 33(23):3418–3428

doi: 10.1002/elps.201200272 pubmed: 23138639 pmcid: 3804109

Mao S et al (2020) RefShannon: a genome-guided transcriptome assembler using sparse flow decomposition. PLoS ONE 15(6):e0232946

doi: 10.1371/journal.pone.0232946 pubmed: 32484809 pmcid: 7266320

Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12(10):671–682

doi: 10.1038/nrg3068 pubmed: 21897427

Mir K et al (2012) Predicting statistical properties of open reading frames in bacterial genomes. PLoS ONE 7(9):e45103

doi: 10.1371/journal.pone.0045103 pubmed: 23028785 pmcid: 3454372

Peng Y et al (2013) IDBA-tran: a more robust de novo de bruijn graph assembler for transcriptomes with uneven expression levels. Bioinformatics 29(13):i326–i334

doi: 10.1093/bioinformatics/btt219 pubmed: 23813001 pmcid: 3694675

Pertea G, Pertea M (2020) GFF Utilities: GffRead and GffCompare F1000Res, 9

Pertea M et al (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33(3):290–295

doi: 10.1038/nbt.3122 pubmed: 25690850 pmcid: 4643835

Robertson G et al (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7(11):909–912

doi: 10.1038/nmeth.1517 pubmed: 20935650

Sachdev K, Gupta MK (2019) A comprehensive review of feature based methods for drug target interaction prediction. J Biomed Inform 93:103159

doi: 10.1016/j.jbi.2019.103159 pubmed: 30926470

Schulz MH et al (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28(8):1086–1092

doi: 10.1093/bioinformatics/bts094 pubmed: 22368243 pmcid: 3324515

Song L, Sabunciyan S, Florea L (2016) CLASS2: accurate and efficient splice variant annotation from RNA-seq reads. Nucleic Acids Res 44(10):e98

doi: 10.1093/nar/gkw158 pubmed: 26975657 pmcid: 4889935

Stransky N et al (2015) Pharmacogenomic agreement between two cancer cell line data sets. Nature 528(7580):84–

doi: 10.1038/nature15736 pmcid: 6343827

Trapnell C et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515

doi: 10.1038/nbt.1621 pubmed: 20436464 pmcid: 3146043

Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a Revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63

doi: 10.1038/nrg2484 pubmed: 19015660 pmcid: 2949280

Wang L et al (2020) Incorporating chemical sub-structures and protein evolutionary information for inferring drug-target interactions. Sci Rep 10(1):6641

doi: 10.1038/s41598-020-62891-2 pubmed: 32313024 pmcid: 7171114

Wei D et al (2019) Comprehensive anticancer drug response prediction based on a simple cell line-drug complex network model. BMC Bioinformatics 20(1):44

doi: 10.1186/s12859-019-2608-9 pubmed: 30670007 pmcid: 6341656

Xie Y et al (2014) SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30(12):1660–1666

doi: 10.1093/bioinformatics/btu077 pubmed: 24532719

Yoon S et al (2018) TraRECo: a greedy approach based de novo transcriptome assembler with read error correction using consensus matrix. BMC Genomics 19(1):653

doi: 10.1186/s12864-018-5034-x pubmed: 30180798 pmcid: 6123912

StringFix: an annotation-guided transcriptome assembler improves the recovery of amino acid sequences from RNA-Seq reads.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Informations de copyright

Références

Auteurs

Joongho Lee (J)

Minsoo Kim (M)

Kyudong Han (K)

Seokhyun Yoon (S)

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Decoding the genomic terrain: functional insights into 14 chemosensory proteins in whitefly Bemisia tabaci Asia II-1.

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

A cytoplasmic osmosensing mechanism mediated by molecular crowding-sensitive DCP5.

Classifications MeSH