CAARS: comparative assembly and annotation of RNA-Seq data.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
01 07 2019
01 07 2019
Historique:
received:
28
06
2017
revised:
13
09
2018
accepted:
16
11
2018
pubmed:
20
11
2018
medline:
12
6
2020
entrez:
20
11
2018
Statut:
ppublish
Résumé
RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction. We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses. CAARS is implemented in Python and Ocaml and is freely available at https://github.com/carinerey/caars. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 30452539
pii: 5191702
doi: 10.1093/bioinformatics/bty903
pmc: PMC6596894
doi:
Substances chimiques
RNA
63231-63-0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
2199-2207Informations de copyright
© The Author(s) 2018. Published by Oxford University Press.
Références
Database (Oxford). 2016 Feb 20;2016:
pubmed: 26896847
Mol Ecol. 2014 Jun;23(11):2699-711
pubmed: 24754676
Bioinformatics. 2017 Sep 1;33(17):2789
pubmed: 28903539
Syst Biol. 2015 Nov;64(6):969-82
pubmed: 26130236
BMC Evol Biol. 2007 Nov 30;7:241
pubmed: 18053139
BMC Genomics. 2016 Jan 14;17:54
pubmed: 26763976
Genome Biol. 2016 Jan 26;17:13
pubmed: 26813401
BMC Evol Biol. 2012 Jun 14;12:88
pubmed: 22697210
Mol Biol Evol. 2014 Nov;31(11):3081-92
pubmed: 25158799
Bioinformatics. 2012 Dec 1;28(23):3150-2
pubmed: 23060610
Nucleic Acids Res. 2016 Jan 4;44(D1):D710-6
pubmed: 26687719
Nat Methods. 2011 Jun;8(6):469-77
pubmed: 21623353
Trends Genet. 2008 Nov;24(11):539-51
pubmed: 18819722
Mol Biol Evol. 2015 Apr;32(4):835-45
pubmed: 25739733
BMC Bioinformatics. 2015 Mar 25;16:98
pubmed: 25887972
Bioinformatics. 2013 May 15;29(10):1250-9
pubmed: 23493323
PLoS Comput Biol. 2009 Jan;5(1):e1000262
pubmed: 19148271
Science. 2015 Jan 23;347(6220):1260419
pubmed: 25613900
Mol Cell Proteomics. 2014 Feb;13(2):397-406
pubmed: 24309898
PLoS One. 2007 Apr 18;2(4):e383
pubmed: 17440619
Nucleic Acids Res. 2002 Jul 15;30(14):3059-66
pubmed: 12136088
Mol Biol Evol. 2016 Sep;33(9):2391-5
pubmed: 27297470
Nat Biotechnol. 2010 May;28(5):511-5
pubmed: 20436464
Mol Ecol Resour. 2014 Mar;14(2):381-92
pubmed: 24119300
BMC Bioinformatics. 2013 Nov 19;14:330
pubmed: 24252138
Nat Biotechnol. 2011 May 15;29(7):644-52
pubmed: 21572440
Brief Bioinform. 2011 Sep;12(5):379-91
pubmed: 21690100
Mol Ecol. 2013 Feb;22(3):620-34
pubmed: 22998089
Nucleic Acids Res. 2014 Jan;42(Database issue):D897-902
pubmed: 24275491
Nat Biotechnol. 2016 May;34(5):525-7
pubmed: 27043002
Mol Ecol Resour. 2016 Mar;16(2):446-58
pubmed: 26358618
PLoS One. 2017 Sep 20;12(9):e0185020
pubmed: 28931057
BMC Genomics. 2016 May 24;17:392
pubmed: 27220689
Genome Biol Evol. 2016 Aug 03;8(7):2155-63
pubmed: 27324918
Mol Ecol. 2016 Mar;25(6):1224-41
pubmed: 26756714
Brief Bioinform. 2017 May 1;18(3):530-536
pubmed: 27013646
Mol Ecol. 2016 Apr;25(7):1478-93
pubmed: 26859844
Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30
pubmed: 24288371
Ecol Lett. 2015 May;18(5):441-50
pubmed: 25808114
Nat Rev Genet. 2009 Jan;10(1):57-63
pubmed: 19015660
PLoS Biol. 2009 May 5;7(5):e1000112
pubmed: 19468303
Bioinformatics. 2009 May 1;25(9):1105-11
pubmed: 19289445
Nat Rev Genet. 2011 Feb;12(2):87-98
pubmed: 21191423
Mol Phylogenet Evol. 2013 Jan;66(1):417-22
pubmed: 23000819
Genome Res. 1999 Sep;9(9):868-77
pubmed: 10508846
Genomics Insights. 2016 Feb 25;9:17-28
pubmed: 26966373
Genome Res. 2013 Feb;23(2):323-30
pubmed: 23132911
Proc Natl Acad Sci U S A. 1998 May 26;95(11):6239-44
pubmed: 9600949
BMC Bioinformatics. 2009 Jun 16;10 Suppl 6:S3
pubmed: 19534752
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500