A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics.

secapr de novo assembly loci extraction low-coverage whole genome sequencing target sequence capture

Journal

Molecular ecology
ISSN: 1365-294X
Titre abrégé: Mol Ecol
Pays: England
ID NLM: 9214478

Informations de publication

Date de publication:
12 2021
Historique:
revised: 24 09 2021
received: 30 11 2020
accepted: 16 10 2021
pubmed: 22 10 2021
medline: 29 1 2022
entrez: 21 10 2021
Statut: ppublish

Résumé

The increasing availability of short-read whole genome sequencing (WGS) provides unprecedented opportunities to study ecological and evolutionary processes. Although loci of interest can be extracted from WGS data and combined with target sequence data, this requires suitable bioinformatic workflows. Here, we test different assembly and locus extraction strategies and implement them into secapr, a pipeline that processes short-read data into multilocus alignments for phylogenetics and molecular ecology analyses. We integrate the processing of data from low-coverage WGS (<30×) and target sequence capture into a flexible framework, while optimizing de novo contig assembly and loci extraction. Specifically, we test different assembly strategies by contrasting their ability to recover loci from targeted butterfly protein-coding genes, using four data sets: a WGS data set across different average coverages (10×, 5× and 2×) and a data set for which these loci were enriched prior to sequencing via target sequence capture. Using the resulting de novo contigs, we account for potential errors within contigs and infer phylogenetic trees to evaluate the ability of each assembly strategy to recover species relationships. We demonstrate that choosing multiple sizes of kmer simultaneously for assembly results in the highest yield of extracted loci from de novo assembled contigs, while data sets derived from sequencing read depths as low as 5× recovers the expected species relationships in phylogenetic trees. By making the tested assembly approaches available in the secapr pipeline, we hope to inspire future studies to incorporate complementary data and make an informed choice on the optimal assembly strategy.

Identifiants

pubmed: 34674330
doi: 10.1111/mec.16240
pmc: PMC9298010
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

6021-6035

Subventions

Organisme : Swedish Research Council
ID : 2017-04980
Organisme : Swedish Research Council
ID : 2019-04739
Organisme : Swedish Foundation for Strategic Research
Organisme : Royal Botanic Gardens, Kew
Organisme : Grant Agency of the Czech Republic
ID : GJ20-18566Y
Organisme : Marie Skłodowska-Curie Fellowship of the European Commission
ID : MARIPOSAS-704035

Informations de copyright

© 2021 The Authors. Molecular Ecology published by John Wiley & Sons Ltd.

Références

Mol Ecol. 2021 Dec;30(23):5966-5993
pubmed: 34250668
Syst Biol. 2017 Sep 01;66(5):786-798
pubmed: 28123117
Mol Ecol Resour. 2019 Jul;19(4):877-892
pubmed: 30934146
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Bioinformatics. 2014 Jan 1;30(1):40-9
pubmed: 24130309
Curr Biol. 2018 Mar 5;28(5):770-778.e5
pubmed: 29456146
Mol Biol Evol. 2016 Jul;33(7):1654-68
pubmed: 27189547
Science. 2010 May 7;328(5979):723-5
pubmed: 20448179
Genome Res. 2009 Jun;19(6):1117-23
pubmed: 19251739
Genet Sel Evol. 2019 Aug 14;51(1):44
pubmed: 31412777
Syst Biol. 2019 Jan 1;68(1):32-46
pubmed: 29771371
Genome Res. 2017 May;27(5):768-777
pubmed: 28232478
Nucleic Acids Res. 2019 Jul 2;47(W1):W623-W631
pubmed: 31045209
Brief Bioinform. 2018 Jan 1;19(1):23-40
pubmed: 27742661
Nat Methods. 2007 Nov;4(11):903-5
pubmed: 17934467
Mol Biol Evol. 2013 Apr;30(4):772-80
pubmed: 23329690
BMC Genomics. 2017 May 22;18(1):396
pubmed: 28532386
Front Genet. 2020 Feb 21;10:1407
pubmed: 32153629
Algorithms Mol Biol. 2013 Sep 16;8(1):22
pubmed: 24040893
Proc Natl Acad Sci U S A. 2019 Mar 26;116(13):6232-6237
pubmed: 30877254
Genome Res. 2008 May;18(5):802-9
pubmed: 18332092
Science. 2014 Dec 12;346(6215):1320-31
pubmed: 25504713
Genome Res. 2003 Jan;13(1):103-7
pubmed: 12529312
Nat Biotechnol. 2009 Feb;27(2):182-9
pubmed: 19182786
Nat Microbiol. 2021 Jan;6(1):3-6
pubmed: 33349678
Genome Res. 2017 May;27(5):824-834
pubmed: 28298430
BMC Genomics. 2014 Jan 30;15:85
pubmed: 24479562
Mol Ecol Resour. 2020 Jul;20(4):892-905
pubmed: 32243090
Mol Biol Evol. 2020 May 1;37(5):1530-1534
pubmed: 32011700
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
PeerJ. 2018 Jul 13;6:e5175
pubmed: 30023140
Syst Biol. 2012 Oct;61(5):717-26
pubmed: 22232343
Mol Ecol. 2021 Dec;30(23):6021-6035
pubmed: 34674330
Mol Biol Evol. 2018 Feb 1;35(2):518-522
pubmed: 29077904
Curr Genomics. 2017 Aug;18(4):366-374
pubmed: 29081692
PLoS Comput Biol. 2016 Jun 16;12(6):e1004753
pubmed: 27308864
Genome Res. 2016 Sep;26(9):1257-67
pubmed: 27435933
Bioinformatics. 2013 Jan 1;29(1):84-91
pubmed: 23093610
Brief Funct Genomics. 2010 Dec;9(5-6):416-23
pubmed: 21266344
Bioinformatics. 2011 Feb 15;27(4):592-3
pubmed: 21169378
Nat Genet. 2021 Jan;53(1):120-126
pubmed: 33414550
Bioinformatics. 2016 Mar 1;32(5):786-8
pubmed: 26530724
Nat Methods. 2017 Jun;14(6):587-589
pubmed: 28481363
BMC Bioinformatics. 2018 May 8;19(Suppl 6):153
pubmed: 29745866
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712

Auteurs

Pedro G Ribeiro (P)

Biology Centre of the Czech Academy of Sciences, Institute of Entomology, České Budějovice, Czech Republic.
Faculty of Science, University of South Bohemia, České Budějovice, Czech Republic.

María Fernanda Torres Jiménez (MF)

Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden.
Gothenburg Global Biodiversity Centre, Gothenburg, Sweden.

Tobias Andermann (T)

Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden.
Gothenburg Global Biodiversity Centre, Gothenburg, Sweden.
Department of Biology, University of Fribourg, Fribourg, Switzerland.
Swiss Institute of Bioinformatics, Fribourg, Switzerland.

Alexandre Antonelli (A)

Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden.
Gothenburg Global Biodiversity Centre, Gothenburg, Sweden.
Royal Botanical Gardens Kew, Richmond, UK.
Department of Plant Sciences, University of Oxford, Oxford, UK.

Christine D Bacon (CD)

Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden.
Gothenburg Global Biodiversity Centre, Gothenburg, Sweden.

Pável Matos-Maraví (P)

Biology Centre of the Czech Academy of Sciences, Institute of Entomology, České Budějovice, Czech Republic.
Gothenburg Global Biodiversity Centre, Gothenburg, Sweden.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Amaryllidaceae Alkaloids Lycoris NADPH-Ferrihemoprotein Reductase Gene Expression Regulation, Plant Plant Proteins

Classifications MeSH