A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering.

Angiosperms Hyb-Seq k-means clustering k-medoids clustering machine learning nuclear genes phylogenomics sequence capture target enrichment

Journal

Systematic biology
ISSN: 1076-836X
Titre abrégé: Syst Biol
Pays: England
ID NLM: 9302532

Informations de publication

Date de publication:
01 07 2019
Historique:
received: 02 07 2018
revised: 29 11 2018
accepted: 03 12 2018
pubmed: 12 12 2018
medline: 4 12 2019
entrez: 12 12 2018
Statut: ppublish

Résumé

Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants). We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes, while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to represent each coding sequence in the final probe set. Using this method, 5-15 representative sequences were selected per orthologous locus, representing the sequence diversity of angiosperms more efficiently than if probes were designed using available sequenced genomes alone. To test our approximately 80,000 probes, we hybridized libraries from 42 species spanning all higher-order groups of angiosperms, with a focus on taxa not present in the sequence alignments used to design the probes. Out of a possible 353 coding sequences, we recovered an average of 283 per species and at least 100 in all species. Differences among taxa in sequence recovery could not be explained by relatedness to the representative taxa selected for probe design, suggesting that there is no phylogenetic bias in the probe set. Our probe set, which targeted 260 kbp of coding sequence, achieved a median recovery of 137 kbp per taxon in coding regions, a maximum recovery of 250 kbp, and an additional median of 212 kbp per taxon in flanking non-coding regions across all species. These results suggest that the Angiosperms353 probe set described here is effective for any group of flowering plants and would be useful for phylogenetic studies from the species level to higher-order groups, including the entire angiosperm clade itself.

Identifiants

pubmed: 30535394
pii: 5237557
doi: 10.1093/sysbio/syy086
pmc: PMC6568016
doi:

Substances chimiques

DNA Probes 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

594-606

Informations de copyright

© The Author(s) 2018. Published by Oxford University Press on behalf of the Society of Systematic Biologists.

Références

Science. 2013 Dec 20;342(6165):1241089
pubmed: 24357323
Appl Plant Sci. 2016 Jul 12;4(7):
pubmed: 27437175
Appl Plant Sci. 2014 Aug 29;2(9):
pubmed: 25225629
New Phytol. 2018 Oct;220(2):636-650
pubmed: 30016546
BMC Evol Biol. 2014 Feb 17;14:23
pubmed: 24533922
Front Plant Sci. 2016 Jan 05;6:1144
pubmed: 26779209
Am J Bot. 2018 Mar;105(3):291-301
pubmed: 29603143
Nature. 2015 Oct 22;526(7574):569-73
pubmed: 26444237
Appl Plant Sci. 2016 Jul 13;4(7):
pubmed: 27437173
PeerJ. 2017 Jul 25;5:e3569
pubmed: 28761782
Mol Phylogenet Evol. 2018 Jun;123:88-100
pubmed: 29496541
BMC Evol Biol. 2010 Feb 24;10:61
pubmed: 20181251
Philos Trans R Soc Lond B Biol Sci. 2016 Sep 5;371(1702):
pubmed: 27481790
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Syst Biol. 2018 Jan 01;67(1):94-112
pubmed: 28472459
Curr Biol. 2018 Apr 23;28(8):1246-1256.e12
pubmed: 29657119
Mol Phylogenet Evol. 2003 Dec;29(3):417-34
pubmed: 14615184
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Mol Phylogenet Evol. 2015 Feb;83:156-66
pubmed: 25463749
BMC Bioinformatics. 2005 Feb 15;6:31
pubmed: 15713233
Appl Plant Sci. 2018 Apr 02;6(3):e1038
pubmed: 29732268
Gigascience. 2014 Oct 27;3:17
pubmed: 25625010
Am J Bot. 2010 Aug;97(8):1296-303
pubmed: 21616882
PLoS One. 2014 Jul 07;9(7):e98986
pubmed: 24999823
Am J Bot. 2018 Mar;105(3):614-622
pubmed: 29603138
Syst Biol. 2012 Oct;61(5):727-44
pubmed: 22605266
Am J Bot. 2018 Mar;105(3):446-462
pubmed: 29738076
New Phytol. 2018 Apr;218(2):819-834
pubmed: 29399804
Nat Plants. 2017 Mar 03;3:17015
pubmed: 28260783
Nat Rev Genet. 2016 Feb;17(2):81-92
pubmed: 26729255
Mol Biol Evol. 2017 Jul 1;34(7):1689-1701
pubmed: 28383641
BMC Genomics. 2017 Jun 23;18(1):475
pubmed: 28645249
Mol Phylogenet Evol. 2018 Mar;120:240-247
pubmed: 29222063
Plant Cell. 2004 Jul;16(7):1667-78
pubmed: 15208399
Mol Biol Evol. 2015 Aug;32(8):2001-14
pubmed: 25837578
BMC Evol Biol. 2017 Jun 16;17(1):141
pubmed: 28622761
New Phytol. 2017 May;214(3):1338-1354
pubmed: 28294342
Mol Biol Evol. 2015 Aug;32(8):2015-35
pubmed: 25873589
Genome Biol Evol. 2009 Oct 05;1:391-9
pubmed: 20333207
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
Appl Plant Sci. 2018 Apr 02;6(3):e1036
pubmed: 29732266
Syst Biol. 2017 May 1;66(3):399-412
pubmed: 27798402
Syst Biol. 2012 Oct;61(5):717-26
pubmed: 22232343
Mol Phylogenet Evol. 2017 Jun;111:231-247
pubmed: 28390909
Nature. 2011 May 5;473(7345):97-100
pubmed: 21478875
Mol Phylogenet Evol. 2012 Nov;65(2):774-85
pubmed: 22842093
Evolution. 2017 Apr;71(4):913-922
pubmed: 28186341
Appl Plant Sci. 2014 Feb 06;2(2):
pubmed: 25202605
Proc Natl Acad Sci U S A. 2014 Nov 11;111(45):E4859-68
pubmed: 25355905
Sci Rep. 2018 Apr 16;8(1):6053
pubmed: 29662101
Genome Biol. 2015 Aug 06;16:157
pubmed: 26243257
Appl Plant Sci. 2015 Apr 06;3(4):
pubmed: 25909041
Genetics. 1992 Jun;131(2):509-13
pubmed: 1644284
Am J Bot. 2018 Mar;105(3):302-314
pubmed: 29746720

Auteurs

Matthew G Johnson (MG)

Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA.
Plant Science and Conservation, Chicago Botanic Garden, 1000 Lake Cook Road, Glencoe, IL 60022, USA.

Lisa Pokorny (L)

Department of Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK.

Steven Dodsworth (S)

Department of Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK.
School of Life Sciences, University of Bedfordshire, University Square, Luton LU1 3JU, UK.

Laura R Botigué (LR)

Department of Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK.
Centre for Research in Agricultural Genomics, Campus UAB, Edifici CRAG, Bellaterra Cerdanyola del Vallès, 08193 Barcelona, Spain.

Robyn S Cowan (RS)

Department of Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK.

Alison Devault (A)

Arbor Biosciences, 5840 Interface Dr, Suite 101, Ann Arbor, MI 48103, USA.

Wolf L Eiserhardt (WL)

Department of Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK.
Department of Bioscience, Aarhus University, 8000 Aarhus C, Denmark.

Niroshini Epitawalage (N)

Department of Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK.

Félix Forest (F)

Department of Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK.

Jan T Kim (JT)

Department of Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK.

James H Leebens-Mack (JH)

Department of Plant Biology, University of Georgia, 2502 Miller Plant Sciences, Athens, GA 30602, USA.

Ilia J Leitch (IJ)

Department of Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK.

Olivier Maurin (O)

Department of Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK.

Douglas E Soltis (DE)

Department of Biology, University of Florida, 220 Bartram Hall, Gainesville, FL 32611-8525, USA.
Florida Museum of Natural History, University of Florida, 3215 Hull Road, Gainesville, FL 32611-2710, USA.

Pamela S Soltis (PS)

Department of Biology, University of Florida, 220 Bartram Hall, Gainesville, FL 32611-8525, USA.
Florida Museum of Natural History, University of Florida, 3215 Hull Road, Gainesville, FL 32611-2710, USA.

Gane Ka-Shu Wong (GK)

BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China.
Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada.
Department of Medicine, University of Alberta, Edmonton, AB T6G 2E1, Canada.

William J Baker (WJ)

Department of Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK.

Norman J Wickett (NJ)

Plant Science and Conservation, Chicago Botanic Garden, 1000 Lake Cook Road, Glencoe, IL 60022, USA.
Program in Plant Biology and Conservation, Northwestern University, 2205 Tech Drive, Evanston, IL 60208, USA.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Genome Size Genome, Plant Magnoliopsida Evolution, Molecular Arabidopsis

Classifications MeSH