PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences.
BEAST 2
Bayesian phylodynamics
duplicate sequences
fast algorithms
large data sets
subsampling
Journal
Molecular biology and evolution
ISSN: 1537-1719
Titre abrégé: Mol Biol Evol
Pays: United States
ID NLM: 8501455
Informations de publication
Date de publication:
01 10 2020
01 10 2020
Historique:
pubmed:
4
6
2020
medline:
16
4
2021
entrez:
4
6
2020
Statut:
ppublish
Résumé
Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient.
Identifiants
pubmed: 32492139
pii: 5850868
doi: 10.1093/molbev/msaa136
pmc: PMC7530608
doi:
Types de publication
Evaluation Study
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
3061-3075Informations de copyright
© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Références
Sci Rep. 2013 Oct 03;3:2837
pubmed: 24089188
Proc Natl Acad Sci U S A. 2016 Mar 8;113(10):2690-5
pubmed: 26903617
J Comput Biol. 2013 Feb;20(2):113-23
pubmed: 23383997
Genetics. 2013 Nov;195(3):1055-62
pubmed: 24037268
Nature. 1995 Jan 12;373(6510):117-22
pubmed: 7529365
Infect Genet Evol. 2016 Sep;43:329-37
pubmed: 27282472
Proc Natl Acad Sci U S A. 2011 Nov 15;108(46):E1156-63
pubmed: 22065783
PLoS Pathog. 2011 Sep;7(9):e1002243
pubmed: 21912520
Mol Biol Evol. 2017 Apr 1;34(4):997-1007
pubmed: 28100788
BMC Evol Biol. 2005 Aug 17;5:44
pubmed: 16107214
Mol Biol Evol. 2018 May 1;35(5):1253-1265
pubmed: 29474671
PLoS Curr. 2014 Sep 02;6:
pubmed: 25642364
PLoS Pathog. 2012;8(8):e1002881
pubmed: 22927817
J Mol Evol. 1981;17(6):368-76
pubmed: 7288891
J Virol. 2014 Mar;88(5):2891-902
pubmed: 24371048
PLoS Curr. 2014 Oct 06;6:
pubmed: 25642370
PLoS Comput Biol. 2014 Mar 27;10(3):e1003515
pubmed: 24675810
PLoS Curr. 2014 Oct 24;6:
pubmed: 25914858
Viruses. 2016 Jan 07;8(1):
pubmed: 26751471
BMC Bioinformatics. 2011 Apr 26;12:119
pubmed: 21521499
Bioinformatics. 2014 May 1;30(9):1312-3
pubmed: 24451623
Science. 2016 Apr 15;352(6283):345-349
pubmed: 27013429
BMC Evol Biol. 2019 Dec 26;19(1):232
pubmed: 31878875
PLoS Comput Biol. 2016 Sep 28;12(9):e1005130
pubmed: 27681228
Retrovirology. 2013 May 02;10:49
pubmed: 23639104
Nat Rev Microbiol. 2011 Jul 04;9(8):617-26
pubmed: 21725337
Virus Evol. 2018 Jan 29;4(1):vex044
pubmed: 29403651
Bioinformatics. 2009 Jun 1;25(11):1370-6
pubmed: 19369496
PLoS Comput Biol. 2014 Apr 10;10(4):e1003537
pubmed: 24722319
PLoS Comput Biol. 2011 Mar;7(3):e1002027
pubmed: 21483482
Mol Biol Evol. 2017 May 1;34(5):1276-1288
pubmed: 28204593
PLoS Comput Biol. 2014 Mar 27;10(3):e1003549
pubmed: 24675511
Nat Rev Genet. 2016 May 17;17(6):333-51
pubmed: 27184599
PLoS Comput Biol. 2014 Apr 03;10(4):e1003505
pubmed: 24699231
Syst Biol. 2010 Jan;59(1):27-41
pubmed: 20525618
Retrovirology. 2014 Jul 04;11:56
pubmed: 24996694
Microbiol Mol Biol Rev. 2012 Jun;76(2):159-216
pubmed: 22688811
Viruses. 2011 Oct;3(10):2006-24
pubmed: 22069526
Front Microbiol. 2012 Sep 11;3:329
pubmed: 22973268
Genetics. 2017 Feb;205(2):857-870
pubmed: 28007885
Science. 1996 Mar 15;271(5255):1582-6
pubmed: 8599114
Proc Natl Acad Sci U S A. 2013 Jan 2;110(1):228-33
pubmed: 23248286
Elife. 2015 Dec 11;4:
pubmed: 26652000
PLoS One. 2014 May 19;9(5):e97505
pubmed: 24842159
Mol Biol Evol. 2020 May 1;37(5):1530-1534
pubmed: 32011700
PLoS Comput Biol. 2015 Dec 30;11(12):e1004613
pubmed: 26717515
Mol Biol Evol. 2014 Jul;31(7):1869-79
pubmed: 24714079
Nat Methods. 2008 Jan;5(1):16-8
pubmed: 18165802
Naturwissenschaften. 1977 Nov;64(11):541-65
pubmed: 593400
BMC Evol Biol. 2011 May 19;11:131
pubmed: 21595904
Syst Biol. 2012 Jul;61(4):579-93
pubmed: 22223445
AIDS. 2011 Oct 23;25(16):2019-26
pubmed: 21832936
Mol Biol Evol. 2005 May;22(5):1185-92
pubmed: 15703244
Brief Bioinform. 2014 May;15(3):431-42
pubmed: 23257116
Syst Biol. 2019 Nov 1;68(6):1052-1061
pubmed: 31034053
PLoS Comput Biol. 2017 May 18;13(5):e1005495
pubmed: 28545083
Proc Natl Acad Sci U S A. 2005 Mar 22;102(12):4425-9
pubmed: 15767575
Nat Rev Genet. 2009 Aug;10(8):540-50
pubmed: 19564871
PLoS Comput Biol. 2012;8(11):e1002753
pubmed: 23133358
Science. 2016 Jul 22;353(6297):353-4
pubmed: 27417493