Horizontal gene transfer and recombination analysis of SARS-CoV-2 genes helps discover its close relatives and shed light on its origin.
Consensus tree
Evolution of SARS-CoV-2
Gene evolution
Horizontal gene transfer
Phylogenetic network
Recombination
Journal
BMC ecology and evolution
ISSN: 2730-7182
Titre abrégé: BMC Ecol Evol
Pays: England
ID NLM: 101775613
Informations de publication
Date de publication:
21 01 2021
21 01 2021
Historique:
received:
08
09
2020
accepted:
08
12
2020
entrez:
30
1
2021
pubmed:
31
1
2021
medline:
9
2
2021
Statut:
epublish
Résumé
The SARS-CoV-2 pandemic is one of the greatest global medical and social challenges that have emerged in recent history. Human coronavirus strains discovered during previous SARS outbreaks have been hypothesized to pass from bats to humans using intermediate hosts, e.g. civets for SARS-CoV and camels for MERS-CoV. The discovery of an intermediate host of SARS-CoV-2 and the identification of specific mechanism of its emergence in humans are topics of primary evolutionary importance. In this study we investigate the evolutionary patterns of 11 main genes of SARS-CoV-2. Previous studies suggested that the genome of SARS-CoV-2 is highly similar to the horseshoe bat coronavirus RaTG13 for most of the genes and to some Malayan pangolin coronavirus (CoV) strains for the receptor binding (RB) domain of the spike protein. We provide a detailed list of statistically significant horizontal gene transfer and recombination events (both intergenic and intragenic) inferred for each of 11 main genes of the SARS-CoV-2 genome. Our analysis reveals that two continuous regions of genes S and N of SARS-CoV-2 may result from intragenic recombination between RaTG13 and Guangdong (GD) Pangolin CoVs. Statistically significant gene transfer-recombination events between RaTG13 and GD Pangolin CoV have been identified in region [1215-1425] of gene S and region [534-727] of gene N. Moreover, some statistically significant recombination events between the ancestors of SARS-CoV-2, RaTG13, GD Pangolin CoV and bat CoV ZC45-ZXC21 coronaviruses have been identified in genes ORF1ab, S, ORF3a, ORF7a, ORF8 and N. Furthermore, topology-based clustering of gene trees inferred for 25 CoV organisms revealed a three-way evolution of coronavirus genes, with gene phylogenies of ORF1ab, S and N forming the first cluster, gene phylogenies of ORF3a, E, M, ORF6, ORF7a, ORF7b and ORF8 forming the second cluster, and phylogeny of gene ORF10 forming the third cluster. The results of our horizontal gene transfer and recombination analysis suggest that SARS-CoV-2 could not only be a chimera virus resulting from recombination of the bat RaTG13 and Guangdong pangolin coronaviruses but also a close relative of the bat CoV ZC45 and ZXC21 strains. They also indicate that a GD pangolin may be an intermediate host of this dangerous virus.
Sections du résumé
BACKGROUND
The SARS-CoV-2 pandemic is one of the greatest global medical and social challenges that have emerged in recent history. Human coronavirus strains discovered during previous SARS outbreaks have been hypothesized to pass from bats to humans using intermediate hosts, e.g. civets for SARS-CoV and camels for MERS-CoV. The discovery of an intermediate host of SARS-CoV-2 and the identification of specific mechanism of its emergence in humans are topics of primary evolutionary importance. In this study we investigate the evolutionary patterns of 11 main genes of SARS-CoV-2. Previous studies suggested that the genome of SARS-CoV-2 is highly similar to the horseshoe bat coronavirus RaTG13 for most of the genes and to some Malayan pangolin coronavirus (CoV) strains for the receptor binding (RB) domain of the spike protein.
RESULTS
We provide a detailed list of statistically significant horizontal gene transfer and recombination events (both intergenic and intragenic) inferred for each of 11 main genes of the SARS-CoV-2 genome. Our analysis reveals that two continuous regions of genes S and N of SARS-CoV-2 may result from intragenic recombination between RaTG13 and Guangdong (GD) Pangolin CoVs. Statistically significant gene transfer-recombination events between RaTG13 and GD Pangolin CoV have been identified in region [1215-1425] of gene S and region [534-727] of gene N. Moreover, some statistically significant recombination events between the ancestors of SARS-CoV-2, RaTG13, GD Pangolin CoV and bat CoV ZC45-ZXC21 coronaviruses have been identified in genes ORF1ab, S, ORF3a, ORF7a, ORF8 and N. Furthermore, topology-based clustering of gene trees inferred for 25 CoV organisms revealed a three-way evolution of coronavirus genes, with gene phylogenies of ORF1ab, S and N forming the first cluster, gene phylogenies of ORF3a, E, M, ORF6, ORF7a, ORF7b and ORF8 forming the second cluster, and phylogeny of gene ORF10 forming the third cluster.
CONCLUSIONS
The results of our horizontal gene transfer and recombination analysis suggest that SARS-CoV-2 could not only be a chimera virus resulting from recombination of the bat RaTG13 and Guangdong pangolin coronaviruses but also a close relative of the bat CoV ZC45 and ZXC21 strains. They also indicate that a GD pangolin may be an intermediate host of this dangerous virus.
Identifiants
pubmed: 33514319
doi: 10.1186/s12862-020-01732-2
pii: 10.1186/s12862-020-01732-2
pmc: PMC7817968
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
5Subventions
Organisme : Natural Sciences and Engineering Research Council of Canada
ID : 249644
Organisme : Canadian Institute for Advanced Research
ID : CF-0136
Références
J Virol. 1999 Jan;73(1):152-60
pubmed: 9847317
Neurology. 2015 Sep 8;85(10):846-52
pubmed: 26253449
Emerg Infect Dis. 2004 Jun;10(6):1030-7
pubmed: 15207054
Virus Res. 2008 Apr;133(1):88-100
pubmed: 17397959
Nat Microbiol. 2020 Nov;5(11):1408-1417
pubmed: 32724171
BMC Evol Biol. 2003 Jan 6;3:2
pubmed: 12515582
Cell. 2000 Nov 22;103(5):711-21
pubmed: 11114328
J Virol. 2015 Oct;89(20):10532-47
pubmed: 26269185
Bioinformatics. 2006 Nov 1;22(21):2604-11
pubmed: 16928736
BMC Microbiol. 2006 Oct 04;6:88
pubmed: 17020602
J Virol. 2010 Apr;84(7):3134-46
pubmed: 19906932
Nature. 2020 Mar;579(7798):270-273
pubmed: 32015507
Infect Genet Evol. 2008 Jul;8(4):397-405
pubmed: 17881296
Mol Biol Evol. 2018 Jun 1;35(6):1547-1549
pubmed: 29722887
Mol Biol Evol. 2000 Apr;17(4):540-52
pubmed: 10742046
Annu Rev Microbiol. 2001;55:709-42
pubmed: 11544372
Curr Biol. 2020 Apr 20;30(8):1578
pubmed: 32315626
Microbiol Mol Biol Rev. 2005 Dec;69(4):635-64
pubmed: 16339739
Lancet. 2020 Feb 22;395(10224):565-574
pubmed: 32007145
J Virol. 2007 Oct;81(20):11054-68
pubmed: 17686858
Front Genet. 2013 Oct 14;4:206
pubmed: 24133504
Syst Biol. 2010 Mar;59(2):195-211
pubmed: 20525630
Biol Direct. 2007 Dec 06;2:36
pubmed: 18062816
J Virol. 2004 Jan;78(1):76-82
pubmed: 14671089
Mil Med Res. 2020 Mar 13;7(1):11
pubmed: 32169119
Nat Med. 2020 Apr;26(4):450-452
pubmed: 32284615
J Comput Biol. 2000;7(5):731-44
pubmed: 11153096
Infect Genet Evol. 2015 Mar;30:296-307
pubmed: 25541518
Nucleic Acids Res. 2012 Jul;40(Web Server issue):W573-9
pubmed: 22675075
J Biol Chem. 2006 Jun 9;281(23):15829-36
pubmed: 16597622
PLoS Pathog. 2017 Nov 30;13(11):e1006698
pubmed: 29190287
Mol Biol Evol. 2006 Feb;23(2):254-67
pubmed: 16221896
Viruses. 2010 Aug;2(8):1804-1820
pubmed: 21994708
Protein J. 2020 Jun;39(3):198-216
pubmed: 32447571
Proc Natl Acad Sci U S A. 2021 Jun 8;118(23):
pubmed: 34021074
Mol Biol Evol. 2007 Jan;24(1):324-37
pubmed: 17068107
PLoS One. 2010 Apr 01;5(4):e9989
pubmed: 20376325
J Comput Biol. 2004;11(1):195-212
pubmed: 15072696
Syst Biol. 2002 Apr;51(2):199-216
pubmed: 12028728
Proc Natl Acad Sci U S A. 2006 Aug 15;103(33):12540-5
pubmed: 16894145
Sci Adv. 2020 Jul 1;6(27):
pubmed: 32937441
Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W465-9
pubmed: 18424797
Viruses. 2019 Oct 24;11(11):
pubmed: 31652964
Nature. 2020 Jul;583(7815):282-285
pubmed: 32218527
Euro Surveill. 2017 Mar 30;22(13):
pubmed: 28382917
BMC Evol Biol. 2018 Apr 5;18(1):48
pubmed: 29621975
Cell Mol Immunol. 2020 Jun;17(6):613-620
pubmed: 32203189
Nucleic Acids Res. 2011 Nov;39(21):e144
pubmed: 21917854
BMC Bioinformatics. 2004 Aug 19;5:113
pubmed: 15318951
PLoS Comput Biol. 2015 May 28;11(5):e1004095
pubmed: 26020646
Genetics. 2006 Apr;172(4):2665-81
pubmed: 16489234
Bioinformatics. 2006 Nov 1;22(21):2688-90
pubmed: 16928733
Front Microbiol. 2019 Feb 13;10:184
pubmed: 30814982
Nucleic Acids Res. 2008 Jan;36(Database issue):D25-30
pubmed: 18073190
J Med Virol. 2020 Jun;92(6):660-666
pubmed: 32159237
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W557-9
pubmed: 15980534
Trends Genet. 2013 Aug;29(8):439-41
pubmed: 23764187
Trends Microbiol. 2016 Mar;24(3):224-237
pubmed: 26774999
Bioinformatics. 2005 Feb 1;21(3):390-2
pubmed: 15374874
Nat Microbiol. 2020 Nov;5(11):1403-1407
pubmed: 32669681