Complete vertebrate mitogenomes reveal widespread repeats and gene duplications.
Assembly
Duplications
Long reads
Mitochondrial DNA
Repeats
Sequencing
Vertebrate
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
29 04 2021
29 04 2021
Historique:
received:
17
08
2020
accepted:
31
03
2021
entrez:
29
4
2021
pubmed:
30
4
2021
medline:
15
1
2022
Statut:
epublish
Résumé
Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100-300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization. Our results indicate that even in the "simple" case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.
Sections du résumé
BACKGROUND
Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly.
RESULTS
As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100-300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization.
CONCLUSIONS
Our results indicate that even in the "simple" case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.
Identifiants
pubmed: 33910595
doi: 10.1186/s13059-021-02336-9
pii: 10.1186/s13059-021-02336-9
pmc: PMC8082918
doi:
Types de publication
Journal Article
Research Support, N.I.H., Intramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
120Subventions
Organisme : Wellcome Trust
ID : WT207492
Pays : United Kingdom
Investigateurs
Alexander N G Kirschel
(ANG)
Andrew Digby
(A)
Andrew Veale
(A)
Anne Bronikowski
(A)
Bob Murphy
(B)
Bruce Robertson
(B)
Clare Baker
(C)
Camila Mazzoni
(C)
Christopher Balakrishnan
(C)
Chul Lee
(C)
Daniel Mead
(D)
Emma Teeling
(E)
Erez Lieberman Aiden
(EL)
Erica Todd
(E)
Evan Eichler
(E)
Gavin J P Naylor
(GJP)
Guojie Zhang
(G)
Jeramiah Smith
(J)
Jochen Wolf
(J)
Justin Touchon
(J)
Kira Delmore
(K)
Kjetill Jakobsen
(K)
Lisa Komoroske
(L)
Mark Wilkinson
(M)
Martin Genner
(M)
Martin Pšenička
(M)
Matthew Fuxjager
(M)
Mike Stratton
(M)
Miriam Liedvogel
(M)
Neil Gemmell
(N)
Piotr Minias
(P)
Peter O Dunn
(PO)
Peter Sudmant
(P)
Phil Morin
(P)
Qasim Ayub
(Q)
Robert Kraus
(R)
Sonja Vernes
(S)
Steve Smith
(S)
Tanya Lama
(T)
Taylor Edwards
(T)
Tim Smith
(T)
Tom Gilbert
(T)
Tomas Marques-Bonet
(T)
Tony Einfeldt
(T)
Byrappa Venkatesh
(B)
Warren Johnson
(W)
Wes Warren
(W)
Yury Bukhman
(Y)
Références
Mol Biol Evol. 2016 Mar;33(3):621-42
pubmed: 26556496
Mol Phylogenet Evol. 2004 Jul;32(1):274-86
pubmed: 15186813
Methods Mol Biol. 2019;1962:1-14
pubmed: 31020551
Methods Mol Biol. 2012;858:153-82
pubmed: 22684956
Trends Ecol Evol. 1989 Jan;4(1):6-11
pubmed: 21227301
PLoS One. 2014 Apr 23;9(4):e95599
pubmed: 24759626
Am J Hum Genet. 2001 Nov;69(5):1113-26
pubmed: 11582570
Nucleic Acids Res. 2019 Jul 2;47(W1):W256-W259
pubmed: 30931475
Mol Biol Evol. 2017 Nov 1;34(11):2762-2772
pubmed: 28981721
Nat Methods. 2012 May 30;9(6):523-4
pubmed: 22669646
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Comput Struct Biotechnol J. 2019 Nov 17;18:9-19
pubmed: 31890139
Genome Biol Evol. 2015 Oct 15;7(11):2983-95
pubmed: 26475316
BMC Evol Biol. 2017 Jul 6;17(1):162
pubmed: 28683774
Mitochondrion. 2015 Jan;20:13-21
pubmed: 25446395
BMC Evol Biol. 2016 Oct 26;16(1):230
pubmed: 27782796
Front Cell Dev Biol. 2016 Aug 18;4:85
pubmed: 27588285
Mol Phylogenet Evol. 2013 Nov;69(2):313-9
pubmed: 22982435
Proc Natl Acad Sci U S A. 1989 Aug;86(16):6196-200
pubmed: 2762322
Genome Biol. 2020 Sep 14;21(1):245
pubmed: 32928274
BMC Evol Biol. 2018 May 30;18(1):80
pubmed: 29848319
Nucleic Acids Res. 2014 Nov 10;42(20):12640-9
pubmed: 25348406
BMC Bioinformatics. 2010 Jan 05;11:7
pubmed: 20051126
Mol Biol Evol. 2017 Jun 1;34(6):1319-1334
pubmed: 28087770
Front Cell Dev Biol. 2019 Sep 25;7:201
pubmed: 31612134
Proc Biol Sci. 2003 Feb 7;270(1512):313-21
pubmed: 12614582
Nucleic Acids Res. 2017 Feb 28;45(4):e18
pubmed: 28204566
Biochemistry (Mosc). 2012 Dec;77(13):1424-35
pubmed: 23379519
Nat Biotechnol. 2019 Oct;37(10):1155-1162
pubmed: 31406327
Proc Biol Sci. 2018 May 16;285(1878):
pubmed: 29769358
Proc Natl Acad Sci U S A. 1998 Sep 1;95(18):10693-7
pubmed: 9724766
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Science. 2014 Dec 12;346(6215):1320-31
pubmed: 25504713
Mol Biol Evol. 2017 Jul 1;34(7):1812-1819
pubmed: 28387841
BMC Genomics. 2016 Sep 07;17:719
pubmed: 27604148
Nat Genet. 2016 Oct;48(10):1204-10
pubmed: 27548311
Hum Mutat. 2004 Feb;23(2):125-33
pubmed: 14722916
Mol Biol Evol. 2009 Feb;26(2):313-26
pubmed: 18981298
Sci Rep. 2015 Aug 19;5:13227
pubmed: 26288099
Genome Biol. 2015 Dec 29;16:294
pubmed: 26714481
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
Curr Biol. 2019 Jun 17;29(12):2031-2042.e6
pubmed: 31178321
Science. 2011 Oct 28;334(6055):521-4
pubmed: 21940861
Mitochondrial DNA B Resour. 2017 Sep 4;2(2):601-603
pubmed: 33473916
Mol Biol (Mosk). 2019 Jul-Aug;53(4):627-637
pubmed: 31397436
PLoS Comput Biol. 2018 Jan 26;14(1):e1005944
pubmed: 29373581
Curr Biol. 2016 May 23;26(10):1274-84
pubmed: 27185558
Mitochondrial DNA. 2014 Dec;25(6):416-7
pubmed: 23815333
Nat Biotechnol. 2011 Jan;29(1):24-6
pubmed: 21221095
Genome Biol. 2013 May 29;14(5):R51
pubmed: 23718773
Mitochondrial DNA A DNA Mapp Seq Anal. 2017 Nov;28(6):867-871
pubmed: 27549748
PLoS Biol. 2004 Oct;2(10):e312
pubmed: 15455034
PLoS Genet. 2010 Feb 12;6(2):e1000834
pubmed: 20168995
Nat Methods. 2016 Dec;13(12):1050-1054
pubmed: 27749838
Mol Biol Evol. 2004 Jun;21(6):974-83
pubmed: 14739240
Genome Biol Evol. 2019 Oct 1;11(10):2824-2849
pubmed: 31580435
Nucleic Acids Res. 2013 Jul;41(13):e129
pubmed: 23661685
Annu Rev Anim Biosci. 2015;3:57-111
pubmed: 25689317
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
Mol Biol Evol. 2007 Jan;24(1):269-80
pubmed: 17062634
BMC Genomics. 2017 Jan 7;18(1):49
pubmed: 28061749
Mol Ecol. 2009 Nov;18(22):4541-50
pubmed: 19821901
J Fish Biol. 2009 Feb;74(2):329-56
pubmed: 20735564