Complete vertebrate mitogenomes reveal widespread repeats and gene duplications.


Journal

Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660

Informations de publication

Date de publication:
29 04 2021
Historique:
received: 17 08 2020
accepted: 31 03 2021
entrez: 29 4 2021
pubmed: 30 4 2021
medline: 15 1 2022
Statut: epublish

Résumé

Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100-300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization. Our results indicate that even in the "simple" case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.

Sections du résumé

BACKGROUND
Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly.
RESULTS
As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100-300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization.
CONCLUSIONS
Our results indicate that even in the "simple" case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.

Identifiants

pubmed: 33910595
doi: 10.1186/s13059-021-02336-9
pii: 10.1186/s13059-021-02336-9
pmc: PMC8082918
doi:

Types de publication

Journal Article Research Support, N.I.H., Intramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

120

Subventions

Organisme : Wellcome Trust
ID : WT207492
Pays : United Kingdom

Investigateurs

Alexander N G Kirschel (ANG)
Andrew Digby (A)
Andrew Veale (A)
Anne Bronikowski (A)
Bob Murphy (B)
Bruce Robertson (B)
Clare Baker (C)
Camila Mazzoni (C)
Christopher Balakrishnan (C)
Chul Lee (C)
Daniel Mead (D)
Emma Teeling (E)
Erez Lieberman Aiden (EL)
Erica Todd (E)
Evan Eichler (E)
Gavin J P Naylor (GJP)
Guojie Zhang (G)
Jeramiah Smith (J)
Jochen Wolf (J)
Justin Touchon (J)
Kira Delmore (K)
Kjetill Jakobsen (K)
Lisa Komoroske (L)
Mark Wilkinson (M)
Martin Genner (M)
Martin Pšenička (M)
Matthew Fuxjager (M)
Mike Stratton (M)
Miriam Liedvogel (M)
Neil Gemmell (N)
Piotr Minias (P)
Peter O Dunn (PO)
Peter Sudmant (P)
Phil Morin (P)
Qasim Ayub (Q)
Robert Kraus (R)
Sonja Vernes (S)
Steve Smith (S)
Tanya Lama (T)
Taylor Edwards (T)
Tim Smith (T)
Tom Gilbert (T)
Tomas Marques-Bonet (T)
Tony Einfeldt (T)
Byrappa Venkatesh (B)
Warren Johnson (W)
Wes Warren (W)
Yury Bukhman (Y)

Références

Mol Biol Evol. 2016 Mar;33(3):621-42
pubmed: 26556496
Mol Phylogenet Evol. 2004 Jul;32(1):274-86
pubmed: 15186813
Methods Mol Biol. 2019;1962:1-14
pubmed: 31020551
Methods Mol Biol. 2012;858:153-82
pubmed: 22684956
Trends Ecol Evol. 1989 Jan;4(1):6-11
pubmed: 21227301
PLoS One. 2014 Apr 23;9(4):e95599
pubmed: 24759626
Am J Hum Genet. 2001 Nov;69(5):1113-26
pubmed: 11582570
Nucleic Acids Res. 2019 Jul 2;47(W1):W256-W259
pubmed: 30931475
Mol Biol Evol. 2017 Nov 1;34(11):2762-2772
pubmed: 28981721
Nat Methods. 2012 May 30;9(6):523-4
pubmed: 22669646
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Comput Struct Biotechnol J. 2019 Nov 17;18:9-19
pubmed: 31890139
Genome Biol Evol. 2015 Oct 15;7(11):2983-95
pubmed: 26475316
BMC Evol Biol. 2017 Jul 6;17(1):162
pubmed: 28683774
Mitochondrion. 2015 Jan;20:13-21
pubmed: 25446395
BMC Evol Biol. 2016 Oct 26;16(1):230
pubmed: 27782796
Front Cell Dev Biol. 2016 Aug 18;4:85
pubmed: 27588285
Mol Phylogenet Evol. 2013 Nov;69(2):313-9
pubmed: 22982435
Proc Natl Acad Sci U S A. 1989 Aug;86(16):6196-200
pubmed: 2762322
Genome Biol. 2020 Sep 14;21(1):245
pubmed: 32928274
BMC Evol Biol. 2018 May 30;18(1):80
pubmed: 29848319
Nucleic Acids Res. 2014 Nov 10;42(20):12640-9
pubmed: 25348406
BMC Bioinformatics. 2010 Jan 05;11:7
pubmed: 20051126
Mol Biol Evol. 2017 Jun 1;34(6):1319-1334
pubmed: 28087770
Front Cell Dev Biol. 2019 Sep 25;7:201
pubmed: 31612134
Proc Biol Sci. 2003 Feb 7;270(1512):313-21
pubmed: 12614582
Nucleic Acids Res. 2017 Feb 28;45(4):e18
pubmed: 28204566
Biochemistry (Mosc). 2012 Dec;77(13):1424-35
pubmed: 23379519
Nat Biotechnol. 2019 Oct;37(10):1155-1162
pubmed: 31406327
Proc Biol Sci. 2018 May 16;285(1878):
pubmed: 29769358
Proc Natl Acad Sci U S A. 1998 Sep 1;95(18):10693-7
pubmed: 9724766
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Science. 2014 Dec 12;346(6215):1320-31
pubmed: 25504713
Mol Biol Evol. 2017 Jul 1;34(7):1812-1819
pubmed: 28387841
BMC Genomics. 2016 Sep 07;17:719
pubmed: 27604148
Nat Genet. 2016 Oct;48(10):1204-10
pubmed: 27548311
Hum Mutat. 2004 Feb;23(2):125-33
pubmed: 14722916
Mol Biol Evol. 2009 Feb;26(2):313-26
pubmed: 18981298
Sci Rep. 2015 Aug 19;5:13227
pubmed: 26288099
Genome Biol. 2015 Dec 29;16:294
pubmed: 26714481
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
Curr Biol. 2019 Jun 17;29(12):2031-2042.e6
pubmed: 31178321
Science. 2011 Oct 28;334(6055):521-4
pubmed: 21940861
Mitochondrial DNA B Resour. 2017 Sep 4;2(2):601-603
pubmed: 33473916
Mol Biol (Mosk). 2019 Jul-Aug;53(4):627-637
pubmed: 31397436
PLoS Comput Biol. 2018 Jan 26;14(1):e1005944
pubmed: 29373581
Curr Biol. 2016 May 23;26(10):1274-84
pubmed: 27185558
Mitochondrial DNA. 2014 Dec;25(6):416-7
pubmed: 23815333
Nat Biotechnol. 2011 Jan;29(1):24-6
pubmed: 21221095
Genome Biol. 2013 May 29;14(5):R51
pubmed: 23718773
Mitochondrial DNA A DNA Mapp Seq Anal. 2017 Nov;28(6):867-871
pubmed: 27549748
PLoS Biol. 2004 Oct;2(10):e312
pubmed: 15455034
PLoS Genet. 2010 Feb 12;6(2):e1000834
pubmed: 20168995
Nat Methods. 2016 Dec;13(12):1050-1054
pubmed: 27749838
Mol Biol Evol. 2004 Jun;21(6):974-83
pubmed: 14739240
Genome Biol Evol. 2019 Oct 1;11(10):2824-2849
pubmed: 31580435
Nucleic Acids Res. 2013 Jul;41(13):e129
pubmed: 23661685
Annu Rev Anim Biosci. 2015;3:57-111
pubmed: 25689317
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
Mol Biol Evol. 2007 Jan;24(1):269-80
pubmed: 17062634
BMC Genomics. 2017 Jan 7;18(1):49
pubmed: 28061749
Mol Ecol. 2009 Nov;18(22):4541-50
pubmed: 19821901
J Fish Biol. 2009 Feb;74(2):329-56
pubmed: 20735564

Auteurs

Giulio Formenti (G)

The Vertebrate Genome Lab, Rockefeller University, New York, NY, USA. gformenti@rockefeller.edu.
Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY, USA. gformenti@rockefeller.edu.
The Howards Hughes Medical Institute, Chevy Chase, MD, USA. gformenti@rockefeller.edu.

Arang Rhie (A)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

Jennifer Balacco (J)

The Vertebrate Genome Lab, Rockefeller University, New York, NY, USA.

Bettina Haase (B)

The Vertebrate Genome Lab, Rockefeller University, New York, NY, USA.

Jacquelyn Mountcastle (J)

The Vertebrate Genome Lab, Rockefeller University, New York, NY, USA.

Olivier Fedrigo (O)

The Vertebrate Genome Lab, Rockefeller University, New York, NY, USA.

Samara Brown (S)

Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY, USA.
The Howards Hughes Medical Institute, Chevy Chase, MD, USA.

Marco Rosario Capodiferro (MR)

Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia, Italy.

Farooq O Al-Ajli (FO)

Monash University Malaysia Genomics Facility, School of Science, Bandar Sunway, Selangor Darul Ehsan, Malaysia.
Tropical Medicine and Biology Multidisciplinary Platform, Monash University Malaysia, Bandar Sunway, Selangor Darul Ehsan, Malaysia.
Qatar Falcon Genome Project, Doha, State of Qatar.

Roberto Ambrosini (R)

Department of Environmental Science and Policy, University of Milan, Milan, Italy.

Peter Houde (P)

Department of Biology, New Mexico State University, Las Cruces, NM, USA.

Sergey Koren (S)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

Karen Oliver (K)

Wellcome Sanger Institute, Cambridge, UK.

Michelle Smith (M)

Wellcome Sanger Institute, Cambridge, UK.

Jason Skelton (J)

Wellcome Sanger Institute, Cambridge, UK.

Emma Betteridge (E)

Wellcome Sanger Institute, Cambridge, UK.

Jale Dolucan (J)

Wellcome Sanger Institute, Cambridge, UK.

Craig Corton (C)

Wellcome Sanger Institute, Cambridge, UK.

Iliana Bista (I)

Wellcome Sanger Institute, Cambridge, UK.
Department of Genetics, University of Cambridge, Cambridge, UK.

James Torrance (J)

Wellcome Sanger Institute, Cambridge, UK.

Alan Tracey (A)

Wellcome Sanger Institute, Cambridge, UK.

Jonathan Wood (J)

Wellcome Sanger Institute, Cambridge, UK.

Marcela Uliano-Silva (M)

Wellcome Sanger Institute, Cambridge, UK.

Kerstin Howe (K)

Wellcome Sanger Institute, Cambridge, UK.

Shane McCarthy (S)

Wellcome Sanger Institute, Cambridge, UK.
Department of Genetics, University of Cambridge, Cambridge, UK.

Sylke Winkler (S)

Max Planck Institute of Molecular Cell Biology & Genetics, Dresden, Germany.

Woori Kwak (W)

Hoonygen, Seoul, Korea.

Jonas Korlach (J)

Pacific Biosciences, Menlo Park, CA, USA.

Arkarachai Fungtammasan (A)

DNAnexus Inc., Mountain View, CA, USA.

Daniel Fordham (D)

Oxford Nanopore Technologies Ltd, Oxford Science Park, Oxford, UK.

Vania Costa (V)

Oxford Nanopore Technologies Ltd, Oxford Science Park, Oxford, UK.

Simon Mayes (S)

Oxford Nanopore Technologies Ltd, Oxford Science Park, Oxford, UK.

Matteo Chiara (M)

Department of Biosciences, University of Milan, Milan, Italy.

David S Horner (DS)

Department of Biosciences, University of Milan, Milan, Italy.

Eugene Myers (E)

Max Planck Institute of Molecular Cell Biology & Genetics, Dresden, Germany.

Richard Durbin (R)

Wellcome Sanger Institute, Cambridge, UK.
Department of Genetics, University of Cambridge, Cambridge, UK.

Alessandro Achilli (A)

Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia, Italy.

Edward L Braun (EL)

Department of Biology, University of Florida, Gainesville, FL, USA.

Adam M Phillippy (AM)

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.

Erich D Jarvis (ED)

The Vertebrate Genome Lab, Rockefeller University, New York, NY, USA. ejarvis@rockefeller.edu.
Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY, USA. ejarvis@rockefeller.edu.
The Howards Hughes Medical Institute, Chevy Chase, MD, USA. ejarvis@rockefeller.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Robotic Surgical Procedures Animals Humans Telemedicine Models, Animal

Odour generalisation and detection dog training.

Lyn Caldicott, Thomas W Pike, Helen E Zulch et al.
1.00
Animals Odorants Dogs Generalization, Psychological Smell
Animals TOR Serine-Threonine Kinases Colorectal Neoplasms Colitis Mice

Classifications MeSH