A globally diverse reference alignment and panel for imputation of mitochondrial DNA variants.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
01 Sep 2021
Historique:
received: 22 01 2021
accepted: 16 08 2021
entrez: 2 9 2021
pubmed: 3 9 2021
medline: 4 9 2021
Statut: epublish

Résumé

Variation in mitochondrial DNA (mtDNA) identified by genotyping microarrays or by sequencing only the hypervariable regions of the genome may be insufficient to reliably assign mitochondrial genomes to phylogenetic lineages or haplogroups. This lack of resolution can limit functional and clinical interpretation of a substantial body of existing mtDNA data. To address this limitation, we developed and evaluated a large, curated reference alignment of complete mtDNA sequences as part of a pipeline for imputing missing mtDNA single nucleotide variants (mtSNVs). We call our reference alignment and pipeline MitoImpute. We aligned the sequences of 36,960 complete human mitochondrial genomes downloaded from GenBank, filtered and controlled for quality. These sequences were reformatted for use in imputation software, IMPUTE2. We assessed the imputation accuracy of MitoImpute by measuring haplogroup and genotype concordance in data from the 1000 Genomes Project and the Alzheimer's Disease Neuroimaging Initiative (ADNI). The mean improvement of haplogroup assignment in the 1000 Genomes samples was 42.7% (Matthew's correlation coefficient = 0.64). In the ADNI cohort, we imputed missing single nucleotide variants. These results show that our reference alignment and panel can be used to impute missing mtSNVs in existing data obtained from using microarrays, thereby broadening the scope of functional and clinical investigation of mtDNA. This improvement may be particularly useful in studies where participants have been recruited over time and mtDNA data obtained using different methods, enabling better integration of early data collected using less accurate methods with more recent sequence data.

Sections du résumé

BACKGROUND BACKGROUND
Variation in mitochondrial DNA (mtDNA) identified by genotyping microarrays or by sequencing only the hypervariable regions of the genome may be insufficient to reliably assign mitochondrial genomes to phylogenetic lineages or haplogroups. This lack of resolution can limit functional and clinical interpretation of a substantial body of existing mtDNA data. To address this limitation, we developed and evaluated a large, curated reference alignment of complete mtDNA sequences as part of a pipeline for imputing missing mtDNA single nucleotide variants (mtSNVs). We call our reference alignment and pipeline MitoImpute.
RESULTS RESULTS
We aligned the sequences of 36,960 complete human mitochondrial genomes downloaded from GenBank, filtered and controlled for quality. These sequences were reformatted for use in imputation software, IMPUTE2. We assessed the imputation accuracy of MitoImpute by measuring haplogroup and genotype concordance in data from the 1000 Genomes Project and the Alzheimer's Disease Neuroimaging Initiative (ADNI). The mean improvement of haplogroup assignment in the 1000 Genomes samples was 42.7% (Matthew's correlation coefficient = 0.64). In the ADNI cohort, we imputed missing single nucleotide variants.
CONCLUSION CONCLUSIONS
These results show that our reference alignment and panel can be used to impute missing mtSNVs in existing data obtained from using microarrays, thereby broadening the scope of functional and clinical investigation of mtDNA. This improvement may be particularly useful in studies where participants have been recruited over time and mtDNA data obtained using different methods, enabling better integration of early data collected using less accurate methods with more recent sequence data.

Identifiants

pubmed: 34470617
doi: 10.1186/s12859-021-04337-8
pii: 10.1186/s12859-021-04337-8
pmc: PMC8409003
doi:

Substances chimiques

DNA, Mitochondrial 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

417

Subventions

Organisme : NIA NIH HHS
ID : P01 AG012435
Pays : United States
Organisme : NIA NIH HHS
ID : P30 AG066530
Pays : United States
Organisme : NIH HHS
ID : S10 OD026880
Pays : United States

Informations de copyright

© 2021. The Author(s).

Références

BioData Min. 2021 Feb 4;14(1):13
pubmed: 33541410
Genome Med. 2019 Oct 22;11(1):64
pubmed: 31640730
Genome Res. 2007 Feb;17(2):127-35
pubmed: 17272647
Nat Genet. 2016 Oct;48(10):1284-1287
pubmed: 27571263
Biochim Biophys Acta. 1975 Oct 20;405(2):442-51
pubmed: 1180967
Bioinformatics. 2012 Oct 1;28(19):2520-2
pubmed: 22908215
Alzheimers Dement. 2018 Apr;14(4):514-519
pubmed: 29306584
J Genet Genomics. 2012 Oct 20;39(10):545-50
pubmed: 23089364
PLoS Genet. 2009 Jun;5(6):e1000529
pubmed: 19543373
Bioinformatics. 2012 Jun 15;28(12):1647-9
pubmed: 22543367
Annu Rev Genet. 2007;41:539-64
pubmed: 18076332
Alzheimers Dement. 2010 May;6(3):265-73
pubmed: 20451875
Nat Genet. 2016 Oct;48(10):1279-83
pubmed: 27548312
Proc Natl Acad Sci U S A. 1994 Sep 13;91(19):8739-46
pubmed: 8090716
Mol Biol Evol. 2013 Apr;30(4):772-80
pubmed: 23329690
Hum Mol Genet. 2016 Aug 1;25(15):3245-3254
pubmed: 27346520
Am J Hum Genet. 2009 Feb;84(2):210-23
pubmed: 19200528
Bioinformatics. 2011 Nov 1;27(21):2987-93
pubmed: 21903627
Genetics. 1996 Dec;144(4):1835-50
pubmed: 8978068
Nucleic Acids Res. 2016 Jul 8;44(W1):W58-63
pubmed: 27084951
Mol Biol Evol. 2007 Nov;24(11):2433-42
pubmed: 17709332
NAR Genom Bioinform. 2020 Apr 14;2(2):lqaa024
pubmed: 33575581
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Sci Adv. 2021 Mar 17;7(12):
pubmed: 33731350
Am J Hum Genet. 2007 Sep;81(3):559-75
pubmed: 17701901
PLoS Genet. 2014 May 22;10(5):e1004369
pubmed: 24852434
PeerJ. 2018 Jun 25;6:e5149
pubmed: 29967758
PLoS One. 2018 Jan 25;13(1):e0191153
pubmed: 29370225
Cell. 2019 Mar 21;177(1):26-31
pubmed: 30901543
Nat Genet. 2021 Jul;53(7):982-993
pubmed: 34002094
Front Genet. 2019 Apr 03;10:239
pubmed: 31001313
Nat Genet. 1999 Oct;23(2):147
pubmed: 10508508
Syst Biol. 2009 Feb;58(1):150-8
pubmed: 20525575
Nat Rev Dis Primers. 2016 Oct 20;2:16080
pubmed: 27775730
Nat Commun. 2015 Sep 14;6:8111
pubmed: 26368830
Eur J Hum Genet. 2017 Jun;25(7):869-876
pubmed: 28401899
Curr Protoc Bioinformatics. 2013 Dec;44:1.23.1-26
pubmed: 25489354

Auteurs

Tim W McInerney (TW)

John Curtin School of Medical Research, Australian National University, Australian Capital Territory, Canberra, Australia.

Brian Fulton-Howard (B)

Genetics and Genomic Sciences, Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY, 10029, USA.

Christopher Patterson (C)

Keck School of Medicine, Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA.
Department of Neurology, Alzheimer's Disease Research Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.

Devashi Paliwal (D)

John Curtin School of Medical Research, Australian National University, Australian Capital Territory, Canberra, Australia.

Lars S Jermiin (LS)

CSIRO Land and Water, Commonwealth Scientific Industrial and Research Organization, Acton, ACT, 2601, Australia.
Research School of Biology, Australian National University, Canberra, ACT, 2601, Australia.
School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland.
Earth Institute, University College Dublin, Belfield, Dublin 4, Ireland.

Hardip R Patel (HR)

John Curtin School of Medical Research, Australian National University, Australian Capital Territory, Canberra, Australia.

Judy Pa (J)

Keck School of Medicine, Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA.
Department of Neurology, Alzheimer's Disease Research Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.

Russell H Swerdlow (RH)

Department of Neurology, Alzheimer's Disease Center, University of Kansas, Fairway, KS, USA.

Alison Goate (A)

Genetics and Genomic Sciences, Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY, 10029, USA.

Simon Easteal (S)

John Curtin School of Medical Research, Australian National University, Australian Capital Territory, Canberra, Australia.

Shea J Andrews (SJ)

Genetics and Genomic Sciences, Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY, 10029, USA. shea.andrews@mssm.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH