A globally diverse reference alignment and panel for imputation of mitochondrial DNA variants.
Imputation
Mitochondrial DNA
Reference panel
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
01 Sep 2021
01 Sep 2021
Historique:
received:
22
01
2021
accepted:
16
08
2021
entrez:
2
9
2021
pubmed:
3
9
2021
medline:
4
9
2021
Statut:
epublish
Résumé
Variation in mitochondrial DNA (mtDNA) identified by genotyping microarrays or by sequencing only the hypervariable regions of the genome may be insufficient to reliably assign mitochondrial genomes to phylogenetic lineages or haplogroups. This lack of resolution can limit functional and clinical interpretation of a substantial body of existing mtDNA data. To address this limitation, we developed and evaluated a large, curated reference alignment of complete mtDNA sequences as part of a pipeline for imputing missing mtDNA single nucleotide variants (mtSNVs). We call our reference alignment and pipeline MitoImpute. We aligned the sequences of 36,960 complete human mitochondrial genomes downloaded from GenBank, filtered and controlled for quality. These sequences were reformatted for use in imputation software, IMPUTE2. We assessed the imputation accuracy of MitoImpute by measuring haplogroup and genotype concordance in data from the 1000 Genomes Project and the Alzheimer's Disease Neuroimaging Initiative (ADNI). The mean improvement of haplogroup assignment in the 1000 Genomes samples was 42.7% (Matthew's correlation coefficient = 0.64). In the ADNI cohort, we imputed missing single nucleotide variants. These results show that our reference alignment and panel can be used to impute missing mtSNVs in existing data obtained from using microarrays, thereby broadening the scope of functional and clinical investigation of mtDNA. This improvement may be particularly useful in studies where participants have been recruited over time and mtDNA data obtained using different methods, enabling better integration of early data collected using less accurate methods with more recent sequence data.
Sections du résumé
BACKGROUND
BACKGROUND
Variation in mitochondrial DNA (mtDNA) identified by genotyping microarrays or by sequencing only the hypervariable regions of the genome may be insufficient to reliably assign mitochondrial genomes to phylogenetic lineages or haplogroups. This lack of resolution can limit functional and clinical interpretation of a substantial body of existing mtDNA data. To address this limitation, we developed and evaluated a large, curated reference alignment of complete mtDNA sequences as part of a pipeline for imputing missing mtDNA single nucleotide variants (mtSNVs). We call our reference alignment and pipeline MitoImpute.
RESULTS
RESULTS
We aligned the sequences of 36,960 complete human mitochondrial genomes downloaded from GenBank, filtered and controlled for quality. These sequences were reformatted for use in imputation software, IMPUTE2. We assessed the imputation accuracy of MitoImpute by measuring haplogroup and genotype concordance in data from the 1000 Genomes Project and the Alzheimer's Disease Neuroimaging Initiative (ADNI). The mean improvement of haplogroup assignment in the 1000 Genomes samples was 42.7% (Matthew's correlation coefficient = 0.64). In the ADNI cohort, we imputed missing single nucleotide variants.
CONCLUSION
CONCLUSIONS
These results show that our reference alignment and panel can be used to impute missing mtSNVs in existing data obtained from using microarrays, thereby broadening the scope of functional and clinical investigation of mtDNA. This improvement may be particularly useful in studies where participants have been recruited over time and mtDNA data obtained using different methods, enabling better integration of early data collected using less accurate methods with more recent sequence data.
Identifiants
pubmed: 34470617
doi: 10.1186/s12859-021-04337-8
pii: 10.1186/s12859-021-04337-8
pmc: PMC8409003
doi:
Substances chimiques
DNA, Mitochondrial
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
417Subventions
Organisme : NIA NIH HHS
ID : P01 AG012435
Pays : United States
Organisme : NIA NIH HHS
ID : P30 AG066530
Pays : United States
Organisme : NIH HHS
ID : S10 OD026880
Pays : United States
Informations de copyright
© 2021. The Author(s).
Références
BioData Min. 2021 Feb 4;14(1):13
pubmed: 33541410
Genome Med. 2019 Oct 22;11(1):64
pubmed: 31640730
Genome Res. 2007 Feb;17(2):127-35
pubmed: 17272647
Nat Genet. 2016 Oct;48(10):1284-1287
pubmed: 27571263
Biochim Biophys Acta. 1975 Oct 20;405(2):442-51
pubmed: 1180967
Bioinformatics. 2012 Oct 1;28(19):2520-2
pubmed: 22908215
Alzheimers Dement. 2018 Apr;14(4):514-519
pubmed: 29306584
J Genet Genomics. 2012 Oct 20;39(10):545-50
pubmed: 23089364
PLoS Genet. 2009 Jun;5(6):e1000529
pubmed: 19543373
Bioinformatics. 2012 Jun 15;28(12):1647-9
pubmed: 22543367
Annu Rev Genet. 2007;41:539-64
pubmed: 18076332
Alzheimers Dement. 2010 May;6(3):265-73
pubmed: 20451875
Nat Genet. 2016 Oct;48(10):1279-83
pubmed: 27548312
Proc Natl Acad Sci U S A. 1994 Sep 13;91(19):8739-46
pubmed: 8090716
Mol Biol Evol. 2013 Apr;30(4):772-80
pubmed: 23329690
Hum Mol Genet. 2016 Aug 1;25(15):3245-3254
pubmed: 27346520
Am J Hum Genet. 2009 Feb;84(2):210-23
pubmed: 19200528
Bioinformatics. 2011 Nov 1;27(21):2987-93
pubmed: 21903627
Genetics. 1996 Dec;144(4):1835-50
pubmed: 8978068
Nucleic Acids Res. 2016 Jul 8;44(W1):W58-63
pubmed: 27084951
Mol Biol Evol. 2007 Nov;24(11):2433-42
pubmed: 17709332
NAR Genom Bioinform. 2020 Apr 14;2(2):lqaa024
pubmed: 33575581
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Sci Adv. 2021 Mar 17;7(12):
pubmed: 33731350
Am J Hum Genet. 2007 Sep;81(3):559-75
pubmed: 17701901
PLoS Genet. 2014 May 22;10(5):e1004369
pubmed: 24852434
PeerJ. 2018 Jun 25;6:e5149
pubmed: 29967758
PLoS One. 2018 Jan 25;13(1):e0191153
pubmed: 29370225
Cell. 2019 Mar 21;177(1):26-31
pubmed: 30901543
Nat Genet. 2021 Jul;53(7):982-993
pubmed: 34002094
Front Genet. 2019 Apr 03;10:239
pubmed: 31001313
Nat Genet. 1999 Oct;23(2):147
pubmed: 10508508
Syst Biol. 2009 Feb;58(1):150-8
pubmed: 20525575
Nat Rev Dis Primers. 2016 Oct 20;2:16080
pubmed: 27775730
Nat Commun. 2015 Sep 14;6:8111
pubmed: 26368830
Eur J Hum Genet. 2017 Jun;25(7):869-876
pubmed: 28401899
Curr Protoc Bioinformatics. 2013 Dec;44:1.23.1-26
pubmed: 25489354