An autoencoder-based deep learning method for genotype imputation.

GWAS autoencoder deep learning genotype imputation paired sample t-test

Journal

Frontiers in artificial intelligence
ISSN: 2624-8212
Titre abrégé: Front Artif Intell
Pays: Switzerland
ID NLM: 101770551

Informations de publication

Date de publication:
2022
Historique:
received: 26 08 2022
accepted: 29 09 2022
entrez: 21 11 2022
pubmed: 22 11 2022
medline: 22 11 2022
Statut: epublish

Résumé

Genotype imputation has a wide range of applications in genome-wide association study (GWAS), including increasing the statistical power of association tests, discovering trait-associated loci in meta-analyses, and prioritizing causal variants with fine-mapping. In recent years, deep learning (DL) based methods, such as sparse convolutional denoising autoencoder (SCDA), have been developed for genotype imputation. However, it remains a challenging task to optimize the learning process in DL-based methods to achieve high imputation accuracy. To address this challenge, we have developed a convolutional autoencoder (AE) model for genotype imputation and implemented a customized training loop by modifying the training process with a single batch loss rather than the average loss over batches. This modified AE imputation model was evaluated using a yeast dataset, the human leukocyte antigen (HLA) data from the 1,000 Genomes Project (1KGP), and our in-house genotype data from the Louisiana Osteoporosis Study (LOS). Our modified AE imputation model has achieved comparable or better performance than the existing SCDA model in terms of evaluation metrics such as the concordance rate (CR), the Hellinger score, the scaled Euclidean norm (SEN) score, and the imputation quality score (IQS) in all three datasets. Taking the imputation results from the HLA data as an example, the AE model achieved an average CR of 0.9468 and 0.9459, Hellinger score of 0.9765 and 0.9518, SEN score of 0.9977 and 0.9953, and IQS of 0.9515 and 0.9044 at missing ratios of 10% and 20%, respectively. As for the results of LOS data, it achieved an average CR of 0.9005, Hellinger score of 0.9384, SEN score of 0.9940, and IQS of 0.8681 at the missing ratio of 20%. In summary, our proposed method for genotype imputation has a great potential to increase the statistical power of GWAS and improve downstream post-GWAS analyses.

Identifiants

pubmed: 36406474
doi: 10.3389/frai.2022.1028978
pmc: PMC9671213
doi:

Banques de données

figshare
['10.6084/m9.figshare.21441078']

Types de publication

Journal Article

Langues

eng

Pagination

1028978

Subventions

Organisme : NIGMS NIH HHS
ID : P20 GM109036
Pays : United States
Organisme : NIA NIH HHS
ID : R01 AG061917
Pays : United States
Organisme : NIAMS NIH HHS
ID : R01 AR069055
Pays : United States
Organisme : NIA NIH HHS
ID : U19 AG055373
Pays : United States

Informations de copyright

Copyright © 2022 Song, Greenbaum, Luttrell, Zhou, Wu, Luo, Qiu, Zhao, Su, Tian, Shen, Hong, Gong, Shi, Deng and Zhang.

Déclaration de conflit d'intérêts

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Références

PLoS One. 2010 Mar 15;5(3):e9697
pubmed: 20300623
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Annu Rev Genomics Hum Genet. 2018 Aug 31;19:73-96
pubmed: 29799802
Science. 2007 Jun 1;316(5829):1341-5
pubmed: 17463248
Nat Genet. 2021 Jul;53(7):1104-1111
pubmed: 34083788
Brief Funct Genomics. 2017 May 1;16(3):163-170
pubmed: 27436001
Am J Hum Genet. 2006 Apr;78(4):629-44
pubmed: 16532393
Nat Commun. 2015 Nov 05;6:8712
pubmed: 26537231
PLoS Genet. 2020 Nov 16;16(11):e1009049
pubmed: 33196638
Hum Mol Genet. 2022 Mar 31;31(7):1067-1081
pubmed: 34673960
Bioinformatics. 2015 Mar 1;31(5):782-4
pubmed: 25338720
Bioinformatics. 2011 Aug 1;27(15):2156-8
pubmed: 21653522
BMC Res Notes. 2014 Dec 11;7:901
pubmed: 25495213
Genes (Basel). 2019 Aug 28;10(9):
pubmed: 31466333
Am J Hum Genet. 2007 Sep;81(3):559-75
pubmed: 17701901
Methods Mol Biol. 2021;2243:271-281
pubmed: 33606262
BMC Genet. 2014 Aug 12;15:88
pubmed: 25112433
Am J Hum Genet. 2018 Sep 6;103(3):338-348
pubmed: 30100085
Genome Res. 2010 Sep;20(9):1297-303
pubmed: 20644199
Nat Genet. 2016 Oct;48(10):1284-1287
pubmed: 27571263
Front Genet. 2021 Sep 22;12:724037
pubmed: 34630519
Front Genet. 2020 Oct 15;11:570255
pubmed: 33193667
Genet Epidemiol. 2010 Dec;34(8):816-34
pubmed: 21058334
Nat Commun. 2021 Mar 12;12(1):1639
pubmed: 33712626
Gigascience. 2021 Feb 16;10(2):
pubmed: 33590861
Nature. 2007 Jun 7;447(7145):661-78
pubmed: 17554300
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Heliyon. 2020 Feb 14;6(2):e03395
pubmed: 32090183
Genetics. 2017 May;206(1):91-104
pubmed: 28348060
Nat Genet. 2021 Jan;53(1):120-126
pubmed: 33414550
Science. 2022 Apr;376(6588):eabl3533
pubmed: 35357935
Mamm Genome. 2022 Mar;33(1):213-229
pubmed: 34498136

Auteurs

Meng Song (M)

School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States.

Jonathan Greenbaum (J)

Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States.

Joseph Luttrell (J)

School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States.

Weihua Zhou (W)

College of Computing, Michigan Technological University, Houghton, MI, United States.

Chong Wu (C)

Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States.

Zhe Luo (Z)

Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States.

Chuan Qiu (C)

Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States.

Lan Juan Zhao (LJ)

Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States.

Kuan-Jui Su (KJ)

Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States.

Qing Tian (Q)

Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States.

Hui Shen (H)

Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States.

Huixiao Hong (H)

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States.

Ping Gong (P)

Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, United States.

Xinghua Shi (X)

Department of Computer & Information Sciences, Temple University, Philadelphia, PA, United States.

Hong-Wen Deng (HW)

Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States.

Chaoyang Zhang (C)

School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States.

Classifications MeSH