Optimal population-specific HLA imputation with dimension reduction.
Admixed populations
Dimension reduction
HLA imputation
Immunogenomics
Journal
HLA
ISSN: 2059-2310
Titre abrégé: HLA
Pays: England
ID NLM: 101675570
Informations de publication
Date de publication:
11 Nov 2023
11 Nov 2023
Historique:
revised:
29
08
2023
received:
19
04
2023
accepted:
14
10
2023
medline:
11
11
2023
pubmed:
11
11
2023
entrez:
11
11
2023
Statut:
aheadofprint
Résumé
Human genomics has quickly evolved, powering genome-wide association studies (GWASs). SNP-based GWASs cannot capture the intense polymorphism of HLA genes, highly associated with disease susceptibility. There are methods to statistically impute HLA genotypes from SNP-genotypes data, but lack of diversity in reference panels hinders their performance. We evaluated the accuracy of the 1000 Genomes data as a reference panel for imputing HLA from admixed individuals of African and European ancestries, focusing on (a) the full dataset, (b) 10 replications from 6 populations, and (c) 19 conditions for the custom reference panels. The full dataset outperformed smaller models, with a good F1-score of 0.66 for HLA-B. However, custom models outperformed the multiethnic or population models of similar size (F1-scores up to 0.53, against up to 0.42). We demonstrated the importance of using genetically specific models for imputing populations, which are currently underrepresented in public datasets, opening the door to HLA imputation for every genetic population.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Conseil Régional des Pays de la Loire
Organisme : Conselho Nacional de Desenvolvimento Científico e Tecnológico
Organisme : European Regional Development Fund
Organisme : Fundação de Amparo à Pesquisa do Estado de São Paulo
Organisme : H2020 Marie Skłodowska-Curie Actions
Organisme : Institut National de la Santé et de la Recherche Médicale
Organisme : Université de Nantes
Informations de copyright
© 2023 The Authors. HLA: Immune Response Genetics published by John Wiley & Sons Ltd.
Références
Limou S, Zagury J-F. Immunogenetics: genome-wide Association of non-Progressive HIV and viral load control: HLA genes and beyond. Front Immunol. 2013;4:1-13.
Fellay J, Shianna KV, Ge D, et al. A whole-genome association study of major determinants for host control of HIV-1. Science. 2007;317:944-947.
International Multiple Sclerosis Genetics Consortium. Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility. Science. 2019;365(6460):eaav7188.
Pairo-Castineira E et al. Genetic mechanisms of critical illness in COVID-19. Nature. 2021;591:92-98.
COVID-19 host genetics initiative. Mapping the human genetic architecture of COVID-19. Nature. 2021;600:472-477.
Douillard V, Castelli EC, Mack SJ, et al. Current HLA investigations on SARS-CoV-2 and perspectives. Front Genet. 2021;12:774922.
Castelli EC, de Castro MV, Naslavsky MS, et al. MUC22, HLA-A, and HLA-DOB variants and COVID-19 in resilient super-agers from Brazil. Front Immunol. 2022;13:975918.
Klein RJ, Zeiss C, Chew EY, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385-389.
Duerr RH, Taylor KD, Brant SR, et al. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science. 2006;314:1461-1463.
Bycroft C, Freeman C, Petkova D, et al. The UK biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203-209.
Hirata M, Kamatani Y, Nagai A, et al. Cross-sectional analysis of BioBank Japan clinical data: a large cohort of 200,000 patients with 47 common diseases. J Epidemiol. 2017;27:S9-S21.
Taliun D, Harris DN, Kessler MD, et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed program. Nature. 2021;590:290-299.
Visscher PM, Wray NR, Zhang Q, et al. 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet. 2017;101:5-22.
Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet. 2019;20:467-484.
Claussnitzer M, Cho JH, Collins R, et al. A brief history of human disease genetics. Nature. 2020;577:179-189.
Browning BL, Zhou Y, Browning SR. A one-penny imputed genome from next-generation reference panels. Am J Hum Genet. 2018;103:338-348.
McCarthy S, das S, Kretzschmar W, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet. 2016;48:1279-1283.
Dausset J. Iso-leuko-antibodies. Acta Haematol. 1958;20:156-166.
Dausset J. The major histocompatibility complex in man. Science. 1981;213:1469-1474.
MacArthur J, Bowler E, Cerezo M, et al. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nucleic Acids Res. 2017;45:D896-D901.
Concannon P, Chen WM, Julier C, et al. Genome-wide scan for linkage to type 1 diabetes in 2,496 multiplex families from the type 1 diabetes genetics consortium. Diabetes. 2009;58:1018-1022.
Nalls MA, Blauwendraat C, Vallerga CL, et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson's disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 2019;18:1091-1102.
Limou S, le Clerc S, Coulonges C, et al. Genomewide association study of an AIDS-nonprogression cohort emphasizes the role played by HLA genes (ANRS Genomewide association study 02). J Infect Dis. 2009;199:419-426.
Hu Z, Liu Y, Zhai X, et al. New loci associated with chronic hepatitis B virus infection in Han Chinese. Nat Genet. 2013;45:1499-1503.
Vergara C, Thio CL, Johnson E, et al. Multi-ancestry genome-wide association study of spontaneous clearance of hepatitis C virus. Gastroenterology. 2019;156:1496-1507.e7.
Moutsianas L, Jostins L, Beecham AH, et al. Class II HLA interactions modulate genetic risk for multiple sclerosis. Nat Genet. 2015;47:1107-1113.
Vince N, Limou S, Daya M, et al. Association of HLA-DRB1*09:01 with tIgE levels among African-ancestry individuals with asthma. J Allergy Clin Immunol. 2020;146:147-155.
Valencia A, Vergara C, Thio CL, et al. Trans-ancestral fine-mapping of MHC reveals key amino acids associated with spontaneous clearance of hepatitis C in HLA-DQβ1. Am J Hum Genet. 2022;109:299-310.
Domenighetti C, Douillard V, Sugier PE, et al. The interaction between HLA-DRB1 and smoking in Parkinson's disease revisited. Mov Disord. 2022;37:1929-1937. doi:10.1002/mds.29133
Robinson J, Barker DJ, Georgiou X, Cooper MA, Flicek P, Marsh SGE. IPD-IMGT/HLA Database. Nucleic Acids Res. 2020;48:D948-D955.
Maiers M, Gragert L, Klitz W. High-resolution HLA alleles and haplotypes in the United States population. Hum Immunol. 2007;68:779-788.
Meyer D, Nunes K. HLA imputation, what is it good for? Hum Immunol. 2017;78:239-241.
Douillard V, Castelli EC, Mack SJ, et al. Approaching genetics through the MHC lens: tools and methods for HLA research. Front Genet. 2021;12:774916.
Zheng X, Shen J, Cox C, et al. HIBAG-HLA genotype imputation with attribute bagging. Pharmacogenomics J. 2014;14:192-200.
Jia X, Han B, Onengut-Gumuscu S, et al. Imputing amino acid polymorphisms in human leukocyte antigens. PloS One. 2013;8:e64683.
Pappas DJ, Tomich A, Garnier F, Marry E, Gourraud P-A. Comparison of high-resolution human leukocyte antigen haplotype frequencies in different ethnic groups: consequences of sampling fluctuation and haplotype frequency distribution tail truncation. Hum Immunol. 2015;76:374-380.
Motyer A et al. Practical use of methods for imputation of HLA alleles from SNP genotype data. 2016. doi:10.1101/091009
Cook S, Choi W, Lim H, et al. Accurate imputation of human leukocyte antigens with CookHLA. Nat Commun. 2021;12:1264.
Naito T, Suzuki K, Hirata J, et al. A deep learning method for HLA imputation and trans-ethnic MHC fine-mapping of type 1 diabetes. Nat Commun. 2021;12:1639.
Okada Y, Momozawa Y, Ashikawa K, et al. Construction of a population-specific HLA imputation reference panel and its application to Graves' disease risk in Japanese. Nat Genet. 2015;47:798-802.
Ritari J, Hyvärinen K, Clancy J, FinnGen, Partanen J, Koskela S. Increasing accuracy of HLA imputation by a population-specific reference panel in a FinnGen biobank cohort. NAR Genom Bioinform. 2020;2:lqaa030.
Nordin J, Ameur A, Lindblad-Toh K, Gyllensten U, Meadows JRS. SweHLA: the high confidence HLA typing bio-resource drawn from 1000 Swedish genomes. Eur J Hum Genet. 2020;28:627-635.
Degenhardt F, Wendorff M, Wittig M, et al. Construction and benchmarking of a multi-ethnic reference panel for the imputation of HLA class I and II alleles. Hum Mol Genet. 2019;28:2078-2092.
Luo Y, Kanai M, Choi W, et al. A high-resolution HLA reference panel capturing global population diversity enables multi-ancestry fine-mapping in HIV host response. Nat Genet. 2021;53:1504-1516.
Vince N, Douillard V, Geffard E, et al. SNP-HLA reference consortium (SHLARC): HLA and SNP data sharing for promoting MHC-centric analyses in genomics. Genet Epidemiol. 2020;44:733-740.
1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature. 2015;526(7571):68-74.
Byrska-Bishop M, Evani US, Zhao X, et al. High-coverage whole-genome sequencing of the expanded 1000 genomes project cohort including 602 trios. Cell. 2022;185:3426-3440.e19.
Clarke L, Fairley S, Zheng-Bradley X, et al. The international genome sample resource (IGSR): a worldwide collection of genome variation incorporating the 1000 genomes project data. Nucleic Acids Res. 2017;45:D854-D859.
Naslavsky MS, Scliar MO, Yamamoto GL, et al. Whole-genome sequencing of 1,171 elderly admixed individuals from São Paulo. Brazil Nat Commun. 2022;13:1004.
Kals M et al. Advantages of genotype imputation with ethnically matched reference panel for rare variant association analyses. doi:10.1101/579201
Herzig AF et al. Can imputation in a European country be improved by local reference panels? The example of France. doi:10.1101/2022.02.17.480829
Diaz-Papkovich A, Anderson-Trocmé L, Gravel S. A review of UMAP in population genetics. J Hum Genet. 2021;66:85-91.
Sakaue S, Hirata J, Kanai M, et al. Dimensionality reduction reveals fine-scale structure in the Japanese population with consequences for polygenic risk prediction. Nat Commun. 2020;11:1569.
Dai CL, Vazifeh MM, Yeang CH, et al. Population histories of the United States revealed through fine-scale migration and haplotype analysis. Am J Hum Genet. 2020;106:371-388.
Maróstica AS, Nunes K, Castelli EC, et al. How HLA diversity is apportioned: influence of selection and relevance to transplantation. Philos Trans R Soc Lond B Biol Sci. 2022;377:20200420.
Lewontin RC. The apportionment of human diversity. In: Dobzhansky T, Hecht MK, Steere WC, eds. Evolutionary Biology. Springer US; 1972:381-398. doi:10.1007/978-1-4684-9063-3_14
Mimori T, Yasuda J, Kuroki Y, et al. Construction of full-length Japanese reference panel of class I HLA genes with single-molecule, real-time sequencing. Pharmacogenomics J. 2019;19:136-146.
Zhou F, Cao H, Zuo X, et al. Deep sequencing of the MHC region in the Chinese population contributes to studies of complex disease. Nat Genet. 2016;48:740-746.
Nunes K, Zheng X, Torres M, et al. HLA imputation in an admixed population: an assessment of the 1000 genomes data as a training set. Hum Immunol. 2016;77:307-312.
Huang Y-H, Khor SS, Zheng X, et al. A high-resolution HLA imputation system for the Taiwanese population: a study of the Taiwan biobank. Pharmacogenomics J. 2020;20:695-704. doi:10.1038/s41397-020-0156-3
Severe Covid-19 GWAS Group et al. Genomewide association study of severe Covid-19 with respiratory failure. N Engl J Med. 2020;383:1522-1534.
Dekeyser T, Génin E, Herzig AF. Opening the black box of imputation software to study the impact of reference panel composition on performance. Genes (Basel). 2023;14:410.
Abi-Rached L, Gouret P, Yeh JH, et al. Immune diversity sheds light on missing variation in worldwide genetic diversity panels. PloS One. 2018;13:e0206512.
Castelli EC, Paz MA, Souza AS, Ramalho J, Mendes-Junior CT. Hla-mapper: an application to optimize the mapping of HLA sequences produced by massively parallel sequencing procedures. Hum Immunol. 2018;79:678-684.
Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; 2022.
McInnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. 2018. doi:10.48550/ARXIV.1802.03426
Becht E, McInnes L, Healy J, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2018;37:38-44. doi:10.1038/nbt.4314