A deep learning approach to prediction of blood group antigens from genomic data.
AI
Illumina GSA
blood antigen
blood types
convolutional neural network
deep learning
denoising autoencoder
genetic prediction
Journal
Transfusion
ISSN: 1537-2995
Titre abrégé: Transfusion
Pays: United States
ID NLM: 0417360
Informations de publication
Date de publication:
13 Sep 2024
13 Sep 2024
Historique:
revised:
17
07
2024
received:
17
11
2023
accepted:
27
08
2024
medline:
15
9
2024
pubmed:
15
9
2024
entrez:
13
9
2024
Statut:
aheadofprint
Résumé
Deep learning methods are revolutionizing natural science. In this study, we aim to apply such techniques to develop blood type prediction models based on cheap to analyze and easily scalable screening array genotyping platforms. Combining existing blood types from blood banks and imputed screening array genotypes for ~111,000 Danish and 1168 Finnish blood donors, we used deep learning techniques to train and validate blood type prediction models for 36 antigens in 15 blood group systems. To account for missing genotypes a denoising autoencoder initial step was utilized, followed by a convolutional neural network blood type classifier. Two thirds of the trained blood type prediction models demonstrated an F1-accuracy above 99%. Models for antigens with low or high frequencies like, for example, C High accuracy in a variety of blood groups proves viability of deep learning-based blood type prediction using array chip genotypes, even in blood groups with nontrivial genetic underpinnings. These techniques are suitable for aiding in identifying blood donors with rare blood types by greatly narrowing down the potential pool of candidate donors before clinical grade confirmation.
Sections du résumé
BACKGROUND
BACKGROUND
Deep learning methods are revolutionizing natural science. In this study, we aim to apply such techniques to develop blood type prediction models based on cheap to analyze and easily scalable screening array genotyping platforms.
METHODS
METHODS
Combining existing blood types from blood banks and imputed screening array genotypes for ~111,000 Danish and 1168 Finnish blood donors, we used deep learning techniques to train and validate blood type prediction models for 36 antigens in 15 blood group systems. To account for missing genotypes a denoising autoencoder initial step was utilized, followed by a convolutional neural network blood type classifier.
RESULTS
RESULTS
Two thirds of the trained blood type prediction models demonstrated an F1-accuracy above 99%. Models for antigens with low or high frequencies like, for example, C
DISCUSSION
CONCLUSIONS
High accuracy in a variety of blood groups proves viability of deep learning-based blood type prediction using array chip genotypes, even in blood groups with nontrivial genetic underpinnings. These techniques are suitable for aiding in identifying blood donors with rare blood types by greatly narrowing down the potential pool of candidate donors before clinical grade confirmation.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Bloddonornes forskningsfond
ID : 64931
Organisme : A.P Møller Fonden
ID : 45000
Organisme : Novo Nordisk Foundation
ID : NNF17OC0027594
Organisme : Novo Nordisk Foundation
ID : NNF14CC0001
Informations de copyright
© 2024 The Author(s). Transfusion published by Wiley Periodicals LLC on behalf of AABB.
Références
ISBT. Red cell immunogenetics and blood group terminology. ISBT Working Party. 2023 Available from: https://www.isbtweb.org/isbt-working-parties/rcibgt.html
Gandhi MJ, Strong DM, Whitaker BI, Petrisli E. A brief overview of clinical significance of blood group antibodies. Immunohematology. 2018;33:4–6.
Pirenne F. Prevention of delayed hemolytic transfusion reaction. Transfus Clin Biol. 2019;26:99–101.
Wahl S, Quirolo KC. Current issues in blood transfusion for sickle cell disease. Curr Opin Pediatr. 2009;21:15–21.
Ewald DR, Sumner SCJ. Blood type biochemistry and human disease. Wiley Interdiscip Rev Syst Biol Med. 2016;8:517–535.
Dahlén T, Clements M, Zhao J, Olsson ML, Edgren G. An agnostic study of associations between ABO and RhD blood group and phenome‐wide disease risk. Elife. 2021;10:e65658.
Moslemi C, Sækmose S, Larsen R, Brodersen T, Didriksen M, Hjalgrim H, et al. A large cohort study of the effects of Lewis, ABO, 13 other blood groups, and secretor status on COVID‐19 susceptibility, severity, and long COVID‐19. Transfusion. 2023;63:47–58.
Reid ME, Lomas‐Francis C, Olsson ML. The blood group antigen FactsBook. Cambridge: Elsevier Ltd; 2012.
Moslemi C, Sækmose SG, Larsen R, Bay JT, Brodersen T, Didriksen M, et al. Genetic prediction of 33 blood group phenotypes using an existing genotype dataset. Transfusion. 2023;63:2297–2310.
Raud L, Férec C, Fichou Y. From genetic variability to phenotypic expression of blood group systems. Transfus Clin Biol. 2017;24:472–475.
Eraslan G, Avsec Ž, Gagneur J, Theis FJ. Deep learning: new computational modelling techniques for genomics. Nat Rev Genet. 2019;20:389–403.
Hansen TF, Banasik K, Erikstrup C, Pedersen OB, Westergaard D, Chmura PJ, et al. DBDS Genomic Cohort, a prospective and comprehensive resource for integrative and temporal analysis of genetic, environmental and lifestyle factors affecting health of blood donors. BMJ Open. 2019;9:e028401.
Large‐scale whole‐genome sequencing of the Icelandic population. Nature Genetics. 2015 Available from: https://www.nature.com/articles/ng.3247
FinnGen research project is an expedition to the frontier of genomics and medicine. Helsinki: FinnGen; 2023 Available from: https://www.finngen.fi/en
Li Y, Willer C, Sanna S, Abecasis G. Genotype imputation. Annu Rev Genomics Hum Genet. 2009;10:387–406.
Lin P, Hartz SM, Zhang Z, Saccone SF, Wang J, Tischfield JA, et al. A new statistic to evaluate imputation reliability. PLoS One. 2010;5:e9697.
FinnGen. Genotype imputation. 2023 Available from: https://finngen.gitbook.io/documentation/v/r4/methods/genotype‐imputation
Grifols, © 2020 & worldwide, S. A. A. rights reserved. BLOODchip ID | Multiplex Blood Group Genotyping. Grifols. Diagnostic. Available from: https://www.diagnostic.grifols.com/en/bloodchip-id/overview
Emmanuel T, Maupong T, Mpoeleng D, Semong T, Mphago B, Tabona O. A survey on missing data in machine learning. J Big Data. 2021;8:140.
Vincent P, Larochelle H, Bengio Y, Manzagol P‐A. Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th international conference on machine learning. New York, NY, USA: Association for Computing Machinery; 2008. p. 1096–1103. https://doi.org/10.1145/1390156.1390294
O'Shea K, Nash R. An introduction to convolutional neural networks. 2015. https://doi.org/10.48550/arXiv.1511.08458
Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60:84–90.
Mustaqeem, Kwon S. A CNN‐assisted enhanced audio signal processing for speech emotion recognition. Sensors (Basel). 2019;20:E183.
Chen G, Zhang X, Zhang J, Li F, Duan S. A novel brain‐computer interface based on audio‐assisted visual evoked EEG and spatial‐temporal attention CNN. Front Neurorobot. 2022;16:995552.
Xie J, Aubert X, Long X, van Dijk J, Arsenali B, Fonseca P, et al. Audio‐based snore detection using deep neural networks. Comput Methods Programs Biomed. 2021;200:105917.
Hajarolasvadi N, Demirel H. 3D CNN‐based speech emotion recognition using K‐means clustering and spectrograms. Entropy (Basel). 2019;21:E479.
Kim Y. Convolutional neural networks for sentence classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics; 2014. p. 1746–1751. https://doi.org/10.3115/v1/D14-1181
Hu S, Teng F, Huang L, Yan J, Zhang H. An explainable CNN approach for medical codes prediction from clinical text. BMC Med Inform Decis Mak. 2021;21:256.
Naseem U, Razzak I, Musial K, Imran M. Transformer based deep intelligent contextual embedding for Twitter sentiment analysis. Future Gener Comput Syst. 2020;113:58–69.
Chen J‐F, Chen W‐L, Huang C‐P, Huang S‐H, Chen A‐P. Financial time‐series data analysis using deep convolutional neural networks. 2016 7th international conference on cloud computing and big data (CCBD); 2016. p. 87–92. https://doi.org/10.1109/CCBD.2016.027
Hoseinzade E, Haratizadeh S. CNNpred: CNN‐based stock market prediction using a diverse set of variables. Expert Syst Appl. 2019;129:273–285.
Blair DC. Information retrieval, 2nd ed. C.J. Van Rijsbergen. London: Butterworths; 1979: 208 pp. Price: $32.50. J Am Soc Inf Sci. 1979;30:374–375.
Sasaki Y. The truth of the F‐measure. Manchester: Teach Tutor Mater; 2007.
Mouro I, Colin Y, Sistonen P, le Pennec PY, Cartron JP, le van Kim C. Molecular basis of the RhCW (Rh8) and RhCX (Rh9) blood group specificities. Blood. 1995;86:1196–1201.
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language models are few‐shot learners. arXiv. 2020. https://doi.org/10.48550/arXiv.2005.14165
Nelis M, Esko T, Mägi R, Zimprich F, Zimprich A, Toncheva D, et al. Genetic structure of Europeans: a view from the north–east. PLoS One. 2009;4:e5472.
Översti S, Majander K, Salmela E, Salo K, Arppe L, Belskiy S, et al. Human mitochondrial DNA lineages in Iron‐Age Fennoscandia suggest incipient admixture and eastern introduction of farming‐related maternal ancestry. Sci Rep. 2019;9:16883.
Salmela E, Lappalainen T, Liu J, Sistonen P, Andersen PM, Schreiber S, et al. Swedish population substructure revealed by genome‐wide single nucleotide polymorphism data. PLoS One. 2011;6:e16747.
Gorakshakar A, Donta A, Jadhav S, Vasanta K, Ghosh K. Molecular analysis of Bombay phenotype cases seen in India. ISBT Sci Ser. 2015;10:100–105.
Howes RE, Patil AP, Piel FB, Nyangiri OA, Kabaria CW, Gething PW, et al. The global distribution of the Duffy blood group. Nat Commun. 2011;2:266.
Delaney M, Harris S, Haile A, Johnsen J, Teramura G, Nelson K. Red blood cell antigen genotype analysis for 9087 Asian, Asian American, and Native American blood donors. Transfusion. 2015;55:2369–2375.
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589.
Placido D, Yuan B, Hjaltelin JX, Zheng C, Haue AD, Chmura PJ, et al. A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories. Nat Med. 2023;29:1113–1122.
Zhang S, Fan R, Liu Y, Chen S, Liu Q, Zeng W. Applications of transformer‐based language models in bioinformatics: a survey. Bioinform Adv. 2023;3:vbad001.
He Y, Shen Z, Zhang Q, Wang S, Huang D‐S. A survey on deep learning in DNA/RNA motif mining. Brief Bioinform. 2021;22:bbaa229.
Ching T, Himmelstein DS, Beaulieu‐Jones BK, Kalinin AA, do BT, Way GP, et al. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface. 2018;15:20170387.
kamiboy/DeepPhenoat: AI based genetic blood type predictor/trainer. GitHub. Available from: https://github.com/kamiboy/DeepPhenoat
Blood Service. Available from: https://www.bloodservice.fi:443/Research%20Projects/biobanking/for-researchers
Weng L. From Autoencoder to Beta‐VAE. 2018 Available from: https://lilianweng.github.io/posts/2018-08-12-vae/
Phung VH, Rhee EJ. A deep learning approach for classification of cloud image patches on small datasets. J Inf Commun Converg Eng. 2018;16:173–178.
Park S, Gil M‐S, Im H, Moon Y‐S. Measurement noise recommendation for efficient Kalman filtering over a large amount of sensor data. Sensors (Basel). 2019;19(5):1168.