Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs).
Journal
Human genetics
ISSN: 1432-1203
Titre abrégé: Hum Genet
Pays: Germany
ID NLM: 7613873
Informations de publication
Date de publication:
Dec 2023
Dec 2023
Historique:
received:
28
07
2023
accepted:
10
10
2023
medline:
27
11
2023
pubmed:
27
10
2023
entrez:
27
10
2023
Statut:
ppublish
Résumé
Episignatures are popular tools for the diagnosis of rare neurodevelopmental disorders. They are commonly based on a set of differentially methylated CpGs used in combination with a support vector machine model. DNA methylation (DNAm) data often include missing values due to changes in data generation technology and batch effects. While many normalization methods exist for DNAm data, their impact on episignature performance have never been assessed. In addition, technologies to quantify DNAm evolve quickly and this may lead to poor transposition of existing episignatures generated on deprecated array versions to new ones. Indeed, probe removal between array versions, technologies or during preprocessing leads to missing values. Thus, the effect of missing data on episignature performance must also be carefully evaluated and addressed through imputation or an innovative approach to episignatures design. In this paper, we used data from patients suffering from Kabuki and Sotos syndrome to evaluate the influence of normalization methods, classification models and missing data on the prediction performances of two existing episignatures. We compare how six popular normalization methods for methylarray data affect episignature classification performances in Kabuki and Sotos syndromes and provide best practice suggestions when building new episignatures. In this setting, we show that Illumina, Noob or Funnorm normalization methods achieved higher classification performances on the testing sets compared to Quantile, Raw and Swan normalization methods. We further show that penalized logistic regression and support vector machines perform best in the classification of Kabuki and Sotos syndrome patients. Then, we describe a new paradigm to build episignatures based on the detection of differentially methylated regions (DMRs) and evaluate their performance compared to classical differentially methylated cytosines (DMCs)-based episignatures in the presence of missing data. We show that the performance of classical DMC-based episignatures suffers from the presence of missing data more than the DMR-based approach. We present a comprehensive evaluation of how the normalization of DNA methylation data affects episignature performance, using three popular classification models. We further evaluate how missing data affect those models' predictions. Finally, we propose a novel methodology to develop episignatures based on differentially methylated regions identification and show how this method slightly outperforms classical episignatures in the presence of missing data.
Identifiants
pubmed: 37889307
doi: 10.1007/s00439-023-02609-2
pii: 10.1007/s00439-023-02609-2
pmc: PMC10676303
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
1721-1735Subventions
Organisme : Fonds De La Recherche Scientifique - FNRS
ID : 1.E.013.20F
Organisme : Innoviris Foundation
ID : PFS-11e IgenCare
Informations de copyright
© 2023. The Author(s).
Références
Mol Cell. 2013 Jan 24;49(2):359-367
pubmed: 23177740
BMC Genomics. 2020 Jan 2;21(1):6
pubmed: 31898477
Nat Commun. 2015 Dec 22;6:10207
pubmed: 26690673
Nat Commun. 2018 May 25;9(1):2064
pubmed: 29802345
Epigenetics. 2023 Dec;18(1):2185742
pubmed: 36871255
Epigenomics. 2011 Dec;3(6):771-84
pubmed: 22126295
HGG Adv. 2021 Dec 03;3(1):100075
pubmed: 35047860
Genome Biol. 2014 Dec 03;15(12):503
pubmed: 25599564
Genome Biol. 2012 Jun 15;13(6):R44
pubmed: 22703947
BMC Med. 2009 Oct 22;7:62
pubmed: 19845972
Bioinformatics. 2013 Jan 15;29(2):189-96
pubmed: 23175756
Bioinformatics. 2014 May 15;30(10):1363-9
pubmed: 24478339
Epigenetics. 2019 Dec;14(12):1177-1182
pubmed: 31250700
Brief Bioinform. 2019 Nov 27;20(6):2224-2235
pubmed: 30239597
Epigenetics. 2020 Jun - Jul;15(6-7):594-603
pubmed: 31833794
Clin Epigenetics. 2021 May 26;13(1):119
pubmed: 34039421
Clin Cancer Res. 2016 Dec 15;22(24):6236-6246
pubmed: 27256309
Clin Epigenetics. 2022 Dec 16;14(1):174
pubmed: 36527161
Am J Hum Genet. 2017 May 4;100(5):773-788
pubmed: 28475860
Ageing Res Rev. 2021 Dec;72:101488
pubmed: 34662746
BMC Bioinformatics. 2022 Sep 5;23(1):364
pubmed: 36064314
Nat Protoc. 2009;4(1):44-57
pubmed: 19131956
Am J Hum Genet. 2020 Mar 5;106(3):356-370
pubmed: 32109418
Cell Mol Life Sci. 2017 Nov;74(22):4133-4157
pubmed: 28631008
Epigenomics. 2016 Mar;8(3):389-99
pubmed: 26673039
Epigenetics Chromatin. 2015 Jan 27;8:6
pubmed: 25972926
Hum Mutat. 2020 Oct;41(10):1722-1733
pubmed: 32623772
Genet Med. 2023 Jan;25(1):63-75
pubmed: 36399132
Nucleic Acids Res. 2013 Apr;41(7):e90
pubmed: 23476028
Elife. 2021 Feb 26;10:
pubmed: 33646943
Front Neurosci. 2021 Nov 03;15:776809
pubmed: 34803599
Epigenetics. 2022 Dec;17(13):2241-2258
pubmed: 36047742
Int J Mol Sci. 2022 Jul 20;23(14):
pubmed: 35887345
BMC Med Genomics. 2013 Jan 28;6:1
pubmed: 23356856
Epigenetics. 2022 Dec;17(13):2434-2454
pubmed: 36354000
Bioinformatics. 2017 Feb 15;33(4):558-560
pubmed: 28035024
Am J Med Genet B Neuropsychiatr Genet. 2005 Feb 5;133B(1):37-42
pubmed: 15635661
Int J Epidemiol. 2012 Feb;41(1):200-9
pubmed: 22422453
Am J Hum Genet. 2021 Aug 5;108(8):1359-1366
pubmed: 34297908
Bioinformatics. 2017 Dec 15;33(24):3982-3984
pubmed: 28961746
Epigenomics. 2012 Jun;4(3):325-41
pubmed: 22690668