EnsembleCNV: an ensemble machine learning algorithm to identify and genotype copy number variation using SNP array data.
Journal
Nucleic acids research
ISSN: 1362-4962
Titre abrégé: Nucleic Acids Res
Pays: England
ID NLM: 0411011
Informations de publication
Date de publication:
23 04 2019
23 04 2019
Historique:
accepted:
25
01
2019
revised:
17
12
2018
received:
04
10
2018
pubmed:
6
2
2019
medline:
29
10
2019
entrez:
6
2
2019
Statut:
ppublish
Résumé
The associations between diseases/traits and copy number variants (CNVs) have not been systematically investigated in genome-wide association studies (GWASs), primarily due to a lack of robust and accurate tools for CNV genotyping. Herein, we propose a novel ensemble learning framework, ensembleCNV, to detect and genotype CNVs using single nucleotide polymorphism (SNP) array data. EnsembleCNV (a) identifies and eliminates batch effects at raw data level; (b) assembles individual CNV calls into CNV regions (CNVRs) from multiple existing callers with complementary strengths by a heuristic algorithm; (c) re-genotypes each CNVR with local likelihood model adjusted by global information across multiple CNVRs; (d) refines CNVR boundaries by local correlation structure in copy number intensities; (e) provides direct CNV genotyping accompanied with confidence score, directly accessible for downstream quality control and association analysis. Benchmarked on two large datasets, ensembleCNV outperformed competing methods and achieved a high call rate (93.3%) and reproducibility (98.6%), while concurrently achieving high sensitivity by capturing 85% of common CNVs documented in the 1000 Genomes Project. Given this CNV call rate and accuracy, which are comparable to SNP genotyping, we suggest ensembleCNV holds significant promise for performing genome-wide CNV association studies and investigating how CNVs predispose to human diseases.
Identifiants
pubmed: 30722045
pii: 5306576
doi: 10.1093/nar/gkz068
pmc: PMC6468244
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e39Informations de copyright
© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
Références
Nat Genet. 2011 May;43(5):491-8
pubmed: 21478889
Annu Rev Genet. 2011;45:203-26
pubmed: 21854229
Nat Biotechnol. 2011 May 08;29(6):512-20
pubmed: 21552272
BMC Bioinformatics. 2012 Aug 16;13:205
pubmed: 22897923
Nat Genet. 2007 Jul;39(7 Suppl):S37-42
pubmed: 17597780
Nat Rev Genet. 2011 May;12(5):363-76
pubmed: 21358748
Nucleic Acids Res. 2014 Jan;42(Database issue):D1001-6
pubmed: 24316577
Biostatistics. 2008 Jan;9(1):18-29
pubmed: 17513312
Am J Hum Genet. 2012 Oct 5;91(4):597-607
pubmed: 23040492
Nat Genet. 2017 Jul;49(7):1141-1147
pubmed: 28604732
Nat Rev Genet. 2010 Jun;11(6):446-50
pubmed: 20479774
Nat Genet. 2008 Oct;40(10):1166-74
pubmed: 18776908
Nat Genet. 2015 Mar;47(3):296-303
pubmed: 25621458
Nucleic Acids Res. 2007;35(6):2013-25
pubmed: 17341461
Nat Genet. 2008 Oct;40(10):1199-203
pubmed: 18776910
Nature. 2015 Oct 1;526(7571):75-81
pubmed: 26432246
Nat Commun. 2015 Feb 24;6:6304
pubmed: 25710614
Nat Genet. 2011 Mar;43(3):269-76
pubmed: 21317889
Hum Mol Genet. 2008 Oct 15;17(R2):R135-42
pubmed: 18852202
Ann Appl Stat. 2010 Dec 1;4(4):1749-1773
pubmed: 21572975
Nat Rev Genet. 2013 Jul;14(7):483-95
pubmed: 23752797
Genome Biol. 2014 Jun 26;15(6):R84
pubmed: 24970577
Science. 2004 Jul 23;305(5683):525-8
pubmed: 15273396
Nat Methods. 2009 Nov;6(11 Suppl):S13-20
pubmed: 19844226
Nat Protoc. 2014 Nov;9(11):2643-62
pubmed: 25321409
Science. 2016 Aug 19;353(6301):827-30
pubmed: 27540175
Hum Mol Genet. 2009 Apr 15;18(R1):R1-8
pubmed: 19297395
Biometrika. 2010 Sep;97(3):631-645
pubmed: 22822250
Genome Res. 2007 Nov;17(11):1665-74
pubmed: 17921354
Methods. 2016 Jun 1;102:36-49
pubmed: 26845461
Biostatistics. 2004 Oct;5(4):557-72
pubmed: 15475419
Curr Opin Genet Dev. 2009 Jun;19(3):196-204
pubmed: 19477115
Nucleic Acids Res. 2008 Nov;36(19):e126
pubmed: 18784189
Front Bioeng Biotechnol. 2015 Jun 25;3:92
pubmed: 26161383
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Sci Rep. 2017 Dec 4;7(1):16907
pubmed: 29203782
Am J Hum Genet. 2007 Sep;81(3):559-75
pubmed: 17701901
Bioinformatics. 2010 Feb 15;26(4):464-9
pubmed: 20031968
Nat Genet. 2017 May;49(5):692-699
pubmed: 28369037
Hum Hered. 2009;68(1):1-22
pubmed: 19339782
Genome Med. 2016 Jul 19;8(1):78
pubmed: 27435222
Nat Genet. 2008 Oct;40(10):1253-60
pubmed: 18776909