Bayesian copy number detection and association in large-scale studies.
Bayes Theorem
Case-Control Studies
DNA Copy Number Variations
/ genetics
Genetic Predisposition to Disease
Genome, Human
/ genetics
Genome-Wide Association Study
Humans
Membrane Proteins
/ genetics
Pancreatic Neoplasms
/ genetics
Proto-Oncogene Proteins c-myc
/ genetics
Tumor Suppressor Proteins
/ genetics
Batch effects
CNPBayes
Copy number variants
Genome-wide association
Pancreatic cancer
SNP array
Journal
BMC cancer
ISSN: 1471-2407
Titre abrégé: BMC Cancer
Pays: England
ID NLM: 100967800
Informations de publication
Date de publication:
07 Sep 2020
07 Sep 2020
Historique:
received:
21
02
2020
accepted:
17
08
2020
entrez:
7
9
2020
pubmed:
8
9
2020
medline:
20
4
2021
Statut:
epublish
Résumé
Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples. We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3). Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases.
Sections du résumé
BACKGROUND
BACKGROUND
Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples.
METHODS
METHODS
We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease.
RESULTS
RESULTS
Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3).
CONCLUSIONS
CONCLUSIONS
Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases.
Identifiants
pubmed: 32894098
doi: 10.1186/s12885-020-07304-3
pii: 10.1186/s12885-020-07304-3
pmc: PMC7487704
doi:
Substances chimiques
MYC protein, human
0
Membrane Proteins
0
Proto-Oncogene Proteins c-myc
0
TUSC3 protein, human
0
Tumor Suppressor Proteins
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
856Subventions
Organisme : NCI NIH HHS
ID : P30 CA008748
Pays : United States
Organisme : NCI NIH HHS
ID : P50 CA062924
Pays : United States
Organisme : NCI NIH HHS
ID : R01 CA154823
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001863
Pays : United States
Références
Nat Genet. 2019 Jan;51(1):106-116
pubmed: 30559488
Genes (Basel). 2014 Dec 11;5(4):1064-94
pubmed: 25513881
Nucleic Acids Res. 2012 May;40(10):e72
pubmed: 22323520
Genome Biol. 2007;8(10):R228
pubmed: 17961237
Genome Res. 2013 Jan;23(1):152-8
pubmed: 23028187
Nucleic Acids Res. 2008 Aug;36(13):e80
pubmed: 18559357
Am J Hum Genet. 2016 Mar 3;98(3):571-578
pubmed: 26942289
Cancer Biol Ther. 2007 Oct;6(10):1592-9
pubmed: 17912030
Genes Cancer. 2010 Jun;1(6):555-9
pubmed: 21779458
Nat Genet. 2008 Oct;40(10):1253-60
pubmed: 18776909
J Mol Diagn. 2012 Nov;14(6):550-9
pubmed: 22922130
Nature. 2010 Apr 1;464(7289):713-20
pubmed: 20360734
Am J Hum Genet. 2012 Oct 5;91(4):597-607
pubmed: 23040492
Nat Genet. 2014 Sep;46(9):994-1000
pubmed: 25086665
Nat Genet. 2012 May 06;44(6):642-50
pubmed: 22561516
Head Neck. 2012 Jun;34(6):830-9
pubmed: 22127891
Nat Genet. 2008 Oct;40(10):1245-52
pubmed: 18776912
Ann Appl Stat. 2008 Jun 1;2(2):687-713
pubmed: 19609370
Front Genet. 2014 Feb 13;5:29
pubmed: 24592275
Cell. 2016 Jul 28;166(3):755-765
pubmed: 27372738
Nat Genet. 2015 Aug;47(8):911-6
pubmed: 26098869
J Pathol. 2016 May;239(1):60-71
pubmed: 27071482
Biochim Biophys Acta Mol Basis Dis. 2017 Jul;1863(7):1749-1760
pubmed: 28487226
Genet Epidemiol. 2011 Sep;35(6):536-48
pubmed: 21769931
Am J Hum Genet. 2006 Apr;78(4):629-44
pubmed: 16532393
Cell Rep. 2018 Sep 11;24(11):2838-2856
pubmed: 30208311
Cell. 2018 Apr 5;173(2):355-370.e14
pubmed: 29625052
Nat Biotechnol. 2011 May 08;29(6):512-20
pubmed: 21552272
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Bioinformatics. 2016 Jan 1;32(1):133-5
pubmed: 26382196
Nucleic Acids Res. 2008 Nov;36(19):e126
pubmed: 18784189
Genet Epidemiol. 2011 Dec;35(8):831-44
pubmed: 22125222
J Neurodev Disord. 2019 Feb 7;11(1):3
pubmed: 30732576
Nature. 2007 Oct 18;449(7164):851-61
pubmed: 17943122
Bioinformatics. 2009 May 1;25(9):1099-104
pubmed: 19276148
PLoS Genet. 2007 Sep;3(9):1724-35
pubmed: 17907809
Cell Mol Life Sci. 2018 Mar;75(5):849-857
pubmed: 28929175
BMC Genet. 2014 Jul 09;15:81
pubmed: 25007794