Statistics to prioritize rare variants in family-based sequencing studies with disease subtypes.
affected‐only family design
disease subtypes
family‐based sequencing study
heritability
rare variant prioritization
Journal
Genetic epidemiology
ISSN: 1098-2272
Titre abrégé: Genet Epidemiol
Pays: United States
ID NLM: 8411723
Informations de publication
Date de publication:
28 Jun 2024
28 Jun 2024
Historique:
revised:
26
03
2024
received:
17
10
2023
accepted:
13
06
2024
medline:
28
6
2024
pubmed:
28
6
2024
entrez:
28
6
2024
Statut:
aheadofprint
Résumé
Family-based sequencing studies are increasingly used to find rare genetic variants of high risk for disease traits with familial clustering. In some studies, families with multiple disease subtypes are collected and the exomes of affected relatives are sequenced for shared rare variants (RVs). Since different families can harbor different causal variants and each family harbors many RVs, tests to detect causal variants can have low power in this study design. Our goal is rather to prioritize shared variants for further investigation by, for example, pathway analyses or functional studies. The transmission-disequilibrium test prioritizes variants based on departures from Mendelian transmission in parent-child trios. Extending this idea to families, we propose methods to prioritize RVs shared in affected relatives with two disease subtypes, with one subtype more heritable than the other. Global approaches condition on a variant being observed in the study and assume a known probability of carrying a causal variant. In contrast, local approaches condition on a variant being observed in specific families to eliminate the carrier probability. Our simulation results indicate that global approaches are robust to misspecification of the carrier probability and prioritize more effectively than local approaches even when the carrier probability is misspecified.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Natural Sciences and Engineering Research Council of Canada
Organisme : CIHR
Pays : Canada
Organisme : Canadian Statistical Sciences Institute
Informations de copyright
© 2024 The Author(s). Genetic Epidemiology published by Wiley Periodicals LLC.
Références
Adrion, J. R., Cole, C. B., Dukler, N., Galloway, J. G., Gladstein, A. L., Gower, G., Kyriazis, C. C., Ragsdale, A. P., Tsambos, G., Baumdicker, F., Carlson, J., Cartwright, R. A., Durvasula, A., Gronau, I., Kim, B. Y., McKenzie, P., Messer, P. W., Noskova, E., Ortega‐Del Vecchyo, D., … Kern, A. D. (2020). A community‐maintained standard library of population genetic models. eLife, 9, e54967. https://doi.org/10.7554/eLife.54967
Adzhubei, I. A., Schmidt, S., Peshkin, L., Ramensky, V. E., Gerasimova, A., Bork, P., Kondrashov, A. S., & Sunyaev, S. R. (2010). A method and server for predicting damaging missense mutations. Nature Methods, 7(4), 248–249.
Basu, S., Stephens, M., Pankow, J., & Thompson, E. (2010). A likelihood‐based trait‐model‐free approach for linkage detection of binary trait. Biometrics, 66(1), 205–213.
Bell, F., & Miller, M. (2005). Life tables for the United States social security area 1900–2100, Actuarial study no. 120. https://www.ssa.gov/oact/NOTES/as120/LifeTables_Tbl_6_2000.html
Bureau, A., Begum, F., Taub, M., Hetmanski, J., Parker, M., Albacha‐Hejazi, H., Scott, A., Murray, J., Marazita, M., Bailey‐Wilson, J., Beaty, T., & Ruczinski, I. (2019). Inferring disease risk genes from sequencing data in multiplex pedigrees through sharing of rare variants. Genetic Epidemiology, 43(1), 39–49.
Bureau, A., Parker, M., Ruczinski, I., Taub, M., Marazita, M., Murray, J., Mangold, E., Noethen, M., Ludwig, K., Hetmanski, J., Bailey‐Wilson, J., Cropp, C., Li, Q., Szymczak, S., Albacha‐Hejazi, H., Alqosayer, K., Field, L., Wu‐Chou, Y., Doheny, K., … Beaty, T. (2014). Whole exome sequencing of distant relatives in multiplex families implicates rare variants in candidate genes for oral clefts. Genetics, 197(3), 1039–1044. https://doi.org/10.1534/genetics.114.165225
Bureau, A., Younkin, S., Parker, M., Bailey‐Wilson, J., Marazita, M., Murray, J., Mangold, E., Albacha‐Hejazi, H., Beaty, T., & Ruczinski, I. (2014). Inferring rare disease risk variants based on exact probabilities of sharing by multiple affected relatives. Bioinformatics, 30(15), 2189–2196. https://doi.org/10.1093/bioinformatics/btu198
Cerhan, J., & Slager, S. (2015). Familial predisposition and genetic risk factors for lymphoma. Blood, 126(20), 2265–2273.
Efron, B., & Morris, C. (1972). Empirical Bayes on vector observations: An extension of Stein's method. Biometrika, 59(2), 335–347. https://doi.org/10.1093/biomet/59.2.335
Epasinghege Dona, N., & Graham, J. (2022). Simulated exome‐sequencing data for a family study of lymphoid cancer. Zenodo https://doi.org/10.5281/zenodo.6499208
Fahrmeir, L., Hennevogl, W., & Tutz, G. (2013). Multivariate statistical modelling based on generalized linear models. Springer Series in Statistics. Springer. https://books.google.ca/books?id=XWEFCAAAQBAJ
Haller, B., & Messer, P. (2017). Slim 2: Flexible, interactive forward genetic simulations. Molecular Biology and Evolution, 34(1), 230–240. https://doi.org/10.1093/molbev/msw211
Højsgaard, S. (2012). Graphical independence networks with the gRain package for R. Journal of Statistical Software, 46(10), 1–26. https://doi.org/10.18637/jss.v046.i10
Jones, S. J., Voong, J., Thomas, R., English, A., Schuetz, J., Slack, G. W., Graham, J., Connors, J. M., & Brooks‐Wilson, A. (2017). Nonrandom occurrence of lymphoid cancer types in 140 families. Leukemia & Lymphoma, 58(9), 1–10.
Laird, N., & Lange, C. (2008). Family‐based methods for linkage and association analysis. Advances in Genetics, 60, 219.
Lauritzen, S. L., & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society. Series B (Methodological), 50(2), 157–224. http://www.jstor.org/stable/2345762
Nieuwoudt, C. (2021). Simulation and statistical methods for family‐based sequencing studies. PhD thesis, Simon Fraser University. https://summit.sfu.ca/item/35195
Nieuwoudt, C., Binte Farooq, F., Brooks‐Wilson, A., Bureau, A., & Graham, J. (2023). Data and code for analyses in “Statistics to prioritize rare variants in family‐based sequencing studies with disease subtypes”. Zenodo. https://zenodo.org/records/10012025
Nieuwoudt, C., Brooks‐Wilson, A., & Graham, J. (2020). SimRVSeqeunces: An R package to simulate genetic sequence data for pedigrees. Bioinformatics, 36(7) 2295–2297. https://doi.org/10.1093/bioinformatics/btz881
Nieuwoudt, C., Jones, S., Brooks‐Wilson, A., & Graham, J. (2018). Simulating pedigrees ascertained for multiple disease‐affected relatives. Source Code for Biology and Medicine, 13(1), 1–11. https://doi.org/10.1101/234153
Ott, J., Wang, J., & Leal, S. (2015). Genetic linkage analysis in the age of whole‐genome sequencing. Nature reviews. Genetics, 16(5), 275.
R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Sinsheimer, J., Blangero, J., & Lange, K. (2000). Gamete‐competition models. American Journal of Human Genetics, 66(3), 1168–1172.
Surveillance, Epidemiology, and End Results (SEER) Program. (n.d.a). SEER*Stat database: Incidence—SEER 18 Regs Research Data + Hurricane Katrina Impacted Louisiana Cases, Nov 2014 Sub (2000‐2012) <Katrina/Rita Population Adjustment>—Linked To County Attributes—Total U.S., 1969–2013 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, Surveillance Systems Branch, released April 2015, based on the November 2014 submission. https://www.seer.cancer.gov
Surveillance, Epidemiology, and End Results (SEER) Program. (n.d.b). SEER*Stat database: Incidence—SEER 9 Regs Research Data, Nov 2014 Sub (1973–2012) <Katrina/Rita Population Adjustment>—Linked To County Attributes—Total U.S., 1969–2013 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, Surveillance Systems Branch, released April 2015, based on the November 2014 submission. https://www.seer.cancer.gov
Surveillance, Epidemiology, and End Results (SEER) Program. (n.d.c). SEER*Stat database: Incidence‐Based Mortality—SEER 9 Regs Research Data, Nov 2014 Sub (1973–2012) <Katrina/Rita Population Adjustment>—Linked To County Attributes—Total U.S., 1969–2013 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, Surveillance Systems Branch, released April 2015, based on the November 2014 submission. https://www.seer.cancer.gov