Statistics to prioritize rare variants in family-based sequencing studies with disease subtypes.

affected‐only family design disease subtypes family‐based sequencing study heritability rare variant prioritization

Journal

Genetic epidemiology
ISSN: 1098-2272
Titre abrégé: Genet Epidemiol
Pays: United States
ID NLM: 8411723

Informations de publication

Date de publication:
28 Jun 2024
Historique:
revised: 26 03 2024
received: 17 10 2023
accepted: 13 06 2024
medline: 28 6 2024
pubmed: 28 6 2024
entrez: 28 6 2024
Statut: aheadofprint

Résumé

Family-based sequencing studies are increasingly used to find rare genetic variants of high risk for disease traits with familial clustering. In some studies, families with multiple disease subtypes are collected and the exomes of affected relatives are sequenced for shared rare variants (RVs). Since different families can harbor different causal variants and each family harbors many RVs, tests to detect causal variants can have low power in this study design. Our goal is rather to prioritize shared variants for further investigation by, for example, pathway analyses or functional studies. The transmission-disequilibrium test prioritizes variants based on departures from Mendelian transmission in parent-child trios. Extending this idea to families, we propose methods to prioritize RVs shared in affected relatives with two disease subtypes, with one subtype more heritable than the other. Global approaches condition on a variant being observed in the study and assume a known probability of carrying a causal variant. In contrast, local approaches condition on a variant being observed in specific families to eliminate the carrier probability. Our simulation results indicate that global approaches are robust to misspecification of the carrier probability and prioritize more effectively than local approaches even when the carrier probability is misspecified.

Identifiants

pubmed: 38940260
doi: 10.1002/gepi.22579
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Natural Sciences and Engineering Research Council of Canada
Organisme : CIHR
Pays : Canada
Organisme : Canadian Statistical Sciences Institute

Informations de copyright

© 2024 The Author(s). Genetic Epidemiology published by Wiley Periodicals LLC.

Références

Adrion, J. R., Cole, C. B., Dukler, N., Galloway, J. G., Gladstein, A. L., Gower, G., Kyriazis, C. C., Ragsdale, A. P., Tsambos, G., Baumdicker, F., Carlson, J., Cartwright, R. A., Durvasula, A., Gronau, I., Kim, B. Y., McKenzie, P., Messer, P. W., Noskova, E., Ortega‐Del Vecchyo, D., … Kern, A. D. (2020). A community‐maintained standard library of population genetic models. eLife, 9, e54967. https://doi.org/10.7554/eLife.54967
Adzhubei, I. A., Schmidt, S., Peshkin, L., Ramensky, V. E., Gerasimova, A., Bork, P., Kondrashov, A. S., & Sunyaev, S. R. (2010). A method and server for predicting damaging missense mutations. Nature Methods, 7(4), 248–249.
Basu, S., Stephens, M., Pankow, J., & Thompson, E. (2010). A likelihood‐based trait‐model‐free approach for linkage detection of binary trait. Biometrics, 66(1), 205–213.
Bell, F., & Miller, M. (2005). Life tables for the United States social security area 1900–2100, Actuarial study no. 120. https://www.ssa.gov/oact/NOTES/as120/LifeTables_Tbl_6_2000.html
Bureau, A., Begum, F., Taub, M., Hetmanski, J., Parker, M., Albacha‐Hejazi, H., Scott, A., Murray, J., Marazita, M., Bailey‐Wilson, J., Beaty, T., & Ruczinski, I. (2019). Inferring disease risk genes from sequencing data in multiplex pedigrees through sharing of rare variants. Genetic Epidemiology, 43(1), 39–49.
Bureau, A., Parker, M., Ruczinski, I., Taub, M., Marazita, M., Murray, J., Mangold, E., Noethen, M., Ludwig, K., Hetmanski, J., Bailey‐Wilson, J., Cropp, C., Li, Q., Szymczak, S., Albacha‐Hejazi, H., Alqosayer, K., Field, L., Wu‐Chou, Y., Doheny, K., … Beaty, T. (2014). Whole exome sequencing of distant relatives in multiplex families implicates rare variants in candidate genes for oral clefts. Genetics, 197(3), 1039–1044. https://doi.org/10.1534/genetics.114.165225
Bureau, A., Younkin, S., Parker, M., Bailey‐Wilson, J., Marazita, M., Murray, J., Mangold, E., Albacha‐Hejazi, H., Beaty, T., & Ruczinski, I. (2014). Inferring rare disease risk variants based on exact probabilities of sharing by multiple affected relatives. Bioinformatics, 30(15), 2189–2196. https://doi.org/10.1093/bioinformatics/btu198
Cerhan, J., & Slager, S. (2015). Familial predisposition and genetic risk factors for lymphoma. Blood, 126(20), 2265–2273.
Efron, B., & Morris, C. (1972). Empirical Bayes on vector observations: An extension of Stein's method. Biometrika, 59(2), 335–347. https://doi.org/10.1093/biomet/59.2.335
Epasinghege Dona, N., & Graham, J. (2022). Simulated exome‐sequencing data for a family study of lymphoid cancer. Zenodo https://doi.org/10.5281/zenodo.6499208
Fahrmeir, L., Hennevogl, W., & Tutz, G. (2013). Multivariate statistical modelling based on generalized linear models. Springer Series in Statistics. Springer. https://books.google.ca/books?id=XWEFCAAAQBAJ
Haller, B., & Messer, P. (2017). Slim 2: Flexible, interactive forward genetic simulations. Molecular Biology and Evolution, 34(1), 230–240. https://doi.org/10.1093/molbev/msw211
Højsgaard, S. (2012). Graphical independence networks with the gRain package for R. Journal of Statistical Software, 46(10), 1–26. https://doi.org/10.18637/jss.v046.i10
Jones, S. J., Voong, J., Thomas, R., English, A., Schuetz, J., Slack, G. W., Graham, J., Connors, J. M., & Brooks‐Wilson, A. (2017). Nonrandom occurrence of lymphoid cancer types in 140 families. Leukemia & Lymphoma, 58(9), 1–10.
Laird, N., & Lange, C. (2008). Family‐based methods for linkage and association analysis. Advances in Genetics, 60, 219.
Lauritzen, S. L., & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society. Series B (Methodological), 50(2), 157–224. http://www.jstor.org/stable/2345762
Nieuwoudt, C. (2021). Simulation and statistical methods for family‐based sequencing studies. PhD thesis, Simon Fraser University. https://summit.sfu.ca/item/35195
Nieuwoudt, C., Binte Farooq, F., Brooks‐Wilson, A., Bureau, A., & Graham, J. (2023). Data and code for analyses in “Statistics to prioritize rare variants in family‐based sequencing studies with disease subtypes”. Zenodo. https://zenodo.org/records/10012025
Nieuwoudt, C., Brooks‐Wilson, A., & Graham, J. (2020). SimRVSeqeunces: An R package to simulate genetic sequence data for pedigrees. Bioinformatics, 36(7) 2295–2297. https://doi.org/10.1093/bioinformatics/btz881
Nieuwoudt, C., Jones, S., Brooks‐Wilson, A., & Graham, J. (2018). Simulating pedigrees ascertained for multiple disease‐affected relatives. Source Code for Biology and Medicine, 13(1), 1–11. https://doi.org/10.1101/234153
Ott, J., Wang, J., & Leal, S. (2015). Genetic linkage analysis in the age of whole‐genome sequencing. Nature reviews. Genetics, 16(5), 275.
R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Sinsheimer, J., Blangero, J., & Lange, K. (2000). Gamete‐competition models. American Journal of Human Genetics, 66(3), 1168–1172.
Surveillance, Epidemiology, and End Results (SEER) Program. (n.d.a). SEER*Stat database: Incidence—SEER 18 Regs Research Data + Hurricane Katrina Impacted Louisiana Cases, Nov 2014 Sub (2000‐2012) <Katrina/Rita Population Adjustment>—Linked To County Attributes—Total U.S., 1969–2013 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, Surveillance Systems Branch, released April 2015, based on the November 2014 submission. https://www.seer.cancer.gov
Surveillance, Epidemiology, and End Results (SEER) Program. (n.d.b). SEER*Stat database: Incidence—SEER 9 Regs Research Data, Nov 2014 Sub (1973–2012) <Katrina/Rita Population Adjustment>—Linked To County Attributes—Total U.S., 1969–2013 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, Surveillance Systems Branch, released April 2015, based on the November 2014 submission. https://www.seer.cancer.gov
Surveillance, Epidemiology, and End Results (SEER) Program. (n.d.c). SEER*Stat database: Incidence‐Based Mortality—SEER 9 Regs Research Data, Nov 2014 Sub (1973–2012) <Katrina/Rita Population Adjustment>—Linked To County Attributes—Total U.S., 1969–2013 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, Surveillance Systems Branch, released April 2015, based on the November 2014 submission. https://www.seer.cancer.gov

Auteurs

Christina Nieuwoudt (C)

Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada.

Fabiha Binte Farooq (FB)

Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada.

Angela Brooks-Wilson (A)

Department of Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, British Columbia, Canada.
Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada.

Alexandre Bureau (A)

Département de Médecine Sociale et Préventive, Université Laval, Québec City, Québec, Canada.
Centre de recherche CERVO, Québec City, Québec, Canada.

Jinko Graham (J)

Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada.

Classifications MeSH