Grouping of genomic markers in populations with family structure.

Clustering Covariance matrix Group lasso SNP-BLUP Single nucleotide polymorphism TagSNP

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
19 Feb 2021
Historique:
received: 05 08 2020
accepted: 08 02 2021
entrez: 20 2 2021
pubmed: 21 2 2021
medline: 10 3 2021
Statut: epublish

Résumé

Linkage and linkage disequilibrium (LD) between genome regions cause dependencies among genomic markers. Due to family stratification in populations with non-random mating in livestock or crop, the standard measures of population LD such as [Formula: see text] may be biased. Grouping of markers according to their interdependence needs to account for the actual population structure in order to allow proper inference in genome-based evaluations. Given a matrix reflecting the strength of association between markers, groups are built successively using a greedy algorithm; largest groups are built at first. As an option, a representative marker is selected for each group. We provide an implementation of the grouping approach as a new function to the R package hscovar. This package enables the calculation of the theoretical covariance between biallelic markers for half- or full-sib families and the derivation of representative markers. In case studies, we have shown that the number of groups comprising dependent markers was smaller and representative SNPs were spread more uniformly over the investigated chromosome region when the family stratification was respected compared to a population-LD approach. In a simulation study, we observed that sensitivity and specificity of a genome-based association study improved if selection of representative markers took family structure into account. Chromosome segments which frequently recombine in the underlying population can be identified from the matrix of pairwise dependence between markers. Representative markers can be exploited, for instance, for dimension reduction prior to a genome-based association study or the grouping structure itself can be employed in a grouped penalization approach.

Sections du résumé

BACKGROUND BACKGROUND
Linkage and linkage disequilibrium (LD) between genome regions cause dependencies among genomic markers. Due to family stratification in populations with non-random mating in livestock or crop, the standard measures of population LD such as [Formula: see text] may be biased. Grouping of markers according to their interdependence needs to account for the actual population structure in order to allow proper inference in genome-based evaluations.
RESULTS RESULTS
Given a matrix reflecting the strength of association between markers, groups are built successively using a greedy algorithm; largest groups are built at first. As an option, a representative marker is selected for each group. We provide an implementation of the grouping approach as a new function to the R package hscovar. This package enables the calculation of the theoretical covariance between biallelic markers for half- or full-sib families and the derivation of representative markers. In case studies, we have shown that the number of groups comprising dependent markers was smaller and representative SNPs were spread more uniformly over the investigated chromosome region when the family stratification was respected compared to a population-LD approach. In a simulation study, we observed that sensitivity and specificity of a genome-based association study improved if selection of representative markers took family structure into account.
CONCLUSIONS CONCLUSIONS
Chromosome segments which frequently recombine in the underlying population can be identified from the matrix of pairwise dependence between markers. Representative markers can be exploited, for instance, for dimension reduction prior to a genome-based association study or the grouping structure itself can be employed in a grouped penalization approach.

Identifiants

pubmed: 33607943
doi: 10.1186/s12859-021-04010-0
pii: 10.1186/s12859-021-04010-0
pmc: PMC7893918
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

79

Subventions

Organisme : Deutsche Forschungsgemeinschaft
ID : WI 4450/2-1

Références

Genome Biol. 2013;14(9):R103
pubmed: 24050704
Proc Natl Acad Sci U S A. 2016 Feb 23;113(8):E987-96
pubmed: 26858403
Nat Genet. 2006 Aug;38(8):879-87
pubmed: 16832355
Science. 2002 Jun 21;296(5576):2225-9
pubmed: 12029063
Front Genet. 2018 Jun 05;9:186
pubmed: 29922330
Theor Appl Genet. 1968 Jun;38(6):226-31
pubmed: 24442307
BMC Bioinformatics. 2015 May 08;16:148
pubmed: 25951947
Hum Hered. 2007;64(1):45-51
pubmed: 17483596
Bioinformatics. 2018 Feb 1;34(3):388-397
pubmed: 29028986
Am J Hum Genet. 2007 Nov;81(5):1084-97
pubmed: 17924348
Am J Hum Genet. 2004 Jan;74(1):106-20
pubmed: 14681826
BMC Genet. 2020 Jun 29;21(1):66
pubmed: 32600319
Bioinformatics. 2017 Jul 15;33(14):2078-2081
pubmed: 28334342
BMC Bioinformatics. 2014 Jun 07;15:172
pubmed: 24906803
J Dairy Sci. 2012 Jul;95(7):4065-73
pubmed: 22720963

Auteurs

Dörte Wittenburg (D)

Institute of Genetics and Biometry, Leibniz Institute for Farm Animal Biology, 18196, Dummerstorf, Germany. wittenburg@fbn-dummerstorf.de.

Michael Doschoris (M)

Institute of Genetics and Biometry, Leibniz Institute for Farm Animal Biology, 18196, Dummerstorf, Germany.

Jan Klosa (J)

Institute of Genetics and Biometry, Leibniz Institute for Farm Animal Biology, 18196, Dummerstorf, Germany.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH