Grouping of genomic markers in populations with family structure.

Genetic Linkage Genome Genomics Humans Linkage Disequilibrium Polymorphism, Single Nucleotide

Clustering Covariance matrix Group lasso SNP-BLUP Single nucleotide polymorphism TagSNP

Journal

BMC bioinformatics

ISSN: 1471-2105

Titre abrégé: BMC Bioinformatics

Pays: England

ID NLM: 100965194

Informations de publication

Date de publication:
19 Feb 2021

Historique:

received: 05 08 2020

accepted: 08 02 2021

entrez: 20 2 2021

pubmed: 21 2 2021

medline: 10 3 2021

Statut: epublish

Résumé

Linkage and linkage disequilibrium (LD) between genome regions cause dependencies among genomic markers. Due to family stratification in populations with non-random mating in livestock or crop, the standard measures of population LD such as [Formula: see text] may be biased. Grouping of markers according to their interdependence needs to account for the actual population structure in order to allow proper inference in genome-based evaluations. Given a matrix reflecting the strength of association between markers, groups are built successively using a greedy algorithm; largest groups are built at first. As an option, a representative marker is selected for each group. We provide an implementation of the grouping approach as a new function to the R package hscovar. This package enables the calculation of the theoretical covariance between biallelic markers for half- or full-sib families and the derivation of representative markers. In case studies, we have shown that the number of groups comprising dependent markers was smaller and representative SNPs were spread more uniformly over the investigated chromosome region when the family stratification was respected compared to a population-LD approach. In a simulation study, we observed that sensitivity and specificity of a genome-based association study improved if selection of representative markers took family structure into account. Chromosome segments which frequently recombine in the underlying population can be identified from the matrix of pairwise dependence between markers. Representative markers can be exploited, for instance, for dimension reduction prior to a genome-based association study or the grouping structure itself can be employed in a grouped penalization approach.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

Given a matrix reflecting the strength of association between markers, groups are built successively using a greedy algorithm; largest groups are built at first. As an option, a representative marker is selected for each group. We provide an implementation of the grouping approach as a new function to the R package hscovar. This package enables the calculation of the theoretical covariance between biallelic markers for half- or full-sib families and the derivation of representative markers. In case studies, we have shown that the number of groups comprising dependent markers was smaller and representative SNPs were spread more uniformly over the investigated chromosome region when the family stratification was respected compared to a population-LD approach. In a simulation study, we observed that sensitivity and specificity of a genome-based association study improved if selection of representative markers took family structure into account.

CONCLUSIONS CONCLUSIONS

Chromosome segments which frequently recombine in the underlying population can be identified from the matrix of pairwise dependence between markers. Representative markers can be exploited, for instance, for dimension reduction prior to a genome-based association study or the grouping structure itself can be employed in a grouped penalization approach.

Identifiants

DOI: 10.1186/s12859-021-04010-0 PMID: 33607943 PMC: PMC7893918

pubmed: 33607943

doi: 10.1186/s12859-021-04010-0

pii: 10.1186/s12859-021-04010-0

pmc: PMC7893918

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

Subventions

Organisme : Deutsche Forschungsgemeinschaft

ID : WI 4450/2-1

Références

Genome Biol. 2013;14(9):R103

pubmed: 24050704

Proc Natl Acad Sci U S A. 2016 Feb 23;113(8):E987-96

pubmed: 26858403

Nat Genet. 2006 Aug;38(8):879-87

pubmed: 16832355

Science. 2002 Jun 21;296(5576):2225-9

pubmed: 12029063

Front Genet. 2018 Jun 05;9:186

pubmed: 29922330

Theor Appl Genet. 1968 Jun;38(6):226-31

pubmed: 24442307

BMC Bioinformatics. 2015 May 08;16:148

pubmed: 25951947

Hum Hered. 2007;64(1):45-51

pubmed: 17483596

Bioinformatics. 2018 Feb 1;34(3):388-397

pubmed: 29028986

Am J Hum Genet. 2007 Nov;81(5):1084-97

pubmed: 17924348

Am J Hum Genet. 2004 Jan;74(1):106-20

pubmed: 14681826

BMC Genet. 2020 Jun 29;21(1):66

pubmed: 32600319

Bioinformatics. 2017 Jul 15;33(14):2078-2081

pubmed: 28334342

BMC Bioinformatics. 2014 Jun 07;15:172

pubmed: 24906803

J Dairy Sci. 2012 Jul;95(7):4065-73

pubmed: 22720963

Grouping of genomic markers in populations with family structure.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Références

Auteurs

Dörte Wittenburg (D)

Michael Doschoris (M)

Jan Klosa (J)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH