A Log-Ratio Biplot Approach for Exploring Genetic Relatedness Based on Identity by State.

allele sharing composition identity by descent identity by state log-ratio transformation

Journal

Frontiers in genetics
ISSN: 1664-8021
Titre abrégé: Front Genet
Pays: Switzerland
ID NLM: 101560621

Informations de publication

Date de publication:
2019
Historique:
received: 05 12 2018
accepted: 29 03 2019
entrez: 10 5 2019
pubmed: 10 5 2019
medline: 10 5 2019
Statut: epublish

Résumé

The detection of cryptic relatedness in large population-based cohorts is of great importance in genome research. The usual approach for detecting closely related individuals is to plot allele sharing statistics, based on identity-by-state or identity-by-descent, in a two-dimensional scatterplot. This approach ignores that allele sharing data across individuals has in reality a higher dimensionality, and neither regards the compositional nature of the underlying counts of shared genotypes. In this paper we develop biplot methodology based on log-ratio principal component analysis that overcomes these restrictions. This leads to entirely new graphics that are essentially useful for exploring relatedness in genetic databases from homogeneous populations. The proposed method can be applied in an iterative manner, acting as a looking glass for more remote relationships that are harder to classify. Datasets from the 1,000 Genomes Project and the Genomes For Life-GCAT Project are used to illustrate the proposed method. The discriminatory power of the log-ratio biplot approach is compared with the classical plots in a simulation study. In a non-inbred homogeneous population the classification rate of the log-ratio principal component approach outperforms the classical graphics across the whole allele frequency spectrum, using only identity by state. In these circumstances, simulations show that with 35,000 independent bi-allelic variants, log-ratio principal component analysis, combined with discriminant analysis, can correctly classify relationships up to and including the fourth degree.

Identifiants

pubmed: 31068965
doi: 10.3389/fgene.2019.00341
pmc: PMC6491861
doi:

Types de publication

Journal Article

Langues

eng

Pagination

341

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM075091
Pays : United States

Références

Ann Hum Genet. 1975 Oct;39(2):173-88
pubmed: 1052764
Am J Hum Genet. 2000 Mar;66(3):1076-94
pubmed: 10712219
Am J Hum Genet. 2000 Nov;67(5):1219-31
pubmed: 11032786
Bioinformatics. 2001 Aug;17(8):742-3
pubmed: 11524377
Genetics. 2003 Mar;163(3):1153-67
pubmed: 12663552
PLoS Genet. 2005 Sep;1(3):e32
pubmed: 16151517
Genetics. 2006 May;173(1):483-96
pubmed: 16510792
Nat Rev Genet. 2006 Oct;7(10):771-80
pubmed: 16983373
Ann Hum Genet. 2006 Nov;70(Pt 6):841-7
pubmed: 17044859
Am J Hum Genet. 2007 Sep;81(3):559-75
pubmed: 17701901
Nature. 2008 Feb 21;451(7181):998-1003
pubmed: 18288195
Nat Genet. 2009 Jan;41(1):35-46
pubmed: 19060910
Stat Appl Genet Mol Biol. 2010;9:Article 13
pubmed: 20196748
Am J Hum Genet. 2010 Oct 8;87(4):457-64
pubmed: 20869033
Bioinformatics. 2010 Nov 15;26(22):2867-73
pubmed: 20926424
Genet Res (Camb). 2011 Feb;93(1):47-64
pubmed: 21226974
Genome Res. 2011 May;21(5):768-74
pubmed: 21324875
Bioinformatics. 2011 Jul 1;27(13):i333-41
pubmed: 21685089
PLoS Genet. 2011 Sep;7(9):e1002287
pubmed: 21966277
PLoS One. 2012;7(11):e49575
pubmed: 23185369
Source Code Biol Med. 2013 Feb 06;8(1):5
pubmed: 23384435
G3 (Bethesda). 2013 May 20;3(5):891-907
pubmed: 23550135
Stat Appl Genet Mol Biol. 2013 Aug;12(4):433-48
pubmed: 23934608
Genet Epidemiol. 2015 May;39(4):276-93
pubmed: 25810074
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
PLoS One. 2016 Mar 31;11(3):e0152406
pubmed: 27031620
Mol Ecol Resour. 2017 Nov;17(6):1271-1282
pubmed: 28374569
Front Plant Sci. 2017 Apr 25;8:552
pubmed: 28487705
Genetics. 2017 Sep;207(1):75-82
pubmed: 28739658
Mol Ecol Resour. 2018 Jan;18(1):41-54
pubmed: 28776944
BMJ Open. 2018 Mar 27;8(3):e018324
pubmed: 29593016
G3 (Bethesda). 2018 Oct 3;8(10):3185-3202
pubmed: 30082329
J Med Genet. 2018 Nov;55(11):765-778
pubmed: 30166351
Nature. 2018 Oct;562(7726):203-209
pubmed: 30305743

Auteurs

Jan Graffelman (J)

Department of Statistics and Operations Research, Technical University of Catalonia, Barcelona, Spain.
Department of Biostatistics, University of Washington, Seattle, WA, United States.

Iván Galván Femenía (I)

Department of Computer Science, Applied Mathematics and Statistics, University of Girona, Girona, Spain.
Genomes For Life - GCAT Lab, Institute for Health Science Research Germans Trias i Pujol (IGTP), Badalona, Spain.

Rafael de Cid (R)

Genomes For Life - GCAT Lab, Institute for Health Science Research Germans Trias i Pujol (IGTP), Badalona, Spain.

Carles Barceló Vidal (C)

Department of Computer Science, Applied Mathematics and Statistics, University of Girona, Girona, Spain.

Classifications MeSH