IPCAPS: an R package for iterative pruning to capture population structure.

Fine-scale structure Iterative pruning Outlier detection Population clustering Population genetics

Journal

Source code for biology and medicine
ISSN: 1751-0473
Titre abrégé: Source Code Biol Med
Pays: England
ID NLM: 101276533

Informations de publication

Date de publication:
2019
Historique:
received: 25 07 2017
accepted: 21 02 2019
entrez: 3 4 2019
pubmed: 3 4 2019
medline: 3 4 2019
Statut: epublish

Résumé

Resolving population genetic structure is challenging, especially when dealing with closely related or geographically confined populations. Although Principal Component Analysis (PCA)-based methods and genomic variation with single nucleotide polymorphisms (SNPs) are widely used to describe shared genetic ancestry, improvements can be made especially when fine-scale population structure is the target. This work presents an R package called IPCAPS, which uses SNP information for resolving possibly fine-scale population structure. The IPCAPS routines are built on the iterative pruning Principal Component Analysis (ipPCA) framework that systematically assigns individuals to genetically similar subgroups. In each iteration, our tool is able to detect and eliminate outliers, hereby avoiding severe misclassification errors. IPCAPS supports different measurement scales for variables used to identify substructure. Hence, panels of gene expression and methylation data can be accommodated as well. The tool can also be applied in patient sub-phenotyping contexts. IPCAPS is developed in R and is freely available from http://bio3.giga.ulg.ac.be/ipcaps.

Sections du résumé

BACKGROUND BACKGROUND
Resolving population genetic structure is challenging, especially when dealing with closely related or geographically confined populations. Although Principal Component Analysis (PCA)-based methods and genomic variation with single nucleotide polymorphisms (SNPs) are widely used to describe shared genetic ancestry, improvements can be made especially when fine-scale population structure is the target.
RESULTS RESULTS
This work presents an R package called IPCAPS, which uses SNP information for resolving possibly fine-scale population structure. The IPCAPS routines are built on the iterative pruning Principal Component Analysis (ipPCA) framework that systematically assigns individuals to genetically similar subgroups. In each iteration, our tool is able to detect and eliminate outliers, hereby avoiding severe misclassification errors.
CONCLUSIONS CONCLUSIONS
IPCAPS supports different measurement scales for variables used to identify substructure. Hence, panels of gene expression and methylation data can be accommodated as well. The tool can also be applied in patient sub-phenotyping contexts. IPCAPS is developed in R and is freely available from http://bio3.giga.ulg.ac.be/ipcaps.

Identifiants

pubmed: 30936940
doi: 10.1186/s13029-019-0072-6
pii: 72
pmc: PMC6427891
doi:

Types de publication

Journal Article

Langues

eng

Pagination

2

Déclaration de conflit d'intérêts

Not applicable.Not applicable.The authors declare that they have no competing interests.Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Références

Nature. 2003 Dec 18;426(6968):789-96
pubmed: 14685227
Nat Genet. 2006 Aug;38(8):904-9
pubmed: 16862161
BMC Bioinformatics. 2008 Dec 16;9:539
pubmed: 19087322
BMC Bioinformatics. 2009 Nov 23;10:382
pubmed: 19930644
BMC Bioinformatics. 2011 Jun 23;12:255
pubmed: 21699684
PLoS Genet. 2012 Jan;8(1):e1002453
pubmed: 22291602
PLoS One. 2012;7(10):e48375
pubmed: 23152744
BMC Bioinformatics. 2013 Apr 19;14:132
pubmed: 23601181
Sci Rep. 2015 Jan 30;5:8140
pubmed: 25633916
Gigascience. 2015 Feb 25;4:7
pubmed: 25722852
Genetica. 1995;96(1-2):3-12
pubmed: 7607457

Auteurs

Kridsadakorn Chaichoompu (K)

1GIGA-R Medical Genomics - BIO3, University of Liege, Avenue de l'Hôpital 11, 4000 Liege, Belgium.

Fentaw Abegaz (F)

1GIGA-R Medical Genomics - BIO3, University of Liege, Avenue de l'Hôpital 11, 4000 Liege, Belgium.

Sissades Tongsima (S)

2Genome Technology Research Unit, National Center for Genetic Engineering and Biotechnology, 113 Thailand Science Park, Phahonyothin Road, Khlong Neung, Khlong Luang, Pathum Thani 12120 Thailand.

Philip James Shaw (PJ)

3Medical Molecular Biology Research Unit, National Center for Genetic Engineering and Biotechnology, 113 Thailand Science Park, Phahonyothin Road, Khlong Neung, Khlong Luang, Pathum Thani 12120 Thailand.

Anavaj Sakuntabhai (A)

4Functional Genetics of Infectious Diseases Unit, Institut Pasteur, 25-28, rue du Docteur Roux, 75015 Paris, France.
5Centre National de la Recherche Scientifique, URA3012, Paris, France.

Luísa Pereira (L)

6Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Rua Alfredo Allen, 208, 4200-135 Porto, Portugal.
7Instituto de Patologia e Imunologia Molecular da Universidade do Porto, Rua Júlio Amaral de Carvalho, 45, 4200-135 Porto, Portugal.

Kristel Van Steen (K)

1GIGA-R Medical Genomics - BIO3, University of Liege, Avenue de l'Hôpital 11, 4000 Liege, Belgium.
WELBIO (Walloon Excellence in Lifesciences and Biotechnology), Avenue Pasteur 6, 1300 Wavre, Belgium.

Classifications MeSH