Inferring ancestry with the hierarchical soft clustering approach tangleGen.


Journal

Genome research
ISSN: 1549-5469
Titre abrégé: Genome Res
Pays: United States
ID NLM: 9518021

Informations de publication

Date de publication:
21 Oct 2024
Historique:
received: 27 03 2024
accepted: 16 10 2024
medline: 22 10 2024
pubmed: 22 10 2024
entrez: 21 10 2024
Statut: aheadofprint

Résumé

Understanding the genetic ancestry of populations is central to numerous scientific and societal fields. It contributes to a better understanding of human evolutionary history, advances personalized medicine, aids in forensic identification, and allows individuals to connect to their genealogical roots. Existing methods, such as ADMIXTURE, have significantly improved our ability to infer ancestries. However, these methods typically work with a fixed number of independent ancestral populations. As a result, they provide insight into genetic admixture, but do not include a hierarchical interpretation. In particular, the intricate ancestral population structures remain difficult to unravel. Alternative methods with a consistent inheritance structure, such as hierarchical clustering, may offer benefits in terms of interpreting the inferred ancestries. Here, we present tangleGen, a soft clustering tool that transfers the hierarchical machine learning framework Tangles, which leverages graph theoretical concepts, to the field of population genetics. The hierarchical perspective of tangleGen on the composition and structure of populations improves the interpretability of the inferred ancestral relationships. Moreover, tangleGen adds a new layer of explainability, as it allows identifying the SNPs that are responsible for the clustering structure. We demonstrate the capabilities and benefits of tangleGen for the inference of ancestral relationships, using both simulated data and data from the 1000 Genomes Project.

Identifiants

pubmed: 39433440
pii: gr.279399.124
doi: 10.1101/gr.279399.124
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

Published by Cold Spring Harbor Laboratory Press.

Auteurs

Klara Elisabeth Burger (KE)

University of Tübingen.

Solveig Klepper (S)

University of Tübingen, Tübingen AI Center.

Ulrike von Luxburg (U)

University of Tübingen, Tübingen AI Center.

Franz Baumdicker (F)

Institute for Bioinformatics and Medical Informatics, University of Tübingen franz.baumdicker@uni-tuebingen.de.

Classifications MeSH