Persistent homology reveals strong phylogenetic signal in 3D protein structures.

persistent homology phylogenetics protein 3D structure topological data analysis

Journal

PNAS nexus
ISSN: 2752-6542
Titre abrégé: PNAS Nexus
Pays: England
ID NLM: 9918367777906676

Informations de publication

Date de publication:
Apr 2024
Historique:
received: 02 11 2023
accepted: 01 04 2024
medline: 1 5 2024
pubmed: 1 5 2024
entrez: 1 5 2024
Statut: epublish

Résumé

Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on 3D structure comparisons are still in their infancy. In this study, we propose an effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g. classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from 10 major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.

Identifiants

pubmed: 38689707
doi: 10.1093/pnasnexus/pgae158
pii: pgae158
pmc: PMC11058471
doi:

Types de publication

Journal Article

Langues

eng

Pagination

pgae158

Informations de copyright

© The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences.

Auteurs

Léa Bou Dagher (L)

Université Claude Bernard Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et BiologieÉvolutive, UMR5558, F-69622 Villeurbanne, France.
Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, F-69622 Villeurbanne, France.
Université Libanaise, Laboratoire de Mathématiques, École Doctorale en Science et Technologie, PO BOX 5 Hadath, Liban.

Dominique Madern (D)

University Grenoble Alpes, CEA, CNRS, IBS, 38000 Grenoble, France.

Philippe Malbos (P)

Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, F-69622 Villeurbanne, France.

Céline Brochier-Armanet (C)

Université Claude Bernard Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et BiologieÉvolutive, UMR5558, F-69622 Villeurbanne, France.

Classifications MeSH