Aligning biological sequences by exploiting residue conservation and coevolution.


Journal

Physical review. E
ISSN: 2470-0053
Titre abrégé: Phys Rev E
Pays: United States
ID NLM: 101676019

Informations de publication

Date de publication:
Dec 2020
Historique:
received: 02 06 2020
accepted: 12 11 2020
entrez: 20 1 2021
pubmed: 21 1 2021
medline: 25 9 2021
Statut: ppublish

Résumé

Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e., arranging sequences from different organisms in such a way to identify similar regions, to detect evolutionary relationships between sequences, and to predict biomolecular structure and function. This is typically addressed through profile models, which capture position specificities like conservation in sequences but assume an independent evolution of different positions. Over recent years, it has been well established that coevolution of different amino-acid positions is essential for maintaining three-dimensional structure and function. Modeling approaches based on inverse statistical physics can catch the coevolution signal in sequence ensembles, and they are now widely used in predicting protein structure, protein-protein interactions, and mutational landscapes. Here, we present DCAlign, an efficient alignment algorithm based on an approximate message-passing strategy, which is able to overcome the limitations of profile models, to include coevolution among positions in a general way, and to be therefore universally applicable to protein- and RNA-sequence alignment without the need of using complementary structural information. The potential of DCAlign is carefully explored using well-controlled simulated data, as well as real protein and RNA sequences.

Identifiants

pubmed: 33465950
doi: 10.1103/PhysRevE.102.062409
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

062409

Auteurs

Anna Paola Muntoni (AP)

Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy.
Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France.
Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, F-75005 Paris, France.

Andrea Pagnani (A)

Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy.
Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo (TO), Italy.
INFN, Sezione di Torino, Via Giuria 1, I-10125 Torino, Italy.

Martin Weigt (M)

Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, F-75005 Paris, France.

Francesco Zamponi (F)

Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France.

Articles similaires

Genome Size Genome, Plant Magnoliopsida Evolution, Molecular Arabidopsis
Genome, Chloroplast Phylogeny Evolution, Molecular Ilex Microsatellite Repeats

A computational model for bacteriophage ϕX174 gene expression.

Alexis M Hill, Tanvi A Ingle, Claus O Wilke
1.00
Gene Expression Regulation, Viral Promoter Regions, Genetic Bacteriophage phi X 174 Computer Simulation Models, Genetic
Citrus Phenylalanine Ammonia-Lyase Stress, Physiological Multigene Family Phylogeny

Classifications MeSH