Petabase-Scale Homology Search for Structure Prediction.


Journal

Cold Spring Harbor perspectives in biology
ISSN: 1943-0264
Titre abrégé: Cold Spring Harb Perspect Biol
Pays: United States
ID NLM: 101513680

Informations de publication

Date de publication:
05 Feb 2024
Historique:
medline: 6 2 2024
pubmed: 6 2 2024
entrez: 5 2 2024
Statut: aheadofprint

Résumé

The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold's advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold's CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.

Identifiants

pubmed: 38316555
pii: cshperspect.a041465
doi: 10.1101/cshperspect.a041465
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

Copyright © 2024 Cold Spring Harbor Laboratory Press; all rights reserved.

Auteurs

Sewon Lee (S)

School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea.

Gyuri Kim (G)

School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea.

Eli Levy Karin (EL)

ELKMO, Copenhagen 2720, Denmark.

Milot Mirdita (M)

School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea.

Sukhwan Park (S)

Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea.

Rayan Chikhi (R)

Institut Pasteur, Université Paris Cité, G5 Sequence Bioinformatics, 75015 Paris, France.

Artem Babaian (A)

Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada.
Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada.

Andriy Kryshtafovych (A)

Genome Center, University of California, Davis, California 95616, USA.

Martin Steinegger (M)

School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea martin.steinegger@snu.ac.kr.
Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea.
Artificial Intelligence Institute, Seoul National University, Seoul 08826, South Korea.
Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, South Korea.

Classifications MeSH