FLIGHTED: Inferring Fitness Landscapes from Noisy High-Throughput Experimental Data.


Journal

bioRxiv : the preprint server for biology
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187

Informations de publication

Date de publication:
27 Mar 2024
Historique:
medline: 8 4 2024
pubmed: 8 4 2024
entrez: 8 4 2024
Statut: epublish

Résumé

Machine learning (ML) for protein design requires large protein fitness datasets generated by high-throughput experiments for training, fine-tuning, and bench-marking models. However, most models do not account for experimental noise inherent in these datasets, harming model performance and changing model rankings in benchmarking studies. Here, we develop FLIGHTED, a Bayesian method for generating fitness landscapes with calibrated errors from noisy high-throughput experimental data. We apply FLIGHTED to single-step selection assays such as phage display and to a novel high-throughput assay DHARMA that ties fitness to base editing activity. Our results show that FLIGHTED robustly generates fitness landscapes with accurate errors. We demonstrate that FLIGHTED improves model performance and enables the generation of protein fitness datasets of up to 10^6 variants with DHARMA. FLIGHTED can be used on any high-throughput assay and makes it easy for ML scientists to account for experimental noise when modeling protein fitness.

Identifiants

pubmed: 38586054
doi: 10.1101/2024.03.26.586797
pmc: PMC10996587
pii:
doi:

Types de publication

Preprint

Langues

eng

Auteurs

Classifications MeSH