Representations of lipid nanoparticles using large language models for transfection efficiency prediction.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
29 May 2024
29 May 2024
Historique:
received:
23
02
2024
revised:
08
04
2024
accepted:
28
05
2024
medline:
29
5
2024
pubmed:
29
5
2024
entrez:
29
5
2024
Statut:
aheadofprint
Résumé
Lipid nanoparticles (LNPs) are the most widely used vehicles for mRNA vaccine delivery. The structure of the lipids composing the LNPs can have a major impact on the effectiveness of the mRNA payload. Several properties should be optimized to improve delivery and expression including biodegradability, synthetic accessibility and transfection efficiency (TE). To optimize LNPs we developed and tested models that enable the virtual screening of LNPs with high TE. Our best method uses the lipid SMILES as inputs to a Large Language Model (LLM). LLM generated embeddings are then used by a downstream gradient-boosting classifier. As we show, our method can more accurately predict lipid properties which could lead to higher efficiency and reduced experimental time and costs. Code and data links available at: https://github.com/Sanofi-Public/LipoBART.
Identifiants
pubmed: 38810107
pii: 7684951
doi: 10.1093/bioinformatics/btae342
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© The Author(s) 2024. Published by Oxford University Press.