Recent advances in the self-referencing embedded strings (SELFIES) library.


Journal

Digital discovery
ISSN: 2635-098X
Titre abrégé: Digit Discov
Pays: England
ID NLM: 9918351186906676

Informations de publication

Date de publication:
08 Aug 2023
Historique:
received: 17 03 2023
accepted: 23 06 2023
medline: 28 11 2023
pubmed: 28 11 2023
entrez: 28 11 2023
Statut: epublish

Résumé

String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models. To address these problems, a novel representation, SELF-referencing embedded strings (SELFIES), was proposed that is inherently 100% robust, alongside an accompanying open-source implementation called selfies. Since then, we have generalized SELFIES to support a wider range of molecules and semantic constraints, and streamlined its underlying grammar. We have implemented this updated representation in subsequent versions of selfies, where we have also made major advances with respect to design, efficiency, and supported features. Hence, we present the current status of selfies (version 2.1.1) in this manuscript. Our library, selfies, is available at GitHub (https://github.com/aspuru-guzik-group/selfies).

Identifiants

pubmed: 38013816
doi: 10.1039/d3dd00044c
pii: d3dd00044c
pmc: PMC10408573
doi:

Types de publication

Journal Article Review

Langues

eng

Pagination

897-908

Informations de copyright

This journal is © The Royal Society of Chemistry.

Déclaration de conflit d'intérêts

There are no conflicts to declare.

Références

Patterns (N Y). 2022 Oct 14;3(10):100588
pubmed: 36277819
J Cheminform. 2017 Sep 4;9(1):48
pubmed: 29086083
Science. 2018 Jul 27;361(6400):360-365
pubmed: 30049875
J Cheminform. 2021 Apr 27;13(1):34
pubmed: 33906675
Sci Adv. 2018 Jul 25;4(7):eaap7885
pubmed: 30050984
J Chem Inf Comput Sci. 1994 Sep-Oct;34(5):1219-24
pubmed: 7962217
J Chem Inf Comput Sci. 2002 Jan-Feb;42(1):46-57
pubmed: 11855965
Chem Sci. 2021 Apr 20;12(20):7079-7090
pubmed: 34123336
ACS Cent Sci. 2018 Feb 28;4(2):268-276
pubmed: 29532027
Proc Mach Learn Res. 2022 Jul;162:5777-5792
pubmed: 36193121
J Cheminform. 2020 Oct 27;12(1):65
pubmed: 33372621
Chem Sci. 2022 Feb 16;13(13):3697-3705
pubmed: 35432902
J Chem Inf Comput Sci. 2001 May-Jun;41(3):702-12
pubmed: 11410049

Auteurs

Alston Lo (A)

Department of Computer Science, University of Toronto Canada alston.lo@mail.utoronto.ca r.pollice@rug.nl alan@aspuru.com.

Robert Pollice (R)

Department of Computer Science, University of Toronto Canada alston.lo@mail.utoronto.ca r.pollice@rug.nl alan@aspuru.com.
Chemical Physics Theory Group, Department of Chemistry, University of Toronto Canada.
Stratingh Institute for Chemistry, University of Groningen The Netherlands.

AkshatKumar Nigam (A)

Department of Computer Science, Stanford University California USA.

Andrew D White (AD)

Department of Chemical Engineering, University of Rochester USA.

Mario Krenn (M)

Max Planck Institute for the Science of Light (MPL) Erlangen Germany.

Alán Aspuru-Guzik (A)

Department of Computer Science, University of Toronto Canada alston.lo@mail.utoronto.ca r.pollice@rug.nl alan@aspuru.com.
Chemical Physics Theory Group, Department of Chemistry, University of Toronto Canada.
Vector Institute for Artificial Intelligence Toronto Canada.
Canadian Institute for Advanced Research (CIFAR) Lebovic Fellow Toronto Canada.

Classifications MeSH