A community-powered search of machine learning strategy space to find NMR property prediction models.
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2021
2021
Historique:
received:
27
01
2021
accepted:
08
06
2021
entrez:
20
7
2021
pubmed:
21
7
2021
medline:
9
11
2021
Statut:
epublish
Résumé
The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published 'in-house' efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties.
Identifiants
pubmed: 34283864
doi: 10.1371/journal.pone.0253612
pii: PONE-D-21-02243
pmc: PMC8291653
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0253612Déclaration de conflit d'intérêts
Authors SB, LD, PH, AH, SK, ZK, MK, YL, JPM, TTN, MP, GR, WR, LS, NT, and DW are affiliated with commercial companies. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This does not alter our adherence to PLOS ONE policies on sharing data and materials. There are no patents, products in development or marketed products associated with this research to declare.
Références
Proc Natl Acad Sci U S A. 2019 Apr 23;116(17):8089-8092
pubmed: 31015357
Proc Natl Acad Sci U S A. 2015 Nov 24;112(47):14569-74
pubmed: 26554009
Proc Natl Acad Sci U S A. 2018 Nov 27;115(48):E11231-E11237
pubmed: 30413625
Phys Rev Lett. 2007 Apr 6;98(14):146401
pubmed: 17501293
Science. 2019 Aug 30;365(6456):885-890
pubmed: 31296650
J Chem Phys. 2018 Jun 28;148(24):241717
pubmed: 29960351
Science. 2019 May 31;364(6443):859-865
pubmed: 31147514
J Chem Phys. 2011 Feb 21;134(7):074106
pubmed: 21341827
Nature. 2010 Aug 5;466(7307):756-60
pubmed: 20686574
J Phys Chem Lett. 2015 Jun 18;6(12):2326-31
pubmed: 26113956
Science. 2018 Feb 16;359(6377):725-726
pubmed: 29449469
Science. 2020 Jan 31;367(6477):564-568
pubmed: 32001653
Science. 2020 Feb 28;367(6481):1026-1030
pubmed: 32001523
Proc Natl Acad Sci U S A. 2011 Nov 22;108(47):18949-53
pubmed: 22065763
Nat Commun. 2018 Oct 29;9(1):4501
pubmed: 30374021
Science. 2019 Sep 6;365(6457):
pubmed: 31488660
Chem Sci. 2019 Nov 20;11(2):508-515
pubmed: 32190270
J Cheminform. 2016 May 05;8:26
pubmed: 27158267
J Chem Phys. 2020 Jan 31;152(4):044107
pubmed: 32007071
Science. 2020 Apr 3;368(6486):89-94
pubmed: 32241948
Sci Data. 2014 Aug 05;1:140022
pubmed: 25977779
Proc Natl Acad Sci U S A. 2015 Jan 20;112(3):679-84
pubmed: 25561529
Science. 2019 Aug 9;365(6453):
pubmed: 31395756
J Chem Theory Comput. 2015 May 12;11(5):2087-96
pubmed: 26574412