Characterizing Uncertainty in Machine Learning for Chemistry.


Journal

Journal of chemical information and modeling
ISSN: 1549-960X
Titre abrégé: J Chem Inf Model
Pays: United States
ID NLM: 101230060

Informations de publication

Date de publication:
10 07 2023
Historique:
medline: 11 7 2023
pubmed: 20 6 2023
entrez: 20 6 2023
Statut: ppublish

Résumé

Characterizing uncertainty in machine learning models has recently gained interest in the context of machine learning reliability, robustness, safety, and active learning. Here, we separate the total uncertainty into contributions from noise in the data (aleatoric) and shortcomings of the model (epistemic), further dividing epistemic uncertainty into model bias and variance contributions. We systematically address the influence of noise, model bias, and model variance in the context of chemical property predictions, where the diverse nature of target properties and the vast chemical chemical space give rise to many different distinct sources of prediction error. We demonstrate that different sources of error can each be significant in different contexts and must be individually addressed during model development. Through controlled experiments on data sets of molecular properties, we show important trends in model performance associated with the level of noise in the data set, size of the data set, model architecture, molecule representation, ensemble size, and data set splitting. In particular, we show that 1) noise in the test set can limit a model's observed performance when the actual performance is much better, 2) using size-extensive model aggregation structures is crucial for extensive property prediction, and 3) ensembling is a reliable tool for uncertainty quantification and improvement specifically for the contribution of model variance. We develop general guidelines on how to improve an underperforming model when falling into different uncertainty contexts.

Identifiants

pubmed: 37338239
doi: 10.1021/acs.jcim.3c00373
pmc: PMC10336963
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

4012-4029

Références

J Chem Inf Model. 2012 Nov 26;52(11):2864-75
pubmed: 23088335
J Chem Inf Model. 2019 Sep 23;59(9):3817-3828
pubmed: 31438677
J Am Chem Soc. 2022 Jun 22;144(24):10785-10797
pubmed: 35687887
Big Data. 2017 Sep;5(3):246-255
pubmed: 28933947
J Comput Aided Mol Des. 2014 Jul;28(7):711-20
pubmed: 24928188
J Chem Inf Model. 2019 Aug 26;59(8):3370-3388
pubmed: 31361484
Nature. 2018 Mar 28;555(7698):604-610
pubmed: 29595767
Science. 2018 Apr 13;360(6385):186-190
pubmed: 29449509
Nat Med. 2018 Sep;24(9):1342-1350
pubmed: 30104768
Nature. 2019 Aug;572(7767):116-119
pubmed: 31367026
J Phys Chem Lett. 2020 Apr 16;11(8):2992-2997
pubmed: 32216310
Sensors (Basel). 2022 Jul 25;22(15):
pubmed: 35898047
Chem Sci. 2020 Nov 5;12(3):1163-1175
pubmed: 36299676
J Chem Inf Model. 2007 Mar-Apr;47(2):342-53
pubmed: 17260980
SAR QSAR Environ Res. 2021 Mar;32(3):207-219
pubmed: 33601989
Chem Sci. 2019 Jul 10;10(35):8154-8163
pubmed: 31857882
Chem Sci. 2018 Nov 26;10(2):370-377
pubmed: 30746086
Brief Bioinform. 2021 Jul 20;22(4):
pubmed: 33147620
Chem Commun (Camb). 2019 Oct 8;55(81):12152-12155
pubmed: 31497831
J Chem Inf Model. 2020 Jun 22;60(6):2697-2717
pubmed: 32243154
J Chem Inf Model. 2022 May 9;62(9):2101-2110
pubmed: 34734699
J Cheminform. 2021 Dec 7;13(1):96
pubmed: 34876230
Chem Sci. 2017 Oct 31;9(2):513-530
pubmed: 29629118
J Chem Inf Model. 2020 Dec 28;60(12):5936-5945
pubmed: 33164522
J Chem Inf Model. 2020 Aug 24;60(8):3770-3780
pubmed: 32702986
Nat Commun. 2021 Mar 16;12(1):1695
pubmed: 33727552
J Phys Chem A. 2020 Oct 15;124(41):8607-8613
pubmed: 32936640
J Chem Inf Model. 2023 Jul 10;63(13):4012-4029
pubmed: 37338239
J Chem Inf Model. 2022 Feb 14;62(3):433-446
pubmed: 35044781
J Chem Inf Model. 2013 Apr 22;53(4):783-90
pubmed: 23521722
ACS Cent Sci. 2019 Sep 25;5(9):1572-1583
pubmed: 31572784
J Chem Inf Model. 2019 Jul 22;59(7):3330-3339
pubmed: 31241929
J Med Chem. 1996 Jul 19;39(15):2887-93
pubmed: 8709122
ACS Cent Sci. 2017 Oct 25;3(10):1103-1113
pubmed: 29104927
J Chem Phys. 2018 Jun 28;148(24):241722
pubmed: 29960322
Chem Sci. 2022 Jan 4;13(4):1152-1162
pubmed: 35211282
J Chem Inf Comput Sci. 2004 May-Jun;44(3):1000-5
pubmed: 15154768
J Chem Inf Model. 2017 Jun 26;57(6):1300-1308
pubmed: 28481528
J Cheminform. 2020 Apr 22;12(1):27
pubmed: 33430978
Science. 2019 Jan 18;363(6424):
pubmed: 30655414
J Comput Aided Mol Des. 2017 Sep;31(9):829-839
pubmed: 28752345
Angew Chem Int Ed Engl. 2005 Feb 25;44(10):1504-8
pubmed: 15674983
Future Med Chem. 2020 Nov;12(22):1995-1999
pubmed: 33124448
J Chem Inf Model. 2022 Jan 10;62(1):16-26
pubmed: 34939786
ACS Cent Sci. 2021 Aug 25;7(8):1356-1367
pubmed: 34471680
Sci Data. 2014 Aug 05;1:140022
pubmed: 25977779

Auteurs

Esther Heid (E)

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
Institute of Materials Chemistry, TU Wien, 1060 Vienna, Austria.

Charles J McGill (CJ)

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
Department of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, Virginia 23284, United States.

Florence H Vermeire (FH)

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
Department of Chemical Engineering, KU Leuven, Celestijnenlaan 200F, B-3001 Leuven, Belgium.

William H Green (WH)

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.

Articles similaires

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Humans Middle Aged Female Male Surveys and Questionnaires

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis

Classifications MeSH