Characterizing Uncertainty in Machine Learning for Chemistry.
Journal
Journal of chemical information and modeling
ISSN: 1549-960X
Titre abrégé: J Chem Inf Model
Pays: United States
ID NLM: 101230060
Informations de publication
Date de publication:
10 07 2023
10 07 2023
Historique:
medline:
11
7
2023
pubmed:
20
6
2023
entrez:
20
6
2023
Statut:
ppublish
Résumé
Characterizing uncertainty in machine learning models has recently gained interest in the context of machine learning reliability, robustness, safety, and active learning. Here, we separate the total uncertainty into contributions from noise in the data (aleatoric) and shortcomings of the model (epistemic), further dividing epistemic uncertainty into model bias and variance contributions. We systematically address the influence of noise, model bias, and model variance in the context of chemical property predictions, where the diverse nature of target properties and the vast chemical chemical space give rise to many different distinct sources of prediction error. We demonstrate that different sources of error can each be significant in different contexts and must be individually addressed during model development. Through controlled experiments on data sets of molecular properties, we show important trends in model performance associated with the level of noise in the data set, size of the data set, model architecture, molecule representation, ensemble size, and data set splitting. In particular, we show that 1) noise in the test set can limit a model's observed performance when the actual performance is much better, 2) using size-extensive model aggregation structures is crucial for extensive property prediction, and 3) ensembling is a reliable tool for uncertainty quantification and improvement specifically for the contribution of model variance. We develop general guidelines on how to improve an underperforming model when falling into different uncertainty contexts.
Identifiants
pubmed: 37338239
doi: 10.1021/acs.jcim.3c00373
pmc: PMC10336963
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
4012-4029Références
J Chem Inf Model. 2012 Nov 26;52(11):2864-75
pubmed: 23088335
J Chem Inf Model. 2019 Sep 23;59(9):3817-3828
pubmed: 31438677
J Am Chem Soc. 2022 Jun 22;144(24):10785-10797
pubmed: 35687887
Big Data. 2017 Sep;5(3):246-255
pubmed: 28933947
J Comput Aided Mol Des. 2014 Jul;28(7):711-20
pubmed: 24928188
J Chem Inf Model. 2019 Aug 26;59(8):3370-3388
pubmed: 31361484
Nature. 2018 Mar 28;555(7698):604-610
pubmed: 29595767
Science. 2018 Apr 13;360(6385):186-190
pubmed: 29449509
Nat Med. 2018 Sep;24(9):1342-1350
pubmed: 30104768
Nature. 2019 Aug;572(7767):116-119
pubmed: 31367026
J Phys Chem Lett. 2020 Apr 16;11(8):2992-2997
pubmed: 32216310
Sensors (Basel). 2022 Jul 25;22(15):
pubmed: 35898047
Chem Sci. 2020 Nov 5;12(3):1163-1175
pubmed: 36299676
J Chem Inf Model. 2007 Mar-Apr;47(2):342-53
pubmed: 17260980
SAR QSAR Environ Res. 2021 Mar;32(3):207-219
pubmed: 33601989
Chem Sci. 2019 Jul 10;10(35):8154-8163
pubmed: 31857882
Chem Sci. 2018 Nov 26;10(2):370-377
pubmed: 30746086
Brief Bioinform. 2021 Jul 20;22(4):
pubmed: 33147620
Chem Commun (Camb). 2019 Oct 8;55(81):12152-12155
pubmed: 31497831
J Chem Inf Model. 2020 Jun 22;60(6):2697-2717
pubmed: 32243154
J Chem Inf Model. 2022 May 9;62(9):2101-2110
pubmed: 34734699
J Cheminform. 2021 Dec 7;13(1):96
pubmed: 34876230
Chem Sci. 2017 Oct 31;9(2):513-530
pubmed: 29629118
J Chem Inf Model. 2020 Dec 28;60(12):5936-5945
pubmed: 33164522
J Chem Inf Model. 2020 Aug 24;60(8):3770-3780
pubmed: 32702986
Nat Commun. 2021 Mar 16;12(1):1695
pubmed: 33727552
J Phys Chem A. 2020 Oct 15;124(41):8607-8613
pubmed: 32936640
J Chem Inf Model. 2023 Jul 10;63(13):4012-4029
pubmed: 37338239
J Chem Inf Model. 2022 Feb 14;62(3):433-446
pubmed: 35044781
J Chem Inf Model. 2013 Apr 22;53(4):783-90
pubmed: 23521722
ACS Cent Sci. 2019 Sep 25;5(9):1572-1583
pubmed: 31572784
J Chem Inf Model. 2019 Jul 22;59(7):3330-3339
pubmed: 31241929
J Med Chem. 1996 Jul 19;39(15):2887-93
pubmed: 8709122
ACS Cent Sci. 2017 Oct 25;3(10):1103-1113
pubmed: 29104927
J Chem Phys. 2018 Jun 28;148(24):241722
pubmed: 29960322
Chem Sci. 2022 Jan 4;13(4):1152-1162
pubmed: 35211282
J Chem Inf Comput Sci. 2004 May-Jun;44(3):1000-5
pubmed: 15154768
J Chem Inf Model. 2017 Jun 26;57(6):1300-1308
pubmed: 28481528
J Cheminform. 2020 Apr 22;12(1):27
pubmed: 33430978
Science. 2019 Jan 18;363(6424):
pubmed: 30655414
J Comput Aided Mol Des. 2017 Sep;31(9):829-839
pubmed: 28752345
Angew Chem Int Ed Engl. 2005 Feb 25;44(10):1504-8
pubmed: 15674983
Future Med Chem. 2020 Nov;12(22):1995-1999
pubmed: 33124448
J Chem Inf Model. 2022 Jan 10;62(1):16-26
pubmed: 34939786
ACS Cent Sci. 2021 Aug 25;7(8):1356-1367
pubmed: 34471680
Sci Data. 2014 Aug 05;1:140022
pubmed: 25977779