Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs.
Journal
ACS omega
ISSN: 2470-1343
Titre abrégé: ACS Omega
Pays: United States
ID NLM: 101691658
Informations de publication
Date de publication:
04 Jul 2023
04 Jul 2023
Historique:
received:
23
02
2023
accepted:
06
06
2023
medline:
10
7
2023
pubmed:
10
7
2023
entrez:
10
7
2023
Statut:
epublish
Résumé
Therapeutic peptides and proteins derived from either endogenous hormones, such as insulin, or de novo design via display technologies occupy a distinct pharmaceutical space in between small molecules and large proteins such as antibodies. Optimizing the pharmacokinetic (PK) profile of drug candidates is of high importance when it comes to prioritizing lead candidates, and machine-learning models can provide a relevant tool to accelerate the drug design process. Predicting PK parameters of proteins remains difficult due to the complex factors that influence PK properties; furthermore, the data sets are small compared to the variety of compounds in the protein space. This study describes a novel combination of molecular descriptors for proteins such as insulin analogs, where many contained chemical modifications, e.g., attached small molecules for protraction of the half-life. The underlying data set consisted of 640 structural diverse insulin analogs, of which around half had attached small molecules. Other analogs were conjugated to peptides, amino acid extensions, or fragment crystallizable regions. The PK parameters clearance (CL), half-life (T1/2), and mean residence time (MRT) could be predicted by using classical machine-learning models such as Random Forest (RF) and Artificial Neural Networks (ANN) with root-mean-square errors of CL of 0.60 and 0.68 (log units) and average fold errors of 2.5 and 2.9 for RF and ANN, respectively. Both random and temporal data splittings were employed to evaluate ideal and prospective model performance with the best models, regardless of data splitting, achieving a minimum of 70% of predictions within a twofold error. The tested molecular representations include (1) global physiochemical descriptors combined with descriptors encoding the amino acid composition of the insulin analogs, (2) physiochemical descriptors of the attached small molecule, (3) protein language model (evolutionary scale modeling) embedding of the amino acid sequence of the molecules, and (4) a natural language processing inspired embedding (mol2vec) of the attached small molecule. Encoding the attached small molecule via (2) or (4) significantly improved the predictions, while the benefit of using the protein language model-based encoding (3) depended on the used machine-learning model. The most important molecular descriptors were identified as descriptors related to the molecular size of both the protein and protraction part using Shapley additive explanations values. Overall, the results show that combining representations of proteins and small molecules was key for PK predictions of insulin analogs.
Identifiants
pubmed: 37426277
doi: 10.1021/acsomega.3c01218
pmc: PMC10324072
doi:
Types de publication
Journal Article
Langues
eng
Pagination
23566-23578Informations de copyright
© 2023 The Authors. Published by American Chemical Society.
Déclaration de conflit d'intérêts
The authors declare the following competing financial interest(s): KAE, KMB, KI, MT, NRK, SF & HHFR are all employees and minor stockholders at Novo Nordisk A/S. Only a very small number of therapeutic proteins with small-molecule attachments has publicly available in vivo PK data. The manuscript is therefore based on Novo Nordisk A/S proprietary data.
Références
J Chem Inf Model. 2019 Sep 23;59(9):3968-3980
pubmed: 31403793
Clin Pharmacokinet. 2006;45(5):511-42
pubmed: 16640456
Biomedicines. 2021 Jan 05;9(1):
pubmed: 33466380
Nature. 2013 Jan 10;493(7431):241-5
pubmed: 23302862
PLoS One. 2018 Jun 1;13(6):e0196829
pubmed: 29856745
BMC Bioinformatics. 2020 Jun 9;21(1):235
pubmed: 32517697
Curr Top Med Chem. 2008;8(18):1555-72
pubmed: 19075767
J Chem Inf Model. 2013 Apr 22;53(4):783-90
pubmed: 23521722
Trends Biotechnol. 2015 Jan;33(1):27-34
pubmed: 25488117
J Comput Aided Mol Des. 2020 Jul;34(7):709-715
pubmed: 32468207
J Chem Inf Model. 2019 Nov 25;59(11):4893-4905
pubmed: 31714067
J Med Chem. 2015 Sep 24;58(18):7370-80
pubmed: 26308095
Clin Pharmacokinet. 2011 May;50(5):331-47
pubmed: 21456633
Nucleic Acids Res. 2022 Jul 5;50(W1):W510-W515
pubmed: 35648435
J Chem Inf Model. 2020 Oct 26;60(10):4603-4613
pubmed: 32804486
Mol Pharm. 2021 Mar 1;18(3):1071-1079
pubmed: 33512165
J Med Chem. 2021 Jan 14;64(1):616-628
pubmed: 33356257
MAbs. 2021 Jan-Dec;13(1):1932230
pubmed: 34116620
Mol Pharm. 2021 Dec 6;18(12):4520-4530
pubmed: 34758626
Bioinformatics. 2018 Aug 1;34(15):2605-2613
pubmed: 29554211
J Cheminform. 2021 Feb 8;13(1):7
pubmed: 33557952
Nat Mach Intell. 2020 Jan;2(1):56-67
pubmed: 32607472
Drug Discov Today. 2022 Feb;27(2):529-537
pubmed: 34592448
J Chem Inf Comput Sci. 2002 Nov-Dec;42(6):1273-80
pubmed: 12444722
Bioinformatics. 2018 Aug 1;34(15):2642-2648
pubmed: 29584811
Nat Rev Drug Discov. 2021 Apr;20(4):309-325
pubmed: 33536635
Front Robot AI. 2019 Nov 05;6:108
pubmed: 33501123
PLoS One. 2017 Jul 31;12(7):e0181748
pubmed: 28759605
Diabetol Metab Syndr. 2015 Jun 26;7:57
pubmed: 26136850
Expert Opin Biol Ther. 2016 Jul;16(7):903-15
pubmed: 26967759
J Chem Inf Model. 2018 Jan 22;58(1):27-35
pubmed: 29268609
Nat Rev Drug Discov. 2023 Jan;22(1):59-80
pubmed: 36002588
Neural Comput. 1998 Sep 15;10(7):1895-1923
pubmed: 9744903
Biochemistry. 2008 Apr 22;47(16):4743-51
pubmed: 18376848
Mol Pharm. 2019 Feb 4;16(2):533-541
pubmed: 30571137
J Chem Inf Model. 2010 May 24;50(5):742-54
pubmed: 20426451
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15):
pubmed: 33876751
J Med Chem. 2021 Jul 8;64(13):8942-8950
pubmed: 33944562
Diabetes Care. 1999 Sep;22(9):1501-6
pubmed: 10480516
J Comput Aided Mol Des. 2016 Aug;30(8):595-608
pubmed: 27558503
Drug Discov Today. 2018 Jun;23(6):1241-1250
pubmed: 29366762
ACS Med Chem Lett. 2018 Jun 15;9(7):577-580
pubmed: 30034579
J Chem Inf Model. 2020 Jun 22;60(6):2773-2790
pubmed: 32250622
JCI Insight. 2019 Feb 26;5:
pubmed: 30830873