AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
06 02 2023
06 02 2023
Historique:
received:
29
11
2022
accepted:
24
01
2023
entrez:
6
2
2023
pubmed:
7
2
2023
medline:
9
2
2023
Statut:
epublish
Résumé
Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating scoring of compounds with artificial intelligence, few works have bridged these successes back to the virtual screening community in terms of utility and forward-looking development. We demonstrate the power of high-speed ML models by scoring 1 billion molecules in under a day (50 k predictions per GPU seconds). We showcase a workflow for docking utilizing surrogate AI-based models as a pre-filter to a standard docking workflow. Our workflow is ten times faster at screening a library of compounds than the standard technique, with an error rate less than 0.01% of detecting the underlying best scoring 0.1% of compounds. Our analysis of the speedup explains that another order of magnitude speedup must come from model accuracy rather than computing speed. In order to drive another order of magnitude of acceleration, we share a benchmark dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million "in-stock" molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. We believe this is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100 × or even 1000 × faster than current techniques and reduce missing top hits. The technique outlined aims to be a fast drop-in replacement for docking for screening billion-scale molecular libraries.
Identifiants
pubmed: 36747041
doi: 10.1038/s41598-023-28785-9
pii: 10.1038/s41598-023-28785-9
pmc: PMC9901402
doi:
Substances chimiques
Ligands
0
Proteins
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
2105Informations de copyright
© 2023. The Author(s).
Références
Nature. 2020 Apr;580(7805):663-668
pubmed: 32152607
J Chem Inf Model. 2018 Feb 26;58(2):287-296
pubmed: 29309725
J Chem Inf Model. 2017 Apr 24;57(4):942-957
pubmed: 28368587
J Med Chem. 2006 Dec 14;49(25):7466-78
pubmed: 17149875
J Chem Inf Model. 2009 Feb;49(2):444-60
pubmed: 19434845
Nucleic Acids Res. 2016 Jan 4;44(D1):D1202-13
pubmed: 26400175
J Chem Inf Model. 2012 Nov 26;52(11):2864-75
pubmed: 23088335
J Chem Inf Comput Sci. 2002 Nov-Dec;42(6):1273-80
pubmed: 12444722
J Comput Aided Mol Des. 2019 Jan;33(1):19-34
pubmed: 29992528
Proc Natl Acad Sci U S A. 2021 Jul 27;118(30):
pubmed: 34234012
J Chem Inf Model. 2013 Mar 25;53(3):623-37
pubmed: 23432543
J Comput Aided Mol Des. 2011 Jul;25(7):637-47
pubmed: 21618009
PLoS One. 2014 Jan 24;9(1):e85678
pubmed: 24475049
Bioinformatics. 2017 Dec 01;33(23):3836-3843
pubmed: 28369284
J Comput Chem. 2009 Dec;30(16):2785-91
pubmed: 19399780
J Chem Inf Model. 2007 Jan-Feb;47(1):195-207
pubmed: 17238265
J Chem Inf Model. 2021 May 24;61(5):2341-2352
pubmed: 33861591
ACS Cent Sci. 2020 Jun 24;6(6):939-949
pubmed: 32607441
BMC Bioinformatics. 2009 Jun 02;10:168
pubmed: 19486540
Sci Data. 2020 Nov 11;7(1):384
pubmed: 33177514
J Med Chem. 2012 Jul 26;55(14):6582-94
pubmed: 22716043
Nature. 2004 Dec 16;432(7019):862-5
pubmed: 15602552
Nature. 2019 Feb;566(7743):224-229
pubmed: 30728502
Expert Opin Drug Discov. 2019 Jul;14(7):619-637
pubmed: 31025886
BMC Bioinformatics. 2018 Dec 21;19(Suppl 18):491
pubmed: 30577736
Sci Data. 2016 Sep 27;3:160086
pubmed: 27676312
Infect Drug Resist. 2018 Oct 10;11:1645-1658
pubmed: 30349322
Bioinformatics. 2020 Dec 22;36(20):5109-5111
pubmed: 32692801
Curr Protein Pept Sci. 2008 Feb;9(1):1-15
pubmed: 18336319
PLoS Comput Biol. 2015 Dec 02;11(12):e1004586
pubmed: 26629955
Front Chem. 2019 Jul 11;7:498
pubmed: 31355188
J Comput Chem. 2010 Jan 30;31(2):455-61
pubmed: 19499576
J Am Chem Soc. 2009 Jul 1;131(25):8732-3
pubmed: 19505099
J Chem Inf Model. 2020 Sep 28;60(9):4263-4273
pubmed: 32282202
J Chem Inf Model. 2022 Jan 10;62(1):116-128
pubmed: 34793155
Phys Chem Chem Phys. 2016 May 14;18(18):12964-75
pubmed: 27108770
Eur J Pharm Sci. 2020 Oct 1;153:105465
pubmed: 32668312
J Mol Struct. 2021 Mar 15;1228:129433
pubmed: 33071352
Nat Med. 2017 Apr 7;23(4):405-408
pubmed: 28388612
Nucleic Acids Res. 2016 Jan 4;44(D1):D1220-8
pubmed: 26582922
Biomed Pharmacother. 2021 May;137:111313
pubmed: 33556871
PLoS One. 2013 Oct 01;8(10):e75992
pubmed: 24098414
Inf Fusion. 2019 Oct;50:71-91
pubmed: 30467459
Mol Inform. 2020 Aug;39(8):e2000028
pubmed: 32162456
Nucleic Acids Res. 2017 Jan 4;45(D1):D932-D939
pubmed: 27789690
J Cheminform. 2018 Feb 06;10(1):4
pubmed: 29411163
J Chem Inf Model. 2010 May 24;50(5):742-54
pubmed: 20426451
J Chem Inf Model. 2015 Nov 23;55(11):2324-37
pubmed: 26479676
Nucleic Acids Res. 2007 Jan;35(Database issue):D198-201
pubmed: 17145705
J Chem Inf Model. 2020 Dec 28;60(12):5832-5852
pubmed: 33326239
Expert Opin Drug Discov. 2010 Jun 1;5(6):597-607
pubmed: 21532931
iScience. 2021 Feb 19;24(2):102021
pubmed: 33426509
Front Pharmacol. 2020 Dec 18;11:565644
pubmed: 33390943
Comb Chem High Throughput Screen. 2021;24(5):716-728
pubmed: 32798373
BMC Bioinformatics. 2021 May 17;22(1):252
pubmed: 34001007
RNA. 2009 Jun;15(6):1219-30
pubmed: 19369428
Nucleic Acids Res. 2018 Jan 4;46(D1):D1074-D1082
pubmed: 29126136
Clin Microbiol Rev. 2017 Nov 15;31(1):
pubmed: 29142078
Sci Data. 2014 Aug 05;1:140022
pubmed: 25977779