Query matters : how selection strategies influence active learning in drug discovery
Williams, Huw J. and Pickett, Stephen D. and Baxter, Andrew and Palmer, David S. (2026) Query matters : how selection strategies influence active learning in drug discovery. Journal of Chemical Information and Modeling. ISSN 1549-9596 (https://doi.org/10.1021/acs.jcim.5c02504)
Preview |
Text.
Filename: Williams-etal-JCIM-2026-how-selection-strategies-influence-active-learning-in-drug-discovery.pdf
Final Published Version License:
Download (10MB)| Preview |
Abstract
We present SimDMTA, an in silico framework designed to simulate the Design–Make–Test–Analyze (DMTA) cycle used in preclinical drug discovery. Using docking scores as a proxy for biological assays, the simulations allow factors controlling the efficiency of the DMTA cycle to be explored in a manner that would not be feasible using traditional experiments due to time and cost constraints. In this workflow, a machine learning model predicts docking scores, selects compounds using various query strategies, docks selected molecules, and retrains iteratively. Starting from a broad chemical space, the model actively samples molecules derived from a 3,5-dimethyl-4-phenylisoxazole scaffold, an active warhead for the Bromodomain 4 (BRD4) BD1 binding site, to refine its predictions. Our results show that uncertainty-based sampling significantly outperforms greedy and hybrid approaches in both hit discovery and the ability of the model that predicts docking scores to generalize beyond its training set. Notably, by the final iteration, 37 of the top 50 ranked compounds were within the top 1% of the chemical space of all evaluated compounds. Strategies that include some random selection correct systematic biases more rapidly, but are less effective at predicting top-performing molecules. These findings underscore the value of incorporating molecular diversity and uncertainty into design strategies. While such strategies may deprioritize those molecules with the highest absolute predictions in early rounds, they markedly accelerate model refinement, ultimately leading to more effective hit identification in discovery driven by active learning.
ORCID iDs
Williams, Huw J.
ORCID: https://orcid.org/0009-0007-6196-3627, Pickett, Stephen D., Baxter, Andrew and Palmer, David S.
ORCID: https://orcid.org/0000-0003-4356-9144;
-
-
Item type: Article ID code: 95343 Dates: DateEvent26 February 2026Published26 February 2026Published Online15 January 2026AcceptedSubjects: Education > Theory and practice of education
Science > ChemistryDepartment: Faculty of Science > Pure and Applied Chemistry Depositing user: Pure Administrator Date deposited: 19 Jan 2026 12:30 Last modified: 08 Mar 2026 01:57 Related URLs: URI: https://strathprints.strath.ac.uk/id/eprint/95343
Tools
Tools






