Correctly assessing the formulatability of lead
compounds and selecting the appropriate formulation
strategy is critical for risk reduction and improved
efficiency. Leveraging the approved drug formulation
data, the present research designed and developed
the first data-driven and knowledge-guided
artificial intelligence (AI) system for intelligent
formulation strategy decision-making. First, the
approved small molecule drug formulation data were
compiled, involving both oral and injectable drugs.
The formulation strategy decision route was then
established based on insights gleaned from approved
drugs. Binary classification models were developed
for each step of decision. Given the absence of
exact negative samples in the marketed drug data, we
improved and validated a positive-unlabeled learning
algorithm for scoring and labeling unlabeled data.
Next, the top-performing algorithm was selected from
8 commonly used supervised learning algorithms for
each of the 12 classification tasks, with the
average accuracy, recall, precision, and AUC of 86%,
82%, 86% and 91%, respectively. Lastly, the AI
formulation strategy decision platform named
FormulationDT was successfully constructed by
integrating 12 well-trained models with expert
knowledge, which can be applied at multiple drug
development stages, from lead screening to
commercial formulation development. The first
data-driven and knowledge-guided AI formulation
strategy decision platform, FormulationDT,
demonstrates the value of partially supervised
learning in pharmaceutical decision-making. It holds
significant potential as an in silico tool for
formulatability assessment and formulation strategy
decision, facilitating efficiency gains in drug
discovery and development.
Figure 1: a. Data flow; b. Formulation strategy distribution of marketed small molecule drugs

Table 1: Machine learning task definition and description.

Table 2: The dataset and outcome of positive-unlabeled bagging for 12 tasks

Table 3: The performance of the optimal models for 12 tasks (mean ± standard deviation, 5 repeats)

We tested FormulationAI on the following systems/browsers
OS | Chrome | Firefox | Microsoft Edge | Safari |
---|---|---|---|---|
Linux Ubuntu 20.04 LTS | not tested | 80.01 (64 bit) | n/a | n/a |
Windows 10 | 106.0.5249.119 (64 bit) | 107.0.1 (64 bit) | 105.13.1343.50 (64 bit) | not tested |
Mac OSX | 107.0.5304.121 | not tested | not tested | 16.1 |
Android 11 | not tested | 107.2.0 | 107.0.1418.62 | n/a |