A hybrid modeling approach for assessing mechanistic models of small molecule partitioning in vivo using a machine learning integrated modeling platform
Prediction of the first-in-human dosing regimens is a critical step in drug development and requires accurate quantitation of drug distribution. Traditional in vivo studies used to characterize clinical candidate’s volume of distribution are error-prone, time and cost intensive and lack reproducibility in clinical settings. The paper demonstrates how a computational platform integrating machine learning optimization with mechanistic modeling can be used to simulate compound plasma concentration profile and predict tissue-plasma partition coefficients with high accuracy by varying the lipophilicity descriptor logP. The approach applied to chemically diverse small molecules resulted in comparable geometric mean fold-errors of 1.50 and 1.63 in pharmacokinetic outputs for direct tissue:plasma partition and hybrid logP optimization, with the latter enabling prediction of tissue permeation that can be used to guide toxicity and efficacy dosing in human subjects. The optimization simulations required to achieve these results were parallelized on the AWS cloud and generated outputs in under 5 h. Accuracy, speed, and scalability of the framework indicate that it can be used to assess the relevance of other mechanistic relationships implicated in pharmacokinetic-pharmacodynamic phenomena with a lower risk of overfitting datasets and generate large database of physiologically-relevant drug disposition for further integration with machine learning models.
Pharmacokinetic (PK) predictions of compound disposition are critical for safety and efficacy assessments both before and after drugs enter clinical trials. These predictions are predicated on a thorough understanding of compound absorption, distribution, metabolism, and excretion (ADME). Compounds are required to be characterized thoroughly for ADME properties prior to regulatory approval, and during preclinical drug discovery/development act as key attributes that can determine the compound’s prioritization for further testing1.Thestandard in vivo assay for ADME characterization is the measurement of plasma concentrations over time after an administered dose of a molecule in subjects2.This gives insight into critical parameters driving compound pharmacokinetics; specifically, the standard metrics for PK outputs are area under the curve (AUC), maximum concentration (Cmax), time of maximum concentration (tmax), area under the moment curve (AUMC), steady-state volume of distribution (Vdss), mean residence time (MRT), and half-life (t1/2). These output metrics are quantified using non-compartmental modeling and considered comprehensive for capturing compound PK behavior in vivo3.
Current paradigm in drug development testing. Existing methodologies for PK characterization typically rely on in vivo studies in mice, rats and dogs4. Animal models are the current gold standard for conducting these PK predictions, however, they suffer from an overall low relevance to in vivo clinical studies, as evidenced by the current 92 percent failure rates of compounds that enter the clinic; and of those approximately 16 percent. of compound failures are attributed to ADME-related issues in clinical trials5. These studies are expensive and time-consuming and are known to suffer from translatability issues6. Even so, the standard paradigm for preclinical ADME data translation includes: (1) testing a compound in preclinical subjects and collect outputs, (2) building 1- or 2-compartment model of compound PK, or use physiologically-based pharmacokinetic models (PBPK), (3) fitting model parameters to the compound PK data, (4) scaling optimized parameters to human relevant values using established allometry functions or according to surface area/body weight, and (5) simulating outcomes from an established dosing regimen7.
These developed models are typically tested only once human studies commence. The learnings from the resulting outcome are frequently discarded or not utilized to help inform further model development. The resulting challenge is that these models, although fit to experimental datasets, are rarely improved upon and developed with the clinical outcome in mind3, 6, 8, 9.
An additional challenge with the standard paradigm of studying only systemic PK outputs is the loss of insight into tissue-specific kinetics that can drive compound safety/tox dynamics10. Apart from some indications and mechanisms of action realized in the blood vessel bed, the site of action for the majority of compounds is a non-circulatory tissue or organ system necessitating more specific prediction of drug permeation to a site- of-action implicated in target binding11. Therefore, determination of the tissue:plasma partition coefficients (Kp) for organs and tissues is critical because these properties of drug compounds defines their exposure to the specific receptors. Due to differences in Kp values for different organs, drug target exposure may not be directly correlated with general plasma concentration metrics including Vdss. While VDss shows distribution of the drug compounds between plasma and tissues, it does not count differences of tissue composition and morphology. However, these experiments are expensive and time-consuming12. As such, several in silico methods have been developed to predict Kp values from more easily obtained in vitro data. Using a combination of tissue composition information and the compound’s physicochemical characteristics, such as lipophilicity (logP) and the unbound fraction in plasma (Kp), these methods account for the distribution of the drug between water and drug-binding components, including proteins, lipids, and phospholipids. That is amplified as predictions of drug disposition and effects are becoming more personalized to subjects with conditions directly influencing clinical PK of pharmaceuticals13. In vivo patient stratification can present practical and ethical challenges, and thus benefits from accurate translation of insights generated via preclinical studies14. When resolving the challenge of poor preclinical translatability, a crucial aspect is understanding which mechanistic behaviors are modulating the compound PK. If datasets consist only of the calculated endpoints (AUC, Cmax, tmax, Vdss), it is both challenging to personalize prediction for a specific patient or to leverage advances in statistical learning such as machine learning and deep learning (ML/DL) models for the structure-based prediction of PK parameters15, 16.
Computational predictions of drug distribution. For drugs with particularly narrow therapeutic window, it is particularly important to determine precisely which route of administration (ROA) is optimal as well as maximum tolerated dose (MTD) and minimum effective concentration (MEC) for the specific route. Optimizing an ROA and compound formulation is predicated on, for a given dose, minimizing compound distribution to organs where a toxic burden is suspected and maximizing penetration to a site-of-action. These characteristics fall into the domain of distribution. Overall, tracking the disposition of different active components of therapies (parent and metabolites) not only systemically but to general active and inactive sites is critical for continued model improvement in PK.
There has been a multitude of mathematic relationships developed over the last few decades that explore mechanistic model-based predictions of compound distribution into tissues with Kp values17. There are different model modalities, such as compartment-based models that characterize organs as well-perfused and well-mixed, finite element analysis (FEA), and three-dimensional recapitulations of tissue structures18, 19. In PK contexts mechanistic models of distribution into tissue components (neutral lipids, phospholipids, intracellular water, etc.) are most frequently used to output predictions of tissue:plasma partition coefficients. The more well-characterized approaches are described in works by Poulin and Thiel, Berezhkovskiy, Rodgers and Rowland (Rodgers), and Schmitt et al.20–23. Each of these equations focuses on key assumptions of how compound physicochemical properties and specific physiological parameters interact to present themselves in this organ-specific equilibrium constant. Poulin and Theil model proposes a tissue:plasma partition coefficient prediction that accounts for dissolution into water and nonspecific binding to neutral lipids and phospholipids. Berezhkovskiy’s modified method assumes that only drugs in the water fraction bind to tissues. Rodgers and Rowland approach considers above mentioned approached but also counts the impact of drug ionization on partitioning24. In cases where mechanistic equations are not used to calculate compartment-specific organ partition coefficients, global optimizations of Vdss are performed. As mentioned before, this is a useful indicator for relative compound distribution but lacks the insight to develop next-generation dose optimizations.
Identifiability in computational models. Information from any mathematical model is obtained primarily through parameter inference and predictions on the trajectories of the internal states of the system from experimentally determined measurements. However, a major challenge in extracting this information (especially in the case of biological systems) deals with handling measurements that are not feasible due to experimental limitations. This theoretical phenomenon, i.e. the ability to infer the parameters of the biological system from the observed outputs/measurements is called identifiability25, 26. One such use case relates to predicting the organ-specific drug distribution in PBPK models12. Understanding the site/organ-specific drug distribution is important for evaluating the drug efficacy, safety and insights into other biological mechanisms that are dependent on the drug concentration in that tissue/organ. Since most available in vivo PK data relate to plasma concentrations, obtaining the organ-specific drug distribution remains a significant problem due to identifiability issues as discussed above. The problem is augmented especially in cases where the site or target of action for the drug is in organs or compartments other than plasma. One approach to overcome this problem is by employing mechanistic frameworks either through incorporating mechanisms that govern the physical/chemical interactions between the drug and the tissue or through empirical equations that predict drug distributions based on observations from other cases. Building physiological mechanisms is entirely dependent on the problem under study, i.e. subject to certain drugs and organs of interest and the particular physics underpinning the mechanism of distribution. On the other hand, predictions from empirical equations can be used for any drug or organ of interest based on the behavior of other drugs. Key examples of empirical equations that can predict organ- specific drug distribution are the Rodgers-Rowland (Rodgers) and Poulin equations21, 22. The predictions are not solely based on empirical relationships as certain physical parameters such as the lipophilicity (logP), degree of ionization/nature of the compound (pKa) and protein binding in plasma and tissues (fup) are employed in evaluating the degree of drug distribution. Even though these empirical equations provide us a decent initial estimate on drug distribution values, we cannot rely on them exclusively for accurate predictions. This is because we do not have a complete understanding of how parameters like pKa and logP present themselves in a physiological context for driving the drug distribution between plasma and the site of action. The datasets that were used to validate these relationships tend to be decoupled from systemic studies that are more numerous in avail- ability. Different tissues have different levels of lipids/phospholipids that are highly dependent on the subject or population under consideration and we do not have a good estimate or bounds for these values to predict the drug distribution. Also, certain parameters such as the lipophilicity (logP) for a compound is usually evaluated through octanol:water systems which might not an efficient way to translate to in vivo systems as biological lipids have different partitioning coefficients. Some of the relevant mechanisms and approximations driving distribution are captured in Fig. 1; specifically, these implicate binding of compounds to tissue proteins and Fickian (passive) diffusion of free compound into tissue depots as a result of concentration gradients. Boundary conditions are assumed to be driven by equilibrium partition coefficients across a barrier. All of these factors play a role in distribution, and, effectively, both volume of distribution and organ-specific drug disposition. Therefore, using direct optimization of the Kp values for various organ compartments may be considered more beneficial in comparison to mechanistic models and LogP optimization since it does not count the limitations imposed by suggested lipophilicity, degree of ionization, and protein binding. Therefore, the goal of the present study was to estimate accuracy of the PK simulation using AI-based prediction of the Kp values in comparison to generally accepted mechanistic models.
In this work, we used an AI/ML-based PK-PD modeling platform BIOiSIM and its functionalities to resolving identifiability challenges present in PK distribution simulations through hybrid integration of existing mechanistic models and physicochemical parameter optimization. The selected approach is evaluated for a proof- of-principle application of the BIOiSIM platform to the simulation of drug disposition in vivo for one small molecule compounds, and comparing simulation outputs using a fixed optimization for distribution based on the Rodgers equation to a global optimization of an effective tissue:plasma partition coefficient27, 28. The accuracy of plasmavenous concentration simulation is evaluated across both methods, and insights as well as limitations of the study are discussed.
Results
Sensitivity and convergence testing. The core BIOiSIM model was used in conjunction with existing in vivo datasets to optimize distribution parameters and missing PK parameters—specifically, blood:plasma ratio and first-order absorption rate constant, ka (h−1). The high amplitude oscillations in convergence plots (Fig. 2) correspond to the large steps taken during coarse optimization. Post-selection of the optimal coarse parameter combinations, each of the simulations converged as evidenced by the flat tail of each optimization curve. Overall, there is high confidence in the optimized parameter values as a result of the minimal variation of objective function value at the end of optimization. The final optimized datapoints are expressed in Table S1 for the different configurations that were tested.
Simulation accuracy. The comparison of BIOiSIM simulation accuracy is captured in Figs. 3, 4, 5 and Table 1. The key PK outputs discussed previously were assessed for accuracy using AFE, AAFE and r2 as well as an overall Geometric Mean Fold Error prediction of accuracy for each compound (Fig. 5). The Pearson correlation coefficient values are consistently high across all metrics and optimization conditions (0.6051–0.9974), indicative of good agreement between observed PK outputs and the simulated ones. For first-order PK outputs (AUC0–t, Cmax, Tmax) AAFE < 1.6, indicating that the plasma concentration simulations are of comparable magnitude to those from experimental studies. Additionally, the second-order outputs (AUMC, AUC0–inf, MRT, Vdss) have comparably low AAFE values for Kp and logP-optimized outputs (1.74, 1.36, 2.11, 2.13 for logP; 1.85, 1.37, 2.02, 1.71 for Kp optimization). This is indicative of the model having sufficient mechanistic complexity for capturing the PK disposition of the different molecules.
Between the optimization conditions, Kp optimization performed best overall (GMFE = 1.53) followed by logP optimization (GMFE = 1.69) and Rodgers equation (1.87). This is as expected, given the greater flexibility offered to optimization directly of the distribution-driving parameter Kp and removing the relativistic constraints between organs as a result of the Rodgers equation. Overall, Fig. 6 shows that median values were similar for all of the PK outputs, however the range of fold-errors was greater for Rodgers equation (non-optimized) especially for Cmax, where log(AFE) ranged from − 1.0 to 0.4. The interquartile ranges are centered around the median and log(Predicted/Observed) = 0, and there is a slight bias towards underprediction of all of the parameters as seen by the greater magnitude of negative log(AFE) compared to the maximum log(AFE). This is further confirmed with AFE values consistently less than 1; interestingly, direct Kp optimization showed a greater bias towards underprediction specifically of Vdss and MRT parameters (0.69, 0.61) compared to the other methodologies for prediction. Learn More
-
Most popular related searches
Customer comments
No comments were found for A hybrid modeling approach for assessing mechanistic models of small molecule partitioning in vivo using a machine learning integrated modeling platform. Be the first to comment!