Modeling Drugs-PLGA Nanoparticles Interactions Using Gaussian Processes: Pharmaceutics Informatics Approach

The objective of this study was to correlate the binding of drugs on a very popular nanoparticulate polymeric matrix; PLGA nanoparticles with their main constitutional, electronic and physico-chemical descriptors. Gaussian Processes (GPs) was the artificial intelligence machine learning method that was utilized to fulfil this task. The method could successfully model the results where optimum values of the investigated descriptors of the loaded drugs were deduced. A percentage bias of 12.68% ± 2.1 was obtained in predicting the binding energies of a group of test drugs. As a conclusion, GPs could successfully model the drugs-PLGA interactions associated with a good predicting power. The GPs-predicted binding energies (ΔG) can easily be projected to the drugs loading as was previously proven. Adopting the “Pharmaceutics Informatics” approach can save the pharmaceutical industry and the drug delivery scientists a lot of exerted resources, efforts and time.


Introduction
Poly-lactide poly-glycolide polymeric nanoparticles (PLGA) are nanoscale polymeric drug delivery carriers that were widely used as drug delivery systems specifically for lipophilic drugs. Accurate prediction of drug-loading in these nanoparticulate systems prior to wet-lab experimentations saves valuable time and resources [1,2].
In our previous studies, we have concluded that the integration of some computational methods that span chemoinformatics, molecular dynamics simulations, data mining, molecular docking and mathematical modeling together with machine learning techniques enables the successful prediction of important pharmaceutical objectives such as drug loading, as well as providing important information about the interactions thst encompass different drugs with their carriers and subsequently the successful choice of the optimum drugs-carriers pairs [3][4][5][6].
Machine learning methods can be categorized into two main classes; the supervised and the unsupervised tools [7][8][9]. Supervised methods predict an outcome based on the inputs, and depending on the type of the outcome they can carry-out classification or regression tasks. The supervised machine learning methods include: Artificial neural networks, support vector machines, partial least squares regression and the Gaussian processes.
In this current work, the previously introduced hypothesis of adopting computational approaches to predict drug loading in solid lipid nanoparticles was extended by combining Gaussian process modelling with chemoinformatics, molecular dynamics and molecular docking experiments using different scoring functions.
The all-atom molecular dynamics simulations (MDS) were used aiming for preparing the tripalmitin system, that was further used for the docking experiments. In MDS, each run consists of repetitive steps. The forces exerted on each atom in the simulated system are computed after the numerical solving of Newton's laws of motion. The aformentioned forces are computed utilizing a number of equations and some physical constants which together form the 'force field' used in the MDS. In each step, the positions and velocities of the atoms are updated. MDS are widely used in biophysical and chemical studies, however, its usage in drug formulation investigations is still less common [10,11].
Molecular docking is a different computational method where the affinity of a molecule towards a carrier (receptor) is evaluated by calculating the change in the free energy of binding (DG). Resembling the molecular dynamics simulations, this method highly contributed in speeding up the drug discovery process [12].
In order to obtain meaningful information about drug molecules, calculation of main chemoinformatics descriptors was carried-out [13,14]. These descriptors are then used as inputs for modeling a specific outcome of interest using different machine learning techniques.
The Gaussian processes (GP) are a Gaussian distribution over functions, where the functions are a mean function and a covariance function. GP are non-parametric, which means that there is no specific form of the function that models the relationship between the input(s) and the outcome [15]. GP belong to the supervised learning set of methods, with the additional benefit of providing not only the predictions of outcomes, but also the probability (variance) associated with these predictions. Moreover, GP are less prone to over-fitting, which leads to poor predictions [16].
GP have been found to be of value in the drug delivery contexts and the relevant industries over the last decade. Furthermore, they were recorded to result into more robust models compared to other techniques such as the linear regression and the quantitative structure permeability relationships applied in studying percutaneous absorption of drugs through skin [17][18][19].
For GP to predict an outcome for a given unknown input x*, the predicted mean l* is defined as l* = R T (C ? r 2-I) -1 t, where R is NxL cross-covariance matrix expressed as the kernel k(x,x*) with dimensions N (number of training points) and L (number of test points), C is the training points covariance matrix expressed as k(x,x), I is the identity matrix of dimensions N, r 2 is the variance of the random noise or random error and t is the training points outcome. The variance R* of the predicted mean is calculated as R* = C*-R T (C ? r 2 I) -1 R, with C* being the covariance matrix of the test points expressed as k(x*,x*).
To this end, Gaussian Processes were introduced in this study as a new artificial intelligence technique to model the drug-PLGA matrix binding in an attempt to further predict the loaded masses. Combining chemoinformatics tools together with machine learning methods poses a platform for a new approach in the drug delivery field that could be called ''Pharmaceutics Informatics''.

Results and Discussion
This study attempts to utilize the previously generated important data that can be collected from the literature and exploit them in a new generated correlation as a novel proof-of-concept of the success of combining Bio/ chemoinformatics and machine learning methods (as a type of artificial intelligence) in what we can name as ''Pharmaceutics Informatics'' in modeling and predicting the interactions between different drug-carrier pairs.
Particularly, this concept was previously utilized adopting Gaussian Processes (GPs) in few studies such as Moss et al. and Hathout et al. [19,20]. Figure 1 shows the Gaussian Processes generated 3D surface and contour plots. Figure 1 displays a successful modeling of the investigated drug-carrier pairs. It can be concluded from the results that the increase in an important physico-chemical descriptor such as xLogP, above a value of 4, reflecting the high lipophilicity of the drugs led to the lowest binding energy values (DG) indicating higher interactions and binding with the PLGA simulated nanoparticulate matrices. This is considered a logical inference since PLGA is a hydrophobic carrier that possesses high affinity to likewise molecules and drugs [21]. It can also be inferred from the same figure that decreasing the fragment complexity of the studied molecules leads to better binding of the PLGA. The fragment complexity of the molecule is a constitutional main descriptor that represents the number of bonds, number of non-hydrogen atoms and number of heteroatoms of a molecule [3]. Decreasing the number of bonds and the aforemnentioned atoms could increase the flexibility of the molecules leading to better inclusion in the PLGA matrix and therefore better binding results [22,23].
The right panel of Fig. 1 displays the presence of an optimum value (almost 800 Daltons) for the better binding of the drug molecules with PLGA nanoparticulate matrix. Small molecular weight drugs may have high diffusion rates from the PLGA matrices leading to less binding and further low entrapment efficiencies. On the other hand, drugs possessing high molecular weights above 800 Daltons may lack the ability to be docked inside the pores and voids of the PLGA polymeric matrix [24,25]. Furthermore, it is shown in the figure that the total polar surface area has no obvious impact on the values of the binding energies. This can be ascribed to the hydrophobic nature of the carrier that led to the insignificant effect (P \ 0.05) of the values of this electronic descriptor in the studied range (100-500 Angstrom squared). Gaussian Processes usually posses the ability of extracting the main features (descriptors) that affect a specific outcome [18].
In order to internally validate the Gaussian Processes generated model, the actual obtained binding energies (as previously reported in Hathout et al. [3]) were plotted againsed the predicted binding energy values (Fig. 2). The figure displays the high correlation between the predicted and the actual DG values. Most of the obtained points were highly proximal to the 45°degrees line. This indicates the high similarity between the actual and the predicted values [26][27][28][29][30]30].
In an attempt for the external validation of the Gaussian Processes, a group of four test drugs was fed in the GPs predictor profiler of JMP/SASÒ software package; Aspirin, Prednisolone, Resveratrol and Testosterone (Fig. 3). The percentage bias was calculated for these four additional molecules of the test group (Table 1). The scored % Bias was 12.68% ± 2. This value was in a close agreement of that obtained using Artificial Neural Networks algorithms (11.83) that was calculated for the only two molecules that were involved in this study [3]. This acceptable predicting ability of Gaussian processes was previously reported in several studies [17,18].

Computing the Main Descriptors of the Studied Drugs
The main constitutional, electronic and topological descriptors were computed for the investigated drugs aiming to interpret the differences in docking scores that were observed and subsequently correlate them to their recorded entrapment efficiencies. The chosen descriptors were the molecular weight, xLogP, topological polar surface area and the fragment complexity. These main descriptors were computed using BioclipseÒ version 2.6 (Bioclipse project, Uppsala University, Sweden) after Fig. 3 Calculating the binding energy of four validating drugs on PLGA nanoparticulate matrix a aspirin, b prednisolone, c resveratrol and d testosterone using the prediction profiler tool of SAS/JMPÒ software incorporating the mol files of the drugs that were previously obtained using ChemDrawÒ Ultra version 10.

Modeling Utilizing Gaussian Processes (GPs)
The modeling of the obtained binding energies (output) resulting from the docking experiments was perfomed using GPs modeling in JMPÒ 7.0 (SAS, Cary, NC). The x-factors (inputs) were the descriptors that were computed in ''Computing the Main Descriptors of the Studied Drugs'' section. The sole output was the binding energies (DG).

Validating the Predicting Ability of the Developed GPs
The predicting power of the utilized Gaussian Processes was evaluated after computing the predicted binding energies and contrasting these values with the actual obtained values. Furthermore, the molecular descriptors of two different tested drugs together with their docked binding energies on the PLGA matrices namely; Prednisolone and Testosterone were obtained from Metwally and Hathout [3,4]. Two additional test drugs were added, namely; Aspirin and Resveratrol where their important constitutional, electronic and physico-chemical descriptors were calculated using BioclipseÒ v.2.6 (Bioclipse project, Uppsala University, Sweden). The docking experiment was also performed using AutoDock VinaÒ and the virtually simulated PLGA matrix that was built using the same parameters that were used previously by Metwally and Hathout [3]. Afterwards, the main descriptors of each test drug were fed into the prediction profiler of the constructed GPs in ''Modeling Utilizing Gaussian Processes (GPs)'' section. and consequently their corresponding binding energies were generated and contrasted to the actual docking results. Consequently, the percentage bias (% Bias) was computed between the actual and predicted values. The results were compared to the value that was previously reported utilizing ANN [3]. The percentage bias was calculated according to the following equation [30]:

Conclusions
Gaussian Processes was successfully introduced in this study as an artificial intelligence technique that could correlate the important constitutional, electronic and physico-chemical descriptors such as the molecular weight, fragment complexity and total polar surface area and xLogP of several investigated drugs with their binding energies on a virtually simulated PLGA nanoparticulate matrix. It was concluded that molecules possessing a molecular weight around 800 Daltons, decreased fragment complexity and xLogP above 4 would lead to the lowest (more negative) binding energy values and hence stronger binding and interaction with the PLGA matrix. Moreover, the obtained predicting power of the utilized GPs was highly acceptable (12.38% ± 2.1). Projecting the binding energies results into masses of loaded drugs per 100 mg of the carrier matrix was previously attained using a simple mathematical relationship. Accordingly the loading masses can be further predicted. To this end, the introduced technique provides a new proof-of-concept of the success of what we can name as ''Pharmaceutics Informatics'' tools in providing clues for drug delivery scientists and formulators on the good selection of the drug-carrier pairs. Adopting this new technique would save the research and development sectors in the pharmaceutical industry and researches in universities and institutes huge amounts of resources, efforts and time that are usually spent in the wet-lab real experiments.
Author Contributions Conceptualization, RMH; methodology, RMH, OMA, DSA and AAM; validation, RMH and AAM; investigation, RMH and AAM; resources, RMH and AAM; data curation, RMH and AAM; writing-original draft preparation, RMH and AAM; writing-review and editing, RMH and AAM; visualization, RMH and AAM;. All authors have read and agreed to the published version of the manuscript.
Funding This research received no external funding.

Declarations
Conflict of interest The authors declare no conflict of interest.