3.6. MODR Construction
From DOE-derived models, knowledge is acquired, allowing access to a region where the behavior of analytical responses and method performance can be understood based on variations in the analytical parameters [
8]. This region, known as the knowledge space, is confined to the experimental domain in the context of empirical models. In
Figure 4 and
Figure 5, the knowledge space is represented by the entire region of the graph. Within the knowledge space, areas where analytical parameters and their ranges fail to meet specifications, termed as the “non-acceptable performance region,” and areas where mean responses meet specifications, referred to as the “acceptable mean performance region” can be identified [
9].
Multiple response optimization tools are used to find the acceptable mean performance region when several responses are studied simultaneously. There are two primary methods for multiple response optimization. The first is graphical optimization, often referred to as overlay graphs, where the contour plots of each response model are superimposed to estimate the region of intersection, thereby providing a region of optimal conditions or an acceptable performance region. The second method is based on the desirability functions (often called ‘numeric optimization’ or Derringer and Suich desirability) [
10]. Initially, individual desirability functions di(ŷi) for each ŷi must be created using fitted models and establishing the optimization criteria. Desirability values always range from 0 to 1, with desirability equal to zero representing an undesired response and di(ŷi) = 1 for the optimal response. Desirability functions can be customized to suit the chosen optimization criteria. These criteria typically fall into the following categories:
Maximize: The objective is to elevate the response variable to its maximum potential. Desirability increases as the response approaches its highest value.
Minimize: This criterion minimizes the response variable, striving for the lowest achievable value. Desirability increases as the response approaches its minimum value.
Target: The focus is on achieving a specific target value for the response variable. Desirability decreases as the response deviates from this target, emphasizing alignment with the desired outcome.
In range: which specifies a range (lower limit - upper limit) for acceptable results.
Cpk: the process capability index, which calculates the number of standard errors of the predicted response that are within the specification limits.
By tailoring desirability functions according to these criteria, the optimization process can effectively pursue the desired objectives for the analytical method. Once all n variables are transformed into desirability functions, they are combined into a single function called overall desirability (D), employed to discover the best combinations of responses. In mathematical terms, the overall desirability is given by the geometric mean of the n individual desirability shown by equation (27) [
10]
In Design Expert, Graphical and Numerical (desirability) optimizations are available, whereas Fusion QbD uses only desirability to find the best overall answer. In Design Expert, the graphical optimization criteria are defined by the values specified by the analyst for the lower and upper limits, or sometimes just one of these limits.
All eight models (3, 20 – 26) were used to find the acceptable mean performance region. In Fusion QbD the responses k BMC, k DMC, and k CUR were maximized with a lower bond of 2 in order to provide adequate retention. To ensure that the peaks were symmetrical, the Tailing BMC, Tailing DMC, and Tailing CUR responses were minimized with the upper bound equal to 1.2. To ensure that adjacent peaks have adequate separation and avoid coelution, the response Rs (BMC, DMC) and Rs (DMC, CUR) were maximized with a lower bound equal to 2. The same criteria were adopted for graphical optimization in Design Expert. The results of the multiple response optimization are shown in Figures 5a and 6a.
In
Figure 5a of Fusion QbD, the white regions indicate areas of acceptable mean performance, while the colored regions denote areas of non-acceptable mean performance. Similarly, in Design Expert’s graphical optimization shown in
Figure 6a, the bright-yellow regions imply acceptable mean performance, and the grey regions indicate unacceptable performance. However, like the mean response surfaces, these regions only represent the average performance and do not guarantee that the method will meet the requirements of its intended use. Furthermore, these regions cannot be referred to MODR because according to GC USP <1220>, MODR is defined as a “multivariate space of analytical procedure parameters that ensure the ATP is fulfilled and therefore provide assurance of the quality of the measured value” [
3]. Moreover, the MODR should ensure quality, which implies that it should represent a robust region as described in ICH Q14 [
37]. Therefore, the design of MODR should incorporate the method uncertainties to determine whether the analytical procedure will assure quality. If a method is robust “on average” but exhibits excessive variation, it may fail to meet its robustness criteria [
26]. Robustness incorporation includes estimating the variability of analytical measurements related to the CMAs. Also, each source of error is identified to increase analytical performance by reducing the variability associated with the analytical parameters.
Fusion QbD uses Monte Carlo simulations to include the measurement uncertainty of model parameters and estimate the probability of meeting CMA specifications. To start the simulations, the maximum expected variation (± 3σ value) of each CMP must be entered. Then, the robustness indices should be selected, and the lower and upper specification limits (LSL and ULS, respectively) (26) defined for each CMA. These specification limits represent the MODR boundaries for each response and are also known as Edges of Failure. In Fusion QbD, process capability indices such as C
p, C
pk, C
pm, and C
pkm are implemented as robustness capability metrics to quantify system robustness [
26]. These are standard Statistical Process Control (SPC) metrics widely used to quantify and evaluate method variation in critical quality and performance characteristics. The C
p index measures the potential capability of a process and is used when the response has a specified maximum allowable variation and symmetrical upper and lower specification limits (USL and LSL). It can be calculated as
where 6σ represents the inherent variation in the average response value, constrained within the limits of a 3σ confidence interval. The C
pk index measures the actual capability of a process accounting for how centered the process is between the specification limits. If the goal of the response is to maximize, only the LSL is established (29). If the objective is to minimize, then only the USL is established (30).
The C
pm index is used when the response has a target value, and the specification limits are symmetrical (31). When the specification limits are asymmetrical the C
pkmindex is used (32)
A Cp = 1.00 indicates that the process variation exactly matches the specification limits, meaning the process spread (±3σ) is equal to the distance between the upper and lower specification limits (USL and LSL). This implies that approximately 99.7% of the process output falls within the specified limits, but it also suggests there is no room for error or drift in the process mean. When Cp = 1.33, the specification limits are four standard deviations (±4σ) away from the process mean, compared to three standard deviations in the case of Cp = 1.00. This allows for greater assurance that the process outputs will remain within the specified limits, even if there are minor variations or shifts in the process. This value suggests that approximately 99.99% of the process output will meet the specified limits A Cp = 2.00 reflects an exceptionally capable process, where the process spread is only half of the specification range. This indicates that almost all output will be within the specification limits, demonstrating a highly robust and reliable process with substantial tolerance for variations in process parameters [
38]. Higher Cp values, therefore, indicate better process capability and reduced risk of producing out-of-specification products. Subsequently, the Monte Carlo simulations method begins by modeling the system, including input factors CMAs and their variability, characterized by probability distributions. The simulation generates numerous sets of input values through random sampling, each representing a possible scenario. The model calculates the corresponding analytical response for each set, repeating this process thousands to millions of times to produce a distribution of outcomes. This distribution reflects the combined measurement uncertainty of the analytical response, accounting for the variability of all input factors. By analyzing the outcomes, Monte Carlo simulations estimate the likelihood that the analytical method meets specified performance criteria under different conditions.
The desired levels of robustness for each study factor computed for the MU estimation were inserted as (A) flow rate ± 0.024 mL/min, (B) % of organic solvent in the mobile phase ±1.5%, and (C) column temperature ± 1.5 °C. The Cpk was the robustness index chosen for all responses. For the k BMC, k DMC, and k CUR responses, the goal was to maximize and the Cp, lower was calculated by setting up the LSL at 2.00 so the compounds have a minimal retention factor of 2. To obtain symmetrical peaks for all compounds, the Tailing responses were minimized and the Cp, upper was calculated setting the ULS of 1.2. For adequate separation of the critical pairs, the Rs (BMC, DMC) and Rs (DMC, CUR) responses were maximized and Cp, lowerwas calculated by setting up the LSL at 2.00. In this study, the risk control strategy involved selecting conditions that achieve Cpk≥ 1.33 for all the responses.
Figure 5b,d,f show that after the robustness assessment using Monte Carlo simulations and the insertion of uncertainties into the MODR, the white region is smaller than that obtained without the robustness assessment. It is also possible to see how authentic replicates impact the MODR by comparing
Figure 5a,b (results of the DOE without authentic replicates) to
Figure 5e,f (results of the DOE with authentic replicates for all the runs). The importance of selecting the models that will be used to build the MODR can also be demonstrated. In
Figure 5c,d, for example, the plots were constructed based on the DOE with authentic replicates using the value of F= 4, which resulted in the inclusion of a non-significant term in the Tailing BMC model with, as explained in the model selection section. When removing this term by increasing the F value to 5, the MODR changes again (
Figure 5e,f).
On the other hand, Design Expert uses the Propagation of Error (POE) method to identify factor settings that minimize the variation transmitted to a response from factors that are not fully controllable, thereby making the process more robust to input variations. This approach involves applying partial derivatives to locate flat areas on the response surface, which are less sensitive to variations. POE methods require a second-order hierarchical response surface model and estimated standard deviations for the numeric factors. These standard deviations can be inputted for all factors on the Column Info Sheet or individually by selecting specific columns or cells and adjusting them in the Design Properties tool. The same uncertainty values used in Fusion QbD were applied in Design Expert. Using this information, the software constructs a response surface map that shows how factor variations affect each response. By employing multiple response optimization, including objectives to minimize the POE, optimal factor settings can be determined to achieve desired response goals with minimal variation [
11]. Robustness assessment in Design Expert utilizes interval estimation to quantify variability and uncertainty in CMAs and CMPs. Interval estimates, including confidence, prediction, and tolerance intervals, can be added to graphical optimization plots (
Figure 6) by selecting the "Show Interval" box and specifying the type and confidence level [
39]
. Unlike confidence intervals (CI) [
40], which quantify the uncertainty of an estimated population variable (such as the mean or standard deviation), prediction and tolerance intervals address the uncertainty of future observations. A CI provides a range that is likely to contain the unknown population value, indicating the precision of a sample estimate (33). Therefore, it is worth emphasizing that the use of CI is not recommended for evaluating robustness.
Where SE is the standard error of the design at x0 and x0 is the expanded point vector. In a predictive model, the expanded point vector is a list of values and their interactions and is a way to represent the settings of the factors for a particular location for purposes of prediction. It can be seen as a single-row matrix, with one element for every term in the model, it resembles a row of the expanded model matrix (X). The elements of the point vector and the terms represented by the model matrix's columns must be in the same order. At last, s is the estimated standard deviation.
In contrast, a prediction interval (PI) [
40] accounts for both the uncertainty in estimating the population mean and the random variation of individual values, thus offering a range within which the average of a future sample is expected to fall (35).
Being SEpred the predicted error for one future response measurement at x0.
Consequently, PIs are always wider than Cs (
Figure 6b,c). Tolerance intervals (TI) [
40] are even broader, as they include a specified proportion of all individual outcomes, accounting for observed and unobserved variability within the population (
Figure 6d). In mathematical terms, TI is represented as
Where p is the number of terms in the model, P the proportion of the population contained in the tolerance interval, φ is the inverse normal function to convert the proportion to a normal score and X
2 is the chi-square critical value [
40]
Estimating prediction and tolerance intervals is crucial for evaluating total analytical error (TAE) and assessing robustness in the Analytical Quality by Design (AQbD) framework. For robustness studies and procedure qualification in the AQbD framework, determining the range where future response values are likely to be found is critical for proper risk assessment. Using wider prediction and tolerance intervals can more accurately resolve this issue. In contrast, confidence intervals are less suitable for quantifying uncertainty in future observations and may not adequately support the design of the MODR [
9]. By incorporating interval estimates, the graphical optimization (
Figure 5) becomes a more powerful tool for visualizing and managing the uncertainty associated with process optimization, thereby enhancing decision-making and process robustness. As shown in
Figure 6, the pale yellow represents the region where the point estimate fulfills the criteria requirements but a portion of its interval estimate does not. This region becomes wider when moving from CI (
Figure 6b) to PI (
Figure 6c) and TI (
Figure 6d).
Despite the differences in the software calculations and insertion of uncertainties, the regions delimited as MODR in Design Expert (
Figure 6d) and Fusion QbD (
Figure 5f) were the same. Once the MODR has been established, verification runs can be carried out to evaluate performance and provide additional evidence that the MODR is valid. Since the MODR is an irregular and multidimensional space, validating or qualifying the entire MODR can be challenging. Therefore, selecting a portion of the MODR for the verification runs is often practical. Fusion QbD verification runs can be performed by inserting the proven acceptable range (PARs), a rectangle added to the overlay graph within the MODR [
26], as shown in
Figure 7.
The conditions chosen for method validation were a flow rate of 0.825 ml min
-1, a temperature of 35° C, and 55 % acetonitrile in the mobile phase, represented by point T in
Figure 7. To check the method's performance, the predicted values of the evaluated responses were compared with the experimental ones (
Table 9).
The Fusion QbD provides the predicted value and a corresponding lower and upper limit of the 2-sigma confidence interval. According to
Table 9, all the experimental results were close to the predicted value and within the 2-sigma confidence interval, ensuring that under these conditions the method meets the quality performance requirements.
It is important to note that if verification runs were conducted after a prolonged time of column use of the column, the experimental values may deviate from the predicted values due to column degradation and a loss of separation efficiency. This is illustrated in
Table 10, where verification runs were performed after a prolonged time of column use. While the retention factor values remained close to the predicted ones and within the confidence interval, the tailing factor values were higher than the predicted, and the resolution values were lower. This indicates a decrease in the column’s efficiency.
In this way, the experimental results must be compared to the acceptance criteria of the CMAs. In this case, the acceptance criteria for the resolution between the critical pairs are met since the resolution values in
Table 10 are greater than 2 at all points. However, the experimental results for tailing are at or slightly above the acceptance criterion of tailing less than 1.2. Although the tailing values are slightly above the acceptance criterion, the resolution values above 2 guarantee the separation of the compounds.