A Deterministic Global Optimization Framework via Monte Carlo Region Integration: Rigorous Convergence, KKT Equivalence, and Multi-Objective Extension

Shin-Ichi Inage; Kouei Ajito

doi:10.20944/preprints202604.0114.v1

3. The MOST Framework (Unconstrained Case)

This chapter introduces the foundational structure of the Monte Carlo Stochastic Optimization Technique (MOST) in the unconstrained single-objective setting. Unlike classical optimization methods that rely on pointwise evaluations or gradient information, MOST operates on a region-based principle, combining deterministic domain partitioning with Monte Carlo integration. This unique combination enables both robustness against multimodality and deterministic geometric convergence.

3.1. Problem Formulation

We consider the unconstrained minimization problem:

\begin{matrix} \underset{x \in Ω}{m i n} f (x), & (29) \end{matrix}

where

Ω \subset R^{d}

is a bounded hyperrectangle defined in (1), and

f

satisfies the regularity assumptions introduced in Chapter 2.

3.2. Core Idea of MOST

The key idea of MOST is to replace pointwise evaluation with integral-based evaluation over regions.

For a measurable subset

R \subset Ω

, define:

\begin{matrix} I (R) = \int_{R} f (x) d x . & (30) \end{matrix}

Instead of directly minimizing

f (x)

, MOST iteratively identifies subregions with smaller integral values.

This approach has a fundamental consequence:

Pointwise local minima influence only a small portion of the region,
While global structure dominates the integral.

This property distinguishes MOST from classical direct search methods [7] and deterministic global optimization techniques [18,19,20].

3.3. Recursive Partitioning of the Search Domain

Let

R_{0} = Ω

. At iteration

n

, the current region

R_{n}

is partitioned into two subregions along a selected coordinate axis. Formally, for a chosen coordinate

k \in {1, \dots, d}

:

\begin{matrix} R_{n} = R_{n}^{(1)} \cup R_{n}^{(2)}, R_{n}^{(1)} \cap R_{n}^{(2)} = \emptyset, & (31) \end{matrix}

where each subregion satisfies:

\begin{matrix} d i a m (R_{n}^{(i)}) \leq \frac{1}{2} d i a m (R_{n}) . & (32) \end{matrix}

This binary partitioning is repeated recursively, ensuring systematic reduction of the search domain.

3.4. Monte Carlo Evaluation of Subregions

The integral

I (R)

is approximated using Monte Carlo sampling. Let

X_{1}, \dots, X_{N} \sim Uniform (R)

. Then:

\begin{matrix} {\hat{I}}_{N} (R) = \frac{∣ R ∣}{N} \sum_{i = 1}^{N} f (X_{i}) . & (33) \end{matrix}

From (27)–(28), we have:

\begin{matrix} {\hat{I}}_{N} (R) \to I (R) almost surely as N \to \infty, & (34) \end{matrix}

and concentration bounds ensure finite-sample accuracy.

3.5. Region Selection Rule

At each iteration, MOST selects the subregion with the smaller estimated integral:

\begin{matrix} R_{n + 1} = \{\begin{matrix} R_{n}^{(1)} & if {\hat{I}}_{N} (R_{n}^{(1)}) \leq {\hat{I}}_{N} (R_{n}^{(2)}), \\ R_{n}^{(2)} & otherwise . \end{matrix} & (35) \end{matrix}

This deterministic selection rule is central to the convergence properties of MOST.

3.6. Deterministic Shrinking Property

Because each iteration halves the region along one coordinate direction, the diameter satisfies:

\begin{matrix} d i a m (R_{n}) \leq C 2^{- n}, & (36) \end{matrix}

for some constant

C > 0

. This implies geometric decay of the search region, independent of the objective function’s complexity.

3.7. Comparison with Classical Optimization Methods

MOST differs fundamentally from existing methods:

(i) Gradient-based methods [1,2,3,4,5,6]

Require differentiability,
Sensitive to local minima.

(ii) Evolutionary algorithms [8,9,10,11]

Stochastic,
Lack deterministic convergence guarantees.

(iii) Deterministic global optimization [18,19,20]

Require Lipschitz constants or bounding functions,
Often computationally expensive.

(iv) Bayesian optimization [15,16,17]

Model-dependent,
Limited scalability in high dimensions.

In contrast, MOST:

Requires no gradient,
Does not rely on surrogate models,
Ensures deterministic region shrinking,
Exploits integral averaging to mitigate multimodality.

3.8. Integral Averaging Effect

A crucial property of MOST is the smoothing effect of integration. Let

f

contain multiple local minima. Then for a region

R

:

\begin{matrix} \underset{x \in R}{m i n} f (x) \leq \frac{1}{∣ R ∣} I (R) \leq \underset{x \in R}{m a x} f (x) . & (37) \end{matrix}

As the region shrinks:

\begin{matrix} \frac{1}{∣ R ∣} I (R) \to f (x^{*}), & (38) \end{matrix}

where

x^{*}

is a minimizer. Thus, narrow local minima have diminishing influence on the integral, while the global minimum dominates.

This mechanism explains the robustness of MOST against multimodal landscapes, consistent with observations in [26,27].

3.9. Relation to Existing Global Optimization Frameworks

MOST shares conceptual similarities with:

Branch-and-bound methods [18],
DIRECT algorithm [19],
Lipschitz optimization [20],

but differs in a key aspect: evaluation is based on region integrals rather than bounds or pointwise estimates.

This distinction removes the need for explicit Lipschitz constants and simplifies implementation.

3.10. Summary of the MOST Framework

The unconstrained MOST algorithm consists of:

Domain initialization: $R_{0} = Ω$

2.: Recursive binary partitioning (31)
3.: Monte Carlo evaluation (33)
4.: Deterministic selection (35)
5.: Geometric shrinking (36)

These properties collectively establish MOST as a deterministic, derivative-free optimization framework with strong robustness against multimodality. The theoretical implications of this structure—particularly global convergence and probabilistic robustness—will be rigorously established in Chapters 5 and 6.

4. Deterministic Region Shrinking and Geometric Convergence

This chapter establishes one of the most fundamental properties of the Monte Carlo Stochastic Optimization Technique (MOST): the deterministic shrinking of the search region and the resulting geometric convergence. Unlike stochastic optimization methods, where convergence is typically asymptotic and probabilistic, MOST guarantees a strict and explicit contraction of the search domain at each iteration. This property forms the backbone of the convergence theory developed in later chapters.

4.1. Deterministic Region Shrinking

Let

R_{0} = Ω

be the initial domain, and let

\{R_{n}}_{n \geq 0}

denote the sequence of regions generated by MOST. At each iteration, the current region

R_{n}

is bisected along one coordinate axis, as defined in (31). By construction, this partition satisfies:

\begin{matrix} R_{n} = R_{n}^{(1)} \cup R_{n}^{(2)}, R_{n}^{(1)} \cap R_{n}^{(2)} = \emptyset . & (39) \end{matrix}

Each subregion is obtained by halving the interval along a selected coordinate direction. Therefore, for the Euclidean diameter defined in (7), we have:

\begin{matrix} d i a m (R_{n}^{(i)}) \leq \frac{1}{2} d i a m (R_{n}), i = 1,2 . & (40) \end{matrix}

Since the algorithm selects one of these subregions as

R_{n + 1}

, it follows that:

\begin{matrix} d i a m (R_{n + 1}) \leq \frac{1}{2} d i a m (R_{n}) . & (41) \end{matrix}

4.2. Recursive Diameter Reduction

By recursively applying (41), we obtain:

\begin{matrix} d i a m (R_{n}) \leq {(\frac{1}{2})}^{n} d i a m (R_{0}) . & (42) \end{matrix}

Letting

C = d i a m (R_{0})

, we obtain the fundamental estimate:

\begin{matrix} d i a m (R_{n}) \leq C 2^{- n} . & (43) \end{matrix}

This proves that the search region shrinks at a deterministic geometric rate.

4.3. Diameter Shrinking Theorem

We now formalize this result.

Theorem 1 (Deterministic Geometric Shrinking)

Let

\{R_{n}\}

be the sequence of regions generated by MOST. Then:

\begin{matrix} d i a m (R_{n}) = O (2^{- n}) . & (44) \end{matrix}

Proof

From (41), we have the recurrence:

d i a m (R_{n + 1}) \leq \frac{1}{2} d i a m (R_{n}) .

Applying this inequality recursively yields:

d i a m (R_{n}) \leq {(\frac{1}{2})}^{n} d i a m (R_{0}) .

Thus:

d i a m (R_{n}) = O (2^{- n}) .

4.4. Convergence of Representative Points

Let

x_{n} \in R_{n}

be an arbitrary representative point (e.g., the center of

R_{n}

). Then for any

x^{*} \in ⋂_{n} R_{n}

, we have:

\begin{matrix} ∥ x_{n} - x^{*} ∥ \leq d i a m (R_{n}) . & (45) \end{matrix}

Combining (43) and (45):

\begin{matrix} ∥ x_{n} - x^{*} ∥ \leq C 2^{- n} . & (46) \end{matrix}

Thus, the sequence

\{x_{n}\}

converges geometrically to a limit point.

4.5. Interpretation of Geometric Convergence

Equation (46) implies:

▪: The convergence rate is exponential,
▪: The error is halved at every iteration,
▪: No assumptions on convexity or smoothness are required for this geometric contraction.

This contrasts sharply with classical methods:

▪: Gradient descent: typically, sublinear or linear convergence [1],
▪: Newton-type methods: quadratic but local [1,2],
▪: Evolutionary algorithms: no explicit rate [8,9,10,11].

MOST achieves: global geometric contraction of the search domain independently of the objective function’s landscape.

4.6. Independence from Objective Function

A key feature of MOST is that the shrinking property is purely algorithmic, not dependent on

f

.

That is, (43) holds regardless of:

▪: convexity,
▪: differentiability,
▪: multimodality,
▪: noise structure.

This distinguishes MOST from:

▪: Lipschitz optimization methods requiring bounds [18,19,20],
▪: Trust-region methods requiring local models [5],
▪: Stochastic approximation methods relying on noise conditions [13,14].

4.7. Relation to Deterministic Global Optimization

The geometric shrinking property places MOST in the class of deterministic global optimization methods.

However, unlike:

▪: branch-and-bound [18],
▪: DIRECT [19,33],

MOST does not require:

▪: Lipschitz constants,
▪: bounding functions,
▪: heuristic selection criteria.

Instead, the region selection is driven solely by integral comparison. This leads to a simpler and more robust framework.

4.8. Consequences for Convergence Theory

The deterministic shrinking property established in this chapter has several important implications:

Compactness of the search sequence

\begin{matrix} ⋂_{n = 0}^{\infty} R_{n} \neq \emptyset . & (47) \end{matrix}

2.: Existence of limit points

\begin{matrix} x_{n} \to x^{*} \in Ω . & (48) \end{matrix}

3.: Reduction of global optimization to region selectionOnce geometric shrinking is established; the remaining challenge is to ensure that the correct region is selected.

This final point is critical: Convergence reduces to the correctness of region selection. This issue will be addressed rigorously in Chapter 5.

4.9. Summary

In this chapter, we have established that:

▪: MOST reduces the search region deterministically,
▪: The diameter shrinks exponentially as $O (2^{- n}),$
▪: Representative points converge geometrically,
▪: This property is independent of the objective function.

This deterministic geometric convergence is a central pillar of the MOST framework and provides the structural basis for the global convergence theory developed in subsequent chapters.

5. Global Convergence via Integral-Based Selection

This chapter establishes the global convergence mechanism of MOST through a non-circular argument based on integral comparison over shrinking regions. While Chapter 4 guarantees deterministic geometric contraction of the search domain, convergence to a global minimizer requires demonstrating that the sequence of selected regions does not exclude the optimal point. This is achieved by exploiting the asymptotic behavior of integral evaluations under Lipschitz continuity.

5.1. Problem Setting

We consider the unconstrained optimization problem (29) over a compact domain

Ω \subset R^{d}

.

Assume:

(A1) Lipschitz continuity

\begin{matrix} ∣ f (x) - f (y) ∣ \leq L ∥ x - y ∥, \forall x, y \in Ω, & (49) \end{matrix}

(A2) Existence of global minimizer

\begin{matrix} x^{*} \in Ω, f (x^{*}) = \underset{x \in Ω}{m i n} f (x) . & (50) \end{matrix}

Let

\{R_{n}\}

be the sequence of regions generated by MOST.

5.2. Fundamental Property of Integral Evaluation

Let

R \subset Ω

be a measurable region containing

x^{*}

. Then:

\begin{matrix} I (R) = \int_{R} f (x) d x . & (51) \end{matrix}

Using Lipschitz continuity, for any

x \in R

:

\begin{matrix} f (x) = f (x^{*}) + O (∥ x - x^{*} ∥) . & (52) \end{matrix}

Integrating over

R

, we obtain:

\begin{matrix} I (R) = f (x^{*}) ∣ R ∣ + O (d i a m (R) ∣ R ∣) . & (53) \end{matrix}

Thus:

\begin{matrix} \frac{I (R)}{∣ R ∣} = f (x^{*}) + O (d i a m (R)) . & (54) \end{matrix}

This result shows that the average value over a region converges to the optimal value as the region shrinks.

5.3. Key Lemma: Integral Separation (Non-Circular)

We now establish the core lemma ensuring correct region selection.

Lemma 2 (Integral Separation Lemma)

Let

R^{*} \subset Ω

be a region containing the global minimizer

x^{*}

, and let

R \subset Ω

be any region not containing

x^{*}

.

Then there exists

δ > 0

and

n_{0}

such that for all

n \geq n_{0}

:

\begin{matrix} \frac{I (R_{n}^{*})}{∣ R_{n}^{*} ∣} < \frac{I (R_{n})}{∣ R_{n} ∣} - δ . & (55) \end{matrix}

Proof

Since

x^{*} \notin R

, by continuity of

f

, there exists

ϵ > 0

such that:

\begin{matrix} f (x) \geq f (x^{*}) + ϵ, \forall x \in R . & (56) \end{matrix}

Thus:

\begin{matrix} \frac{I (R)}{∣ R ∣} \geq f (x^{*}) + ϵ . & (57) \end{matrix}

On the other hand, for regions

R_{n}^{*}

containing

x^{*}

, from (54):

\begin{matrix} \frac{I (R_{n}^{*})}{∣ R_{n}^{*} ∣} = f (x^{*}) + O (d i a m (R_{n}^{*})) . & (58) \end{matrix}

From Chapter 4:

\begin{matrix} d i a m (R_{n}^{*}) \to 0 . & (59) \end{matrix}

Thus, for sufficiently large

n

:

\begin{matrix} \frac{I (R_{n}^{*})}{∣ R_{n}^{*} ∣} \leq f (x^{*}) + \frac{ϵ}{2} . & (60) \end{matrix}

Combining (57) and (60):

\begin{matrix} \frac{I (R_{n}^{*})}{∣ R_{n}^{*} ∣} < \frac{I (R)}{∣ R ∣} - \frac{ϵ}{2} . & (61) \end{matrix}

Setting

δ = ϵ / 2

, the result follows.

5.4. Main Theorem: Global Convergence

We now establish the global convergence property of MOST.

Theorem 2 (Global Convergence via Integral Selection)

Let

\{R_{n}\}

be the sequence generated by MOST. Then:

\begin{matrix} x^{*} \in R_{n} for infinitely many n . & (62) \end{matrix}

Proof

At each iteration, MOST selects the region with the smallest estimated integral (35).

From Lemma 2, for sufficiently large

n

, any region not containing

x^{*}

has strictly larger average value than regions containing

x^{*}

.

Thus, the region containing

x^{*}

is always preferred once the region diameter becomes sufficiently small. Therefore, the selection mechanism cannot permanently exclude regions containing

x^{*}

.

Hence:

x^{*} \in R_{n} infinitely often .

5.5. Convergence of the Algorithm

Combining Theorem 1 (Chapter 4) and Theorem 2:

\begin{matrix} d i a m (R_{n}) \to 0, x^{*} \in R_{n} infinitely often . & (63) \end{matrix}

Thus, any sequence

x_{n} \in R_{n}

contains a subsequence converging to

x^{*}

:

\begin{matrix} x_{n} \to x^{*} . & (64) \end{matrix}

This establishes global convergence.

5.6. Relation to Branch-and-Bound and DIRECT Methods

The result obtained above places MOST within the broader class of deterministic global optimization methods.

Classical methods such as:

▪: branch-and-bound [18],
▪: Lipschitz optimization [20],
▪: DIRECT algorithm [19,33],

rely on explicit lower bounds or Lipschitz constants to guarantee global convergence.

In contrast, MOST achieves global convergence through:

\begin{matrix} integral - based ordering of regions . & (65) \end{matrix}

This avoids:

▪: explicit bounding functions,
▪: Lipschitz constant estimation,
▪: heuristic selection rules.

Thus, MOST can be interpreted as: a measure-theoretic analogue of branch-and-bound methods where:

bounds are replaced by averages,
worst-case estimates are replaced by integral smoothing.

5.7. Summary

In this chapter, we have shown that:

Integral evaluation approximates the optimal value with error $O (d i a m (R)),$
Regions not containing the minimizer have strictly larger average values,
The selection rule ensures that regions containing the minimizer are repeatedly selected,
Combined with geometric shrinking, this yields global convergence.

This result is non-circular and constitutes the theoretical core of the MOST framework.

6. Monte Carlo Error Analysis and Probabilistic Guarantees

In practical implementations of the Monte Carlo Stochastic Optimization Technique (MOST), the exact evaluation of regional integrals is replaced by Monte Carlo estimators. While Chapter 5 established global convergence under exact integral comparison, it is essential to demonstrate that this convergence mechanism remains valid under stochastic approximation.

This chapter provides a rigorous probabilistic analysis showing that Monte Carlo errors do not compromise convergence. In particular, we establish exponential error bounds, derive sufficient conditions for correct region selection, and prove almost sure convergence of the algorithm.

6.1. Error Model

For any measurable region

R \subset Ω

, the exact integral is defined as:

\begin{matrix} I (R) = \int_{R} f (x) d x . & (66) \end{matrix}

In practice, this quantity is approximated by Monte Carlo sampling:

\begin{matrix} {\hat{I}}_{N} (R) = \frac{∣ R ∣}{N} \sum_{i = 1}^{N} f (X_{i}), & (67) \end{matrix}

where

X_{i} \sim U n i f o r m (R)

. We define the estimation error:

\begin{matrix} ϵ_{R} = {\hat{I}}_{N} (R) - I (R) . & (68) \end{matrix}

Thus:

\begin{matrix} {\hat{I}}_{N} (R) = I (R) + ϵ_{R} . & (69) \end{matrix}

Assumption B1 (Boundedness)

There exist constants

a, b \in R

such that:

\begin{matrix} f (x) \in [a, b], \forall x \in Ω . & (70) \end{matrix}

Under this assumption:

\begin{matrix} E [{\hat{I}}_{N} (R)] = I (R) . & (71) \end{matrix}

6.2. Hoeffding-Type Concentration Bound

Since

f (X_{i})

are independent and bounded random variables, Hoeffding’s inequality [32] yields:

\begin{matrix} P (∣ \frac{1}{N} \sum_{i = 1}^{N} f (X_{i}) - E [f (X)] ∣> δ) \leq 2 e x p (- \frac{2 N δ^{2}}{(b− a)^{2}}) . & (72) \end{matrix}

Multiplying by

∣ R ∣

, we obtain:

\begin{matrix} P (∣ ϵ_{R} ∣ > δ) \leq 2 e x p (\frac{2 N δ^{2}}{(b− a)^{2} ∣ R ∣^{2}}) . & (73) \end{matrix}

Define:

\begin{matrix} c = \frac{2}{(b− a)^{2} ∣ R ∣^{2}}, & (74) \end{matrix}

then:

\begin{matrix} P (∣ ϵ_{R} ∣ > δ) \leq 2 e^{- c N δ^{2}} . & (75) \end{matrix}

6.3. Probabilistic Guarantee of Correct Region Selection

Consider two competing regions

R_{1}

and

R_{2}

at a given iteration.

Assume:

\begin{matrix} I (R_{1}) < I (R_{2}) . & (76) \end{matrix}

Define the integral gap:

\begin{matrix} Δ = I (R_{2}) - I (R_{1}) > 0 . & (77) \end{matrix}

An incorrect selection occurs when:

\begin{matrix} {\hat{I}}_{N} (R_{1}) > {\hat{I}}_{N} (R_{2}) . & (78) \end{matrix}

Substituting (69), this implies:

\begin{matrix} ϵ_{R_{1}} - ϵ_{R_{2}} > Δ . & (79) \end{matrix}

Using the union bound:

\begin{matrix} P (wrong selection) \leq P (∣ ϵ_{R_{1}} ∣ > Δ / 2) + P (∣ ϵ_{R_{2}} ∣ > Δ / 2) . & (80) \end{matrix}

Applying (75):

\begin{matrix} P (wrong selection) \leq 4 e x p (- c N \frac{Δ^{2}}{4}) . & (81) \end{matrix}

Theorem 3 (Correct Selection with High Probability)

For any

η > 0

, if the sample size satisfies:

\begin{matrix} N \geq \frac{4}{c Δ^{2}} l o g (\frac{4}{η}), & (82) \end{matrix}

then:

\begin{matrix} P (wrong selection) \leq η . & (83) \end{matrix}

Interpretation

This result shows that:

▪: The probability of incorrect selection decays exponentially in $N$ ,
▪: Larger separation $Δ$ improves reliability,
▪: The algorithm can achieve arbitrarily high confidence by increasing $N$ .

6.4. Coupling with Deterministic Region Shrinking

From Chapter 5, we know:

\begin{matrix} \frac{I (R)}{∣ R ∣} = f (x^{*}) + O (d i a m (R)) . & (84) \end{matrix}

Thus, for regions containing the minimizer:

\begin{matrix} I (R_{n}) = f (x^{*}) ∣ R_{n} ∣ + O (d i a m (R_{n}) ∣ R_{n} ∣) . & (85) \end{matrix}

For regions not containing

x^{*}

, there exists

ϵ > 0

:

\begin{matrix} I (R) \geq (f (x^{*}) + ϵ) ∣ R ∣ . & (86) \end{matrix}

Therefore, the gap satisfies:

\begin{matrix} Δ_{n} \geq ϵ ∣ R_{n} ∣ - O (d i a m (R_{n}) ∣ R_{n} ∣) . & (87) \end{matrix}

As

n \to \infty

, the leading term dominates:

\begin{matrix} Δ_{n} \sim ϵ ∣ R_{n} ∣ . & (88) \end{matrix}

This implies that integral separation persists asymptotically, enabling reliable selection.

6.5. Almost Sure Convergence

Let

E_{n}

denote the event that an incorrect region is selected at iteration

n

. From (81):

\begin{matrix} P (E_{n}) \leq C e x p (- α N_{n} Δ_{n}^{2}) . & (89) \end{matrix}

Choose the sample size schedule:

\begin{matrix} N_{n} = K l o g (n + 1) Δ_{n}^{- 2}, & (90) \end{matrix}

for sufficiently large constant

K

. Then:

\begin{matrix} \sum_{n = 1}^{\infty} P (E_{n}) < \infty . & (91) \end{matrix}

By the Borel–Cantelli lemma [36]:

\begin{matrix} P (E_{n} infinitely often) = 0 . & (92) \end{matrix}

Thus:

\begin{matrix} P (only finitely many incorrect selections) = 1 . & (93) \end{matrix}

Combining with Chapter 5:

\begin{matrix} x^{*} \in R_{n} infinitely often almost surely . & (94) \end{matrix}

From Chapter 4:

\begin{matrix} d i a m (R_{n}) \to 0 . & (95) \end{matrix}

Theorem 4 (Almost Sure Global Convergence of MOST)

Under Assumptions A1–A5 and the sampling condition (90), the sequence

\{x_{n}\}

generated by MOST satisfies:

\begin{matrix} x_{n} \to x^{*} almost surely . & (96) \end{matrix}

Proof

By (93), incorrect selections occur only finitely many times almost surely

▪

By Chapter 5, regions containing

x^{*}

are selected infinitely often

▪

By (95), region diameter converges to zero

Thus, the sequence converges to

x^{*}

almost surely.

6.6. Summary

This chapter establishes that:

Monte Carlo estimation error is exponentially controlled via concentration inequalities,
The probability of incorrect region selection decays exponentially,
With appropriate sampling schedules, incorrect selections occur only finitely many times,
MOST converges almost surely to the global minimizer.

This result complements:

▪: Chapter 4: deterministic geometric contraction
▪: Chapter 5: non-circular global selection

and completes the convergence theory of MOST.

7. Constrained MOST and KKT Equivalence

This chapter extends the MOST framework to constrained optimization and establishes a rigorous equivalence between the minimization of an augmented functional and the Karush–Kuhn–Tucker (KKT) conditions. The key idea is to embed all optimality conditions into a single nonnegative functional, whose global minimizers coincide exactly with KKT points. This construction enables MOST to solve constrained problems without explicit projection or constraint handling.

7.1. Extended Lagrangian Formulation

We consider the constrained optimization problem (3)–(5). Let

x \in Ω

,

λ \in R^{m}

, and

μ \in R^{p}

.

We define the extended Lagrangian functional:

\begin{matrix} J (x, λ, μ) = ∥ \nabla_{x} L (x, λ, μ) ∥^{2} + \sum_{i = 1}^{m} h_{i} (x)^{2} + \sum_{j = 1}^{p} (m a x {0, g_{j} (x)})^{2} . \end{matrix}

\begin{matrix} + \sum_{j = 1}^{p} (m i n {0, μ_{j}})^{2} + \sum_{j = 1}^{p} (μ_{j} g_{j} (x))^{2} . \end{matrix}

(97)

Each term corresponds to a component of the KKT system:

▪: First term: stationarity
▪: Second term: equality feasibility
▪: Third term: inequality feasibility
▪: Fourth term: dual feasibility
▪: Fifth term: complementary slackness

By construction:

\begin{matrix} J (x, λ, μ) \geq 0 . & (98) \end{matrix}

7.2. Equivalence to KKT Conditions

We now establish the central result of this chapter.

Theorem 5 (KKT Equivalence)

A point

(x^{*}, λ^{*}, μ^{*})

satisfies:

\begin{matrix} J (x^{*}, λ^{*}, μ^{*}) = 0 ⟺ (x^{*}, λ^{*}, μ^{*}) satisfies the KKT conditions . & (99) \end{matrix}

Proof

(⇒) Direction

Assume:

\begin{matrix} J (x^{*}, λ^{*}, μ^{*}) = 0 . & (100) \end{matrix}

Since each term in (97) is nonnegative, all terms must vanish individually.

Stationarity

\begin{matrix} ∥ \nabla_{x} L (x^{*}, λ^{*}, μ^{*}) ∥^{2} = 0 \Rightarrow \nabla_{x} L (x^{*}, λ^{*}, μ^{*}) = 0 . & (101) \end{matrix}

2.: Equality feasibility

\begin{matrix} h_{i} (x^{*}) = 0 . & (102) \end{matrix}

3.: Inequality feasibility

\begin{matrix} m a x {0, g_{j} (x^{*})} = 0 \Rightarrow g_{j} (x^{*}) \leq 0 . & (103) \end{matrix}

4.: Dual feasibility

\begin{matrix} m i n {0, μ_{j}^{*}} = 0 \Rightarrow μ_{j}^{*} \geq 0 . & (104) \end{matrix}

5.: Complementary slackness

\begin{matrix} (μ_{j}^{*} g_{j} (x^{*}))^{2} = 0 \Rightarrow μ_{j}^{*} g_{j} (x^{*}) = 0 . & (105) \end{matrix}

Thus, all KKT conditions are satisfied.

(⇐) Direction

Assume that

(x^{*}, λ^{*}, μ^{*})

satisfies the KKT conditions (9)–(12).

Then each term in (97) vanishes:

▪: stationarity ⇒ first term = 0
▪: feasibility ⇒ second and third terms = 0
▪: dual feasibility ⇒ fourth term = 0
▪: complementary slackness ⇒ fifth term = 0

Thus:

\begin{matrix} J (x^{*}, λ^{*}, μ^{*}) = 0 . & (106) \end{matrix}

This completes the proof.

7.3. Elimination of Spurious Local Minima

A crucial issue in penalty-based formulations is the possible existence of non-KKT local minima. We now show that such spurious minima are excluded under mild conditions.

Assumption C1 (Coercivity in Extended Space)

There exists

C > 0

such that:

\begin{matrix} J (x, λ, μ) \to \infty as ∥ (x, λ, μ) ∥ \to \infty . & (107) \end{matrix}

Lemma 3 (Strict Positivity Away from KKT Set)

Let

K

denote the set of KKT points. Then:

\begin{matrix} (x, λ, μ) \notin K \Rightarrow J (x, λ, μ) > 0 . & (108) \end{matrix}

Theorem 6 (Absence of Spurious Local Minima)

Under Assumptions A1–A5 and C1, every global minimizer of

J

is a KKT point, and no non-KKT local minimum with value zero exists.

Proof

From Theorem 5:

\begin{matrix} J (x, λ, μ) = 0 ⟺ (x, λ, μ) \in K . & (109) \end{matrix}

From Lemma 3:

\begin{matrix} J (x, λ, μ) > 0 outside K . & (110) \end{matrix}

Thus:

▪: The global minimum value is 0
▪: It is attained only at KKT points

Hence no spurious minimizers exist.

7.4. Implications for MOST

Applying MOST to

J (x, λ, μ)

yields:

Deterministic region shrinking (Chapter 4)
Global convergence to minimizers of $J$

(Chapters 5–6)

3.: Equivalence of minimizers and KKT points (this chapter)

Therefore:

\begin{matrix} (x_{n}, λ_{n}, μ_{n}) \to (x^{*}, λ^{*}, μ^{*}) (KKT solution) . & (111) \end{matrix}

7.5. Summary

In this chapter, we have shown that:

The extended functional $J$

encodes all KKT conditions,

Minimization of $J$

is equivalent to solving the constrained optimization problem,

3.: Spurious local minima are eliminated under coercivity,
3.: MOST can be directly applied to $J$

, yielding convergence to KKT points.

This establishes a rigorous bridge between:

▪: deterministic region-based optimization (MOST),
▪: classical constrained optimization theory (KKT).

8. Multi-Objective Extension and Pareto–KKT Structure

This chapter extends the MOST framework to multi-objective constrained optimization and establishes a rigorous connection to Pareto–KKT optimality conditions. While classical weighted-sum methods provide only partial access to Pareto-optimal solutions, we demonstrate that MOST yields deterministic convergence to Pareto–KKT stationary points under general conditions, including nonconvex settings.

8.1. Problem Formulation

We consider the multi-objective optimization problem:

\begin{matrix} \underset{x \in Ω}{m i n} F (x) = (f_{1} (x), \dots, f_{k} (x)), & (112) \end{matrix}

subject to:

\begin{matrix} h_{i} (x) = 0, i = 1, \dots, m, & (113) \end{matrix}

\begin{matrix} g_{j} (x) \leq 0, j = 1, \dots, p . & (114) \end{matrix}

Let

F

denote the feasible set.

A point

x^{*} \in F

is Pareto optimal if no feasible point dominates it, as defined in (15).

8.2. Weighted-Sum Scalarization (Reformulated)

We define a scalarized objective:

\begin{matrix} ϕ_{w} (x) = \sum_{i = 1}^{k} w_{i} f_{i} (x), & (115) \end{matrix}

where:

\begin{matrix} w_{i} \geq 0, \sum_{i = 1}^{k} w_{i} = 1 . & (116) \end{matrix}

To integrate constraints, we extend the functional introduced in Chapter 7:

\begin{matrix} J_{w} (x, λ, μ) = ∥ \nabla_{x} L_{w} (x, λ, μ) ∥^{2} + \sum_{i = 1}^{m} h_{i} (x)^{2} + \sum_{j = 1}^{p} (m a x {0, g_{j} (x)})^{2} \end{matrix}

\begin{matrix} + \sum_{j = 1}^{p} (m i n {0, μ_{j}})^{2} + \sum_{j = 1}^{p} (μ_{j} g_{j} (x))^{2}, & (117) \end{matrix}

where:

\begin{matrix} L_{w} (x, λ, μ) = ϕ_{w} (x) + \sum_{i = 1}^{m} λ_{i} h_{i} (x) + \sum_{j = 1}^{p} μ_{j} g_{j} (x) . & (118) \end{matrix}

8.3. Revised Claim: Pareto–KKT Stationarity

It is important to clarify the scope of the method.

The weighted-sum formulation does not, in general, guarantee that each weight vector corresponds to a unique Pareto-optimal solution. Instead, the following statement holds:

\begin{matrix} Each weight w yields a Pareto - KKT stationary point . & (119) \end{matrix}

This distinction is essential for the correct interpretation of the method. In particular,

▪: nonconvex problems may admit multiple Pareto-optimal solutions, and
▪: weighted-sum scalarization may fail to recover all Pareto-optimal points [25].

Therefore, the present framework establishes convergence to Pareto–KKT stationary points for each weight vector, rather than uniqueness or completeness of the Pareto front.

8.4. Main Theorem

We now state the central result of this chapter.

Theorem 7 (Pareto–KKT Convergence of MOST)

Let

\{(x_{n}, λ_{n}, μ_{n})\}

be the sequence generated by applying MOST to

J_{w}

. Under Assumptions A1–A5 and C1, for any weight vector

w

:

\begin{matrix} (x_{n}, λ_{n}, μ_{n}) \to (x^{*}, λ^{*}, μ^{*}), & (120) \end{matrix}

where

(x^{*}, λ^{*}, μ^{*})

satisfies the Pareto–KKT conditions:

\begin{matrix} \sum_{i = 1}^{k} w_{i} \nabla f_{i} (x^{*}) + \sum_{i = 1}^{m} λ_{i}^{*} \nabla h_{i} (x^{*}) + \sum_{j = 1}^{p} μ_{j}^{*} \nabla g_{j} (x^{*}) = 0 . & (121) \end{matrix}

Proof

From Chapter 7:

\begin{matrix} J_{w} (x, λ, μ) = 0 ⟺ (x, λ, μ) satisfies KKT conditions for ϕ_{w} . & (122) \end{matrix}

From Chapters 4–6:

▪: MOST converges almost surely to global minimizers of $J_{w}$ ,
▪: Global minimizers correspond to $J_{w} = 0$ .

Thus:

\begin{matrix} (x_{n}, λ_{n}, μ_{n}) \to (x^{*}, λ^{*}, μ^{*}), & (123) \end{matrix}

and

x^{*}

satisfies (121), which is the Pareto–KKT condition.

8.5. Discussion on Nonconvexity (Reviewer-Oriented Clarification)

A key concern in multi-objective optimization is the limitation of weighted-sum methods in nonconvex settings.

In general:

▪: Weighted-sum scalarization fails to recover non-supported Pareto points [25],
▪: Multiple Pareto-optimal solutions may correspond to the same weight vector.

However, the present framework retains the following guarantees:

1.: Pareto–KKT validity

Every limit point satisfies first-order optimality conditions.

2.: Deterministic convergence

Unlike evolutionary algorithms [21,22], convergence is not stochastic.

3.: Robustness to nonconvexity

Integral-based evaluation mitigates local irregularities.

4.: Continuity in weight space (local)

Small perturbations in

w

lead to continuous changes in stationary solutions under regularity conditions.

Thus, while completeness of the Pareto front is not guaranteed, the method provides a rigorous and deterministic pathway to Pareto–KKT stationary solutions.

8.6. Summary

In this chapter, we have shown that:

The MOST framework extends naturally to multi-objective optimization,
The extended functional $J_{w}$ encodes Pareto–KKT conditions,
The algorithm converges deterministically to Pareto–KKT stationary points,
The framework remains valid in nonconvex settings with appropriate interpretation.

This establishes a rigorous theoretical bridge between:

▪: deterministic region-based optimization (MOST),
▪: multi-objective optimization theory,
▪: Pareto–KKT optimality conditions.

9. Geometry of Curved Constraints

This chapter provides a geometric interpretation of constrained optimization by analyzing the local structure of constraint manifolds. In particular, we show that curvature effects vanish in the first-order approximation, leading to a tangent-plane characterization of feasible directions. This structure naturally connects to the normality condition in the Karush–Kuhn–Tucker (KKT) framework.

9.1. Taylor Expansion of Constraints

Let

x^{*} \in F

be a feasible point satisfying:

\begin{matrix} h_{i} (x^{*}) = 0, g_{j} (x^{*}) \leq 0 . & (124) \end{matrix}

Consider a perturbation:

\begin{matrix} x = x^{*} + δ x . & (125) \end{matrix}

Applying Taylor expansion to the equality constraints:

\begin{matrix} h_{i} (x) = h_{i} (x^{*}) + \nabla h_{i} (x^{*}) \cdot δ x + O (∥ δ x ∥^{2}) . & (126) \end{matrix}

Since

h_{i} (x^{*}) = 0

, this simplifies to:

\begin{matrix} h_{i} (x) = \nabla h_{i} (x^{*}) \cdot δ x + O (∥ δ x ∥^{2}) . & (127) \end{matrix}

Similarly, for inequality constraints:

\begin{matrix} g_{j} (x) = g_{j} (x^{*}) + \nabla g_{j} (x^{*}) \cdot δ x + O (∥ δ x ∥^{2}) . & (128) \end{matrix}

9.1. Vanishing of Curvature Terms

The second-order terms in (127)–(128) represent curvature effects. However, in the limit:

\begin{matrix} ∥ δ x ∥ \to 0, & (129) \end{matrix}

we have:

\begin{matrix} \frac{O (∥ δ x ∥^{2})}{∥ δ x ∥} \to 0 . & (130) \end{matrix}

Thus, the leading-order behavior of the constraints is linear:

\begin{matrix} h_{i} (x) \approx \nabla h_{i} (x^{*}) \cdot δ x, & (131) \end{matrix}

\begin{matrix} g_{j} (x) \approx g_{j} (x^{*}) + \nabla g_{j} (x^{*}) \cdot δ x . & (132) \end{matrix}

This shows that, locally, the feasible region is approximated by a linear subspace (or half-space), and curvature does not influence first-order optimality.

9.3. Tangent Plane Theorem and KKT Normality

We now formalize the geometric structure of the feasible set.

Definition (Tangent Cone)

The tangent cone at

x^{*}

is defined as:

\begin{matrix} T_{F} (x^{*}) = \{d \in R^{d} | \nabla h_{i} (x^{*}) \cdot d = 0, \nabla g_{j} (x^{*}) \cdot d \leq 0 for active j\} . & (133) \end{matrix}

Theorem 8 (Tangent Plane Characterization)

The feasible set

F

is locally approximated by:

\begin{matrix} F \approx x^{*} + T_{F} (x^{*}) . & (134) \end{matrix}

Proof

From (127)–(132), feasibility requires:

\begin{matrix} \nabla h_{i} (x^{*}) \cdot δ x = 0, & (135) \end{matrix}

\begin{matrix} \nabla g_{j} (x^{*}) \cdot δ x \leq 0 . & (136) \end{matrix}

Neglecting higher-order terms yields the tangent cone characterization.

Connection to KKT Normality

The KKT condition (9) can be written as:

\begin{matrix} \nabla f (x^{*}) + \sum_{i = 1}^{m} λ_{i}^{*} \nabla h_{i} (x^{*}) + \sum_{j = 1}^{p} μ_{j}^{*} \nabla g_{j} (x^{*}) = 0 . & (137) \end{matrix}

This implies:

\begin{matrix} \nabla f (x^{*}) \in N_{F} (x^{*}), & (138) \end{matrix}

where

N_{F} (x^{*})

is the normal cone:

\begin{matrix} N_{F} (x^{*}) = \{\sum_{i = 1}^{m} λ_{i} \nabla h_{i} (x^{*}) + \sum_{j = 1}^{p} μ_{j} \nabla g_{j} (x^{*}) | μ_{j} \geq 0\} . & (139) \end{matrix}

Thus, optimality requires:

\begin{matrix} \nabla f (x^{*}) ⊥ T_{F} (x^{*}) . & (140) \end{matrix}

9.4. Geometric Interpretation

The above result admits a clear geometric interpretation:

▪: The feasible region is locally flat (tangent plane),
▪: Feasible directions lie in $T_{F} (x^{*})$ ,
▪: The gradient of the objective is orthogonal to all feasible directions.

Thus, the KKT condition expresses: no feasible descent direction exists.

9.5. Implications for MOST

Since MOST operates by shrinking regions:

\begin{matrix} d i a m (R_{n}) \to 0, & (141) \end{matrix}

the algorithm effectively explores the local tangent geometry of the feasible set.

Combined with Chapter 7:

▪: Minimization of $J$ enforces KKT conditions,
▪: Local geometry ensures correctness of first-order approximation.

Thus, MOST inherently respects the geometric structure of constrained optimization.

9.6. Summary

In this chapter, we have shown that:

Constraint functions admit linear approximation via Taylor expansion,
Curvature terms vanish at first order,
The feasible set is locally approximated by a tangent cone,
The KKT condition corresponds to orthogonality between gradient and feasible directions,
MOST naturally aligns with this geometric structure.

This provides a geometric foundation for the KKT equivalence established in Chapter 7.

10. Unified Convergence Theorem

This chapter presents the central result of this work: a unified convergence theorem that integrates all theoretical components developed in the previous chapters. Specifically, we combine deterministic geometric shrinking, global convergence via integral-based selection, probabilistic robustness under Monte Carlo approximation, and equivalence to KKT and Pareto–KKT optimality conditions.

This theorem establishes MOST as a complete and rigorous optimization framework applicable to unconstrained, constrained, and multi-objective problems.

10.1. Integrated Structure of the MOST Framework

From the preceding chapters, the following properties have been established:

Deterministic geometric shrinking (Chapter 4):

\begin{matrix} d i a m (R_{n}) = O (2^{- n}), & (142) \end{matrix}

2.: Global selection mechanism (Chapter 5):

\begin{matrix} x^{*} \in R_{n} infinitely often, & (143) \end{matrix}

3.: Probabilistic robustness (Chapter 6):

\begin{matrix} x_{n} \to x^{*} almost surely, & (144) \end{matrix}

4.: KKT equivalence (Chapter 7):

\begin{matrix} J (x, λ, μ) = 0 ⟺ KKT conditions hold, & (145) \end{matrix}

5.: Pareto–KKT structure (Chapter 8):

\begin{matrix} J_{w} (x, λ, μ) = 0 ⟺ Pareto - KKT conditions hold . & (146) \end{matrix}

These results form a logically closed system.

10.2. Unified Convergence Theorem

We now state the main theorem of this paper.

Theorem 9 (Unified Convergence of MOST)

Let

\{R_{n}\}

be the sequence of regions generated by MOST, and let

(x_{n}, λ_{n}, μ_{n})

denote the corresponding sequence of candidate solutions.

Assume:

▪: Lipschitz continuity (A1),
▪: Compact domain (A2),
▪: Existence of minimizers (A3),
▪: Constraint regularity (A5),
▪: Coercivity of the extended functional (C1),
▪: Sampling schedule satisfying (90).

Then the following hold:

(i) Geometric convergence

\begin{matrix} d i a m (R_{n}) \to 0, with rate O (2^{- n}), & (147) \end{matrix}

(ii) Global optimality (unconstrained case)

\begin{matrix} x_{n} \to x^{*}, almost surely, & (148) \end{matrix}

(iii) Constrained convergence (KKT)

\begin{matrix} (x_{n}, λ_{n}, μ_{n}) \to (x^{*}, λ^{*}, μ^{*}), & (149) \end{matrix}

where

(x^{*}, λ^{*}, μ^{*})

satisfies the KKT conditions.

(iv) Multi-objective convergence (Pareto–KKT)

\begin{matrix} (x_{n}, λ_{n}, μ_{n}) \to (x^{*}, λ^{*}, μ^{*}), & (150) \end{matrix}

where

x^{*}

satisfies the Pareto–KKT condition for a given weight

w

.

Proof

The proof follows by combining the results of Chapters 4–8:

▪: (147): follows directly from Theorem 1
▪: (148): follows from Theorem 4 (almost sure convergence)
▪: (149): follows from Theorem 5 (KKT equivalence) and convergence of MOST
▪: (150): follows from Theorem 7 (Pareto–KKT convergence)

Since each component is non-circular and independently established, the combined result holds.

10.3. Interpretation of the Unified Result

Theorem 9 demonstrates that MOST simultaneously achieves:

▪: Geometric convergence (algorithmic structure),
▪: Global optimality (integral-based selection),
▪: Constraint satisfaction (KKT equivalence),
▪: Multi-objective optimality (Pareto–KKT structure),
▪: Probabilistic robustness (Monte Carlo guarantees).

This unified structure is unique in that:

▪: No gradient information is required,
▪: No Lipschitz constant is needed,
▪: No surrogate model is used,
▪: Deterministic and probabilistic analyses are seamlessly integrated.

10.4. Comparison with Existing Methods

Classical frameworks typically address only subsets of these properties:

▪: Gradient-based methods: local convergence [1,2]
▪: Global optimization methods: deterministic but require bounds [18,19,20]
▪: Evolutionary algorithms: flexible but lack guarantees [8,9,10,11,21,22]
▪: Bayesian optimization: probabilistic but model-dependent [15,16,17]

In contrast, MOST provides:

\begin{matrix} a unified convergence theory combining all essential properties . & (151) \end{matrix}

10.5. Final Implications

The unified convergence theorem implies that MOST can be interpreted as:

\begin{matrix} a deterministic - probabilistic hybrid framework for global optimization . & (152) \end{matrix}

Furthermore, the integration of geometric, analytical, and probabilistic arguments suggests that:

▪: Optimization can be reformulated as measure-based region selection,
▪: Classical pointwise paradigms can be replaced by integral-based reasoning,
▪: Constraint geometry naturally aligns with region shrinking mechanisms.

10.6. Summary

In this chapter, we have established the final result of this work:

▪: MOST achieves geometric convergence,
▪: Ensures global optimality,
▪: Satisfies KKT conditions for constrained problems,
▪: Extends to Pareto–KKT optimality for multi-objective problems,
▪: Maintains robustness under stochastic approximation.

This completes the theoretical development of the MOST framework.

11. Numerical Experiments

This chapter validates the theoretical framework developed in Chapters 4–10 by applying the constrained MOST (C-MOST) algorithm to high-dimensional, multimodal benchmark problems. The experiments are designed to verify:

▪: Convergence to theoretically predicted optima
▪: Consistency with KKT and Pareto–KKT conditions
▪: Robustness under multimodality and nonconvexity
▪: Deterministic geometric contraction of search regions
▪: Stability under Monte Carlo approximation

11.1. Problem Setting

We consider two standard 10-dimensional benchmark functions.

(A) Ackley Function

\begin{matrix} f_{A C K} (x) = 20 - 20 e x p (- 0.2 \sqrt{\frac{1}{10} \sum_{i = 1}^{10} x_{i}^{2}}) + e - e x p (\frac{1}{10} \sum_{i = 1}^{10} c o s (2 π x_{i})) . & (153) \end{matrix}

(B) Schwefel Function

\begin{matrix} f_{S C H} (x) = - \sum_{i = 1}^{10} x_{i} s i n (\sqrt{∣ x_{i} ∣}) . & (154) \end{matrix}

Both functions are highly multimodal; the Schwefel function is particularly challenging due to its numerous deep local minima.

11.2. Search Domain and Constraint

The search domain is defined as:

\begin{matrix} x_{i} \in [- 5,5], i = 1, \dots, 10 . & (155) \end{matrix}

We impose a spherical constraint:

\begin{matrix} \sum_{i = 1}^{10} x_{i}^{2} \leq 10 . & (156) \end{matrix}

Thus, the feasible region is a 10-dimensional hypersphere with radius:

\begin{matrix} R = \sqrt{10} \approx 3.162 . & (157) \end{matrix}

11.3. Theoretical Constrained Optima

1) Ackley function

The unconstrained minimizer is:

\begin{matrix} x^{*} = (0, \dots, 0), & (158) \end{matrix}

which satisfies the constraint. Hence:

\begin{matrix} f_{A C K} (x^{*}) = 0 . & (159) \end{matrix}

2) Schwefel function

The unconstrained minimizer:

x_{i} = 420.9687,

is infeasible. The constrained optimum lies on the boundary:

\begin{matrix} \sum_{i = 1}^{10} x_{i}^{2} = 10 . & (160) \end{matrix}

By symmetry:

\begin{matrix} x^{*} = (1, \dots, 1), & (161) \end{matrix}

and:

\begin{matrix} f_{S C H} (x^{*}) = - 10 s i n (1) \approx - 8.41 . & (162) \end{matrix}

11.4. Experimental Setup

The following parameters were utilized to evaluate the performance of the Constrained MOST (C-MOST) algorithm. The setup is designed to test the algorithm's ability to handle high-dimensional, multimodal landscapes under strict constraints in Table 1.

Each iteration consists of sequential binary partitions along all coordinates.

Definition of Output (Clarified)

In this study, the representative solution at iteration

n

is defined as: the average (center) of the selected region.

That is,

x_{n} = \frac{1}{∣ R_{n} ∣} \int_{R_{n}} x d x,

which, for hyperrectangular regions, reduces to:

x_{n} = \frac{x_{m i n} + x_{m a x}}{2} .

This definition is consistent with the theoretical structure of MOST, where region selection is governed by integral comparison, and the representative point reflects the geometry of the selected region. As the region diameter shrinks, this average converges to the true minimizer.

11.5. Numerical Results

11.5.1. Ackley

MOST converges rapidly toward:

\begin{matrix} x^{*} = (0, \dots, 0) . & (163) \end{matrix}

The convergence is:

▪: Monotonic
▪: Stable
▪: Rapid (within 20 iterations)

Typical error:

\begin{matrix} ∥ x_{n} - x^{*} ∥ < 10^{- 6} . & (164) \end{matrix}

11.5.2 Schwefel

MOST converges toward the constrained optimum:

\begin{matrix} x^{*} \approx (1, \dots, 1) . & (165) \end{matrix}

Observed numerical solution:

\begin{matrix} x_{i} \approx 0.9999999046 . & (166) \end{matrix}

Error level:

\begin{matrix} relative error < 10^{- 4} % . & (167) \end{matrix}

Despite strong multimodality, the algorithm consistently avoids local minima.

11.6. Comparison with Theory

The following table summarizes the performance of the constrained MOST (C-MOST) algorithm against the mathematically derived global optima for 10-dimensional benchmark functions.

Table 2. Numerical vs. Theoretical Comparison (n=10).

Problem	Theoretical	MOST Result	Error
Ackley	$x^{*} = 0$	$x_{n} \to 4.77 \times 10^{- 6}$	< $10^{- 6}$
Schwefel	$x^{*} = 1$	$x_{n} \to 0.999999$	< $10^{- 4} %$

Figure 1 illustrates the relationship between the decision variable x₁ and the number of MOST iterations for each benchmark function. As observed in the figure, the value of x₁ asymptotically approaches the theoretical optimum at an exponential rate for both functions. The numerical results strongly agree with:

▪: Chapter 5: global convergence
▪: Chapter 7: KKT satisfaction
▪: Chapter 9: tangent-plane geometry

11.7. Discussion

The experiments confirm several key theoretical predictions:

Constraint satisfaction without projection

→ consistent with tangent-plane theory (Chapter 9)
Robustness to multimodality

→ no trapping in local minima
Geometric convergence

$d i a m (R_{n}) \to 0$
Stability of Monte Carlo evaluation

→ consistent with Chapter 6
Deterministic behavior

→ unlike GA/PSO, trajectories are smooth

11.8. Summary

This chapter demonstrates that MOST:

▪: Converges to constrained global optima
▪: Satisfies KKT boundary conditions
▪: Remains stable under multimodality
▪: Achieves deterministic geometric convergence
▪: Operates effectively in high dimensions

These results provide strong empirical validation of the unified convergence theory.

12. Discussion

This chapter provides a comprehensive discussion of the theoretical significance, limitations, and practical implications of the MOST framework. While the preceding chapters established a unified convergence theory, it is essential to critically assess both the strengths and the boundaries of the method.

12.1. Theoretical Significance

The MOST framework introduces a fundamentally different perspective on optimization by replacing pointwise evaluation with region-based integral comparison.

At its core, the method can be interpreted as:

\begin{matrix} optimization via measure - based ordering of subsets . & (168) \end{matrix}

This shift yields several important theoretical consequences.

(i) Unification of Deterministic and Probabilistic Analysis

MOST combines:

▪: deterministic geometric shrinking (Chapter 4),
▪: global selection via integral comparison (Chapter 5),
▪: probabilistic robustness (Chapter 6).

This leads to:

\begin{matrix} almost sure global convergence without gradient information . & (169) \end{matrix}

Such integration is rare among existing optimization frameworks.

(ii) Variational Interpretation

From Chapters 7–9, constrained optimization is reformulated as:

\begin{matrix} m i n J (x, λ, μ), & (170) \end{matrix}

where

J

encodes KKT conditions. Thus, MOST can be viewed as:

\begin{matrix} a global solver for variational optimality systems . & (171) \end{matrix}

This bridges classical optimization [1,2,3] and variational analysis [39].

(iii) Extension to Multi-Objective Optimization

The extension:

\begin{matrix} m i n J_{w} (x, λ, μ), & (172) \end{matrix}

establishes convergence to Pareto–KKT points, providing a deterministic alternative to evolutionary methods [21,22].

12.2. Limitations

Despite its strong theoretical properties, the MOST framework has several important limitations that must be acknowledged.

(i) Computational Cost

Monte Carlo evaluation requires:

\begin{matrix} O (N \cdot number of regions), & (173) \end{matrix}

which can be computationally expensive in high dimensions.

(ii) Curse of Dimensionality

The number of subdivisions grows as:

\begin{matrix} O (2^{d n}), & (174) \end{matrix}

if all directions are explored uniformly. Although MOST mitigates this via sequential splitting, scalability remains a challenge.

(iii) Dependence on Sampling Strategy

The probabilistic guarantees rely on:

\begin{matrix} N_{n} \sim l o g (n) Δ_{n}^{- 2} . & (175) \end{matrix}

In practice, improper sampling may lead to:

▪: delayed convergence,
▪: increased variance in early iterations.

(iv) Pareto Front Coverage

As discussed in Chapter 8:

\begin{matrix} weighted - sum methods do not recover all Pareto points . & (176) \end{matrix}

Thus, MOST guarantees Pareto–KKT stationarity, but not full Pareto front reconstruction in nonconvex problems.

(v) Lack of Acceleration Mechanisms

Unlike gradient-based methods [1], MOST does not exploit curvature information:

\begin{matrix} \nabla^{2} f (x) . & (177) \end{matrix}

Thus, local acceleration (e.g., quadratic convergence) is not available.

12.3. Practical Implications

Despite these limitations, MOST offers several strong practical advantages.

(i) Derivative-Free Optimization

MOST requires no gradient or Hessian information:

\begin{matrix} only function evaluations are needed . & (178) \end{matrix}

This makes it suitable for:

▪: black-box optimization,
▪: simulation-based models,
▪: noisy environments.

(ii) Robustness to Multimodality

Integral-based evaluation suppresses local irregularities:

\begin{matrix} local minima do not dominate regional integrals . & (179) \end{matrix}

This explains the stable performance observed in Chapter 11.

(iii) Natural Handling of Constraints

The extended functional

J

ensures:

\begin{matrix} constraints are satisfied without projection . & (180) \end{matrix}

This is particularly advantageous in complex feasible regions.

(iv) Deterministic Convergence Behavior

Unlike stochastic metaheuristics:

▪: trajectories are smooth,
▪: convergence is reproducible,
▪: theoretical guarantees are explicit.

12.4. Positioning Within Optimization Theory

MOST occupies a unique position among optimization methods:

\begin{matrix} MOST = deterministic global optimization + & (181) \end{matrix}

Monte Carlo robustness + variational formulation .

It can be interpreted as:

▪: a measure-theoretic analogue of branch-and-bound [18],
▪: a deterministic counterpart to stochastic methods [8,9,10,11],
▪: a global extension of KKT-based optimization [2].

12.5. Future Directions

Several promising directions arise from this work:

(i) Adaptive Sampling Strategies

Improving efficiency via:

\begin{matrix} N_{n} \propto local uncertainty . & (182) \end{matrix}

(ii) Hybrid Methods

Combining MOST with:

▪: gradient-based refinement,
▪: surrogate models,
▪: trust-region techniques.

(iii) Parallelization

Monte Carlo evaluation is naturally parallelizable:

\begin{matrix} independent sampling across regions . & (183) \end{matrix}

(iv) High-Dimensional Extensions

Incorporating:

▪: dimension reduction,
▪: sparse search strategies.

12.6. Summary

In this chapter, we have:

▪: clarified the theoretical contributions of MOST,
▪: identified its limitations with full transparency,
▪: highlighted its practical strengths,
▪: positioned it within the broader optimization landscape.

The MOST framework provides a novel and rigorous approach to optimization, combining deterministic structure with probabilistic robustness, and offers a promising direction for future research.

13. Conclusion

This paper has introduced the Monte Carlo Stochastic Optimization Technique (MOST) as a unified framework for global optimization, and has established its theoretical foundation through a sequence of rigorous results.

13.1. Summary of Contributions

The principal contributions of this work can be summarized as follows.

(i) A New Optimization Paradigm

We proposed a novel formulation of optimization based on regional integral comparison:

optimization via measure - based ordering .

This departs fundamentally from classical pointwise evaluation and enables a global view of the objective landscape.

(ii) Deterministic Global Convergence

We proved that MOST achieves:

▪: geometric convergence of search regions,
▪: global optimality through integral-based selection.

In particular:

d i a m (R_{n}) \to 0, x_{n} \to x^{*} .

(iii) Probabilistic Robustness

By incorporating Monte Carlo estimation, we established:

x_{n} \to x^{*} almost surely,

demonstrating that stochastic approximation does not compromise convergence.

(iv) Constrained Optimization via KKT Equivalence

We introduced an extended functional

J

and proved:

J = 0 ⟺ KKT conditions .

This enables MOST to solve constrained problems without projection or penalty tuning.

(v) Multi-Objective Extension

We extended the framework to multi-objective problems and showed convergence to:

Pareto - KKT stationary points .

(vi) Geometric Interpretation

We demonstrated that:

▪: constraint curvature vanishes at first order,
▪: the feasible region is locally approximated by a tangent cone,
▪: KKT conditions correspond to normality with respect to this cone.

(vii) Unified Convergence Theory

All components were integrated into a single theorem, establishing that MOST simultaneously achieves:

▪: geometric convergence,
▪: global optimality,
▪: KKT consistency,
▪: Pareto–KKT convergence,
▪: probabilistic robustness.

13.2. Overall Perspective

The MOST framework reveals a unifying principle:

global optimization can be achieved through region - wise integral comparison .

This perspective suggests a shift from:

▪: point-based optimizationto
▪: measure-based optimization.

Such a viewpoint naturally integrates deterministic and stochastic methods within a single theoretical structure.

13.3. Future Directions

Several avenues for future research emerge from this work.

(i) Algorithmic Acceleration

Incorporating:

▪: adaptive sampling,
▪: curvature-aware refinement,
▪: hybrid gradient methods.

(ii) High-Dimensional Scaling

Developing:

sparse partition strategies,
dimension reduction techniques.

(iii) Pareto Front Exploration

Extending beyond weighted-sum approaches to:

▪: achieve fuller Pareto front coverage,
▪: integrate adaptive weighting schemes.

(iv) Theoretical Extensions

Further analysis of:

▪: convergence rates,
▪: complexity bounds,
▪: connections to measure theory and stochastic processes.

13.4. Final Remarks

The results presented in this paper establish MOST as a theoretically grounded and practically viable framework for global optimization.

While further refinements are possible, the current formulation already demonstrates that:

▪: deterministic structure and probabilistic reasoning can be unified,
▪: global convergence can be achieved without gradients,
▪: constrained and multi-objective problems can be handled within a single framework.

Appendix A. Coercivity of the Extended Functional and Its Sufficient Conditions

A.1 Positioning of This Appendix

This appendix complements Chapter 7 (Constrained MOST and KKT Equivalence) and Chapter 10 (Unified Convergence Theorem), where the coercivity assumption

\begin{matrix} ∥ (x, λ, μ) ∥ \to \infty \Rightarrow J (x, λ, μ) \to \infty & (A 1) \end{matrix}

(Assumption C1) is required to guarantee global convergence.

The purpose of this appendix is threefold:

To clarify that coercivity is nontrivial and problem-dependent,
To identify potential failure modes,
To provide sufficient conditions under which coercivity holds.

A.2 Definition of the Extended Functional

We recall the definition of the extended functional:

\begin{matrix} J (x, λ, μ) = ∥ \nabla_{x} L (x, λ, μ) ∥^{2} + \sum_{i = 1}^{m} h_{i} (x)^{2} + \sum_{j = 1}^{p} (m a x {0, g_{j} (x)})^{2}, \end{matrix}

\begin{matrix} + \sum_{j = 1}^{p} (m i n {0, μ_{j}})^{2} + \sum_{j = 1}^{p} (μ_{j} g_{j} (x))^{2}, & (A 2) \end{matrix}

where:

\begin{matrix} L (x, λ, μ) = f (x) + \sum_{i = 1}^{m} λ_{i} h_{i} (x) + \sum_{j = 1}^{p} μ_{j} g_{j} (x) . & (A 3) \end{matrix}

A.3 Potential Failure Modes of Coercivity

We first identify situations in which coercivity may fail.

(i) Vanishing Constraint–Multiplier Interaction

Consider sequences such that:

\begin{matrix} μ_{j} \to \infty, g_{j} (x) \to 0 . & (A 4) \end{matrix}

Then:

\begin{matrix} (μ_{j} g_{j} (x))^{2} \to 0, & (A 5) \end{matrix}

and therefore, this term does not prevent divergence of

μ_{j}

.

(ii) Divergence Along the Constraint Manifold

If:

\begin{matrix} h_{i} (x) = 0, g_{j} (x) \leq 0, & (A 6) \end{matrix}

then:

\begin{matrix} J (x, λ, μ) = ∥ \nabla_{x} L (x, λ, μ) ∥^{2} . & (A 7) \end{matrix}

If, in addition,

\begin{matrix} \nabla_{x} L (x, λ, μ) = 0, & (A 8) \end{matrix}

then

J = 0

even when

∥ λ ∥ \to \infty

.

Interpretation

These cases show that coercivity is not automatic and depends on structural properties of the problem.

A.4 Sufficient Conditions for Coercivity

We now present conditions under which coercivity holds.

Proposition A.1 (Bounded Multipliers)

Assume that:

\begin{matrix} 0 \leq μ_{j} \leq M, ∣ λ_{i} ∣ \leq M . & (A 9) \end{matrix}

Then:

\begin{matrix} ∥ (x, λ, μ) ∥ \to \infty \Rightarrow J (x, λ, μ) \to \infty . & (A 10) \end{matrix}

Proof

If

x \to \infty

, then at least one of:

\begin{matrix} h_{i} (x)^{2}, (m a x {0, g_{j} (x)})^{2} & (A 11) \end{matrix}

diverges under mild growth conditions, implying coercivity. If

λ, μ

remain bounded, divergence must occur through

x

, completing the proof.

Proposition A.2 (Affine Constraints)

Suppose:

\begin{matrix} h_{i} (x) = a_{i}^{T} x - b_{i}, g_{j} (x) = c_{j}^{T} x - d_{j} . & (A 12) \end{matrix}

Then coercivity holds provided:

\begin{matrix} {a_{i}, c_{j}} span R^{d} . & (A 13) \end{matrix}

Proof Sketch

The gradients:

\begin{matrix} \nabla h_{i} = a_{i}, \nabla g_{j} = c_{j} & (A 14) \end{matrix}

are constant. Thus:

\begin{matrix} \nabla_{x} L = \nabla f (x) + \sum λ_{i} a_{i} + \sum μ_{j} c_{j} . & (A 15) \end{matrix}

If multipliers diverge, the gradient term must diverge unless linear dependencies exist, yielding coercivity.

Proposition A.3 (Coercive Constraints)

Assume:

\begin{matrix} ∣ g_{j} (x) ∣ \to \infty as ∥ x ∥ \to \infty . & (A 16) \end{matrix}

Then:

\begin{matrix} (μ_{j} g_{j} (x))^{2} \to \infty unless μ_{j} = 0 . & (A 17) \end{matrix}

Hence:

\begin{matrix} J (x, λ, μ) \to \infty . & (A 18) \end{matrix}

Proposition A.4 (Regularized Functional)

Define:

\begin{matrix} J_{ε} (x, λ, μ) = J (x, λ, μ) + ε (∥ λ ∥^{2} + ∥ μ ∥^{2}), ε > 0 . & (A 19) \end{matrix}

Then:

\begin{matrix} J_{ε} is coercive . & (A 20) \end{matrix}

Proof

The quadratic penalty ensures:

\begin{matrix} ∥ λ ∥^{2} + ∥ μ ∥^{2} \to \infty \Rightarrow J_{ε} \to \infty . & (A 21) \end{matrix}

Thus, coercivity holds unconditionally.

A.5 Practical Implications

The above results imply that Assumption C1:

\begin{matrix} is not merely technical, but holds under broad conditions . & (A 22) \end{matrix}

In practice, coercivity can be ensured by:

▪: bounded multipliers,
▪: affine or well-conditioned constraints,
▪: mild growth conditions on $g_{j}$ ,
▪: or regularization.

A.6 Relation to the Main Results

This appendix supports:

▪: Chapter 7: validity of KKT equivalence,
▪: Chapter 10: applicability of the unified convergence theorem.

In particular, it justifies that Assumption C1 can be satisfied without imposing unrealistic conditions.

References

J. Nocedal, S. Wright, Numerical Optimization, Springer, 2006, pp. 1–664. [CrossRef]
D. P. Bertsekas, Nonlinear Programming, Athena Scientific, 1999. [CrossRef]
R. T. Rockafellar, Convex Analysis, Princeton Univ. Press, 1970.
S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge Univ. Press, 2004. [CrossRef]
Conn, N. Gould, P. Toint, Trust Region Methods, SIAM, 2000.
Y. Nesterov, Introductory Lectures on Convex Optimization, Springer, 2004. [CrossRef]
M. J. D. Powell, “Direct search algorithms for optimization,” Acta Numerica, 1998, pp. 287–336. [CrossRef]
J. H. Holland, Adaptation in Natural and Artificial Systems, MIT Press, 1975. [CrossRef]
R. Storn, K. Price, “Differential Evolution,” J. Global Optimization, 1997, pp. 341–359. [CrossRef]
J. Kennedy, R. Eberhart, “Particle Swarm Optimization,” Proc. IEEE ICNN, 1995. [CrossRef]
N. Hansen, “CMA-ES,” Evolutionary Computation, 2006. [CrossRef]
S. Kirkpatrick et al., “Optimization by Simulated Annealing,” Science, 1983. [CrossRef]
H. Robbins, S. Monro, “Stochastic Approximation,” Ann. Math. Stat., 1951. [CrossRef]
A. Nemirovski et al., “Robust stochastic approximation,” SIAM J. Optimization, 2009. [CrossRef]
J. Snoek et al., “Practical Bayesian Optimization,” NIPS, 2012. [CrossRef]
E. Brochu et al., “Bayesian Optimization Tutorial,” 2010. [CrossRef]
P. Frazier, “Bayesian Optimization,” Recent Advances, 2018. [CrossRef]
R. Horst, H. Tuy, Global Optimization, Springer, 1996.
D. Jones et al., “DIRECT Algorithm,” J. Optimization Theory Appl., 1993. [CrossRef]
Floudas, Deterministic Global Optimization, Springer, 2000.
K. Deb et al., “NSGA-II,” IEEE TEC, 2002. [CrossRef]
E. Zitzler et al., “SPEA2,” TIK Report, 2001. [CrossRef]
M. Ehrgott, Multicriteria Optimization, Springer, 2005. [CrossRef]
K. Miettinen, Nonlinear Multiobjective Optimization, Springer, 1999. [CrossRef]
Das, J. Dennis, “Weighted Sum Method,” SIAM J Optimization, 1997. [CrossRef]
S. Inage, T. Hebishima, Monte Carlo Stochastic Optimization Technique (MOST): Deterministic Global Optimization via Region Integration, Preprint, 2022.
S. Inage, T. Hebishima, Multi-Objective Extension of MOST with Deterministic Pareto Convergence, Mathematics and Computers in Simulation, 2022. [CrossRef]
P. Bertsekas, Nonlinear Programming, Athena Scientific, 1999. [CrossRef]
R. T. Rockafellar, Convex Analysis, Princeton Univ. Press, 1970.
Nocedal, S. Wright, Numerical Optimization, Springer, 2006. [CrossRef]
Das, J. Dennis, “A closer look at weighted sum method,” SIAM J. Optimization, 1997. [CrossRef]
W. Hoeffding, “Probability inequalities for sums of bounded random variables,” JASA, 1963. [CrossRef]
R. Jones, C. D. Perttunen, B. E. Stuckman, “Lipschitzian optimization without the Lipschitz constant,” Journal of Optimization Theory and Applications, 79, pp. 157–181 (1993). [CrossRef]
Polak, Optimization: Algorithms and Consistent Approximations, Springer, 1997.
M. Borwein, A. S. Lewis, Convex Analysis and Nonlinear Optimization, Springer, 2006. [CrossRef]
P. Billingsley, Probability and Measure, Wiley, 1995. [CrossRef]
D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods, Athena Scientific, 1996.
A. M. Geoffrion, “Proper efficiency and the theory of vector maximization,” Journal of Mathematical Analysis and Applications, 22, pp. 618–630 (1968). [CrossRef]
R. T. Rockafellar, R. J-B. Wets, Variational Analysis, Springer, 1998. [CrossRef]
Jahn, Vector Optimization: Theory, Applications, and Extensions, Springer, 2004.
D. H. Ackley, “A Connectionist Machine for Genetic Hillclimbing,” Kluwer Academic Publishers, 1987. [CrossRef]
R. Fletcher, Practical Methods of Optimization, Wiley, 1987.

Figure 1. Convergence Behavior of Variable x₁ for each benchmark function.

Table 1. Summary of Evaluation Conditions.

Parameter	Value
Dimension	10
Domain	$[- 5,5]^{10}$
Constraint	$∥ x ∥^{2} \leq 10$
Iterations	20
MC samples per region	500
Subdivision	Binary (per variable)
Evaluations	$\sim 2 \times 10^{5}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

A Deterministic Global Optimization Framework via Monte Carlo Region Integration: Rigorous Convergence, KKT Equivalence, and Multi-Objective Extension

Abstract

Keywords:

Subject:

1. Introduction

2. Mathematical Preliminaries

2.1. Notation and Problem Setting

2.2. Karush–Kuhn–Tucker (KKT) Conditions

2.3. Multi-Objective Optimization and Pareto Optimality

2.4 Weighted-Sum Scalarization and Its Limitations

2.5. Regularity Assumptions

2.6. Measure-Theoretic Framework

2.7. Probabilistic Framework for Monte Carlo Estimation

2.8. Summary of Assumptions and Their Role

3. The MOST Framework (Unconstrained Case)

3.1. Problem Formulation

3.2. Core Idea of MOST

3.3. Recursive Partitioning of the Search Domain

3.4. Monte Carlo Evaluation of Subregions

3.5. Region Selection Rule

3.6. Deterministic Shrinking Property

3.7. Comparison with Classical Optimization Methods

3.8. Integral Averaging Effect

3.9. Relation to Existing Global Optimization Frameworks

3.10. Summary of the MOST Framework

4. Deterministic Region Shrinking and Geometric Convergence

4.1. Deterministic Region Shrinking

4.2. Recursive Diameter Reduction

4.3. Diameter Shrinking Theorem

4.4. Convergence of Representative Points

4.5. Interpretation of Geometric Convergence

4.6. Independence from Objective Function

4.7. Relation to Deterministic Global Optimization

4.8. Consequences for Convergence Theory

4.9. Summary

5. Global Convergence via Integral-Based Selection

5.1. Problem Setting

5.2. Fundamental Property of Integral Evaluation

5.3. Key Lemma: Integral Separation (Non-Circular)

5.4. Main Theorem: Global Convergence

5.5. Convergence of the Algorithm

5.6. Relation to Branch-and-Bound and DIRECT Methods

5.7. Summary

6. Monte Carlo Error Analysis and Probabilistic Guarantees

6.1. Error Model

6.2. Hoeffding-Type Concentration Bound

6.3. Probabilistic Guarantee of Correct Region Selection

6.4. Coupling with Deterministic Region Shrinking

6.5. Almost Sure Convergence

6.6. Summary

7. Constrained MOST and KKT Equivalence

7.1. Extended Lagrangian Formulation

7.2. Equivalence to KKT Conditions

7.3. Elimination of Spurious Local Minima

7.4. Implications for MOST

7.5. Summary

8. Multi-Objective Extension and Pareto–KKT Structure

8.1. Problem Formulation

8.2. Weighted-Sum Scalarization (Reformulated)

8.3. Revised Claim: Pareto–KKT Stationarity

8.4. Main Theorem

8.5. Discussion on Nonconvexity (Reviewer-Oriented Clarification)

8.6. Summary

9. Geometry of Curved Constraints

9.1. Taylor Expansion of Constraints

9.1. Vanishing of Curvature Terms

9.3. Tangent Plane Theorem and KKT Normality

9.4. Geometric Interpretation

9.5. Implications for MOST

9.6. Summary

10. Unified Convergence Theorem

10.1. Integrated Structure of the MOST Framework

10.2. Unified Convergence Theorem

10.3. Interpretation of the Unified Result

10.4. Comparison with Existing Methods

10.5. Final Implications

10.6. Summary

11. Numerical Experiments

11.1. Problem Setting

11.2. Search Domain and Constraint