A Measure-Theoretic Formulation of Monte Carlo Stochastic Optimization: Unifying Continuous and Discrete Domains

Shin-Ichi Inage

doi:10.20944/preprints202604.0083.v1

4.3. Monte Carlo Sampling in Discrete MOST

A crucial distinction between continuous and discrete MOST lies in the sampling mechanism.

4.3.1. Continuous vs. Discrete Sampling

In continuous domains, samples are drawn from a uniform distribution with respect to the Lebesgue measure. In contrast, in discrete domains, sampling is performed over a finite set.

4.3.2. Uniform Discrete Sampling

For a subset

A \subset D

, define the discrete uniform distribution:

\begin{matrix} P (X = x_{i}) = \frac{1}{| A |}, x_{i} \in A . \end{matrix}

(35)

This corresponds exactly to the probability measure induced by the counting measure.

4.3.3. Weighted Sampling for Odd Partition

In the case of odd partitioning, the midpoint receives half weight. The corresponding sampling distribution becomes:

\begin{matrix} P (X = x_{i}) = \frac{w (x_{i})}{\sum_{x \in A} w (x)}, \end{matrix}

(36)

where

w

is either

w_{L}

or

w_{R}

.

4.3.4. Monte Carlo Estimation

The evaluation functional is approximated by:

\begin{matrix} {\hat{J}}_{M} (A) = \frac{1}{M} \sum_{i = 1}^{M} f (X_{i}), X_{i} \sim P_{A} . \end{matrix}

(37)

This estimator remains consistent with the measure-theoretic formulation in Chapter 3.

4.4. Discrete MOST Algorithm

We now define the complete discrete MOST procedure.

Algorithm (Discrete MOST)

Input: Finite set

D^{(0)} = D

, sample size

M

For

k = 0,1, 2, \dots

:

1.: Partition $D^{(k)}$ into $D_{L}^{(k)}$ , $D_{R}^{(k)}$
2.: Estimate:

\begin{matrix} {\hat{J}}_{M} (D_{L}^{(k)}), {\hat{J}}_{M} (D_{R}^{(k)}) \end{matrix}

(38)

3.: Select:

\begin{matrix} D^{(k+ 1)} = \{\begin{matrix} D_{L}^{(k)}, & if {\hat{J}}_{M} (D_{L}^{(k)}) \leq {\hat{J}}_{M} (D_{R}^{(k)}), \\ D_{R}^{(k)}, & otherwise \end{matrix} \end{matrix}

(39)

Stop when:

\begin{matrix} | D^{(k)} | = 1 . \end{matrix}

(40)

4.5. Fundamental Properties

Theorem 4.1 (Finite Termination)

The discrete MOST algorithm terminates in a finite number of steps.

\begin{matrix} \exists K < \infty such that | D^{(K)} | = 1 . \end{matrix}

(41)

Proof.

At each iteration, the cardinality satisfies:

\begin{matrix} | D^{(k+ 1)} | \leq ⌈\frac{| D^{(k)} |}{2}⌉ . \end{matrix}

(42)

Thus,

| D^{(k)} |

is a strictly decreasing sequence of positive integers, implying finite termination.

Theorem 4.2 (Nestedness)

The sequence of sets satisfies:

\begin{matrix} D^{(0)} \supset D^{(1)} \supset \dots \supset D^{(k)} . \end{matrix}

(43)

Proof.

Immediate from the selection rule (39).

Theorem 4.3 (Correctness under Identifiability Condition)

Let

x^{*}

be the global minimizer. If for all

k

,

\begin{matrix} J_{f} (D_{k}^{*}) < J_{f} (D_{k}^{other}), \end{matrix}

(44)

then discrete MOST converges to

x^{*}

.

Proof.

By induction using (39).

4.6. Summary

In this chapter, we have constructed a rigorous formulation of discrete MOST. The key contributions are:

Definition of discrete MOST via counting measure (29)
Symmetric bisection strategies for even and odd cases (30)–(34)
Discrete Monte Carlo sampling consistent with measure theory (35)–(37)
A complete recursive optimization algorithm (38)–(40)
Fundamental convergence properties (41)–(44)

This formulation establishes discrete MOST as a natural extension of the measure-theoretic MOST framework, rather than an independent algorithm.

Chapter 5. Unified Theory of Continuous and Discrete MOST

This chapter presents the central theoretical contribution of the present study: a unified formulation of continuous and discrete MOST within a single measure-theoretic framework. The key observation is that the distinction between continuous and discrete optimization does not arise from the algorithmic structure itself, but solely from the underlying measure used to evaluate regions. In this sense, continuous and discrete MOST are not different methods, but different realizations of the same abstract optimization principle.

5.1. Main Unification Theorem: Lebesgue versus Counting Measure

The measure-theoretic formulation established in Chapter 3 immediately suggests that continuous and discrete MOST can be treated within a common mathematical framework. We now state this principle formally.

Theorem 5.1 (Unified Measure-Theoretic Formulation of MOST)

Let

(X, F, μ)

be a finite measure space, and let

f : X \to R

be a measurable objective function. For each measurable region

A \in F

with

μ (A) > 0

, define the MOST evaluation functional by

\begin{matrix} J_{f} (A) = \frac{1}{μ (A)} \int_{A} f d μ . \end{matrix}

(45)

Then the regional selection mechanism of MOST is fully determined by the pair

(f, μ)

, and both continuous and discrete MOST arise as special cases of the same abstract construction:

1.: Continuous MOST is obtained when $X \subset R^{n}$ and $μ$ is the Lebesgue measure $λ$ , in which case

\begin{matrix} J_{f} (A) = \frac{1}{λ (A)} \int_{A} f (x) d x . \end{matrix}

(46)

2.: Discrete MOST is obtained when $X = D$ is a finite set and $μ$ is the counting measure $ν$ , in which case

\begin{matrix} J_{f} (A) = \frac{1}{ν (A)} \int_{A} f d ν = \frac{1}{| A |} \sum_{x \in A} f (x) . \end{matrix}

(47)

Hence, continuous and discrete MOST are instances of a single measure-based optimization framework.

Proof.

Equation (45) is well defined on any finite measure space. When

μ = λ

, the integral is the usual Lebesgue integral on a measurable subset of

R^{n}

, yielding (46). When

μ = ν

is the counting measure on a finite set, the integral reduces to a finite sum, yielding (47). Since the regional evaluation and the resulting selection rule depend only on

J_{f} (A)

, the algorithmic structure is identical in both cases. Therefore, both continuous and discrete MOST are special realizations of the same abstract optimization scheme.

The significance of Theorem 5.1 is conceptual as well as technical. In conventional optimization theory, continuous and discrete problems are often treated as fundamentally different classes, requiring distinct methods and convergence analyses [1,2,3,4,5,6,7]. By contrast, the present formulation shows that, within the MOST framework, both classes can be derived from the same variational principle once the appropriate measure is specified.

This observation also clarifies the role of Monte Carlo sampling. In continuous domains, samples are drawn according to the normalized Lebesgue measure; in discrete domains, samples are drawn according to the normalized counting measure. Thus, the difference between the two settings is not algorithmic, but measure-theoretic.

5.2. Odd Partitioning as a Weighted Measure

The preceding theorem establishes the equivalence between continuous and discrete MOST at the level of ordinary measures. However, one subtle issue remains: the treatment of odd-cardinality partitions in discrete domains.

In continuous domains, a bisection at a midpoint generates two subdomains whose boundary overlap has Lebesgue measure zero. Therefore, whether the midpoint is assigned to the left or right subdomain is immaterial from the viewpoint of integration. In discrete domains, by contrast, a midpoint corresponds to an actual element of the finite set and therefore possesses positive counting measure. Consequently, a naive partition of an odd-cardinality set would introduce an artificial asymmetry.

To resolve this difficulty, we interpret odd partitioning through a weighted measure. Let

\begin{matrix} D = \{x_{1}, \dots, x_{2 m + 1}\} \end{matrix}

(48)

be a finite ordered set with odd cardinality. Define the left and right weighting functions

w_{L}, w_{R} : D \to [0,1]

by

\begin{matrix} w_{L} (x_{i}) = \{\begin{matrix} 1, & 1 \leq i \leq m, \\ \frac{1}{2}, & i = m + 1, \\ 0, & i \geq m + 2, \end{matrix} w_{R} (x_{i}) = \{\begin{matrix} 0, & 1 \leq i \leq m, \\ \frac{1}{2}, & i = m + 1, \\ 1, & i \geq m + 2 . \end{matrix} \end{matrix}

(49)

These weights induce weighted counting measures

\begin{matrix} ν_{L} (A) = \sum_{x \in A} w_{L} (x), ν_{R} (A) = \sum_{x \in A} w_{R} (x) . \end{matrix}

(50)

The corresponding regional evaluations are then given by

\begin{matrix} J_{L} = \frac{\int_{D} f d ν_{L}}{ν_{L} (D)} = \frac{\sum_{x \in D} w_{L} (x) f (x)}{\sum_{x \in D} w_{L} (x)}, \end{matrix}

(51)

\begin{matrix} J_{R} = \frac{\int_{D} f d ν_{R}}{ν_{R} (D)} = \frac{\sum_{x \in D} w_{R} (x) f (x)}{\sum_{x \in D} w_{R} (x)} . \end{matrix}

(52)

Since

\begin{matrix} ν_{L} (D) = ν_{R} (D) = m + \frac{1}{2}, \end{matrix}

(53)

both sides possess exactly the same effective measure. The midpoint is therefore shared symmetrically, and no directional bias is introduced into the regional comparison.

This construction yields the following result.

Theorem 5.2 (Justification of Midpoint Sharing via Weighted Measure)

Let

D = {x_{1}, \dots, x_{2 m + 1}}

be an odd-cardinality ordered set. Then the symmetric sharing of the midpoint

x_{m + 1}

between the left and right subregions is rigorously represented by the weighted counting measures

ν_{L}

and

ν_{R}

defined in (50). In particular, the regional evaluation rule remains measure-theoretically consistent and preserves left–right symmetry.

Proof.

By construction, the midpoint

x_{m + 1}

contributes one half to each weighted measure, while all other points contribute fully to exactly one side. Thus,

ν_{L} (D) = ν_{R} (D)

, and the resulting evaluations (51)–(52) are normalized averages with equal effective mass. Therefore, midpoint sharing is equivalent to replacing the ordinary counting measure by two weighted counting measures, one for each side. This preserves the measure-theoretic interpretation of MOST and eliminates asymmetry due to odd cardinality.

Theorem 5.2 is important for two reasons. First, it provides a rigorous mathematical interpretation of the midpoint-sharing rule introduced in Chapter 4. Second, it shows that the apparent irregularity of odd partitioning does not require a separate algorithmic principle; rather, it is naturally absorbed into the same theory by allowing weighted measures. In this sense, odd partitioning is not an exception, but a direct extension of the general measure-theoretic framework.

5.3. Interpretation: Different Realizations of the Same Algorithm

The results above lead to a fundamental reinterpretation of MOST.

Theorem 5.3 (Structural Equivalence of Continuous and Discrete MOST)

Continuous MOST and discrete MOST are structurally equivalent algorithms. Their difference lies solely in the underlying measure used to define regional averaging and probabilistic sampling.

Proof.

From Theorem 5.1, both continuous and discrete MOST are generated by the same evaluation functional (45). From Chapter 3, the probability measure associated with a region

A

is

\begin{matrix} P_{A} (B) = \frac{μ (B \cap A)}{μ (A)} . \end{matrix}

(54)

Thus, both the regional comparison and the Monte Carlo sampling mechanism are determined entirely by

μ

. Choosing

μ

as the Lebesgue measure yields the continuous implementation, whereas choosing

μ

as the counting measure yields the discrete implementation. Therefore, the two algorithms are structurally identical and differ only through the choice of measure.

This theorem allows us to state the principal interpretation of the present work: Continuous and discrete MOST are not separate optimization methods. They are distinct realizations of a single measure-based algorithm.

This insight is, in our view, the most important theoretical contribution of the present study. It implies that the boundary between continuous and discrete optimization is not intrinsic to the MOST framework. Rather, the framework is inherently indifferent to that distinction and accommodates both within the same abstract machinery.

The same observation also suggests a natural path toward hybrid optimization. If the search space contains both continuous and discrete variables, one may endow it with a product measure composed of a Lebesgue measure on the continuous coordinates and a counting measure on the discrete coordinates. The regional evaluation principle then remains unchanged. Although such mixed-variable extensions are beyond the scope of the present paper, they emerge naturally from the unified theory developed here.

5.4. Summary

In this chapter, we have established the unified theory of continuous and discrete MOST.

The principal results are as follows:

A single measure-based evaluation functional, defined by normalized integration, generates both continuous and discrete MOST (Theorem 5.1).
Odd-cardinality partitioning in discrete domains is rigorously justified through weighted counting measures, thereby preserving symmetry and measure-theoretic consistency (Theorem 5.2).
Continuous and discrete MOST are structurally identical algorithms whose apparent differences arise solely from the underlying measure (Theorem 5.3).

These results collectively demonstrate that MOST is not merely adaptable to both continuous and discrete problems, but is inherently formulated at a higher level of abstraction in which both emerge as natural special cases.

This unified perspective provides the conceptual foundation for the subsequent discussion of theoretical limitations and numerical behavior.

Chapter 6. Theoretical Properties and Limitations of MOST

This chapter examines the theoretical properties and inherent limitations of the Monte Carlo Stochastic Optimization Technique (MOST). While the preceding chapters established MOST as a measure-based optimization framework with deterministic domain reduction and probabilistic evaluation, its effectiveness depends on structural properties of the objective function.

We therefore identify the conditions under which MOST is expected to perform reliably, as well as scenarios in which its performance may degrade. This analysis is essential for a rigorous understanding of the scope and applicability of the method.

6.1. Conditions for Effectiveness

The performance of MOST is governed by the relationship between the objective function and the underlying measure. In particular, the success of region-based selection depends on whether regions containing the global minimizer exhibit smaller average values than competing regions.

6.1.1. Unimodality

A fundamental condition under which MOST performs effectively is unimodality. Let

x^{*} \in X

denote a global minimizer of

f

. We say that

f

is unimodal if, for any region

A

containing

x^{*}

, the function does not exhibit competing local minima of comparable depth. Under unimodality, sufficiently small regions containing

x^{*}

satisfy

\begin{matrix} J_{f} (A) = \frac{1}{μ (A)} \int_{A} f d μ \approx f (x^{*}), \end{matrix}

(55)

while regions not containing

x^{*}

have strictly larger average values. Consequently, the region selection mechanism of MOST consistently favors subsets containing the global minimizer.

6.1.2. Low-Value Regions with Positive Measure

A more general and practically relevant condition is that the set of near-optimal points has nonzero measure.

Define the sublevel set:

\begin{matrix} S_{ε} = {x \in X : f (x) \leq f (x^{*}) + ε} . if \end{matrix}

(56)

\begin{matrix} μ (S_{ε}) > 0, \end{matrix}

(57)

then the global minimum is not isolated but embedded within a region of non-negligible measure. In this case, for regions

A

sufficiently aligned with

S_{ε}

, we obtain

\begin{matrix} J_{f} (A) \leq f (x^{*}) + C ε, \end{matrix}

(58)

for some constant

C > 0

, whereas regions disjoint from

S_{ε}

yield strictly larger values. This ensures that the integral-based selection mechanism favors regions containing the global optimum.

This condition reflects the intrinsic strength of MOST: it is particularly effective when optimal solutions occupy a finite volume in the search space.

6.2. Fundamental Limitations

Despite its advantages, MOST exhibits inherent limitations that arise from its measure-based nature.

6.2.1. Isolated Minima (Needle Problem)

Consider the case where the global minimizer

x^{*}

is isolated and the surrounding region has significantly larger function values. In such cases, the contribution of

x^{*}

to the regional average is negligible. Let

A

be a region containing

x^{*}

, and suppose that

f (x)

is large for almost all

x \in A ∖ {x^{*}}

. Then

\begin{matrix} J_{f} (A) \approx \frac{1}{μ (A)} \int_{A ∖ {x^{*}}} f d μ, \end{matrix}

(59)

which is essentially independent of the value at

x^{*}

.

Thus, the presence of a single optimal point does not significantly influence the regional average. As a result, MOST may fail to identify regions containing such isolated minima.

This phenomenon is commonly referred to as the needle problem, and it represents a fundamental limitation of all measure-based optimization methods.

6.2.2. Multimodality

In highly multimodal functions, multiple regions may exhibit similar average values. Let

A_{1}, A_{2} \subset X

be two disjoint regions such that

\begin{matrix} J_{f} (A_{1}) \approx J_{f} (A_{2}) . \end{matrix}

(60)

In this case, the selection decision becomes sensitive to sampling noise:

\begin{matrix} {\hat{J}}_{M} (A_{1}) - {\hat{J}}_{M} (A_{2}) = O (M^{- 1 / 2}) . \end{matrix}

(61)

Thus, finite-sample fluctuations may cause the algorithm to select suboptimal regions, particularly in early stages when regions are large. Although repeated subdivision mitigates this issue, convergence may be slower in strongly multimodal landscapes.

6.3. Comparison with Other Optimization Methods

The theoretical properties of MOST can be further clarified by comparison with established optimization methods.

6.3.1. Gradient-Based Methods

Gradient-based methods rely on local derivative information [1,2,3,4]. Their convergence is typically fast in smooth and convex settings, but they are highly sensitive to initialization and prone to convergence toward local minima.

In contrast, MOST:

·: does not require differentiability
·: is inherently global due to region-based evaluation
·: is less sensitive to local irregularities

6.3.2. Evolutionary Algorithms (GA, PSO)

Evolutionary methods such as GA [8] and PSO [10] explore the search space through stochastic population dynamics. While they are robust to multimodality, their convergence is probabilistic and often requires extensive parameter tuning [12].

MOST differs in that:

·: it provides deterministic domain reduction
·: it does not require population-based search
·: it leverages averaging to stabilize evaluation

6.3.3. Structural Distinction

The fundamental difference between MOST and conventional methods can be summarized as follows:

\begin{matrix} Conventional methods : \underset{x \in X}{m i n} f (x), \end{matrix}

(62)

\begin{matrix} MOST : \underset{A \subset X}{m i n} J_{f} (A) . \end{matrix}

(63)

Thus, MOST replaces pointwise optimization with region-based optimization. This distinction explains both its strengths (robustness to noise and local irregularities) and its limitations (reduced sensitivity to isolated extrema).

6.4. Summary

In this chapter, we have analyzed the theoretical properties and limitations of MOST.

The principal conclusions are:

MOST performs effectively when the objective function exhibits unimodality or when near-optimal regions have positive measure.
The method is inherently limited in problems with isolated global minima, where the contribution of the optimum to regional averages is negligible.
In multimodal landscapes, convergence may be influenced by sampling variability.
MOST fundamentally differs from conventional methods by optimizing regions rather than individual points.

This leads to the central insight of this study: MOST is not a pointwise optimization method, but a measure-based optimization framework. This perspective provides a coherent explanation of both the strengths and limitations of the method and clarifies its position within the broader landscape of optimization theory.

Chapter 7. Numerical Experiments: Discrete MOST Under Uniform Discretization (Revised: 10 Variables)

This chapter presents numerical experiments designed to validate the theoretical framework of discrete MOST developed in the preceding chapters. In particular, we examine the performance of discrete MOST under uniform discretization of the search domain and compare the obtained solutions with the corresponding theoretical optima. Two benchmark functions are considered: the Ackley function and the Sphere function. These functions represent, respectively, a highly multimodal landscape and a convex unimodal landscape, thereby providing complementary test cases.

7.1. Experimental Setup

7.1.1. Search Domain

We consider a ten-dimensional domain:

\begin{matrix} Ω = [- 5,5]^{10} . \end{matrix}

(64)

No additional constraints are imposed, ensuring a uniform search space.

7.1.2. Discretization

The continuous domain is discretized uniformly with step size:

\begin{matrix} Δ = 0.1 . \end{matrix}

(65)

Thus, the discrete set is:

\begin{matrix} D = {x_{i} = - 5 + 0.1 k | k = 0,1, \dots, 100} . \end{matrix}

(66)

The resulting grid contains:

\begin{matrix} | D | = 101^{10} points . \end{matrix}

(67)

This discretization transforms the optimization problem into a high-dimensional finite-domain problem suitable for discrete MOST.

7.1.3. Monte Carlo Sampling

For each subregion

A \subset D

, samples are drawn uniformly:

\begin{matrix} P (X = x_{i}) = \frac{1}{| A |} . \end{matrix}

(68)

The sample size is fixed as:

\begin{matrix} M = 1000 . \end{matrix}

(69)

7.2. Test Functions

1) Ackley Function

The Ackley function in 10 variables is defined as:

\begin{matrix} f (x) = - 20 e x p (- 0.2 \sqrt{\frac{1}{10} \sum_{i = 1}^{10} x_{i}^{2}}) - e x p (\frac{1}{10} \sum_{i = 1}^{10} c o s (2 π x_{i})) + 20 + e . \end{matrix}

(70)

Global minimum:

\begin{matrix} x^{*} = (0, \dots, 0), f (x^{*}) = 0 . \end{matrix}

(71)

2) Sphere Function

The Sphere function in 10 variables is defined as:

\begin{matrix} f (x) = \sum_{i = 1}^{10} x_{i}^{2} . \end{matrix}

(72)

Global minimum:

\begin{matrix} x^{*} = (0, \dots, 0), f (x^{*}) = 0 . \end{matrix}

(73)

7.3. Application of Discrete MOST

The discrete MOST algorithm described in Chapter 4 is applied to the discretized domain.

At each iteration:

·: The current discrete set is bisected
·: Monte Carlo estimation (37) is performed
·: The region with smaller average value is selected
·: The process continues until a single point remains

7.4. Results: Ackley Function（Revised）

7.4.1. Discrete Optimum

Due to discretization:

\begin{matrix} x_{d}^{*} = (0.0, \dots, 0.0) . \end{matrix}

(74)

Thus, the theoretical discrete optimum coincides with the continuous optimum. In the present discretization scheme with resolution

Δ = 0.1

, the point

x = 0

is explicitly included in the discrete grid defined by (66). Therefore, the global minimizer of the continuous Ackley function, which is located at the origin, is exactly represented within the discrete search space.

This is a nontrivial but important property: in general, discretization introduces approximation errors, and the discrete optimum may differ from the continuous one. However, in this particular setup, the discretization grid is aligned with the true optimum, ensuring that no discretization bias is introduced at the level of the global solution.

7.4.2. MOST Result

The discrete MOST algorithm yields:

\begin{matrix} x_{M O S T} = (0.0, \dots, 0.0), \end{matrix}

(75)

\begin{matrix} f_{M O S T} \approx 4.44 \times 10^{- 16} . \end{matrix}

(76)

The discrete MOST algorithm converges to the exact global minimizer within the discretized domain. The resulting function value is numerically indistinguishable from zero, with a residual on the order of machine precision.

This slight deviation from the exact value

f (x^{*}) = 0

is not due to algorithmic error, but rather arises from floating-point arithmetic in the evaluation of exponential and trigonometric functions. In particular, the Ackley function involves nested exponential and cosine terms, which introduce small numerical rounding effects.

Importantly, the algorithm successfully navigates the highly multimodal structure of the Ackley function, avoiding numerous local minima and converging to the global optimum.

7.4.3. Error Analysis

Define the error:

\begin{matrix} E = ∥ x_{M O S T} - x^{*} ∥ . \end{matrix}

(77)

Then:

\begin{matrix} E = 0 . \end{matrix}

(78)

This is consistent with the discretization resolution. Since the discrete optimum coincides exactly with the continuous optimum and the algorithm successfully identifies this point, the resulting error is identically zero.

This result represents an ideal case in which both sources of error—discretization error and optimization error—are eliminated:

·: Discretization error is zero because the optimal point is included in the grid
·: Algorithmic error is zero because MOST correctly identifies the optimal region

In general, one expects the error to scale as

O (Δ)

, as discussed in Chapter 6. The present result therefore confirms that the theoretical error bound is sharp and that exact recovery is possible when the discretization grid contains the global minimizer.

7.4.4. Interpretation

·: The solution coincides with the true optimum
·: The error is bounded by $O (Δ)$

·: The method successfully avoids local minima despite strong multimodality

The results for the Ackley function demonstrate the robustness of discrete MOST in a challenging optimization landscape characterized by a large number of local minima.

First, the exact recovery of the global optimum confirms that the region-based selection mechanism is capable of isolating the correct region even in high-dimensional and highly oscillatory settings.

Second, the absence of error indicates that the discretization scheme does not degrade solution quality in this case. This highlights the importance of grid alignment with the global optimum.

Third, and most importantly, the algorithm avoids entrapment in local minima. This behavior can be attributed to the integral-based evaluation mechanism: local minima that occupy small regions contribute negligibly to the average value

J_{f} (A)

, and are therefore systematically rejected during the recursive partitioning process.

7.5. Results: Sphere Function（Revised）

7.5.1. Discrete Optimum

\begin{matrix} x_{d}^{*} = (0, \dots, 0) . \end{matrix}

(79)

As in the Ackley case, the global minimizer of the Sphere function is located at the origin, which is explicitly included in the discretized grid. Therefore, the discrete and continuous optima coincide exactly.

7.5.2. MOST Result

\begin{matrix} x_{M O S T} = (0.0, \dots, 0.0), \end{matrix}

(80)

\begin{matrix} f_{M O S T} = 0 . \end{matrix}

(81)

The discrete MOST algorithm converges exactly to the global minimizer. Unlike the Ackley function, the Sphere function is a smooth, convex, and unimodal function. As a result, the integral-based evaluation aligns perfectly with the pointwise objective structure, and convergence occurs without any ambiguity or numerical instability.

7.5.3. Estimation of Error

\begin{matrix} E = 0 . \end{matrix}

(82)

The zero error reflects both exact representation of the optimum in the discretized domain and perfect identification by the algorithm. This confirms that MOST achieves exact convergence under ideal conditions.

7.5.4. Interpretation

·: Exact recovery of the optimum
·: Confirms correctness under unimodality

The Sphere function provides a baseline case for evaluating the correctness of the algorithm. Since the function is strictly convex and possesses a unique global minimum, any consistent global optimization method should converge to the correct solution.

The results confirm that MOST satisfies this requirement. Moreover, the smooth structure of the function ensures that the regional average

J_{f} (A)

decreases monotonically as the region approaches the optimum, leading to stable and deterministic convergence.

This behavior is consistent with the theoretical conditions outlined in Chapter 6, where unimodality guarantees the effectiveness of region-based optimization.

7.6. Comparative Summary

Table 1 summarizes the performance of the discrete MOST algorithm on the two benchmark functions considered in this study. For both the Ackley function and the Sphere function in ten dimensions, the algorithm successfully identifies the global optimum within the discretized search domain.

Several key observations can be made:

First, the MOST solution coincides exactly with the theoretical optimum in both cases. This confirms that the algorithm is capable of resolving the global minimum even in high-dimensional settings, provided that the optimal point is included in the discretization grid.

Second, the error is identically zero for both functions. This result represents the ideal scenario in which both discretization error and optimization error vanish. In general, as discussed in Chapter 6, the error is expected to scale as

O (Δ)

; however, when the discretization grid contains the true optimum and the algorithm successfully isolates the corresponding region, exact recovery is achieved.

Third, the results highlight the robustness of MOST across different landscape structures. The Sphere function represents a convex unimodal problem, whereas the Ackley function is highly multimodal with numerous local minima. Despite this fundamental difference, the algorithm exhibits consistent performance, indicating that the measure-based regional evaluation effectively suppresses the influence of local irregularities.

Finally, the consistency of results across both test functions supports the theoretical framework developed in Chapters 3–6. In particular, it confirms that discrete MOST operates as a faithful realization of the unified measure-theoretic formulation, even in high-dimensional spaces.

7.7. Discussion

The results confirm the theoretical predictions:

·: Discretization Error

\begin{matrix} E = O (Δ), \end{matrix}

(83)

indicating that accuracy is controlled by grid resolution.

·: Robustness to Multimodality

The Ackley function demonstrates that MOST is not trapped by local minima, due to integral averaging.

·: Consistency with Theory

The results align with Chapter 6:

·: Works well when low-value regions have measure
·: Accuracy improves with finer discretization

7.8. Summary

In this chapter, we have demonstrated that:

·: Discrete MOST successfully approximates global optima under uniform discretization
·: The error is bounded by the discretization step size
·: The method is robust to multimodal landscapes
·: The theoretical framework developed in Chapters 3–6 is validated numerically

These results confirm that discrete MOST provides a practical and theoretically consistent extension of the continuous framework.

Chapter 8. Discussion

This chapter provides a conceptual interpretation of the Monte Carlo Stochastic Optimization Technique (MOST) and discusses its robustness and future extensions. Building upon the measure-theoretic framework and numerical validation presented in the preceding chapters, we position MOST within a broader theoretical and practical context.

8.1. Physical and Geometric Interpretation

The measure-theoretic formulation of MOST reveals a fundamental reinterpretation of optimization itself. Conventional optimization methods operate by searching for points that minimize the objective function. In contrast, MOST evaluates regions, assigning to each region a value determined by a measure-weighted average.

This distinction can be expressed formally as:

\begin{matrix} Classical optimization : \underset{x \in X}{m i n} f (x), \end{matrix}

(84)

\begin{matrix} MOST : \underset{A \subset X}{m i n} \frac{1}{μ (A)} \int_{A} f d μ . \end{matrix}

(85)

Thus, MOST replaces pointwise evaluation with measure-based aggregation. From a geometric viewpoint, this corresponds to evaluating the objective function not at isolated points, but over finite-volume regions. The optimization process therefore becomes a sequence of geometric refinements, in which the search domain is progressively contracted toward regions of lower average energy.

From a physical perspective, the integral in (85) may be interpreted as a form of coarse-grained energy. Rather than being sensitive to infinitesimal fluctuations, MOST captures the macroscopic structure of the objective landscape. This interpretation explains its robustness in complex, multimodal environments: narrow local minima contribute negligibly to the integral unless they occupy a region of non-negligible measure.

Consequently, MOST naturally emphasizes stable structures in the search space—regions where low values persist over a measurable volume—rather than isolated extrema.

8.2. Robustness and Noise Tolerance

A direct consequence of the measure-based formulation is robustness to noise and irregularity. Consider an objective function perturbed by stochastic noise:

\begin{matrix} f_{η} (x) = f (x) + η (x), \end{matrix}

(86)

where

η (x)

is a zero-mean random perturbation. Under regional averaging, we obtain:

\begin{matrix} J_{f_{η}} (A) = J_{f} (A) + \frac{1}{μ (A)} \int_{A} η (x) d μ (x) . \end{matrix}

(87)

Since the noise term averages out, we have:

\begin{matrix} E [J_{f_{η}} (A)] = J_{f} (A) . \end{matrix}

(88)

Moreover, the variance decreases with increasing sample size:

\begin{matrix} V a r ({\hat{J}}_{M} (A)) = O (M^{- 1}) . \end{matrix}

(89)

This averaging effect implies that MOST is inherently resistant to:

·: stochastic noise
·: measurement errors
·: high-frequency oscillations

Unlike gradient-based methods, which are highly sensitive to local perturbations, MOST stabilizes the evaluation through integration. This property is particularly advantageous in practical engineering problems where objective functions are often noisy or derived from simulation outputs.

8.3. Future Extensions

The unified measure-theoretic framework developed in this study naturally suggests several extensions.

8.3.1. Mixed Continuous–Discrete Optimization

Since MOST is defined on a general measure space, hybrid optimization problems can be treated by introducing product measures:

\begin{matrix} μ = λ \times ν, \end{matrix}

(90)

where

λ

is a Lebesgue measure on continuous variables and

ν

is a counting measure on discrete variables.

The evaluation functional remains unchanged:

\begin{matrix} J_{f} (A) = \frac{1}{μ (A)} \int_{A} f d μ . \end{matrix}

(91)

This provides a principled framework for mixed-integer optimization problems without requiring separate algorithmic strategies.

8.3.2. High-Dimensional Optimization

High-dimensional optimization remains a central challenge in modern applications. In such settings, the curse of dimensionality affects all global optimization methods.

Within the MOST framework, this issue manifests in the exponential growth of partition complexity. However, the measure-based formulation suggests potential strategies:

·: adaptive partitioning
·: dimension-wise decomposition
·: importance sampling

In particular, the integral formulation may be combined with variance reduction techniques to improve efficiency in high-dimensional spaces.

8.3.3. Adaptive and Anisotropic Partitioning

The current formulation employs uniform bisection. However, the framework allows for more general partitioning strategies:

\begin{matrix} Ω^{(k)} \to \{A_{1}^{(k)}, A_{2}^{(k)}, \dots\} . \end{matrix}

(92)

By adapting the partitioning scheme to the local structure of

f

, one may improve convergence speed and accuracy.

8.4. Central Insight

The theoretical and numerical results of this study lead to the following fundamental conclusion: MOST is a measure-based optimization framework, not a pointwise optimization method. This statement encapsulates the essential distinction between MOST and conventional optimization techniques. By shifting the focus from points to regions, MOST provides a new perspective on global optimization that is inherently robust, scalable, and adaptable.

8.5. Concluding Remarks of Discussion

The measure-theoretic reinterpretation presented in this work elevates MOST from a heuristic algorithm to a unified theoretical framework. This perspective clarifies both its strengths—robustness, globality, and simplicity—and its limitations, particularly in the presence of isolated minima.

More importantly, it establishes a conceptual bridge between continuous and discrete optimization, suggesting that both can be understood as manifestations of a deeper, measure-based principle.

This insight opens new directions for research and application, particularly in hybrid optimization, high-dimensional systems, and stochastic environments.

Chapter 9. Conclusion

In this study, we have established a unified theoretical framework for the Monte Carlo Stochastic Optimization Technique (MOST) based on measure-theoretic principles. By reformulating MOST in terms of normalized integrals over measurable regions, we have demonstrated that both continuous and discrete optimization problems can be treated within a single, coherent mathematical structure.

A central achievement of this work is the rigorous construction of discrete MOST. By introducing the counting measure and extending it to weighted measures, we have shown that discrete domain partitioning—including the nontrivial case of odd cardinality with midpoint sharing—can be handled consistently within the same theoretical framework. This result confirms that discrete MOST is not a heuristic extension, but a natural consequence of the underlying measure-based formulation.

Furthermore, we have established that continuous and discrete MOST are structurally equivalent algorithms whose differences arise solely from the choice of measure. This unification provides a new perspective on optimization, in which the distinction between continuous and discrete domains is no longer fundamental, but instead reflects different realizations of a common principle.

The numerical experiments conducted using benchmark functions such as the Ackley and Sphere functions have validated the theoretical predictions. In particular, the results demonstrate that discrete MOST achieves accurate approximations of global optima, with errors governed by discretization resolution, and exhibits robustness in multimodal landscapes.

Beyond its theoretical contributions, the present framework suggests broad applicability. The measure-theoretic formulation naturally extends to mixed-variable optimization, high-dimensional problems, and stochastic environments, offering a flexible and scalable approach to complex optimization tasks.

In conclusion, this work establishes MOST as a measure-based optimization paradigm that unifies continuous and discrete optimization within a single rigorous framework. This perspective not only clarifies the fundamental nature of MOST but also opens new avenues for research and application in global optimization theory and practice.

Appendix A. Discrete MOST Algorithm for Numerical Experiments

A.1 Purpose and Relation to the Main Text

This appendix provides a detailed description of the discrete MOST algorithm used in Chapter 7 for numerical experiments. The formulation is directly based on the measure-theoretic framework introduced in Chapter 3 and the discrete construction developed in Chapter 4.

In particular:

·: The evaluation functional corresponds to (29)
·: The Monte Carlo estimator corresponds to (37)
·: The selection rule corresponds to (39)

The purpose of this appendix is to ensure full reproducibility of the numerical results.

A.2 Discrete Domain Construction

Let the search domain be defined as:

\begin{matrix} Ω = [- 5,5]^{n} . \end{matrix}

(A1)

Each coordinate is discretized with step size:

\begin{matrix} Δ = 0.1 . \end{matrix}

(A2)

Thus, the one-dimensional discrete set is:

\begin{matrix} D = \{x_{i} = - 5 + k Δ∣ k = 0,1, \dots, 100\} . \end{matrix}

(A3)

The full search space is given by the Cartesian product:

\begin{matrix} D^{(n)} = D \times D \times \dots \times D . \end{matrix}

(A4)

A.3 Partitioning Strategy

At each iteration, the current candidate set for each variable is partitioned. Let

D_{j}^{(k)}

denote the candidate set of the

j

-th variable at iteration

k

.

A.3.1 Even Cardinality

If

| D_{j}^{(k)} | = 2 m

, then:

\begin{matrix} D_{j, L}^{(k)} = {x_{1}, \dots, x_{m}}, D_{j, R}^{(k)} = {x_{m + 1}, \dots, x_{2 m}} . \end{matrix}

(A5)

A.3.2 Odd Cardinality

If

| D_{j}^{(k)} | = 2 m + 1

, define:

\begin{matrix} D_{j, L}^{(k)} = {x_{1}, \dots, x_{m}, x_{m + 1}}, D_{j, R}^{(k)} = {x_{m + 1}, \dots, x_{2 m + 1}} . \end{matrix}

(A6)

The midpoint

x_{m + 1}

is shared between both subsets.

A.4 Monte Carlo Sampling

For a given subset

A \subset D^{(n)}

, sampling is performed according to the discrete uniform distribution:

\begin{matrix} P (X = x_{i}) = \frac{1}{| A |} . \end{matrix}

(A7)

For odd partitioning, weighted sampling is used:

\begin{matrix} P (X = x_{i}) = \frac{w (x_{i})}{\sum_{x \in A} w (x)} . \end{matrix}

(A8)

A.5 Monte Carlo Estimation

The regional evaluation is approximated by:

\begin{matrix} {\hat{J}}_{M} (A) = \frac{1}{M} \sum_{i = 1}^{M} f (X_{i}), X_{i} \sim P_{A} . \end{matrix}

(A9)

A.6 Algorithm Description

We now present the complete discrete MOST algorithm used in Chapter 7.

Algorithm A.1 (Discrete MOST)

Input:

Initial domain

D^{(0)} = D^{(n)}

Sample size

M

Step 1 (Initialization):

\begin{matrix} k = 0 . \end{matrix}

(A10)

For each variable

j = 1, \dots, n

, set:

\begin{matrix} D_{j}^{(0)} = D . \end{matrix}

(A11)

Step 2 (Partitioning):

For each variable

j

, partition

D_{j}^{(k)}

into:

\begin{matrix} D_{j, L}^{(k)}, D_{j, R}^{(k)} . \end{matrix}

(A12)

Step 3 (Monte Carlo Evaluation):

Estimate:

\begin{matrix} {\hat{J}}_{M} (D_{j, L}^{(k)}), {\hat{J}}_{M} (D_{j, R}^{(k)}) . \end{matrix}

(A13)

Step 4 (Selection):

Update:

\begin{matrix} D_{j}^{(k+ 1)} = \{\begin{matrix} D_{j, L}^{(k)}, & if {\hat{J}}_{M} (D_{j, L}^{(k)}) \leq {\hat{J}}_{M} (D_{j, R}^{(k)}), \\ D_{j, R}^{(k)}, & otherwise . \end{matrix} \end{matrix}

(A14)

Step 5 (Iteration):

\begin{matrix} k \leftarrow k + 1 . \end{matrix}

(A15)

Repeat Steps 2–5 until:

\begin{matrix} | D_{j}^{(k)} | = 1 \forall j . \end{matrix}

(A16)

Step 6 (Output):

Return:

\begin{matrix} x^{*} = (x_{1}^{*}, \dots, x_{n}^{*}), \end{matrix}

(A17)

where

x_{j}^{*}

is the unique element of

D_{j}^{(k)}

.

A.7 Computational Remarks

·: The algorithm performs dimension-wise recursive reduction
·: The total number of iterations is bounded by:

\begin{matrix} O ({l o g}_{2} | D |) . \end{matrix}

(A18)

The computational cost is dominated by Monte Carlo sampling:

\begin{matrix} O (n \cdot M \cdot K), \end{matrix}

(A19)

where

K

is the number of iterations

A.8 Summary

This appendix provides a complete and reproducible description of the discrete MOST algorithm used in the numerical experiments. The formulation is fully consistent with the measure-theoretic framework developed in the main text and directly supports the results presented in Chapter 7.

References

Nocedal, J.; Wright, S. Numerical Optimization; Springer, 2006. [Google Scholar] [CrossRef]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge Univ. Press, 2004. [Google Scholar] [CrossRef]
Bertsekas, D. Nonlinear Programming; Athena Scientific, 1999. [Google Scholar] [CrossRef]
Conn, A.; Gould, N.; Toint, P. Trust Region Methods; SIAM, 2000. [Google Scholar] [CrossRef]
Horst, R.; Pardalos, P. Handbook of Global Optimization; Springer, 1995. [Google Scholar] [CrossRef]
Floudas, C. Deterministic Global Optimization; Springer, 2000. [Google Scholar] [CrossRef]
Spall, J. Introduction to Stochastic Search and Optimization; Wiley, 2003. [Google Scholar] [CrossRef]
Holland, J. Adaptation in Natural and Artificial Systems; MIT Press, 1975. [Google Scholar]
Storn, R.; Price, K. Differential Evolution, J. Global Optimization 1997, 11, 341–359. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle Swarm Optimization. Proc. IEEE ICNN, 1995. [Google Scholar] [CrossRef]
Hansen, N. CMA-ES Evolution Strategy. In Evolutionary Computation; 2003. [Google Scholar] [CrossRef]
Bäck, T. Evolutionary Algorithms in Theory and Practice; Oxford Univ. Press, 1996. [Google Scholar]
Robbins, H.; Monro, S. Stochastic Approximation. In Ann. Math. Stat.; 1951. [Google Scholar] [CrossRef]
Kushner, H.; Yin, G. Stochastic Approximation; Springer, 2003. [Google Scholar] [CrossRef]
Jones, D.; Schonlau, M.; Welch, W. Efficient Global Optimization, J. In Global Optimization; 1998. [Google Scholar] [CrossRef]
Shahriari, B. Bayesian Optimization Review. Proc. IEEE, 2016. [Google Scholar] [CrossRef]
Snoek, J. Practical Bayesian Optimization. NIPS 2012. [Google Scholar]
Lawler, E.; Wood, D. Branch-and-Bound Methods. In Operations Research; 1966. [Google Scholar] [CrossRef]
Jones, D.; DIRECT Algorithm, J. Optimization Theory Appl.; 1993. [Google Scholar] [CrossRef]
Piyavskii, S. Lipschitz Optimization. USSR Comput. Math. 1972. [Google Scholar]
Deb, K. NSGA-II. In IEEE Trans. Evolutionary Computation; 2002. [Google Scholar] [CrossRef]
Zitzler, E.; Thiele, L. SPEA2. TIK Report 2001. [Google Scholar]
Karush, W. Minima of Functions; 1939. [Google Scholar]
Kuhn, H.; Tucker, A. Nonlinear Programming. Proc. Berkeley Symp., 1951. [Google Scholar]
Miettinen, K. Nonlinear Multiobjective Optimization; Springer, 1999. [Google Scholar] [CrossRef]
Inage, S.; Hebishima, H. Monte Carlo Stochastic Optimization Technique (MOST). Mathematics and Computers in Simulation 2022, 199, 257–271. [Google Scholar] [CrossRef]
Inage, S.; Ohgi, S.; Takahashi, Y. Multi-objective MOST. Mathematics and Computers in Simulation 2024, 215, 146–157. [Google Scholar] [CrossRef]
Inage, S.; Ajito, K. A Deterministic Global Optimization Framework via Monte Carlo Region Integration. Preprint 2024. [Google Scholar]
Hoeffding, W. Probability Inequalities for Sums of Bounded Random Variables. Journal of the American Statistical Association 1963, 58(301), 13–30. [Google Scholar] [CrossRef]

Table 1. Comparison of discrete MOST solutions and theoretical optima for 10-dimensional benchmark functions under uniform discretization (

Δ = 0.1

).

Table 1. Comparison of discrete MOST solutions and theoretical optima for 10-dimensional benchmark functions under uniform discretization (

Δ = 0.1

).

Function	True Optimum	MOST Solution	Error
Ackley (10D)	(0,…,0)	(0,…,0)	0
Sphere (10D)	(0,…,0)	(0,…,0)	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

A Measure-Theoretic Formulation of Monte Carlo Stochastic Optimization: Unifying Continuous and Discrete Domains

Abstract

Keywords:

Subject:

1. Introduction

2.1. Problem Setting

2.2. Core Idea: Region-Based Optimization via Integral Comparison

2.3. Binary Partitioning of the Search Domain

2.4. Monte Carlo Approximation of Regional Integrals

2.5. Region Selection Rule

2.6. Deterministic Shrinking of the Search Region

2.7. Integral Averaging Effect

2.8. Comparison with Classical Optimization Methods

2.9. Summary

3.1. Measure Space Formulation

3.2. Measure-Based Evaluation Functional

3.3. Correspondence to Probability Measures

3.4. Monte Carlo Approximation

3.5. Fundamental Theoretical Properties

3.6. Interpretation and Structural Insights

3.7. Summary

4.1. Discrete Search Space and Counting Measure

4.2. Bisection Strategy: Even and Odd Cardinalities

4.2.1. Even Cardinality

4.2.2. Odd Cardinality

4.3. Monte Carlo Sampling in Discrete MOST

4.3.1. Continuous vs. Discrete Sampling

4.3.2. Uniform Discrete Sampling

4.3.3. Weighted Sampling for Odd Partition

4.3.4. Monte Carlo Estimation

4.4. Discrete MOST Algorithm

4.5. Fundamental Properties

4.6. Summary

5.1. Main Unification Theorem: Lebesgue versus Counting Measure

5.2. Odd Partitioning as a Weighted Measure

5.3. Interpretation: Different Realizations of the Same Algorithm

5.4. Summary

6.1. Conditions for Effectiveness

6.1.1. Unimodality

6.1.2. Low-Value Regions with Positive Measure

6.2. Fundamental Limitations

6.2.1. Isolated Minima (Needle Problem)

6.2.2. Multimodality

6.3. Comparison with Other Optimization Methods

6.3.1. Gradient-Based Methods

6.3.2. Evolutionary Algorithms (GA, PSO)

6.3.3. Structural Distinction

6.4. Summary

7.1. Experimental Setup

7.1.1. Search Domain

7.1.2. Discretization

7.1.3. Monte Carlo Sampling

7.2. Test Functions

7.3. Application of Discrete MOST

7.4. Results: Ackley Function（Revised）

7.4.1. Discrete Optimum

7.4.2. MOST Result

7.4.3. Error Analysis

7.4.4. Interpretation

7.5. Results: Sphere Function（Revised）

7.5.1. Discrete Optimum

7.5.2. MOST Result

7.5.3. Estimation of Error

7.5.4. Interpretation

7.6. Comparative Summary

7.7. Discussion

7.8. Summary

8.1. Physical and Geometric Interpretation

8.2. Robustness and Noise Tolerance

8.3. Future Extensions

8.3.1. Mixed Continuous–Discrete Optimization

8.3.2. High-Dimensional Optimization

8.3.3. Adaptive and Anisotropic Partitioning

8.4. Central Insight

8.5. Concluding Remarks of Discussion

Appendix A. Discrete MOST Algorithm for Numerical Experiments

References

MDPI Initiatives

Important Links

Subscribe