A Certificate Layer for Multi-Objective Training Across Refinement Scales

Camilla Josephson

doi:10.20944/preprints202512.1864.v1

Submitted:

21 December 2025

Posted:

22 December 2025

You are already at the latest version

Abstract

Modern training pipelines are governed by multiple coupled nonnegative metrics (performance, constraints, robustness, calibration, and compute budgets) and are rerun across refinement ladders (width/depth scaling, discretization, basis growth, and data-fidelity upgrades). This paper develops a certificate layer for such settings. First, a Metzler comparison system for the ledger vector admits a one-clock reduction: if a Hurwitz witness exists, then a declared positive scalarization contracts exponentially, with explicit two-ledger small-gain formulas. Second, a Master Certificate upgrades four auditable programme lines verified in a single declared ruler---summable tail disturbances, a uniform contraction margin, projective Cauchy consistency, and a uniform dictionary for reported readouts---to existence and uniqueness of a refinement-limit learner on the certification window, together with rate inheritance and readout transport. The framework yields a proof-carrying stability grammar for learning under updates and refinement, intended to compose with external generalization and robustness modules.

Keywords:

certificate layer

;

proof-carrying verification

;

refinement ladders

;

multi-objective learning

;

nonnegative ledgers

;

positive systems

;

Metzler matrices

;

copositive Lyapunov functions

;

small-gain

;

projective limits

;

dictionary transport

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Modern training pipelines are governed by multiple coupled nonnegative quantities: predictive performance, constraint and safety violations, robustness proxies, calibration errors, and compute/latency budgets. In multi-task and multi-constraint settings these ledgers can genuinely compete, so a single weighted loss is often only a proxy for what a pipeline is required to guarantee [1,2]. A second ubiquitous feature is refinement: the same pipeline is rerun across a ladder of resolutions (width/depth scaling [3], discretization refinement, feature-basis growth, data fidelity upgrades). In scientific computing, refinement limits are meaningful only once all levels are compared in a fixed energy norm and stable transfer maps are specified [4,5,6].

1.1. Problem and Certificate Goal

Problem (multi-ledger learning under refinement). Training is repeated across refinement levels, producing (i) learner trajectories/iterates and (ii) a vector of nonnegative ledgers tracking performance, constraints, robustness, calibration, and resource budgets. We seek auditable conditions under which:

(i): a refinement-limit learner exists and is unique in a fixed, declared ruler, and
(ii): the multi-ledger evolution admits a single certified contraction clock (a uniform exponential rate class) that does not degrade as the refinement index grows.

The formal ladder objects (state spaces, coarse maps/projections, and the declared ruler) are fixed once in sec:problem. The two theorem engines that turn auditable inequalities into guarantees are proved in sec:certificate-layer,sec:master-certificate-learning, with instantiations in sec:instantiations.

1.2. Context and Gap

Two adjacent traditions motivate the problem but do not, by themselves, provide a checker-facing interface.

Multiobjective learning (within a fixed representation).

Pareto structure and scalarization are classical in multiobjective optimization [1], and modern multi-task learning can be framed as multiobjective training with gradient-based trade-off schemes [2]. This line clarifies how to negotiate competing objectives at a given scale; it does not provide a refinement-consistent, verifier-friendly implication that survives changes of resolution.

Refinement limits (stability in a fixed norm).

In numerical analysis and multilevel methods, convergence across discretizations becomes meaningful only after fixing a single norm/ruler and stable transfer operators, as in stability–consistency paradigms and energy-norm FEM/multigrid constructions [4,5,6,7]. Semigroup approximation theory plays a related role for evolution families [8,9,10]. What is missing for modern training loops is a compact certificate interface that simultaneously (i) handles several interacting nonnegative quantities and (ii) remains coherent along a refinement ladder, returning a single auditable decay rate.

1.3. State of the Art, Approach, and Contributions

The paper sits at the intersection of three adjacent traditions.

State of the art (adjacent lines).

(i) Multiobjective learning and scalarization. The standard formal language for competing objectives is multiobjective optimization and Pareto trade-offs [1]; in modern ML, multi-task procedures can be interpreted as multiobjective updates with explicit trade-off mechanisms [2]. This line clarifies how to balance objectives at a fixed representation level, but it does not by itself provide a verifier-facing implication that survives changing resolution.

(ii) Positive systems and comparison principles. Metzler matrices, order-preserving comparison, and Perron–Frobenius structure form the classical backbone of positive linear systems and cooperative dynamics [11,12,13]. Copositive (linear) Lyapunov functions and robust/synthesis viewpoints for positive systems make the contraction certificates particularly witness-friendly [14]. What is missing for ML pipelines is an explicit “certificate grammar” that turns such comparison bounds into a shipped artifact and a small checker.

(iii) Refinement ladders and stable limits. In numerical analysis, meaningful convergence across discretizations requires a fixed measurement convention (energy norm) and stable transfer operators; this underlies stability–consistency paradigms and multilevel methods [4,5,6,7]. Abstract semigroup approximation theory provides a complementary operator-theoretic view of consistent approximations of evolution families [8,9,10]. These traditions explain why “one ruler” is non-negotiable, but they do not natively address coupled multi-ledger training objectives nor provide a checker interface for ML systems.

Approach (two theorem engines).

We separate the solution into two theorem engines: an in-level reduction that collapses many nonnegative ledgers to one certified clock, and a cross-level reduction that turns summable refinement discrepancies into a limit object in one ruler.

Engine A (positivity ⇒ one clock).

At each level K we target a cooperative comparison inequality of Metzler type for the ledger vector, possibly with a nonnegative remainder. In this setting, positive-systems theory supplies checker-friendly copositive Lyapunov witnesses and Perron–Frobenius structure [11,12,13,14]. The certificate layer is organized so that a prover can ship a short witness

(w, λ)

and a verifier can check the one-clock claim by a finite set of inequalities (sec:certificate-layer).

Engine B (one ruler + summable discrepancies ⇒ refinement limit).

Across levels, we realize

X^{(K)}

inside a common ambient ruler space and compare levels only after applying the declared coarse projection. A summable tower defect (projective Cauchy condition) forces a Cauchy tower in the ambient ruler and therefore a unique refinement-limit learner on the declared window (sec:master-certificate-learning). This is the refinement-ladder analogue of the basic “summable increments ⇒ convergence” mechanism in Banach spaces [15].

Contributions (with pointers).

C1:: One-clock reduction for nonnegative ledgers. We develop a checker form of the Metzler/Hurwitz implication: a copositive witness yields exponential contraction of a declared scalar ledger, with an explicit two-ledger small-gain boundary and monotone design levers (sec:certificate-layer).
C2:: Master Certificate across refinement ladders. We prove that four auditable programme lines in one ruler (tail-robust envelope, uniform margin, projective Cauchy tower, uniform dictionary) imply existence/uniqueness of a refinement-limit learner on $[0, T]$ together with rate inheritance and readout transport (sec:master-certificate-learning).
C3:: Instantiations. We give (i) a fully checkable toy ladder with explicit constants and (ii) a width-ladder protocol sketch indicating how to populate the certificate from logged traces (sec:instantiations).
C4:: Scope wall and composability. The proved guarantees are training-time stability and refinement-limit existence in a declared ruler. Statistical generalization, out-of-distribution shift, and stochastic-optimizer noise are not asserted unless they are explicitly ledgerized or imported as separate modules (e.g. stability/PAC-Bayes and robustness/verification frameworks) [16,17,18,19,20,21,22].

Certificate interface.

The paper adopts a proof-carrying viewpoint: the producer supplies a certificate artifact (declared ruler, transfer maps, ledger definitions, budgets, and—when applicable—a Metzler witness), and a consumer validates it with a small checker, analogous in spirit to proof-carrying code [23].

2. Problem: Multi-Metric Learning on Refinement Ladders

This section fixes the formal objects referenced by the certificate obligations. We specify (i) a refinement-indexed ladder of state spaces realized inside a single declared ambient ruler, (ii) coherent coarse-graining/projection maps along the ladder, (iii) training dynamics (discrete updates or continuous flows) on each level, and (iv) ledger/readout maps that turn states into multiple nonnegative metrics. This setup is chosen so that the programme lines (O1)–(O4) in sec:master-certificate-learning can be stated unambiguously and checked mechanically.

2.1. Learner State, Dynamics, and a Single Ambient Ruler

2.1.1. Refinement-Indexed State Spaces and One Ruler

Fix a refinement index set

K = {K_{0}, K_{0} + 1, \dots}

. For each

K \in K

, let

X^{(K)}

be a metric space of learner states at resolution K (parameters, buffers, optimizer state, constraint multipliers, etc.). We assume the ladder embeds into an ambient normed space

(X^{(\infty)}, \cdot)

with a single ruler:

X^{(K)} \subset X^{(\infty)} for all K,

and we fix projections

P_{K} : X^{(\infty)} \to X^{(K)}

such that

P_{K} P_{L} = P_{K} (K \leq L), P_{K} z \leq z, P_{K} z \to z in X^{(\infty)} as K \to \infty .

(1)

The identities in (1) encode a coherent ladder geometry: coarse-graining is consistent along the ladder and contractive in the declared ruler.

2.1.2. Training Dynamics: Discrete Updates and Continuous Flows

At each K, training is described either by a discrete-time update map

x_{n + 1}^{(K)} = Φ^{(K)} (x_{n}^{(K)}, ξ_{n}), n = 0, 1, 2, \dots,

(2)

or by a continuous-time (possibly differential-inclusion) flow

{\dot{x}}^{(K)} (t) = F^{(K)} (x^{(K)} (t), t), t \in [0, T] .

(3)

Here

ξ_{n}

denotes exogenous randomness (mini-batch sampling, dropout masks, noise injections) when present. For (2) we assume

Φ^{(K)} (\cdot, ξ)

is measurable (often locally Lipschitz). For (3) we assume

F^{(K)}

is measurable in time and locally Lipschitz in state, ensuring existence/uniqueness of trajectories on

[0, T]

[24,25]. Stochasticity can be absorbed into the certificate layer as a nonnegative disturbance budget; see Grönwall-type propagation [26] and stochastic approximation frameworks [27,28,29].

2.1.3. State Augmentation

The state

x^{(K)}

may include optimizer momentum, running moments (Adam/EMA), constraint multipliers, safety filters, or internal solver iterates. Analytically, this is harmless: auxiliary variables can be stacked into a product space and measured with a product norm, and many stability properties are naturally expressed on the augmented state [25].

2.2. Ledgers and Reported Readouts

2.2.1. Ledger Maps and Trajectories

Fix an integer

m \geq 1

. For each refinement level K, a ledger family is a vector of nonnegative maps

R^{(K)} : X^{(K)} \to R_{\geq 0}^{m}, R^{(K)} (x) = {(R_{1}^{(K)} (x), \dots, R_{m}^{(K)} (x))}^{⊤} .

Each

R_{i}^{(K)}

is a measurable functional representing a tracked metric (risk proxy, constraint residual, robustness proxy, calibration error, compute/latency deficit, etc.). Given a trajectory

τ \mapsto x^{(K)} (τ)

(discrete

τ = n

or continuous

τ = t

), we write

r^{(K)} (τ) : = R^{(K)} (x^{(K)} (τ)) \in R_{\geq 0}^{m} .

2.2.2. Total Ledgers (Certificate-Compatible Scalarizations)

A total ledger is any positive weighting of the ledger vector:

R_{tot}^{(K)} (τ) : = w^{⊤} r^{(K)} (τ), w ≫ 0 .

(4)

There are two complementary interpretations:

(i): Design aggregation: w is chosen by a practitioner to reflect priorities (a scalarization choice) [1,2].
(ii): Certified aggregation: w is produced by the one-clock reduction once a Metzler comparison inequality is established (a copositive Lyapunov weight), so that $R_{tot}^{(K)}$ decays with a certified exponent when the comparison matrix is Hurwitz [11,12,14].

2.2.3. Reported Readouts and Dictionary Conditioning

Refinement changes representation, so not all measured quantities are automatically comparable across K. We distinguish:

Geometric ledgers defined by restriction of the ambient ruler (automatically consistent under refinement), as in stable discretization theory [4,6].
Reported/readout metrics depending on a K-dependent apparatus (validation sets, simulator fidelity, feature maps). These require an explicit dictionary condition later (programme line (O4)) to prevent ill-conditioning from faking improvement.

2.2.4. A Minimal Regularity Interface

To transport convergence in the ruler to convergence of reported readouts, it is often enough to assume local Lipschitz control on the certification window: for each K and each i,

|R_{i}^{(K)} (x) - R_{i}^{(K)} (y)| \leq L_{i} x - y,

(5)

with constants

L_{i}

independent of K on the relevant bounded set. Uniformity prevents refinement from silently changing the scale of what is being measured [8,10].

2.3. Cross-Level Maps and the “Same Task” Requirement

2.3.1. Projective Structure

Assume there exist coarse-graining/projection maps

Π_{K}^{K + 1} : X^{(K + 1)} \to X^{(K)},

representing restriction, pruning, averaging, or projection of refined states to coarser representations. We require consistency (a projective system):

Π_{K}^{K + 2} = Π_{K}^{K + 1} \circ Π_{K + 1}^{K + 2} for all K \geq K_{0} .

(6)

This encodes that all levels represent the same task object, just at different resolution; see, e.g., FEM and multilevel treatments [4,5,6] and extensions beyond nested families [30].

2.4. What Will Be Certified Later

The remainder of the paper will not assume that ledgers decay automatically. Instead, it will require auditable inequalities of the form

{\dot{r}}^{(K)} (τ) \leq M^{(K)} r^{(K)} (τ) + d^{(K)} (τ),

with

M^{(K)}

Metzler and

d^{(K)} \geq 0

a tail disturbance, and then impose uniformity/summability conditions across K (programme lines (O1)–(O4) in sec:master-certificate-learning). Under those obligations, a one-clock total ledger and a refinement-limit learner follow from positive-systems theory and projective Cauchy arguments [11,12,14].

3. Certificate Layer: One-Clock Reduction for Nonnegative Ledgers

3.1. Why Positivity Is the Right Abstraction

3.1.1. Nonnegative Ledgers and Injections

A ledger is any measurable quantity

R_{i} (t) \geq 0

that a practitioner wants to track (risk, constraint violation, robustness proxy, calibration error, solver residual, etc.). We collect them into a vector

r (t) : = {(R_{1} (t), \dots, R_{m} (t))}^{⊤} \in R_{\geq 0}^{m} .

Cross-effects appear as injections: if improving (or stabilizing) ledger j temporarily worsens ledger i, then

{\dot{R}}_{i}

may contain a nonnegative term proportional to

R_{j}

.

3.1.2. Why Positive Comparison Is Canonical

Nonnegativity induces the partial order

u \leq v

iff

u_{i} \leq v_{i}

for all i on

R^{m}

. Many training-time inequalities have the funded–plus–injection form

{\dot{R}}_{i} (t) \leq - (funded decay margin) \cdot R_{i} (t) + \sum_{j \neq i} ({injection}_{i \leftarrow j}) \cdot R_{j} (t),

with all injection coefficients nonnegative. This is exactly the comparison form of a positive linear system (Metzler dynamics). It admits: (i) comparison principles, (ii) Perron–Frobenius structure for the dominant mode, and (iii) copositive Lyapunov functions

V (r) = w^{⊤} r

with

w ≫ 0

; see, e.g., [11,12,13,14].

3.1.3. How Metzler Bounds Arise in Practice (Derivation Template)

The certificate layer does not assume access to model internals; it assumes that one can bound ledger evolution by auditable inequalities. A common derivation pattern is:

(i): write a differential (or finite-difference) inequality for each nonnegative ledger $R_{i}$ ;
(ii): isolate a self-decay term $- 2 λ_{i} R_{i}$ (from dissipation, regularization, descent, contractive updates);
(iii): bound cross-terms by nonnegative injections using operator norms or Lipschitz bounds and Young’s inequality, turning mixed products into sums of squares;
(iv): collect coefficients into a Metzler matrix M with $M_{i i} = - 2 λ_{i}$ and $M_{i j} = η_{i \leftarrow j} \geq 0$ .

Such reductions are standard in Lyapunov and comparison analyses; see, e.g., [25] for Lyapunov inequalities, [12,14] for positive-systems comparison, and [11] for the nonnegative-matrix structure behind Metzler semigroups. In discrete time, the same pattern yields a one-step domination

r_{n + 1}^{(K)} \leq (I + h M) r_{n}^{(K)} + ζ_{n}^{(K)}, ζ_{n}^{(K)} \geq 0,

and can be propagated by discrete Grönwall-type arguments for positive recursions [31,32].

3.2. Metzler Comparison Systems

3.2.1. Definition (Metzler Dominance)

Definition 1

(Metzler comparison inequality). Let

r : [0, \infty) \to R_{\geq 0}^{m}

be absolutely continuous. AMetzler comparison systemfor r is an inequality

\dot{r} (t) \leq M r (t) for a . e . t \geq 0,

(7)

where

M \in R^{m \times m}

isMetzler, i.e.

M_{i j} \geq 0

for all

i \neq j

.

3.2.2. Semantics: Funded Diagonals and Injections

Write

M_{i i} = - 2 λ_{i}, λ_{i} \geq 0, and M_{i j} = η_{i \leftarrow j} \geq 0 (i \neq j) .

Then (7) reads componentwise as

{\dot{R}}_{i} (t) \leq - 2 λ_{i} R_{i} (t) + \sum_{j \neq i} η_{i \leftarrow j} R_{j} (t) .

We interpret

λ_{i}

as a funded decay margin for ledger i and

η_{i \leftarrow j}

as the injection strength from ledger j to ledger i.

3.2.3. Positivity of the Semigroup and Comparison

A basic fact is that Metzler matrices generate positive semigroups:

M Metzler ⟹ e^{t M} \geq 0 entrywise for all t \geq 0,

see, e.g., [11,12].

Lemma 1

(Linear comparison principle for Metzler dominance). Assume (7) and

r (0) \in R_{\geq 0}^{m}

. Let y solve the linear ODE

\dot{y} = M y

with

y (0) = r (0)

. Then for all

t \geq 0

,

0 \leq r (t) \leq y (t) = e^{t M} r (0) (componentwise) .

(8)

Proof.

Let

y (t) = e^{t M} r (0)

solve

\dot{y} = M y

with

y (0) = r (0)

, and set

z (t) : = y (t) - r (t)

. Then

z (0) = 0

and for a.e.

t \geq 0

,

\dot{z} (t) = \dot{y} (t) - \dot{r} (t) \geq M y (t) - M r (t) = M z (t) .

Define the (componentwise) nonnegative forcing term

q (t) : = \dot{z} (t) - M z (t) \geq 0 for a . e . t \geq 0 .

Then z satisfies the Duhamel identity

z (t) = \int_{0}^{t} e^{(t - s) M} q (s) d s .

Because M is Metzler,

e^{(t - s) M} \geq 0

entrywise for all

t \geq s

, and since

q (s) \geq 0

componentwise, the integrand is componentwise nonnegative. Hence

z (t) \geq 0

componentwise for all

t \geq 0

, i.e.

r (t) \leq y (t) = e^{t M} r (0)

componentwise. □

3.2.4. From Componentwise Bounds to Certified Scalar Ledgers

A comparison inequality of the form (7) controls the vector ledger

r (t)

. To obtain a single certified “clock” usable by a checker, we scalarize

r (t)

by a positive weight. Copositive scalarizations are natural in positive systems: if

w ≫ 0

, then

R_{tot} (t) : = w^{⊤} r (t)

is itself a nonnegative ledger and preserves the cone order. This idea underlies linear copositive Lyapunov functions in positive systems [12,14] and is closely related to Perron–Frobenius theory for nonnegative matrices [11].

3.3. One-Clock Reduction

3.3.1. Spectral Abscissa and Effective Rate

Define the spectral abscissa

μ (M) : = max {ℜ z : z \in spec (M)} .

For a Metzler matrix,

μ (M)

is real and governs the slowest exponential mode; see [11,12]. When

μ (M) < 0

we define the effective one-clock rate

λ_{eff} : = - μ (M) > 0 .

3.3.2. Checker form: A Copositive Lyapunov Witness

Theorem 1

(One-clock reduction from a copositive Lyapunov witness). Assume M is Metzler and (7) holds with

r (0) \in R_{\geq 0}^{m}

. If there exist

w ≫ 0

and

λ > 0

such that

w^{⊤} M \leq - λ w^{⊤} (componentwise),

(9)

then the scalar total ledger

R_{tot} (t) : = w^{⊤} r (t)

(10)

obeys

R_{tot} (t) \leq e^{- λ t} R_{tot} (0) \forall t \geq 0 .

(11)

In particular, M is Hurwitz and

μ (M) \leq - λ

.

Proof.

Since

r (t) \geq 0

componentwise and

w ≫ 0

, the scalar ledger

R_{tot} (t) = w^{⊤} r (t)

is nonnegative. Multiply (7) by

w^{⊤}

and use (9): for a.e. t,

{\dot{R}}_{tot} (t) = w^{⊤} \dot{r} (t) \leq w^{⊤} M r (t) \leq - λ w^{⊤} r (t) = - λ R_{tot} (t) .

Grönwall’s inequality yields (11); see, e.g., [25,26]. The implication

μ (M) \leq - λ

(hence Hurwitz) from the existence of a strict copositive linear Lyapunov inequality is standard for Metzler matrices; see [12,14]. □

3.3.3. From Hurwitzness to a Witness, and the Sharp Rate

The checker form thm:oneclock-witness is the verification primitive: a certificate can literally ship

(w, λ)

and the checker verifies (9). The next result explains why such witnesses exist for Hurwitz Metzler matrices and how the sharp rate

λ_{eff} = - μ (M)

appears.

Proposition 1

(Hurwitz Metzler ⇒ existence of a strictly positive witness). Let M be Metzler and assume

μ (M) < 0

. Then there exists

w ≫ 0

such that

w^{⊤} M \leq μ (M) w^{⊤} (componentwise) .

(12)

Consequently, with

λ_{eff} : = - μ (M) > 0

, the same w satisfies

w^{⊤} M \leq - λ_{eff} w^{⊤}

and is a witness in the sense of thm:oneclock-witness.

Proof.

Choose

β > 0

large enough that

A : = M + β I

is entrywise nonnegative (possible since M has nonnegative off-diagonals; take

β \geq - {min}_{i} M_{i i}

). Then

A \geq 0

and

μ (M) = μ (A - β I) = ρ (A) - β,

where

ρ (A)

is the spectral radius of A (for nonnegative A,

ρ (A)

is an eigenvalue by Perron–Frobenius; see [11]). Since

μ (M) < 0

, we have

ρ (A) < β

.

For nonnegative A, the Collatz–Wielandt/Perron–Frobenius theory guarantees existence of a strictly positive left sub-eigenvector: there exists

w ≫ 0

with

w^{⊤} A \leq ρ (A) w^{⊤} (componentwise),

see [11] or [12]. Subtracting

β w^{⊤}

gives

w^{⊤} M = w^{⊤} (A - β I) \leq (ρ (A) - β) w^{⊤} = μ (M) w^{⊤},

which is (12). Since

μ (M) = - λ_{eff}

, we obtain

w^{⊤} M \leq - λ_{eff} w^{⊤}

. □

Corollary 1

(Sharp one-clock envelope from a Perron–Frobenius witness). Assume the hypotheses of prop:hurwitz-implies-witness and let

w ≫ 0

satisfy (12). Then any r satisfying (7) obeys

w^{⊤} r (t) \leq e^{μ (M) t} w^{⊤} r (0) = e^{- λ_{eff} t} w^{⊤} r (0), λ_{eff} = - μ (M) > 0 .

(13)

Proof.

Repeat the proof of thm:oneclock-witness with (12) in place of (9):

\frac{d}{d t} (w^{⊤} r (t)) = w^{⊤} \dot{r} (t) \leq w^{⊤} M r (t) \leq μ (M) w^{⊤} r (t),

and apply Grönwall [26]. □

3.3.4. Two-Ledger Closed Form (Explicit Small-Gain Boundary)

The special case

m = 2

admits explicit eigenvalues and hence an exact small-gain criterion and closed-form effective rate. This provides a particularly transparent diagnostic boundary and is the main “design lever” used in the toy instantiation.

Proposition 2

(Two-ledger small-gain criterion and explicit rate). Assume

r (t) = {(R_{1} (t), R_{2} (t))}^{⊤} \in R_{\geq 0}^{2}

satisfies

\dot{r} (t) \leq M r (t)

with

M = (\begin{matrix} - 2 λ_{1} & η_{1 \leftarrow 2} \\ η_{2 \leftarrow 1} & - 2 λ_{2} \end{matrix}), λ_{1}, λ_{2} > 0, η_{1 \leftarrow 2}, η_{2 \leftarrow 1} \geq 0 .

Then M is Hurwitz if and only if

η_{1 \leftarrow 2} η_{2 \leftarrow 1} < 4 λ_{1} λ_{2},

and in that case

λ_{eff} : = - μ (M) > 0

with the explicit formula

λ_{eff} = (λ_{1} + λ_{2}) - \sqrt{{(λ_{1} - λ_{2})}^{2} + η_{1 \leftarrow 2} η_{2 \leftarrow 1}} .

Proof.

For a real

2 \times 2

matrix, Hurwitz stability is equivalent to

tr (M) < 0

and

det (M) > 0

. Here

tr (M) = - 2 (λ_{1} + λ_{2}) < 0

and

det (M) = 4 λ_{1} λ_{2} - η_{1 \leftarrow 2} η_{2 \leftarrow 1}

, giving the criterion. The eigenvalues are explicit and yield

λ_{eff} = - μ (M)

as stated. □

3.3.5. Monotone Engineering Levers: Funded Diagonals vs. Injections

A distinctive advantage of Metzler comparison certificates is that they translate stability into monotone design parameters. Writing the comparison matrix as

M_{i i} = - 2 λ_{i}, λ_{i} \geq 0, and M_{i j} = η_{i \leftarrow j} \geq 0 (i \neq j),

makes the two control knobs explicit: funded diagonals (increase

λ_{i}

, making

M_{i i}

more negative) and injections (decrease

η_{i \leftarrow j}

, weakening cross-couplings).

Related “gain composition” ideas are classical in interconnected-system robustness [25,33,34]. In the Metzler setting, the ordering is literal: if

\tilde{M} \leq M

entrywise, then

μ (\tilde{M}) \leq μ (M)

and hence

{\tilde{λ}}_{eff} \geq λ_{eff}

, with generic strict improvement away from reducible/degenerate cases [11,12].

Corollary 2

(Funding vs. injection interventions (witness-preserving monotonicity)). Assume (7). Suppose a witness

(w, λ)

satisfies

w^{⊤} M \leq - λ w^{⊤}

with

w ≫ 0

and

λ > 0

. Let

\tilde{M}

be another Metzler matrix with

\tilde{M} \leq M

entrywise (i.e. more negative diagonals and/or smaller off-diagonals). Then thesamewitness proves thesamerate:

w^{⊤} \tilde{M} \leq w^{⊤} M \leq - λ w^{⊤},

and hence, under

\dot{r} \leq \tilde{M} r

, one has

w^{⊤} r (t) \leq e^{- λ t} w^{⊤} r (0)

for all

t \geq 0

.

Proof.

Entrywise monotonicity gives

w^{⊤} \tilde{M} \leq w^{⊤} M

. Apply thm:oneclock-witness to

\tilde{M}

. □

3.3.6. A Checker View: What Must Be Provided and What Is Proved

For certificate checking, the key point is that thm:oneclock-witness requires only an explicit witness: a positive weight vector

w ≫ 0

and a scalar

λ > 0

such that

w^{⊤} M \leq - λ w^{⊤}

. A checker validates this inequality componentwise and then accepts the implication

w^{⊤} r (t) \leq e^{- λ t} w^{⊤} r (0)

, without spectral computations. This is the positive-systems analogue of “proof-carrying” evidence: a short witness certifies a global decay claim [12,14,23].

4. Master Certificate for Learning Under Refinement

We work on a user-declared certification window

[0, T]

and in a single declared ruler

{∥ \cdot ∥}_{W}

. Training is run on each refinement level K and produces a trajectory

x^{(K)} (\cdot)

together with a declared nonnegative scalar ledger

R^{(K)} (\cdot)

. The goal is to ensure that refinement does not change the meaning of “small’’ and that stability survives as

K \to \infty

.

Two quantitative inputs drive the refinement-limit guarantee.

One-ruler Cauchy geometry. Cross-level comparisons are performed in the same instrument norm via the ambient realization (e.g. by comparing $z^{(K)} = i_{K} x^{(K)}$ to $P_{K} z^{(K + 1)}$ ). If the adjacent discrepancies form a summable tail in K, then the tower is Cauchy in the ruler and determines a unique refinement-limit trajectory on $[0, T]$ .
Tail-robust contraction on $[0, T]$ . Each level admits an inhomogeneous decay inequality for $R^{(K)}$ with a margin bounded below uniformly in K and a pollution budget whose integrated size is summable along the ladder. This yields a common exponential envelope on $[0, T]$ , up to an explicit tail floor that vanishes with refinement.

Programme lines (O1)–(O4) record these requirements in auditable form (summable ladder budgets, a uniform margin, and a uniform dictionary linking

R^{(K)}

to reported readouts). Under these lines, the Master Certificate theorem (thm:mc-master-learning) provides: a unique refinement-limit learner on

[0, T]

, a uniform exponent class for the ledger up to vanishing tails, and transport of the same rate to declared readouts.

4.1. Ladder Geometry and Refinement Limits (Analytic Core)

4.1.1. Ambient One-Ruler Structure and Projective Maps

We formalize the measurement contract by requiring that every refinement level is measured by restriction of a single ambient instrument. This is the analogue of fixing an energy norm across discretizations in stable refinement theory [4,5,6].

Definition 2

(One ruler (ambient instrument restriction)). Fix a refinement ladder

{X^{(K)}}_{K \geq K_{0}}

. We say the ladder admitsone rulerif there exist:

an ambient real Hilbert space $(H, {〈 \cdot, \cdot 〉}_{H})$ ,
linear realization maps $i_{K} : X^{(K)} \to H$ for each K,
bounded projections $P_{K} : H \to i_{K} (X^{(K)})$ for each K,
a bounded, self-adjoint, strictly positive operator $W : H \to H$ (theinstrument),
coarse-graining maps $Π_{K}^{K + 1} : X^{(K + 1)} \to X^{(K)}$ for each K,

such that for all

K \geq K_{0}

:

(i): Compatibility (coarse-graining is realized by projection).For all $x \in X^{(K + 1)}$ ,

$i_{K} (Π_{K}^{K + 1} x) = P_{K} (i_{K + 1} x) .$

(14)
(ii): Single instrument (restriction of one ambient ruler).For $x \in X^{(K)}$ define

${∥ x ∥}_{W^{(K)}}^{2} : = {〈 i_{K} x, W i_{K} x 〉}_{H} .$

(15)

Equivalently, ${∥ z ∥}_{W}^{2} : = {〈 z, W z 〉}_{H}$ is the ambient ruler and ${∥ \cdot ∥}_{W^{(K)}}$ is its restriction to $i_{K} (X^{(K)})$ .

We call

(H, W)

thedeclared rulerand

(i_{K}, P_{K})

theambient realizationof the ladder.

Assumption A1

(Instrument contractivity of coarse projections). For each

K \geq K_{0}

, the projection

P_{K}

is contractive in the instrument norm on the next-level realization:

∥ P_{K} {z ∥}_{W} \leq {∥ z ∥}_{W} for all z \in i_{K + 1} (X^{(K + 1)}) .

(16)

Lemma 2

(Non-expansiveness of coarse-graining in one ruler). Assume def:problem-one-ruler,ass:one-ruler-W-contractive. Then for all

K \geq K_{0}

and

x \in X^{(K + 1)}

,

{∥Π_{K}^{K + 1} (x)∥}_{W^{(K)}} \leq {∥ x ∥}_{W^{(K + 1)}} .

(17)

Proof.

Fix K and

x \in X^{(K + 1)}

and set

z : = i_{K + 1} x

. By (14),

i_{K} (Π_{K}^{K + 1} x) = P_{K} z

. Hence

∥ Π_{K}^{K + 1} {x ∥}_{W^{(K)}}^{2} = ∥ P_{K} {z ∥}_{W}^{2} \leq {∥ z ∥}_{W}^{2} = {∥ x ∥}_{W^{(K + 1)}}^{2},

where we used (16) and (15). Taking square roots gives (17). □

4.1.2. Cross-Level Discrepancy and Telescoping

Given

x^{(K)} \in X^{(K)}

and

x^{(L)} \in X^{(L)}

with

L \geq K

, we compare them in the same ruler by projecting the fine object down:

d_{K \leftarrow L} (x^{(K)}, x^{(L)}) : = {∥i_{K} x^{(K)} - P_{K} (i_{L} x^{(L)})∥}_{W} .

(18)

Lemma 3

(Projective telescoping in one ruler). Assume def:problem-one-ruler,ass:one-ruler-W-contractive. Fix

T > 0

and let

x^{(K)} : [0, T] \to X^{(K)}

. Then for any integers

L > K \geq K_{0}

and any

t \in [0, T]

,

{∥i_{K} x^{(K)} (t) - P_{K} (i_{L} x^{(L)} (t))∥}_{W} \leq \sum_{j = K}^{L - 1} {∥i_{j} x^{(j)} (t) - P_{j} (i_{j + 1} x^{(j + 1)} (t))∥}_{W} .

(19)

Proof.

Fix

t \in [0, T]

and write

z^{(j)} : = i_{j} x^{(j)} (t) \in H

. Using

P_{K} = P_{K} P_{j}

for

j \geq K

and contractivity of

P_{K}

in

{∥ \cdot ∥}_{W}

(ass:one-ruler-W-contractive), we obtain

∥ z^{(K)} - P_{K} z^{(L)} ∥_{W} \leq \sum_{j = K}^{L - 1} ∥ P_{K} (z^{(j)} - P_{j} z^{(j + 1)}) ∥_{W} \leq \sum_{j = K}^{L - 1} {∥ z^{(j)} - P_{j} z^{(j + 1)} ∥}_{W},

which is (19). □

4.1.3. Existence and Uniqueness of a Refinement-Limit Object

Theorem 2

(Existence and uniqueness of a refinement-limit object). Assume the ladder admits one ruler (def:problem-one-ruler) and satisfies the projective consistency

Π_{K}^{K + 2} = Π_{K}^{K + 1} Π_{K + 1}^{K + 2}

. Let

{x^{(K)}}_{K \geq K_{0}}

be a compatible tower, i.e.

x^{(K)} = Π_{K}^{K + 1} (x^{(K + 1)})

for all K. Assume there exists a nonnegative sequence

{(a_{K})}_{K \geq K_{0}}

with

\sum_{K \geq K_{0}} a_{K} < \infty

such that for all

K \geq K_{0}

and all

t \in [0, T)

,

{∥(I - P_{K}) i_{K + 1} x^{(K + 1)} (t)∥}_{W} + {∥i_{K} x^{(K)} (t) - P_{K} i_{K + 1} x^{(K + 1)} (t)∥}_{W} \leq a_{K} .

(20)

Then for every

t \in [0, T)

the sequence

{i_{K} x^{(K)} (t)}_{K \geq K_{0}}

is Cauchy in

(H, ∥ \cdot ∥_{W})

and converges to a unique limit

x^{(\infty)} (t) \in H

:

x^{(\infty)} (t) : = lim_{K \to \infty} i_{K} x^{(K)} {(t) in (H, ∥ \cdot ∥}_{W}) .

Moreover, for every fixed

K \geq K_{0}

,

P_{K} x^{(\infty)} (t) = lim_{L \to \infty} P_{K} i_{L} x^{(L)} (t) = i_{K} x^{(K)} (t) \forall t \in [0, T) .

(21)

Proof.

Fix

t \in [0, T)

and let

L > K \geq K_{0}

. Write

z^{(j)} : = i_{j} x^{(j)} (t) \in H

. Then

∥ z^{(L)} - z^{(K)} ∥_{W} \leq ∥ z^{(L)} - P_{K} z^{(L)} ∥_{W} + {∥ P_{K} z^{(L)} - z^{(K)} ∥}_{W} .

By telescoping and (20),

∥ z^{(L)} - P_{K} z^{(L)} ∥_{W} + {∥ P_{K} z^{(L)} - z^{(K)} ∥}_{W} \leq \sum_{j = K}^{L - 1} a_{j} .

Since

\sum_{j \geq K_{0}} a_{j} < \infty

, the right-hand side tends to 0 as

K, L \to \infty

, so

{i_{K} x^{(K)} (t)}_{K}

is Cauchy and converges in

(H, ∥ \cdot ∥_{W})

. Projective consistency (21) follows by applying

P_{K}

to the tower identities and passing to the limit. □

Remark 1

(Where this theorem is used). The Master Certificate theorem (thm:mc-master-learning) uses Theorem 2 as thegeometricengine that turns summable cross-level discrepancies into a refinement-limit object in the declared ruler. Concretely, in the Master Certificate proof one takes

a_{K} : = β_{K} + δ_{K}

from(O3).

4.2. Programme Lines (O1)–(O4) on $[0, T]$

The programme lines (O1)–(O4) are the paper’s checkable obligations on a finite time window. They specify what a certificate must provide and what a verifier must validate: (i) a uniform-in-time contraction envelope for a declared total ledger up to a summable pollution budget, (ii) a K-uniform margin (one exponent class), (iii) a summable Cauchy tower in the declared ruler (no moving goalposts), and (iv) a uniform dictionary linking the declared ledger to reported readouts.

4.2.0.1. Certified time horizon.

Fix

T > 0

. All programme lines (O1)–(O4) are required on the window

[0, T]

.

4.2.1. Objects on the Window: Trajectories, Ruler, and a Declared Total Ledger

For each refinement level

K \geq K_{0}

, let

x^{(K)} : [0, T] \to X^{(K)}

be a training trajectory. Assume the ladder admits one ruler in the sense of def:problem-one-ruler, with embeddings

i_{K} : X^{(K)} \to H

, projections

P_{K} : H \to i_{K} (X^{(K)})

, and an instrument W inducing

{∥ z ∥}_{W}^{2} = 〈 z, W z 〉

. Write the realized trajectory in the ambient ruler as

z^{(K)} (t) : = i_{K} x^{(K)} (t) \in H .

Fix a nonnegative total geometric ledger

R^{(K)} : [0, T] \to R_{\geq 0}

(one scalar per level). This is the scalar object whose decay is certified; any additional structure is used only through inequalities.

(O1) Tail-robust contraction envelope (summable pollution budget)

There exist nonnegative functions

τ_{K} \in L^{1} ([0, T])

, numbers

α_{K} \geq 0

with

\sum_{K = K_{0}}^{\infty} α_{K} < \infty

, and levelwise margins

λ_{K} > 0

such that for a.e.

t \in [0, T]

,

{\dot{R}}^{(K)} (t) \leq - 2 λ_{K} R^{(K)} (t) + τ_{K} (t), \int_{0}^{T} τ_{K} (s) d s \leq α_{K}, K \geq K_{0} .

(22)

(O2) Uniform margin (one exponent class across refinement)

There exists

λ > 0

(independent of K) such that

λ_{K} \geq λ > 0 for all K \geq K_{0} .

(23)

(O3) Projective Cauchy tower in one ruler (summable cross-level inconsistency)

There exist nonnegative sequences

{(β_{K})}_{K \geq K_{0}}

and

{(δ_{K})}_{K \geq K_{0}}

such that

sup_{t \in [0, T]} (\underset{unresolved tail at level K}{\underset{︸}{{∥(I - P_{K}) z^{(K + 1)} (t)∥}_{W}}} + \underset{projected mismatch}{\underset{︸}{{∥z^{(K)} (t) - P_{K} z^{(K + 1)} (t)∥}_{W}}}) \leq β_{K} + δ_{K}, \sum_{K = K_{0}}^{\infty} (β_{K} + δ_{K}) < \infty .

(24)

(O4) Uniform dictionary / observability (reported readouts remain well-conditioned)

Let

M

be a finite family of nonnegative reported readouts/metrics

M (\cdot) \geq 0

. Assume there exist constants

c_{M}, C_{M} > 0

, independent of K, such that for all

t \in [0, T]

and all

K \geq K_{0}

,

c_{M} R^{(K)} (t) \leq M (x^{(K)} (t)) \leq C_{M} R^{(K)} (t), \forall M \in M .

(25)

4.2.2. Checker Summary (What the Certificate Must Provide)

A certificate on

[0, T]

consists of: the declared ruler

(H, W)

and projections

(i_{K}, P_{K})

, the scalar ledger

R^{(K)}

, the budgets

(α_{K})

from (O1) and

(β_{K}, δ_{K})

from (O3), the uniform margin

λ

from (O2), and the dictionary constants

(c_{M}, C_{M})

from (O4). Soundness (proved next) upgrades these inequalities to a unique refinement-limit learner and rate inheritance on

[0, T]

.

4.3. Master Certificate Theorem on $[0, T]$

We now state the soundness implication in its final form. Importantly, the theorem below does not restate the programme lines; it only invokes them by reference.

Theorem 3

(Master Certificate for Learning Under Refinement on

[0, T]

). Assume the ladder admits one ruler (Definition 2) and let

x^{(K)} : [0, T] \to X^{(K)}

be trajectories with ambient realizations

z^{(K)} (t) : = i_{K} x^{(K)} (t) \in H

. Let

R^{(K)} : [0, T] \to R_{\geq 0}

be a declared total geometric ledger. Assume the programme lines(O1)–(O4)from subsec:mc-olines-learning hold on

[0, T]

. Then:

(i): Tail-robust levelwise envelope.With λ as in(O2)and $α_{K}$ as in(O1), for all $K \geq K_{0}$ and all $t \in [0, T]$ ,

$R^{(K)} (t) \leq e^{- 2 λ t} R^{(K)} (0) + α_{K} .$

(26)
(ii): Existence and uniqueness of a refinement-limit trajectory.Define the geometric tower budget

$a_{K} : = β_{K} + δ_{K},$

where $(β_{K}, δ_{K})$ are the sequences from(O3). Then $\sum_{K \geq K_{0}} a_{K} < \infty$ , and there exists a unique trajectory $x^{(\infty)} : [0, T] \to H$ such that $x^{(\infty)} (t) = {lim}_{K \to \infty} z^{(K)} (t)$ in $(H, ∥ \cdot ∥_{W})$ , and

$sup_{t \in [0, T]} {∥z^{(K)} (t) - P_{K} x^{(\infty)} (t)∥}_{W} \leq \sum_{j = K}^{\infty} a_{j} \forall K \geq K_{0} .$

(27)
(iii): Readout transport (rate inheritance at each level).For every $M \in M$ (from(O4)), for all $K \geq K_{0}$ and all $t \in [0, T]$ ,

$M (x^{(K)} (t)) \leq C_{M} (e^{- 2 λ t} R^{(K)} (0) + α_{K}) .$

(28)

Proof.

Part (i) is the inhomogeneous Grönwall bound from (O1)–(O2). Part (ii) follows by applying Theorem 2 with

a_{K} : = β_{K} + δ_{K}

. Part (iii) is (O4) combined with (26). □

5. Instantiations

This section explains how the abstract certificate obligations can be realized in concrete training pipelines. The goal is not to advocate a particular architecture, but to show (i) how to choose a ruler, a ladder, and a ledger vector; (ii) how to derive a Metzler comparison system with explicit margins and injections; and (iii) how to audit the refinement programme lines (O1)–(O4) from logs, bounds, or subproofs. We present two representative instantiations: a fully checkable toy ladder where all constants can be computed in closed form, and a practical “width/refinement” protocol that describes what evidence a producer must ship so that a consumer can validate the Master Certificate without re-running training.

At a high level, every instantiation follows the same recipe. First fix a single ambient ruler (a norm/energy that is independent of refinement) and specify the projections that implement “no moving goalposts.” Next define the nonnegative ledgers and establish an in-level comparison inequality

{\dot{r}}^{(K)} \leq M^{(K)} r^{(K)} + d^{(K)}

with

M^{(K)}

Metzler, extracting diagonal funding margins and off-diagonal injections via standard bounds (e.g. Lipschitz/operator-norm and Young inequalities). Finally, verify the cross-level conditions: tail terms are summable ((O1)), the exponent class is uniform ((O2)), projected states are Cauchy across levels ((O3)), and reported readouts remain uniformly comparable to the ruler ((O4)). The output is a proof-carrying artifact consisting of the matrices, constants, and tail budgets that a checker can validate, thereby upgrading verification to a refinement-limit guarantee with a certified one-clock rate.

5.1. Setup: an $ℓ^{2}$ Ladder with Diagonal Constraint Operator

Let

ℓ^{2}

denote the real Hilbert space of square-summable sequences with inner product

〈 x, y 〉 = \sum_{i \geq 1} x_{i} y_{i}

and norm

{∥ x ∥}_{2}^{2} = \sum_{i \geq 1} x_{i}^{2}

. Define the ambient state space

X^{(\infty)} : = ℓ^{2} \times ℓ^{2}, z = (x, λ) \in X^{(\infty)} .

Fix

p > 0

and define the bounded diagonal operator

A : ℓ^{2} \to ℓ^{2}

by

{(A x)}_{i} : = a_{i} x_{i}, a_{i} : = i^{- p} .

(29)

Then

{∥ A ∥}_{op} = {sup}_{i} | a_{i} | = 1

.

For each

K \geq 1

define the refinement spaces

X^{(K)} : = R^{K} \times R^{K} \subset ℓ^{2} \times ℓ^{2}

embedded into

X^{(\infty)}

by zero padding. Let

P_{K}

be orthogonal projection (truncate to first K coordinates), and set

A_{K} : = P_{K} A P_{K} = diag (a_{1}, \dots, a_{K})

.

5.2. Dynamics: Damped Primal–Dual Flow

Fix damping parameters

m > 0

and

α > 0

. For each K, consider the linear flow on

X^{(K)}

:

\begin{matrix} {\dot{x}}^{(K)} (t) & = - m x^{(K)} (t) - A_{K}^{⊤} λ^{(K)} (t), \\ {\dot{λ}}^{(K)} (t) & = A_{K} x^{(K)} (t) - α λ^{(K)} (t), \end{matrix} t \geq 0 .

(30)

5.3. Ledgers and Readouts

Define the two nonnegative ledgers and the ledger vector

R_{1}^{(K)} (t) : = ∥ x^{(K)} {(t) ∥}_{2}^{2}, R_{2}^{(K)} (t) : = {∥ λ^{(K)} (t) ∥}_{2}^{2}, r^{(K)} (t) : = {(R_{1}^{(K)} (t), R_{2}^{(K)} (t))}^{⊤} .

(31)

As readout metrics, take

M = {M_{1}, M_{2}}

with

M_{1} (z) = {∥ x ∥}^{2}

and

M_{2} (z) = {∥ λ ∥}^{2}

.

5.4. Metzler Comparison and a Checker-Friendly Hurwitz Witness

Lemma 4

(Metzler dominance (uniform in K)). Assume

p > 0

so that

{∥ A ∥}_{op} = 1

. For every K and all

t \geq 0

, the ledger vector satisfies

{\dot{r}}^{(K)} (t) \leq M_{Toy} r^{(K)} (t),

where the same

2 \times 2

Metzler matrix works for all K:

M_{Toy} : = (\begin{matrix} - (2 m - 1) & 1 \\ 1 & - (2 α - 1) \end{matrix}) .

(32)

Proof.

Differentiate the squared norms using (30):

{\dot{R}}_{1}^{(K)} (t) = 2 〈 x^{(K)} (t), {\dot{x}}^{(K)} (t) 〉 = - 2 m {∥ x^{(K)} (t) ∥}_{2}^{2} - 2 〈 x^{(K)} (t), A_{K}^{⊤} λ^{(K)} (t) 〉 .

Using

∥ A_{K} ∥_{op} \leq {∥ A ∥}_{op} = 1

and Young’s inequality

2 a b \leq a^{2} + b^{2}

gives

- 2 〈x^{(K)} (t), A_{K}^{⊤} λ^{(K)} (t)〉 \leq 2 ∥ x^{(K)} {(t) ∥}_{2} ∥ A_{K} ∥_{op} ∥ λ^{(K)} {(t) ∥}_{2} \leq 2 ∥ x^{(K)} {(t) ∥}_{2} ∥ λ^{(K)} {(t) ∥}_{2} \leq ∥ x^{(K)} {(t) ∥}_{2}^{2} + {∥ λ^{(K)} (t) ∥}_{2}^{2} .

Hence

{\dot{R}}_{1}^{(K)} (t) \leq - (2 m - 1) R_{1}^{(K)} (t) + R_{2}^{(K)} (t)

.

Similarly,

{\dot{R}}_{2}^{(K)} (t) = 2 〈 λ^{(K)} (t), {\dot{λ}}^{(K)} (t) 〉 = - 2 α {∥ λ^{(K)} (t) ∥}_{2}^{2} + 2 〈 λ^{(K)} (t), A_{K} x^{(K)} (t) 〉 \leq - (2 α - 1) R_{2}^{(K)} (t) + R_{1}^{(K)} (t),

where we again used

∥ A_{K} ∥_{op} \leq 1

and

2 a b \leq a^{2} + b^{2}

on the cross term. Collecting the two inequalities yields

{\dot{r}}^{(K)} (t) \leq M_{Toy} r^{(K)} (t)

. □

Proposition 3

(Hurwitz witness and certified one-clock rate). Assume

m > 1

and

α > 1

. Let

w = {(1, 1)}^{⊤} ≫ 0

and set

λ_{cert} : = 2 min {m - 1, α - 1} > 0 .

(33)

Then

w^{⊤} M_{Toy} \leq - λ_{cert} w^{⊤}

componentwise, and hence

R_{tot}^{(K)} (t) : = w^{⊤} r^{(K)} (t) = R_{1}^{(K)} (t) + R_{2}^{(K)} (t) \leq e^{- λ_{cert} t} R_{tot}^{(K)} (0) \forall t \geq 0, \forall K .

Proof.

Compute

w^{⊤} M_{Toy} = (- 2 (m - 1), - 2 (α - 1)) \leq - 2 min {m - 1, α - 1} (1, 1)

. Apply Theorem 1 to Lemma 4.

□

5.5. Programme Lines (O1)–(O4) (Vanishing Tails)

Lemma 5

(Exact projective consistency). Let

L \geq K

. If

z^{(L)} (t)

solves (30) at level L, then

P_{K} z^{(L)} (t)

solves the level-K system with initial data

P_{K} z^{(L)} (0)

. In particular, for consistent initial data

z^{(K)} (0) = P_{K} z^{(K + 1)} (0)

we have

z^{(K)} (t) = P_{K} z^{(K + 1)} (t)

for all

t \geq 0

.

Proof.

For diagonal A, the first K coordinates of (30) depend only on the first K coordinates, and

A_{K + 1}

restricted to the first K indices equals

A_{K}

. Uniqueness of ODE solutions yields the identity. □

Proposition 4

(Toy ladder satisfies (O1)–(O4)). Assume

m > 1

and

α > 1

. Then the toy ladder satisfies the Master Certificate programme lines on

[0, \infty)

:

(i): (O1)holds with $τ_{K} \equiv 0$ and $α_{K} \equiv 0$ ;
(ii): (O2)holds with uniform margin $λ = λ_{cert}$ from (33);
(iii): (O3)holds with $β_{K} \equiv 0$ and $δ_{K} \equiv 0$ by Lemma 5;
(iv): (O4)holds with $c_{M_{1}} = C_{M_{1}} = c_{M_{2}} = C_{M_{2}} = 1$ for $M_{1} (z) = {∥ x ∥}^{2}$ and $M_{2} (z) = {∥ λ ∥}^{2}$ .

Proof.

Items (i), (ii) follow from Lemma 4 and Proposition 3. Item (iii) is Lemma 5. Item (iv) holds because the readouts equal the ledgers. □

5.6. Consequence: Refinement-Limit Existence and Inherited Clock

Corollary 3

(Toy model: unique refinement-limit trajectory and uniform rate inheritance). Assume

m > 1

and

α > 1

and consistent initial data

z^{(K)} (0) = P_{K} z^{(\infty)} (0)

for some

z^{(\infty)} (0) \in ℓ^{2} \times ℓ^{2}

. Then there exists a unique refinement-limit trajectory

z^{(\infty)} (t) \in ℓ^{2} \times ℓ^{2}

with

z^{(K)} (t) = P_{K} z^{(\infty)} (t)

for all K and

t \geq 0

, and the total ledger satisfies

R_{tot}^{(K)} (t) \leq e^{- λ_{cert} t} R_{tot}^{(K)} (0), λ_{cert} = 2 min {m - 1, α - 1} .

Proof.

Apply Theorem 2 using Proposition 4. □

5.7. Numerical Sanity Check (Uniform in K)

Fix

m = 2.0

,

α = 2.5

,

p = 1.0

,

T = 5.0

, and a forward Euler time step

Δ t = 10^{- 3}

, so that

λ_{cert} = 2

. Define the certificate ratio

{Rat}^{(K)} (t) : = \frac{R_{tot}^{(K)} (t)}{e^{- λ_{cert} t} R_{tot}^{(K)} (0)} .

Table 1 reports

R_{tot}^{(K)} (0)

,

R_{tot}^{(K)} (T)

,

{max}_{t \in [0, T]} {Rat}^{(K)} (t)

, and

{Rat}^{(K)} (T)

for

K \in {10, 20, 40, 80}

.

Remark 2.

The initial values increase mildly with K because

z^{(K)} (0)

is obtained by truncating a fixed ambient

ℓ^{2}

initial condition. The maximum ratio equals 1 (attained at

t = 0

), while

{Rat}^{(K)} (T) \sim 10^{- 5}

, indicating the bound is conservative on this run.

5.8. Neural Instantiation (Protocol): A Width Ladder with Auditable Programme Lines

5.8.1. Setup: Width Ladder and a Declared Projection

Let K index width (or another scalable capacity parameter). For each K, let

X^{(K)}

encode model parameters together with any augmented optimizer state (momenta, running averages, constraint multipliers). Fix an explicit coarse-graining map

Π_{K}^{K + 1} : X^{(K + 1)} \to X^{(K)}

.

5.8.2. Practical Ledgers (Loggable During Training)

Choose a finite nonnegative ledger vector

r^{(K)} (t) \in R_{\geq 0}^{m}

that is loggable during training, e.g.

r^{(K)} (t) = {(R_{L}^{(K)} (t), R_{safe}^{(K)} (t), R_{rob}^{(K)} (t), R_{comp}^{(K)} (t))}^{⊤} .

5.8.3. Estimating Metzler Coefficients from Traces

In discrete time, a checker-friendly target is the inequality

r_{n + 1}^{(K)} \leq (I + h M^{(K)}) r_{n}^{(K)} + ζ_{n}^{(K)}, ζ_{n}^{(K)} \geq 0,

with

M^{(K)}

Metzler. A Hurwitz witness can be supplied by checking

w^{⊤} M^{(K)} \leq - λ w^{⊤}

(Proposition 1) or, in the two-ledger case, the small-gain product inequality (Proposition 2).

5.8.4. Empirical Checks for (O1)–(O4)

(O2) uniform margin: fit envelope rates for $R_{tot}^{(K)}$ and test uniformity in K.
(O3) projective Cauchy: measure $∥ x^{(K)} - Π_{K}^{K + 1} (x^{(K + 1)}) ∥_{W^{(K)}}$ across checkpoints.
(O1) tail summability: quantify unmodeled remainder budgets $α_{K}$ and test $\sum_{K} α_{K} < \infty$ .
(O4) dictionary conditioning: bound conditioning constants $c_{M}, C_{M}$ uniformly in K.

5.8.5. Interpretation

The protocol certifies training-time contraction and refinement stability in a declared ruler when the programme lines hold. Generalization, domain shift, and stochastic optimization effects are separate modules unless explicitly ledgerized.

6. Concluding Discussion and Outlook

The paper’s contribution is best read as a single verifier-facing implication. A producer supplies a finite certificate for a declared ladder and a declared ruler: (i) in-level one-clock evidence for the nonnegative ledger vector via a Metzler comparison system and a copositive witness, and (ii) cross-level programme-line budgets on the window

[0, T]

ensuring summable refinement tails, a K-uniform decay margin, summable projective inconsistencies, and a uniform dictionary linking the declared geometric ledger to reported readouts. Soundness means the checker does not need to inspect the optimizer internals or re-run training: validating the ruler/transfer maps, the witness inequalities, and the ladder budgets is sufficient to conclude that refinement does not introduce drift, that the ladder admits a unique refinement-limit learner on

[0, T]

, and that the same exponent class is inherited up to the explicit vanishing tails. In particular, any readout covered by the dictionary line inherits the same rate class, so the certified clock is not confined to an internal ledger but transports to declared external metrics.

Where certificates fail is correspondingly concentrated. In applications one typically sees (a) the uniform margin degrade with refinement, (b) dictionary constants blow up (conditioning changes the meaning of the reported metric), or (c) the cross-level tower budgets cease to be summable. Each failure has a direct diagnostic: fit levelwise envelopes and monitor the inferred rate versus K; track conditioning surrogates that upper bound

C_{M} / c_{M}

; and measure projected mismatches

{sup}_{t \in [0, T]} {∥ z^{(K)} (t) - P_{K} z^{(K + 1)} (t) ∥}_{W}

along checkpoints. When the bottleneck is the comparison matrix, the design lever is monotone: fund diagonals (increase self-decay margins) and/or reduce injections (weaken couplings), with an exact boundary in the two-ledger small-gain case.

The claim boundary is deliberate. The certificate is a stability-and-limit guarantee in a declared ruler on a declared window; it does not by itself imply statistical generalization or out-of-distribution robustness, and it does not control stochastic-gradient noise unless such effects are ledgerized or bounded by a separate probabilistic argument. Those properties remain composable add-ons: one may import generalization modules (e.g. stability or PAC-Bayes) or robustness modules (e.g. DRO/shift models) and then connect them to the certified ledgers through additional dictionary links.

Several extensions preserve the same checker philosophy. One can formulate stochastic certificates where the programme lines hold with high probability and the clock becomes probabilistic; allow switched or time-varying Metzler envelopes with common copositive weights; permit slowly varying rulers

W (t)

under a separate “ruler-drift” budget; and replace scalar dictionary bounds by structured operator inequalities for richer readout transport. Finally, a useful empirical direction is a notion of certificate coverage: how much of an observed training trace is explained by the declared comparison model versus assigned to tail pollution, tracked as a function of refinement.

Notation

Symbol	Type / Domain	Meaning / Assumptions
Symbol	Type / Domain	Meaning / Assumptions
Ambient ruler and refinement ladder
$H$	Hilbert space	Ambient realization space used to enforce a single ruler (Definition 2)
W	operator on $H$	Bounded, self-adjoint, strictly positive instrument operator defining ${∥ z ∥}_{W}^{2} = {〈 z, W z 〉}_{H}$
$〈 \cdot, \cdot 〉$	inner product	Ambient Hilbert inner product on $H$
${∥ z ∥}_{W}$	norm	Instrument norm: ${∥ z ∥}_{W} : = \sqrt{〈 z, W z 〉}$ on $H$
$X^{(K)}$	set / space	State/parameter space at refinement level K (width, resolution, basis size, etc.)
$K_{0}$	integer	Minimal refinement index considered; ladder runs over $K \geq K_{0}$
$i_{K}$	map $X^{(K)} \to H$	Realization/embedding of level-K states into the ambient ruler space
$P_{K}$	projection on $H$	Orthogonal projection onto $i_{K} (X^{(K)}) \subset H$
$Π_{K}^{K + 1}$	map $X^{(K + 1)} \to X^{(K)}$	Coarse-graining / restriction map between adjacent refinement levels
$Π_{K}^{L}$	map $X^{(L)} \to X^{(K)}$	Multi-step projection $Π_{K}^{L} : = Π_{K}^{K + 1} \circ \dots \circ Π_{L - 1}^{L}$ (when well-defined)
Training dynamics (discrete and continuous)
$x^{(K)} (t)$	curve in $X^{(K)}$	Continuous-time training trajectory at refinement level K
$x_{n}^{(K)}$	sequence in $X^{(K)}$	Discrete-time iterates at refinement level K
$Φ^{(K)}$	map on $X^{(K)}$	Discrete update map: $x_{n + 1}^{(K)} = Φ^{(K)} (x_{n}^{(K)})$
$V^{(K)}$	vector field	Continuous-time flow: ${\dot{x}}^{(K)} (t) = V^{(K)} (x^{(K)} (t))$
t	$R_{\geq 0}$	Continuous time (or rescaled iteration-time)
n	$N$	Discrete iteration index
Ledgers and scalarizations
$R_{i} (t)$	$R_{\geq 0}$	Nonnegative ledger (risk, constraint violation, robustness proxy, etc.)
$r (t)$	$R_{\geq 0}^{m}$	Ledger vector $r (t) = {(R_{1} (t), \dots, R_{m} (t))}^{⊤}$
$r^{(K)} (t)$	$R_{\geq 0}^{m}$	Level-K ledger vector along training at refinement K
$R_{tot} (t)$	$R_{\geq 0}$	Declared total ledger (scalarization) used for the certificate
w	$R_{> 0}^{m}$	Positive weight vector defining $R_{tot} (t) = w^{⊤} r (t)$
$W_{led}$	SPD matrix	Optional quadratic ledger ruler: $R_{tot} (t) = {∥ r (t) ∥}_{W_{led}}^{2} = r {(t)}^{⊤} W_{led} r (t)$
$M$	finite set	Family of reported readout metrics $M (\cdot) \geq 0$
$c_{M}, C_{M}$	scalars	Dictionary/observability constants in $c_{M} R^{(K)} \leq M (x^{(K)}) \leq C_{M} R^{(K)}$ (line (O4))
Metzler comparison system and one-clock quantities
M	matrix	Metzler comparison matrix: $M_{i j} \geq 0$ for $i \neq j$ (Definition 1)
$M^{(K)}$	matrix	Level-K comparison matrix in ${\dot{r}}^{(K)} \leq M^{(K)} r^{(K)}$
$λ_{i}$	scalar	Funded self-decay margin for ledger i when $M_{i i} = - 2 λ_{i}$
$η_{i \leftarrow j}$	scalar	Injection strength from ledger j to ledger i when $M_{i j} = η_{i \leftarrow j} \geq 0$
$e^{t M}$	matrix	Matrix exponential (positive for Metzler M)
$spec (M)$	set	Spectrum (eigenvalues) of M
$μ (M)$	scalar	Spectral abscissa: $μ (M) : = max {ℜ z : z \in spec (M)}$
Hurwitz	property	M Hurwitz $\Leftrightarrow μ (M) < 0$
$λ_{eff}$	scalar	Effective one-clock rate: $λ_{eff} : = - μ (M) > 0$ (when M is Hurwitz)
$γ_{1 \leftarrow 2}$	scalar	Gain $γ_{1 \leftarrow 2} : = η_{1 \leftarrow 2} / (2 λ_{1})$ in the $2 \times 2$ form
Refinement programme-line budgets (Master Certificate)
T	$R_{> 0}$	Declared certification horizon: programme lines (O1)–(O4) are required on $[0, T]$
$τ_{K} (t)$	$R_{\geq 0}$	Tail disturbance in (O1) on $[0, T]$ : ${\dot{R}}^{(K)} \leq - 2 λ_{K} R^{(K)} + τ_{K}$
$α_{K}$	$R_{\geq 0}$	Integrated pollution budget in (O1): $\int_{0}^{T} τ_{K} (s) d s \leq α_{K}$ with $\sum_{K} α_{K} < \infty$
$β_{K}$	$R_{\geq 0}$	Unresolved-tail budget in (O3): ${sup}_{t \in [0, T]} {∥ (I - P_{K}) z^{(K + 1)} (t) ∥}_{W} \leq β_{K}$ with $\sum_{K} β_{K} < \infty$
$δ_{K}$	$R_{\geq 0}$	Projective mismatch budget in (O3): ${sup}_{t \in [0, T]} {∥ z^{(K)} (t) - P_{K} z^{(K + 1)} (t) ∥}_{W} \leq δ_{K}$ with $\sum_{K} δ_{K} < \infty$

Definitions

Entry	Definition / Formula	Role in the paper
Entry	Definition / Formula	Role in the paper
Ladder geometry (“one ruler”)
Projective consistency	$Π_{K}^{K + 2} = Π_{K}^{K + 1} \circ Π_{K + 1}^{K + 2}$	Encodes “same task” across refinement levels
One ruler (ambient instrument restriction)	There exist $(H, W)$ and realizations $i_{K}, P_{K}$ such that $i_{K} (Π_{K}^{K + 1} x) = P_{K} (i_{K + 1} x)$ and ${∥ x ∥}_{W^{(K)}}^{2} = {〈 i_{K} x, W i_{K} x 〉}_{H}$	Forbids moving goalposts (Definition 2)
Instrument norm and ruler square root	${∥ z ∥}_{W} : = \sqrt{{〈 z, W z 〉}_{H}} = {∥ W^{1 / 2} z ∥}_{H}$	Fixes the single measurement convention across all levels
Instrument contractivity (projection stability)	$∥ P_{K} {z ∥}_{W} \leq {∥ z ∥}_{W}$ on $i_{K + 1} (X^{(K + 1)})$	Ensures coarse projections do not inflate the ruler norm (Assumption A1)
Non-expansiveness across levels	$∥ Π_{K}^{K + 1} {x ∥}_{W^{(K)}} \leq {∥ x ∥}_{W^{(K + 1)}}$	Basic stability of coarse-graining (Lemma 2)
Cross-level discrepancy (ambient)	$d_{K \leftarrow L} (x^{(K)}, x^{(L)}) : = {∥ i_{K} x^{(K)} - P_{K} (i_{L} x^{(L)}) ∥}_{W}$	Canonical “apples-to-apples” cross-level distance
Projective telescoping bound	$∥ i_{K} x^{(K)} - P_{K} (i_{L} x^{(L)}) ∥_{W} \leq \sum_{j = K}^{L - 1} {∥ i_{j} x^{(j)} - P_{j} (i_{j + 1} x^{(j + 1)}) ∥}_{W}$	Turns summable adjacent mismatches into a Cauchy tower (Lemma 3)
Refinement-limit learner (projective limit)	$x^{(\infty)} (t) \in H$ with $P_{K} x^{(\infty)} (t) = i_{K} x^{(K)} (t)$ (compatible Cauchy tower)	Existence/uniqueness of a refinement-limit object (Theorem 2)
Ledgers, scalarizations, and readouts
Ledger vector	$r (t) = {(R_{1} (t), \dots, R_{m} (t))}^{⊤} \in R_{\geq 0}^{m}$	Collects multiple nonnegative training-time quantities
Declared total ledger	$R_{tot} (t) = w^{⊤} r (t)$ with $w ≫ 0$	Single scalar clock target for contraction
Quadratic ledger ruler (optional)	$R_{tot} (t) = {∥ r (t) ∥}_{W_{led}}^{2} = r {(t)}^{⊤} W_{led} r (t)$ , $W_{led} ≻ 0$	Alternative scalarization when a quadratic contract is preferred
Reported readouts / metrics	$M = {M}$ , with $M (\cdot) \geq 0$	External observables whose stability is transported
Dictionary (observability) line	$c_{M} R^{(K)} (t) \leq M (x^{(K)} (t)) \leq C_{M} R^{(K)} (t)$ uniformly in K	Transfers the certified clock to readouts ((O4))
Readout Lipschitz transport (optional)	$\| M (u) - M (v) \| \leq L_{M} {∥ u - v ∥}_{W}$ on a declared bounded set	Converts ruler convergence to readout convergence (optional strengthening)
Metzler comparison and one-clock reduction
Metzler matrix	$M \in R^{m \times m}$ with $M_{i j} \geq 0$ for $i \neq j$	Positivity structure for ledger couplings
Metzler comparison inequality	$\dot{r} (t) \leq M r (t)$ componentwise	Auditable coupling model (Definition 1)
Semigroup positivity	M Metzler $\Rightarrow e^{t M} \geq 0$ entrywise for all $t \geq 0$	Enables order-preserving comparison
Comparison principle (Duhamel form)	If y solves $\dot{y} = M y$ , $y (0) = r (0)$ and $z : = y - r$ , then $z (t) = \int_{0}^{t} e^{(t - s) M} q (s) d s$ , $q : = \dot{z} - M z \geq 0$	Correct proof mechanism for $r (t) \leq e^{t M} r (0)$ (Lemma 1)
Funding + injection parametrization	$M_{i i} = - 2 λ_{i}$ , $M_{i j} = η_{i \leftarrow j} \geq 0$ $(i \neq j)$	Interpretable design levers (fund diagonals, reduce injections)
Spectral abscissa and effective rate	$μ (M) : = max {ℜ z : z \in spec (M)}$ , $λ_{eff} : = - μ (M)$ if $μ (M) < 0$	Defines the certified one-clock exponent class
One-clock certificate (witness form)	$\exists w ≫ 0, λ > 0$ with $w^{⊤} M \leq - λ w^{⊤} \Rightarrow w^{⊤} r (t) \leq e^{- λ t} w^{⊤} r (0)$	Core reduction theorem (Theorem 1)
Hurwitz ⇒ witness (sharp rate)	If $μ (M) < 0$ and M is Metzler, $\exists w ≫ 0$ with $w^{⊤} M \leq μ (M) w^{⊤}$	Explains existence of witnesses / sharp clock (Proposition 1)
Two-ledger small-gain criterion	For $2 \times 2$ funded+injection M, M Hurwitz $\Leftrightarrow η_{1 \leftarrow 2} η_{2 \leftarrow 1} < 4 λ_{1} λ_{2}$	Exact design boundary (Proposition 2)
Master Certificate programme lines on $[0, T]$
Certification horizon	$T > 0$ (declared)	All programme lines are audited on $[0, T]$
(O1) Tail-robust envelope	${\dot{R}}^{(K)} \leq - 2 λ_{K} R^{(K)} + τ_{K}$ , $\int_{0}^{T} τ_{K} \leq α_{K}$ , $\sum_{K} α_{K} < \infty$	Controls time-direction pollution on $[0, T]$
(O2) Uniform margin	$λ_{K} \geq λ > 0$ independent of K	One exponent class across refinement
(O3) Geometric tower budget	${sup}_{t \in [0, T]} (∥ (I - P_{K}) z^{(K + 1)} (t) ∥_{W} + {∥ z^{(K)} (t) - P_{K} z^{(K + 1)} (t) ∥}_{W}) \leq β_{K} + δ_{K}$ , $\sum_{K} (β_{K} + δ_{K}) < \infty$	Cauchy tower in one ruler (no drift) on $[0, T]$
(O4) Uniform dictionary on $[0, T]$	$c_{M} R^{(K)} (t) \leq M (x^{(K)} (t)) \leq C_{M} R^{(K)} (t)$ uniformly in K and $t \in [0, T]$	Transfers rates to reported metrics
Master Certificate (learning version)	(O1)–(O4) ⇒ refinement-limit trajectory + rate inheritance on $[0, T]$	Main soundness engine (Theorem 3)
Certificate artifact (what the prover ships)
Certificate artifact	Declared $(H, W)$ ; maps $(i_{K}, P_{K}, Π_{K}^{K + 1})$ ; ledger definition $R^{(K)}$ ; budgets $(α_{K}, β_{K}, δ_{K})$ ; margin $λ$ ; dictionary constants $(c_{M}, C_{M})$ ; and (when applicable) a Metzler witness $(w, λ)$	Concrete verifier-facing interface (the “proof-carrying” object)
Witness-finding LP (optional recipe)	Find $w \in R^{m}$ , $λ \geq 0$ s.t. $w \geq 1$ , $w^{⊤} M \leq - λ w^{⊤}$ , $\sum_{i} w_{i} = 1$	Makes the one-clock witness actionable for ML/CS audiences

References

Miettinen, K. Nonlinear Multiobjective Optimization . In International Series in Operations Research & Management Science; Kluwer Academic Publishers: Boston, MA, 1999; Vol. 12. [Google Scholar] [CrossRef]
Sener, O.; Koltun, V. Multi-Task Learning as Multi-Objective Optimization. Proceedings of the Advances in Neural Information Processing Systems 2018, arXiv:csVol. 31, 525–536. [Google Scholar]
Kaplan, J.; McCandlish, S.; Henighan, T.; Brown, T.B.; Chess, B.; Child, R.; Gray, S.; Radford, A.; Wu, J.; Amodei, D. Scaling Laws for Neural Language Models. arXiv 2020, arXiv:2001.08361. [Google Scholar] [CrossRef]
Ciarlet, P.G. The Finite Element Method for Elliptic Problems . In Studies in Mathematics and Its Applications; North-Holland Publishing Company: Amsterdam, 1978; Vol. 4. [Google Scholar]
Hackbusch, W. Multi-Grid Methods and Applications . In Springer Series in Computational Mathematics; Springer-Verlag: Berlin, 1985; Vol. 4. [Google Scholar] [CrossRef]
Brenner, S.C.; Scott, L.R. The Mathematical Theory of Finite Element Methods . In Texts in Applied Mathematics, 3 ed.; Springer: New York, 2008; Vol. 15. [Google Scholar] [CrossRef]
Lax, P.D.; Richtmyer, R.D. Survey of the Stability of Linear Finite Difference Equations. Communications on Pure and Applied Mathematics 1956, 9, 267–293. [Google Scholar] [CrossRef]
Hille, E.; Phillips, R.S. Functional Analysis and Semi-Groups . In American Mathematical Society Colloquium Publications; American Mathematical Society: Providence, RI, 1957; Vol. 31. [Google Scholar]
Trotter, H.F. Approximation of Semi-Groups of Operators. Pacific Journal of Mathematics 1958, 8, 887–919. [Google Scholar] [CrossRef]
Pazy, A. Semigroups of Linear Operators and Applications to Partial Differential Equations . In Applied Mathematical Sciences; Springer: New York, 1983; Vol. 44. [Google Scholar] [CrossRef]
Berman, A.; Plemmons, R.J. Nonnegative Matrices in the Mathematical Sciences; Academic Press, 1979. [Google Scholar] [CrossRef]
Farina, L.; Rinaldi, S. Positive Linear Systems: Theory and Applications . In Pure and Applied Mathematics; Wiley–Interscience: New York, 2000; Vol. 255. [Google Scholar]
Smith, H.L. Monotone Dynamical Systems: An Introduction to the Theory of Competitive and Cooperative Systems . In Mathematical Surveys and Monographs; American Mathematical Society: Providence, RI, 1995; Vol. 41. [Google Scholar] [CrossRef]
Briat, C. Linear Parameter-Varying and Time-Delay Systems: Analysis, Observation, Filtering & Control . In Advances in Delays and Dynamics; Springer: Berlin, Heidelberg, 2015. [Google Scholar] [CrossRef]
Kreyszig, E. Introductory Functional Analysis with Applications . In Wiley Classics Library; Wiley, 1989; Vol. 17. [Google Scholar]
Bousquet, O.; Elisseeff, A. Stability and Generalization. Journal of Machine Learning Research 2002, 2, 499–526. [Google Scholar] [CrossRef]
McAllester, D.A. PAC-Bayesian Model Averaging. In Proceedings of the Proceedings of the Twelfth Annual Conference on Computational Learning Theory (COLT ’99), New York, NY, USA, 1999; pp. 164–170. [Google Scholar] [CrossRef]
Catoni, O. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning . In Institute of Mathematical Statistics Lecture Notes–Monograph Series; Institute of Mathematical Statistics: Beachwood, OH, 2007; Vol. 56. [Google Scholar] [CrossRef]
Duchi, J.C.; Namkoong, H. Learning Models with Uniform Performance via Distributionally Robust Optimization. The Annals of Statistics 2021, 49, 1378–1406. [Google Scholar] [CrossRef]
Katz, G.; Barrett, C.; Dill, D.L.; Julian, K.; Kochenderfer, M.J. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. In Proceedings of the Computer Aided Verification (CAV 2017), Proceedings, Part I; Majumdar, R., Kuncak, V., Eds.; Lecture Notes in Computer Science : Cham, 2017; Vol. 10426, pp. 97–117. [Google Scholar] [CrossRef]
Gehr, T.; Mirman, M.; Drachsler-Cohen, D.; Tsankov, P.; Chaudhuri, S.; Vechev, M. AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation. In Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 2018; pp. 3–18. [Google Scholar] [CrossRef]
Seshia, S.A.; Sadigh, D.; Sastry, S.S. Toward Verified Artificial Intelligence An earlier technical version appeared as. Communications of the ACM 2022, arXiv:1606.0851465, 46–55. [Google Scholar] [CrossRef]
Necula, G.C. Proof-carrying code. In Proceedings of the Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’97), New York, NY, USA, 1997; pp. 106–119. [Google Scholar] [CrossRef]
Coddington, E.A.; Levinson, N. Theory of Ordinary Differential Equations; McGraw–Hill: New York, 1955. [Google Scholar]
Khalil, H.K. Nonlinear Systems, 3 ed.; Prentice Hall: Upper Saddle River, NJ, 2002. [Google Scholar]
Grönwall, T.H. Note on the Derivatives with Respect to a Parameter of the Solutions of a System of Differential Equations. Annals of Mathematics (2) 1919, 20, 292–296. [Google Scholar] [CrossRef]
Robbins, H.; Monro, S. A Stochastic Approximation Method. The Annals of Mathematical Statistics 1951, 22, 400–407. [Google Scholar] [CrossRef]
Benaïm, M. Dynamics of Stochastic Approximation Algorithms. In Séminaire de Probabilités XXXIII; Azéma, J., Émery, M., Ledoux, M., Yor, M., Eds.; Springer: Berlin, Heidelberg; Lecture Notes in Mathematics , 1999; Vol. 1709, pp. 1–68. [Google Scholar] [CrossRef]
Kushner, H.J.; Yin, G.G. Stochastic Approximation and Recursive Algorithms and Applications . In Stochastic Modelling and Applied Probability, 2 ed.; Springer: New York, 2003; Vol. 35. [Google Scholar] [CrossRef]
Bramble, J.H.; Pasciak, J.E.; Xu, J. Parallel Multilevel Preconditioners. Mathematics of Computation 1990, 55, 1–22. [Google Scholar] [CrossRef]
Thomée, V. Galerkin Finite Element Methods for Parabolic Problems . In Springer Series in Computational Mathematics, 2 ed.; Springer: Berlin, Heidelberg, 2006. [Google Scholar] [CrossRef]
Emmrich, E. Discrete Versions of Gronwall’s Lemma and Their Application to the Numerical Analysis of Parabolic Problems. In Preprint Reihe Mathematik 637; Technische Universität Berlin, 1999. [Google Scholar]
Desoer, C.A.; Vidyasagar, M. Feedback Systems: Input-Output Properties; Academic Press: New York, 1975. [Google Scholar]
Jiang, Z.P.; Teel, A.R.; Praly, L. Small-Gain Theorem for ISS Systems and Applications. IEEE Transactions on Automatic Control 1994, 39, 1609–1619. [Google Scholar] [CrossRef]

Table 1. Numerical sanity check of the toy certificate

R_{tot}^{(K)} (t) \leq e^{- λ_{cert} t} R_{tot}^{(K)} (0)

with

λ_{cert} = 2

for

m = 2.0

,

α = 2.5

,

p = 1.0

,

T = 5.0

,

Δ t = 10^{- 3}

.

Table 1. Numerical sanity check of the toy certificate

R_{tot}^{(K)} (t) \leq e^{- λ_{cert} t} R_{tot}^{(K)} (0)

with

λ_{cert} = 2

for

m = 2.0

,

α = 2.5

,

p = 1.0

,

T = 5.0

,

Δ t = 10^{- 3}

.

K	$R_{tot}^{(K)} (0)$	$R_{tot}^{(K)} (T)$	${max}_{t \in [0, T]} {Rat}^{(K)} (t)$	${Rat}^{(K)} (T)$
10	$2.125857 \times 10^{- 1}$	$1.091714 \times 10^{- 10}$	$1.000000$	$1.131149 \times 10^{- 5}$
20	$2.370105 \times 10^{- 1}$	$1.247584 \times 10^{- 10}$	$1.000000$	$1.159436 \times 10^{- 5}$
40	$2.459643 \times 10^{- 1}$	$1.351165 \times 10^{- 10}$	$1.000000$	$1.209988 \times 10^{- 5}$
80	$2.486140 \times 10^{- 1}$	$1.378366 \times 10^{- 10}$	$1.000000$	$1.221192 \times 10^{- 5}$

Numerical sanity check of the toy certificate

R_{tot}^{(K)} (t) \leq e^{- λ_{cert} t} R_{tot}^{(K)} (0)

with

λ_{cert} = 2

for

m = 2.0

,

α = 2.5

,

p = 1.0

,

T = 5.0

,

Δ t = 10^{- 3}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

A Certificate Layer for Multi-Objective Training Across Refinement Scales

Abstract

Keywords:

Subject:

1. Introduction

1.1. Problem and Certificate Goal

1.2. Context and Gap

Multiobjective learning (within a fixed representation).

Refinement limits (stability in a fixed norm).

1.3. State of the Art, Approach, and Contributions

State of the art (adjacent lines).

Approach (two theorem engines).

Engine A (positivity ⇒ one clock).

Engine B (one ruler + summable discrepancies ⇒ refinement limit).

Contributions (with pointers).

Certificate interface.

2. Problem: Multi-Metric Learning on Refinement Ladders

2.1. Learner State, Dynamics, and a Single Ambient Ruler

2.1.1. Refinement-Indexed State Spaces and One Ruler

2.1.2. Training Dynamics: Discrete Updates and Continuous Flows

2.1.3. State Augmentation

2.2. Ledgers and Reported Readouts

2.2.1. Ledger Maps and Trajectories

2.2.2. Total Ledgers (Certificate-Compatible Scalarizations)

2.2.3. Reported Readouts and Dictionary Conditioning

2.2.4. A Minimal Regularity Interface

2.3. Cross-Level Maps and the “Same Task” Requirement

2.3.1. Projective Structure

2.4. What Will Be Certified Later

3. Certificate Layer: One-Clock Reduction for Nonnegative Ledgers

3.1. Why Positivity Is the Right Abstraction

3.1.1. Nonnegative Ledgers and Injections

3.1.2. Why Positive Comparison Is Canonical

3.1.3. How Metzler Bounds Arise in Practice (Derivation Template)

3.2. Metzler Comparison Systems

3.2.1. Definition (Metzler Dominance)

3.2.2. Semantics: Funded Diagonals and Injections

3.2.3. Positivity of the Semigroup and Comparison

3.2.4. From Componentwise Bounds to Certified Scalar Ledgers

3.3. One-Clock Reduction

3.3.1. Spectral Abscissa and Effective Rate

3.3.2. Checker form: A Copositive Lyapunov Witness

3.3.3. From Hurwitzness to a Witness, and the Sharp Rate

3.3.4. Two-Ledger Closed Form (Explicit Small-Gain Boundary)

3.3.5. Monotone Engineering Levers: Funded Diagonals vs. Injections

3.3.6. A Checker View: What Must Be Provided and What Is Proved

4. Master Certificate for Learning Under Refinement

4.1. Ladder Geometry and Refinement Limits (Analytic Core)

4.1.1. Ambient One-Ruler Structure and Projective Maps

4.1.2. Cross-Level Discrepancy and Telescoping

4.1.3. Existence and Uniqueness of a Refinement-Limit Object

4.2. Programme Lines (O1)–(O4) on [ 0 , T ]

4.2.0.1. Certified time horizon.

4.2.1. Objects on the Window: Trajectories, Ruler, and a Declared Total Ledger

(O1) Tail-robust contraction envelope (summable pollution budget)

(O2) Uniform margin (one exponent class across refinement)

(O3) Projective Cauchy tower in one ruler (summable cross-level inconsistency)

(O4) Uniform dictionary / observability (reported readouts remain well-conditioned)

4.2.2. Checker Summary (What the Certificate Must Provide)

4.3. Master Certificate Theorem on [ 0 , T ]

5. Instantiations

5.1. Setup: an ℓ 2 Ladder with Diagonal Constraint Operator

5.2. Dynamics: Damped Primal–Dual Flow

5.3. Ledgers and Readouts

5.4. Metzler Comparison and a Checker-Friendly Hurwitz Witness

5.5. Programme Lines (O1)–(O4) (Vanishing Tails)

5.6. Consequence: Refinement-Limit Existence and Inherited Clock

5.7. Numerical Sanity Check (Uniform in K)

5.8. Neural Instantiation (Protocol): A Width Ladder with Auditable Programme Lines

5.8.1. Setup: Width Ladder and a Declared Projection

5.8.2. Practical Ledgers (Loggable During Training)

5.8.3. Estimating Metzler Coefficients from Traces

5.8.4. Empirical Checks for (O1)–(O4)

5.8.5. Interpretation

6. Concluding Discussion and Outlook

Notation

Definitions

References

MDPI Initiatives

Important Links

4.2. Programme Lines (O1)–(O4) on $[0, T]$

4.3. Master Certificate Theorem on $[0, T]$

5.1. Setup: an $ℓ^{2}$ Ladder with Diagonal Constraint Operator