Preprint
Article

This version is not peer-reviewed.

A Certificate Layer for Multi-Objective Training Across Refinement Scales

Submitted:

21 December 2025

Posted:

22 December 2025

You are already at the latest version

Abstract
Modern training pipelines are governed by multiple coupled nonnegative metrics (performance, constraints, robustness, calibration, and compute budgets) and are rerun across refinement ladders (width/depth scaling, discretization, basis growth, and data-fidelity upgrades). This paper develops a certificate layer for such settings. First, a Metzler comparison system for the ledger vector admits a one-clock reduction: if a Hurwitz witness exists, then a declared positive scalarization contracts exponentially, with explicit two-ledger small-gain formulas. Second, a Master Certificate upgrades four auditable programme lines verified in a single declared ruler---summable tail disturbances, a uniform contraction margin, projective Cauchy consistency, and a uniform dictionary for reported readouts---to existence and uniqueness of a refinement-limit learner on the certification window, together with rate inheritance and readout transport. The framework yields a proof-carrying stability grammar for learning under updates and refinement, intended to compose with external generalization and robustness modules.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  ;  

1. Introduction

Modern training pipelines are governed by multiple coupled nonnegative quantities: predictive performance, constraint and safety violations, robustness proxies, calibration errors, and compute/latency budgets. In multi-task and multi-constraint settings these ledgers can genuinely compete, so a single weighted loss is often only a proxy for what a pipeline is required to guarantee [1,2]. A second ubiquitous feature is refinement: the same pipeline is rerun across a ladder of resolutions (width/depth scaling [3], discretization refinement, feature-basis growth, data fidelity upgrades). In scientific computing, refinement limits are meaningful only once all levels are compared in a fixed energy norm and stable transfer maps are specified [4,5,6].

1.1. Problem and Certificate Goal

Problem (multi-ledger learning under refinement). Training is repeated across refinement levels, producing (i) learner trajectories/iterates and (ii) a vector of nonnegative ledgers tracking performance, constraints, robustness, calibration, and resource budgets. We seek auditable conditions under which:
(i)
a refinement-limit learner exists and is unique in a fixed, declared ruler, and
(ii)
the multi-ledger evolution admits a single certified contraction clock (a uniform exponential rate class) that does not degrade as the refinement index grows.
The formal ladder objects (state spaces, coarse maps/projections, and the declared ruler) are fixed once in sec:problem. The two theorem engines that turn auditable inequalities into guarantees are proved in sec:certificate-layer,sec:master-certificate-learning, with instantiations in sec:instantiations.

1.2. Context and Gap

Two adjacent traditions motivate the problem but do not, by themselves, provide a checker-facing interface.

Multiobjective learning (within a fixed representation).

Pareto structure and scalarization are classical in multiobjective optimization [1], and modern multi-task learning can be framed as multiobjective training with gradient-based trade-off schemes [2]. This line clarifies how to negotiate competing objectives at a given scale; it does not provide a refinement-consistent, verifier-friendly implication that survives changes of resolution.

Refinement limits (stability in a fixed norm).

In numerical analysis and multilevel methods, convergence across discretizations becomes meaningful only after fixing a single norm/ruler and stable transfer operators, as in stability–consistency paradigms and energy-norm FEM/multigrid constructions [4,5,6,7]. Semigroup approximation theory plays a related role for evolution families [8,9,10]. What is missing for modern training loops is a compact certificate interface that simultaneously (i) handles several interacting nonnegative quantities and (ii) remains coherent along a refinement ladder, returning a single auditable decay rate.

1.3. State of the Art, Approach, and Contributions

The paper sits at the intersection of three adjacent traditions.

State of the art (adjacent lines).

(i) Multiobjective learning and scalarization. The standard formal language for competing objectives is multiobjective optimization and Pareto trade-offs [1]; in modern ML, multi-task procedures can be interpreted as multiobjective updates with explicit trade-off mechanisms [2]. This line clarifies how to balance objectives at a fixed representation level, but it does not by itself provide a verifier-facing implication that survives changing resolution.
(ii) Positive systems and comparison principles. Metzler matrices, order-preserving comparison, and Perron–Frobenius structure form the classical backbone of positive linear systems and cooperative dynamics [11,12,13]. Copositive (linear) Lyapunov functions and robust/synthesis viewpoints for positive systems make the contraction certificates particularly witness-friendly [14]. What is missing for ML pipelines is an explicit “certificate grammar” that turns such comparison bounds into a shipped artifact and a small checker.
(iii) Refinement ladders and stable limits. In numerical analysis, meaningful convergence across discretizations requires a fixed measurement convention (energy norm) and stable transfer operators; this underlies stability–consistency paradigms and multilevel methods [4,5,6,7]. Abstract semigroup approximation theory provides a complementary operator-theoretic view of consistent approximations of evolution families [8,9,10]. These traditions explain why “one ruler” is non-negotiable, but they do not natively address coupled multi-ledger training objectives nor provide a checker interface for ML systems.

Approach (two theorem engines).

We separate the solution into two theorem engines: an in-level reduction that collapses many nonnegative ledgers to one certified clock, and a cross-level reduction that turns summable refinement discrepancies into a limit object in one ruler.

Engine A (positivity ⇒ one clock).

At each level K we target a cooperative comparison inequality of Metzler type for the ledger vector, possibly with a nonnegative remainder. In this setting, positive-systems theory supplies checker-friendly copositive Lyapunov witnesses and Perron–Frobenius structure [11,12,13,14]. The certificate layer is organized so that a prover can ship a short witness ( w , λ ) and a verifier can check the one-clock claim by a finite set of inequalities (sec:certificate-layer).

Engine B (one ruler + summable discrepancies ⇒ refinement limit).

Across levels, we realize X ( K ) inside a common ambient ruler space and compare levels only after applying the declared coarse projection. A summable tower defect (projective Cauchy condition) forces a Cauchy tower in the ambient ruler and therefore a unique refinement-limit learner on the declared window (sec:master-certificate-learning). This is the refinement-ladder analogue of the basic “summable increments ⇒ convergence” mechanism in Banach spaces [15].

Contributions (with pointers).

C1:
One-clock reduction for nonnegative ledgers. We develop a checker form of the Metzler/Hurwitz implication: a copositive witness yields exponential contraction of a declared scalar ledger, with an explicit two-ledger small-gain boundary and monotone design levers (sec:certificate-layer).
C2:
Master Certificate across refinement ladders. We prove that four auditable programme lines in one ruler (tail-robust envelope, uniform margin, projective Cauchy tower, uniform dictionary) imply existence/uniqueness of a refinement-limit learner on [ 0 , T ] together with rate inheritance and readout transport (sec:master-certificate-learning).
C3:
Instantiations. We give (i) a fully checkable toy ladder with explicit constants and (ii) a width-ladder protocol sketch indicating how to populate the certificate from logged traces (sec:instantiations).
C4:
Scope wall and composability. The proved guarantees are training-time stability and refinement-limit existence in a declared ruler. Statistical generalization, out-of-distribution shift, and stochastic-optimizer noise are not asserted unless they are explicitly ledgerized or imported as separate modules (e.g. stability/PAC-Bayes and robustness/verification frameworks) [16,17,18,19,20,21,22].

Certificate interface.

The paper adopts a proof-carrying viewpoint: the producer supplies a certificate artifact (declared ruler, transfer maps, ledger definitions, budgets, and—when applicable—a Metzler witness), and a consumer validates it with a small checker, analogous in spirit to proof-carrying code [23].

2. Problem: Multi-Metric Learning on Refinement Ladders

This section fixes the formal objects referenced by the certificate obligations. We specify (i) a refinement-indexed ladder of state spaces realized inside a single declared ambient ruler, (ii) coherent coarse-graining/projection maps along the ladder, (iii) training dynamics (discrete updates or continuous flows) on each level, and (iv) ledger/readout maps that turn states into multiple nonnegative metrics. This setup is chosen so that the programme lines (O1)–(O4) in sec:master-certificate-learning can be stated unambiguously and checked mechanically.

2.1. Learner State, Dynamics, and a Single Ambient Ruler

2.1.1. Refinement-Indexed State Spaces and One Ruler

Fix a refinement index set K = { K 0 , K 0 + 1 , } . For each K K , let X ( K ) be a metric space of learner states at resolution K (parameters, buffers, optimizer state, constraint multipliers, etc.). We assume the ladder embeds into an ambient normed space X ( ) , · with a single ruler:
X ( K ) X ( ) for all K ,
and we fix projections P K : X ( ) X ( K ) such that
P K P L = P K ( K L ) , P K z z , P K z z in X ( ) as K .
The identities in (1) encode a coherent ladder geometry: coarse-graining is consistent along the ladder and contractive in the declared ruler.

2.1.2. Training Dynamics: Discrete Updates and Continuous Flows

At each K, training is described either by a discrete-time update map
x n + 1 ( K ) = Φ ( K ) ( x n ( K ) , ξ n ) , n = 0 , 1 , 2 , ,
or by a continuous-time (possibly differential-inclusion) flow
x ˙ ( K ) ( t ) = F ( K ) ( x ( K ) ( t ) , t ) , t [ 0 , T ] .
Here ξ n denotes exogenous randomness (mini-batch sampling, dropout masks, noise injections) when present. For (2) we assume Φ ( K ) ( · , ξ ) is measurable (often locally Lipschitz). For (3) we assume F ( K ) is measurable in time and locally Lipschitz in state, ensuring existence/uniqueness of trajectories on [ 0 , T ] [24,25]. Stochasticity can be absorbed into the certificate layer as a nonnegative disturbance budget; see Grönwall-type propagation [26] and stochastic approximation frameworks [27,28,29].

2.1.3. State Augmentation

The state x ( K ) may include optimizer momentum, running moments (Adam/EMA), constraint multipliers, safety filters, or internal solver iterates. Analytically, this is harmless: auxiliary variables can be stacked into a product space and measured with a product norm, and many stability properties are naturally expressed on the augmented state [25].

2.2. Ledgers and Reported Readouts

2.2.1. Ledger Maps and Trajectories

Fix an integer m 1 . For each refinement level K, a ledger family is a vector of nonnegative maps
R ( K ) : X ( K ) R 0 m , R ( K ) ( x ) = R 1 ( K ) ( x ) , , R m ( K ) ( x ) .
Each R i ( K ) is a measurable functional representing a tracked metric (risk proxy, constraint residual, robustness proxy, calibration error, compute/latency deficit, etc.). Given a trajectory τ x ( K ) ( τ ) (discrete τ = n or continuous τ = t ), we write
r ( K ) ( τ ) : = R ( K ) x ( K ) ( τ ) R 0 m .

2.2.2. Total Ledgers (Certificate-Compatible Scalarizations)

A total ledger is any positive weighting of the ledger vector:
R tot ( K ) ( τ ) : = w r ( K ) ( τ ) , w 0 .
There are two complementary interpretations:
(i)
Design aggregation: w is chosen by a practitioner to reflect priorities (a scalarization choice) [1,2].
(ii)
Certified aggregation: w is produced by the one-clock reduction once a Metzler comparison inequality is established (a copositive Lyapunov weight), so that R tot ( K ) decays with a certified exponent when the comparison matrix is Hurwitz [11,12,14].

2.2.3. Reported Readouts and Dictionary Conditioning

Refinement changes representation, so not all measured quantities are automatically comparable across K. We distinguish:
  • Geometric ledgers defined by restriction of the ambient ruler (automatically consistent under refinement), as in stable discretization theory [4,6].
  • Reported/readout metrics depending on a K-dependent apparatus (validation sets, simulator fidelity, feature maps). These require an explicit dictionary condition later (programme line (O4)) to prevent ill-conditioning from faking improvement.

2.2.4. A Minimal Regularity Interface

To transport convergence in the ruler to convergence of reported readouts, it is often enough to assume local Lipschitz control on the certification window: for each K and each i,
R i ( K ) ( x ) R i ( K ) ( y ) L i x y ,
with constants L i independent of K on the relevant bounded set. Uniformity prevents refinement from silently changing the scale of what is being measured [8,10].

2.3. Cross-Level Maps and the “Same Task” Requirement

2.3.1. Projective Structure

Assume there exist coarse-graining/projection maps
Π K K + 1 : X ( K + 1 ) X ( K ) ,
representing restriction, pruning, averaging, or projection of refined states to coarser representations. We require consistency (a projective system):
Π K K + 2 = Π K K + 1 Π K + 1 K + 2 for all K K 0 .
This encodes that all levels represent the same task object, just at different resolution; see, e.g., FEM and multilevel treatments [4,5,6] and extensions beyond nested families [30].

2.4. What Will Be Certified Later

The remainder of the paper will not assume that ledgers decay automatically. Instead, it will require auditable inequalities of the form
r ˙ ( K ) ( τ ) M ( K ) r ( K ) ( τ ) + d ( K ) ( τ ) ,
with M ( K ) Metzler and d ( K ) 0 a tail disturbance, and then impose uniformity/summability conditions across K (programme lines (O1)–(O4) in sec:master-certificate-learning). Under those obligations, a one-clock total ledger and a refinement-limit learner follow from positive-systems theory and projective Cauchy arguments [11,12,14].

3. Certificate Layer: One-Clock Reduction for Nonnegative Ledgers

3.1. Why Positivity Is the Right Abstraction

3.1.1. Nonnegative Ledgers and Injections

A ledger is any measurable quantity R i ( t ) 0 that a practitioner wants to track (risk, constraint violation, robustness proxy, calibration error, solver residual, etc.). We collect them into a vector
r ( t ) : = R 1 ( t ) , , R m ( t ) R 0 m .
Cross-effects appear as injections: if improving (or stabilizing) ledger j temporarily worsens ledger i, then R ˙ i may contain a nonnegative term proportional to R j .

3.1.2. Why Positive Comparison Is Canonical

Nonnegativity induces the partial order u v iff u i v i for all i on R m . Many training-time inequalities have the funded–plus–injection form
R ˙ i ( t ) ( funded decay margin ) · R i ( t ) + j i ( injection i j ) · R j ( t ) ,
with all injection coefficients nonnegative. This is exactly the comparison form of a positive linear system (Metzler dynamics). It admits: (i) comparison principles, (ii) Perron–Frobenius structure for the dominant mode, and (iii) copositive Lyapunov functions V ( r ) = w r with w 0 ; see, e.g., [11,12,13,14].

3.1.3. How Metzler Bounds Arise in Practice (Derivation Template)

The certificate layer does not assume access to model internals; it assumes that one can bound ledger evolution by auditable inequalities. A common derivation pattern is:
(i)
write a differential (or finite-difference) inequality for each nonnegative ledger R i ;
(ii)
isolate a self-decay term 2 λ i R i (from dissipation, regularization, descent, contractive updates);
(iii)
bound cross-terms by nonnegative injections using operator norms or Lipschitz bounds and Young’s inequality, turning mixed products into sums of squares;
(iv)
collect coefficients into a Metzler matrix M with M i i = 2 λ i and M i j = η i j 0 .
Such reductions are standard in Lyapunov and comparison analyses; see, e.g., [25] for Lyapunov inequalities, [12,14] for positive-systems comparison, and [11] for the nonnegative-matrix structure behind Metzler semigroups. In discrete time, the same pattern yields a one-step domination
r n + 1 ( K ) ( I + h M ) r n ( K ) + ζ n ( K ) , ζ n ( K ) 0 ,
and can be propagated by discrete Grönwall-type arguments for positive recursions [31,32].

3.2. Metzler Comparison Systems

3.2.1. Definition (Metzler Dominance)

Definition 1 
(Metzler comparison inequality). Let r : [ 0 , ) R 0 m be absolutely continuous. AMetzler comparison systemfor r is an inequality
r ˙ ( t ) M r ( t ) for a . e . t 0 ,
where M R m × m isMetzler, i.e. M i j 0 for all i j .

3.2.2. Semantics: Funded Diagonals and Injections

Write
M i i = 2 λ i , λ i 0 , and M i j = η i j 0 ( i j ) .
Then (7) reads componentwise as
R ˙ i ( t ) 2 λ i R i ( t ) + j i η i j R j ( t ) .
We interpret λ i as a funded decay margin for ledger i and η i j as the injection strength from ledger j to ledger i.

3.2.3. Positivity of the Semigroup and Comparison

A basic fact is that Metzler matrices generate positive semigroups:
M Metzler e t M 0 entrywise for all t 0 ,
see, e.g., [11,12].
Lemma 1 
(Linear comparison principle for Metzler dominance). Assume (7) and r ( 0 ) R 0 m . Let y solve the linear ODE y ˙ = M y with y ( 0 ) = r ( 0 ) . Then for all t 0 ,
0 r ( t ) y ( t ) = e t M r ( 0 ) ( componentwise ) .
Proof. 
Let y ( t ) = e t M r ( 0 ) solve y ˙ = M y with y ( 0 ) = r ( 0 ) , and set z ( t ) : = y ( t ) r ( t ) . Then z ( 0 ) = 0 and for a.e. t 0 ,
z ˙ ( t ) = y ˙ ( t ) r ˙ ( t ) M y ( t ) M r ( t ) = M z ( t ) .
Define the (componentwise) nonnegative forcing term
q ( t ) : = z ˙ ( t ) M z ( t ) 0 for a . e . t 0 .
Then z satisfies the Duhamel identity
z ( t ) = 0 t e ( t s ) M q ( s ) d s .
Because M is Metzler, e ( t s ) M 0 entrywise for all t s , and since q ( s ) 0 componentwise, the integrand is componentwise nonnegative. Hence z ( t ) 0 componentwise for all t 0 , i.e. r ( t ) y ( t ) = e t M r ( 0 ) componentwise. □

3.2.4. From Componentwise Bounds to Certified Scalar Ledgers

A comparison inequality of the form (7) controls the vector ledger r ( t ) . To obtain a single certified “clock” usable by a checker, we scalarize r ( t ) by a positive weight. Copositive scalarizations are natural in positive systems: if w 0 , then R tot ( t ) : = w r ( t ) is itself a nonnegative ledger and preserves the cone order. This idea underlies linear copositive Lyapunov functions in positive systems [12,14] and is closely related to Perron–Frobenius theory for nonnegative matrices [11].

3.3. One-Clock Reduction

3.3.1. Spectral Abscissa and Effective Rate

Define the spectral abscissa
μ ( M ) : = max { z : z spec ( M ) } .
For a Metzler matrix, μ ( M ) is real and governs the slowest exponential mode; see [11,12]. When μ ( M ) < 0 we define the effective one-clock rate
λ eff : = μ ( M ) > 0 .

3.3.2. Checker form: A Copositive Lyapunov Witness

Theorem 1 
(One-clock reduction from a copositive Lyapunov witness). Assume M is Metzler and (7) holds with r ( 0 ) R 0 m . If there exist w 0 and λ > 0 such that
w M λ w ( componentwise ) ,
then the scalar total ledger
R tot ( t ) : = w r ( t )
obeys
R tot ( t ) e λ t R tot ( 0 ) t 0 .
In particular, M is Hurwitz and μ ( M ) λ .
Proof. 
Since r ( t ) 0 componentwise and w 0 , the scalar ledger R tot ( t ) = w r ( t ) is nonnegative. Multiply (7) by w and use (9): for a.e. t,
R ˙ tot ( t ) = w r ˙ ( t ) w M r ( t ) λ w r ( t ) = λ R tot ( t ) .
Grönwall’s inequality yields (11); see, e.g., [25,26]. The implication μ ( M ) λ (hence Hurwitz) from the existence of a strict copositive linear Lyapunov inequality is standard for Metzler matrices; see [12,14]. □

3.3.3. From Hurwitzness to a Witness, and the Sharp Rate

The checker form thm:oneclock-witness is the verification primitive: a certificate can literally ship ( w , λ ) and the checker verifies (9). The next result explains why such witnesses exist for Hurwitz Metzler matrices and how the sharp rate λ eff = μ ( M ) appears.
Proposition 1 
(Hurwitz Metzler ⇒ existence of a strictly positive witness). Let M be Metzler and assume μ ( M ) < 0 . Then there exists w 0 such that
w M μ ( M ) w ( componentwise ) .
Consequently, with λ eff : = μ ( M ) > 0 , the same w satisfies w M λ eff w and is a witness in the sense of thm:oneclock-witness.
Proof. 
Choose β > 0 large enough that A : = M + β I is entrywise nonnegative (possible since M has nonnegative off-diagonals; take β min i M i i ). Then A 0 and
μ ( M ) = μ ( A β I ) = ρ ( A ) β ,
where ρ ( A ) is the spectral radius of A (for nonnegative A, ρ ( A ) is an eigenvalue by Perron–Frobenius; see [11]). Since μ ( M ) < 0 , we have ρ ( A ) < β .
For nonnegative A, the Collatz–Wielandt/Perron–Frobenius theory guarantees existence of a strictly positive left sub-eigenvector: there exists w 0 with
w A ρ ( A ) w ( componentwise ) ,
see [11] or [12]. Subtracting β w gives
w M = w ( A β I ) ( ρ ( A ) β ) w = μ ( M ) w ,
which is (12). Since μ ( M ) = λ eff , we obtain w M λ eff w . □
Corollary 1 
(Sharp one-clock envelope from a Perron–Frobenius witness). Assume the hypotheses of prop:hurwitz-implies-witness and let w 0 satisfy (12). Then any r satisfying (7) obeys
w r ( t ) e μ ( M ) t w r ( 0 ) = e λ eff t w r ( 0 ) , λ eff = μ ( M ) > 0 .
Proof. 
Repeat the proof of thm:oneclock-witness with (12) in place of (9):
d d t w r ( t ) = w r ˙ ( t ) w M r ( t ) μ ( M ) w r ( t ) ,
and apply Grönwall [26]. □

3.3.4. Two-Ledger Closed Form (Explicit Small-Gain Boundary)

The special case m = 2 admits explicit eigenvalues and hence an exact small-gain criterion and closed-form effective rate. This provides a particularly transparent diagnostic boundary and is the main “design lever” used in the toy instantiation.
Proposition 2 
(Two-ledger small-gain criterion and explicit rate). Assume r ( t ) = ( R 1 ( t ) , R 2 ( t ) ) R 0 2 satisfies r ˙ ( t ) M r ( t ) with
M = 2 λ 1 η 1 2 η 2 1 2 λ 2 , λ 1 , λ 2 > 0 , η 1 2 , η 2 1 0 .
Then M is Hurwitz if and only if
η 1 2 η 2 1 < 4 λ 1 λ 2 ,
and in that case λ eff : = μ ( M ) > 0 with the explicit formula
λ eff = ( λ 1 + λ 2 ) ( λ 1 λ 2 ) 2 + η 1 2 η 2 1 .
Proof. 
For a real 2 × 2 matrix, Hurwitz stability is equivalent to tr ( M ) < 0 and det ( M ) > 0 . Here tr ( M ) = 2 ( λ 1 + λ 2 ) < 0 and det ( M ) = 4 λ 1 λ 2 η 1 2 η 2 1 , giving the criterion. The eigenvalues are explicit and yield λ eff = μ ( M ) as stated. □

3.3.5. Monotone Engineering Levers: Funded Diagonals vs. Injections

A distinctive advantage of Metzler comparison certificates is that they translate stability into monotone design parameters. Writing the comparison matrix as
M i i = 2 λ i , λ i 0 , and M i j = η i j 0 ( i j ) ,
makes the two control knobs explicit: funded diagonals (increase λ i , making M i i more negative) and injections (decrease η i j , weakening cross-couplings).
Related “gain composition” ideas are classical in interconnected-system robustness [25,33,34]. In the Metzler setting, the ordering is literal: if M ˜ M entrywise, then μ ( M ˜ ) μ ( M ) and hence λ ˜ eff λ eff , with generic strict improvement away from reducible/degenerate cases [11,12].
Corollary 2 
(Funding vs. injection interventions (witness-preserving monotonicity)). Assume (7). Suppose a witness ( w , λ ) satisfies w M λ w with w 0 and λ > 0 . Let M ˜ be another Metzler matrix with M ˜ M entrywise (i.e. more negative diagonals and/or smaller off-diagonals). Then thesamewitness proves thesamerate:
w M ˜ w M λ w ,
and hence, under r ˙ M ˜ r , one has w r ( t ) e λ t w r ( 0 ) for all t 0 .
Proof. 
Entrywise monotonicity gives w M ˜ w M . Apply thm:oneclock-witness to M ˜ . □

3.3.6. A Checker View: What Must Be Provided and What Is Proved

For certificate checking, the key point is that thm:oneclock-witness requires only an explicit witness: a positive weight vector w 0 and a scalar λ > 0 such that w M λ w . A checker validates this inequality componentwise and then accepts the implication w r ( t ) e λ t w r ( 0 ) , without spectral computations. This is the positive-systems analogue of “proof-carrying” evidence: a short witness certifies a global decay claim [12,14,23].

4. Master Certificate for Learning Under Refinement

We work on a user-declared certification window [ 0 , T ] and in a single declared ruler · W . Training is run on each refinement level K and produces a trajectory x ( K ) ( · ) together with a declared nonnegative scalar ledger R ( K ) ( · ) . The goal is to ensure that refinement does not change the meaning of “small’’ and that stability survives as K .
Two quantitative inputs drive the refinement-limit guarantee.
  • One-ruler Cauchy geometry. Cross-level comparisons are performed in the same instrument norm via the ambient realization (e.g. by comparing z ( K ) = i K x ( K ) to P K z ( K + 1 ) ). If the adjacent discrepancies form a summable tail in K, then the tower is Cauchy in the ruler and determines a unique refinement-limit trajectory on [ 0 , T ] .
  • Tail-robust contraction on [ 0 , T ] . Each level admits an inhomogeneous decay inequality for R ( K ) with a margin bounded below uniformly in K and a pollution budget whose integrated size is summable along the ladder. This yields a common exponential envelope on [ 0 , T ] , up to an explicit tail floor that vanishes with refinement.
Programme lines (O1)–(O4) record these requirements in auditable form (summable ladder budgets, a uniform margin, and a uniform dictionary linking R ( K ) to reported readouts). Under these lines, the Master Certificate theorem (thm:mc-master-learning) provides: a unique refinement-limit learner on [ 0 , T ] , a uniform exponent class for the ledger up to vanishing tails, and transport of the same rate to declared readouts.

4.1. Ladder Geometry and Refinement Limits (Analytic Core)

4.1.1. Ambient One-Ruler Structure and Projective Maps

We formalize the measurement contract by requiring that every refinement level is measured by restriction of a single ambient instrument. This is the analogue of fixing an energy norm across discretizations in stable refinement theory [4,5,6].
Definition 2 
(One ruler (ambient instrument restriction)). Fix a refinement ladder { X ( K ) } K K 0 . We say the ladder admitsone rulerif there exist:
  • an ambient real Hilbert space ( H , · , · H ) ,
  • linear realization maps i K : X ( K ) H for each K,
  • bounded projections P K : H i K ( X ( K ) ) for each K,
  • a bounded, self-adjoint, strictly positive operator W : H H (theinstrument),
  • coarse-graining maps Π K K + 1 : X ( K + 1 ) X ( K ) for each K,
such that for all K K 0 :
(i)
Compatibility (coarse-graining is realized by projection).For all x X ( K + 1 ) ,
i K Π K K + 1 x = P K i K + 1 x .
(ii)
Single instrument (restriction of one ambient ruler).For x X ( K ) define
x W ( K ) 2 : = i K x , W i K x H .
Equivalently, z W 2 : = z , W z H is the ambient ruler and · W ( K ) is its restriction to i K ( X ( K ) ) .
We call ( H , W ) thedeclared rulerand ( i K , P K ) theambient realizationof the ladder.
Assumption A1 
(Instrument contractivity of coarse projections). For each K K 0 , the projection P K is contractive in the instrument norm on the next-level realization:
P K z W z W for all z i K + 1 ( X ( K + 1 ) ) .
Lemma 2 
(Non-expansiveness of coarse-graining in one ruler). Assume def:problem-one-ruler,ass:one-ruler-W-contractive. Then for all K K 0 and x X ( K + 1 ) ,
Π K K + 1 ( x ) W ( K ) x W ( K + 1 ) .
Proof. 
Fix K and x X ( K + 1 ) and set z : = i K + 1 x . By (14), i K ( Π K K + 1 x ) = P K z . Hence
Π K K + 1 x W ( K ) 2 = P K z W 2 z W 2 = x W ( K + 1 ) 2 ,
where we used (16) and (15). Taking square roots gives (17). □

4.1.2. Cross-Level Discrepancy and Telescoping

Given x ( K ) X ( K ) and x ( L ) X ( L ) with L K , we compare them in the same ruler by projecting the fine object down:
d K L x ( K ) , x ( L ) : = i K x ( K ) P K ( i L x ( L ) ) W .
Lemma 3 
(Projective telescoping in one ruler). Assume def:problem-one-ruler,ass:one-ruler-W-contractive. Fix T > 0 and let x ( K ) : [ 0 , T ] X ( K ) . Then for any integers L > K K 0 and any t [ 0 , T ] ,
i K x ( K ) ( t ) P K ( i L x ( L ) ( t ) ) W j = K L 1 i j x ( j ) ( t ) P j ( i j + 1 x ( j + 1 ) ( t ) ) W .
Proof. 
Fix t [ 0 , T ] and write z ( j ) : = i j x ( j ) ( t ) H . Using P K = P K P j for j K and contractivity of P K in · W (ass:one-ruler-W-contractive), we obtain
z ( K ) P K z ( L ) W j = K L 1 P K ( z ( j ) P j z ( j + 1 ) ) W j = K L 1 z ( j ) P j z ( j + 1 ) W ,
which is (19). □

4.1.3. Existence and Uniqueness of a Refinement-Limit Object

Theorem 2 
(Existence and uniqueness of a refinement-limit object). Assume the ladder admits one ruler (def:problem-one-ruler) and satisfies the projective consistency Π K K + 2 = Π K K + 1 Π K + 1 K + 2 . Let { x ( K ) } K K 0 be a compatible tower, i.e. x ( K ) = Π K K + 1 ( x ( K + 1 ) ) for all K. Assume there exists a nonnegative sequence ( a K ) K K 0 with K K 0 a K < such that for all K K 0 and all t [ 0 , T ) ,
( I P K ) i K + 1 x ( K + 1 ) ( t ) W + i K x ( K ) ( t ) P K i K + 1 x ( K + 1 ) ( t ) W a K .
Then for every t [ 0 , T ) the sequence { i K x ( K ) ( t ) } K K 0 is Cauchy in ( H , · W ) and converges to a unique limit x ( ) ( t ) H :
x ( ) ( t ) : = lim K i K x ( K ) ( t ) in ( H , · W ) .
Moreover, for every fixed K K 0 ,
P K x ( ) ( t ) = lim L P K i L x ( L ) ( t ) = i K x ( K ) ( t ) t [ 0 , T ) .
Proof. 
Fix t [ 0 , T ) and let L > K K 0 . Write z ( j ) : = i j x ( j ) ( t ) H . Then
z ( L ) z ( K ) W z ( L ) P K z ( L ) W + P K z ( L ) z ( K ) W .
By telescoping and (20),
z ( L ) P K z ( L ) W + P K z ( L ) z ( K ) W j = K L 1 a j .
Since j K 0 a j < , the right-hand side tends to 0 as K , L , so { i K x ( K ) ( t ) } K is Cauchy and converges in ( H , · W ) . Projective consistency (21) follows by applying P K to the tower identities and passing to the limit. □
Remark 1 
(Where this theorem is used). The Master Certificate theorem (thm:mc-master-learning) uses Theorem 2 as thegeometricengine that turns summable cross-level discrepancies into a refinement-limit object in the declared ruler. Concretely, in the Master Certificate proof one takes a K : = β K + δ K from(O3).

4.2. Programme Lines (O1)–(O4) on [ 0 , T ]

The programme lines (O1)–(O4) are the paper’s checkable obligations on a finite time window. They specify what a certificate must provide and what a verifier must validate: (i) a uniform-in-time contraction envelope for a declared total ledger up to a summable pollution budget, (ii) a K-uniform margin (one exponent class), (iii) a summable Cauchy tower in the declared ruler (no moving goalposts), and (iv) a uniform dictionary linking the declared ledger to reported readouts.

4.2.0.1. Certified time horizon.

Fix T > 0 . All programme lines (O1)–(O4) are required on the window [ 0 , T ] .

4.2.1. Objects on the Window: Trajectories, Ruler, and a Declared Total Ledger

For each refinement level K K 0 , let x ( K ) : [ 0 , T ] X ( K ) be a training trajectory. Assume the ladder admits one ruler in the sense of def:problem-one-ruler, with embeddings i K : X ( K ) H , projections P K : H i K ( X ( K ) ) , and an instrument W inducing z W 2 = z , W z . Write the realized trajectory in the ambient ruler as
z ( K ) ( t ) : = i K x ( K ) ( t ) H .
Fix a nonnegative total geometric ledger  R ( K ) : [ 0 , T ] R 0 (one scalar per level). This is the scalar object whose decay is certified; any additional structure is used only through inequalities.

(O1) Tail-robust contraction envelope (summable pollution budget)

There exist nonnegative functions τ K L 1 ( [ 0 , T ] ) , numbers α K 0 with K = K 0 α K < , and levelwise margins λ K > 0 such that for a.e. t [ 0 , T ] ,
R ˙ ( K ) ( t ) 2 λ K R ( K ) ( t ) + τ K ( t ) , 0 T τ K ( s ) d s α K , K K 0 .

(O2) Uniform margin (one exponent class across refinement)

There exists λ > 0 (independent of K) such that
λ K λ > 0 for all K K 0 .

(O3) Projective Cauchy tower in one ruler (summable cross-level inconsistency)

There exist nonnegative sequences ( β K ) K K 0 and ( δ K ) K K 0 such that
sup t [ 0 , T ] ( I P K ) z ( K + 1 ) ( t ) W unresolved tail at level K + z ( K ) ( t ) P K z ( K + 1 ) ( t ) W projected mismatch β K + δ K , K = K 0 ( β K + δ K ) < .

(O4) Uniform dictionary / observability (reported readouts remain well-conditioned)

Let M be a finite family of nonnegative reported readouts/metrics M ( · ) 0 . Assume there exist constants c M , C M > 0 , independent of K, such that for all t [ 0 , T ] and all K K 0 ,
c M R ( K ) ( t ) M x ( K ) ( t ) C M R ( K ) ( t ) , M M .

4.2.2. Checker Summary (What the Certificate Must Provide)

A certificate on [ 0 , T ] consists of: the declared ruler ( H , W ) and projections ( i K , P K ) , the scalar ledger R ( K ) , the budgets ( α K ) from (O1) and ( β K , δ K ) from (O3), the uniform margin λ from (O2), and the dictionary constants ( c M , C M ) from (O4). Soundness (proved next) upgrades these inequalities to a unique refinement-limit learner and rate inheritance on [ 0 , T ] .

4.3. Master Certificate Theorem on [ 0 , T ]

We now state the soundness implication in its final form. Importantly, the theorem below does not restate the programme lines; it only invokes them by reference.
Theorem 3 
(Master Certificate for Learning Under Refinement on [ 0 , T ] ). Assume the ladder admits one ruler (Definition 2) and let x ( K ) : [ 0 , T ] X ( K ) be trajectories with ambient realizations z ( K ) ( t ) : = i K x ( K ) ( t ) H . Let R ( K ) : [ 0 , T ] R 0 be a declared total geometric ledger. Assume the programme lines(O1)(O4)from subsec:mc-olines-learning hold on [ 0 , T ] . Then:
(i)
Tail-robust levelwise envelope.With λ as in(O2)and α K as in(O1), for all K K 0 and all t [ 0 , T ] ,
R ( K ) ( t ) e 2 λ t R ( K ) ( 0 ) + α K .
(ii)
Existence and uniqueness of a refinement-limit trajectory.Define the geometric tower budget
a K : = β K + δ K ,
where ( β K , δ K ) are the sequences from(O3). Then K K 0 a K < , and there exists a unique trajectory x ( ) : [ 0 , T ] H such that x ( ) ( t ) = lim K z ( K ) ( t ) in ( H , · W ) , and
sup t [ 0 , T ] z ( K ) ( t ) P K x ( ) ( t ) W j = K a j K K 0 .
(iii)
Readout transport (rate inheritance at each level).For every M M (from(O4)), for all K K 0 and all t [ 0 , T ] ,
M x ( K ) ( t ) C M e 2 λ t R ( K ) ( 0 ) + α K .
Proof. 
Part (i) is the inhomogeneous Grönwall bound from (O1)–(O2). Part (ii) follows by applying Theorem 2 with a K : = β K + δ K . Part (iii) is (O4) combined with (26). □

5. Instantiations

This section explains how the abstract certificate obligations can be realized in concrete training pipelines. The goal is not to advocate a particular architecture, but to show (i) how to choose a ruler, a ladder, and a ledger vector; (ii) how to derive a Metzler comparison system with explicit margins and injections; and (iii) how to audit the refinement programme lines (O1)–(O4) from logs, bounds, or subproofs. We present two representative instantiations: a fully checkable toy ladder where all constants can be computed in closed form, and a practical “width/refinement” protocol that describes what evidence a producer must ship so that a consumer can validate the Master Certificate without re-running training.
At a high level, every instantiation follows the same recipe. First fix a single ambient ruler (a norm/energy that is independent of refinement) and specify the projections that implement “no moving goalposts.” Next define the nonnegative ledgers and establish an in-level comparison inequality r ˙ ( K ) M ( K ) r ( K ) + d ( K ) with M ( K ) Metzler, extracting diagonal funding margins and off-diagonal injections via standard bounds (e.g. Lipschitz/operator-norm and Young inequalities). Finally, verify the cross-level conditions: tail terms are summable ((O1)), the exponent class is uniform ((O2)), projected states are Cauchy across levels ((O3)), and reported readouts remain uniformly comparable to the ruler ((O4)). The output is a proof-carrying artifact consisting of the matrices, constants, and tail budgets that a checker can validate, thereby upgrading verification to a refinement-limit guarantee with a certified one-clock rate.

5.1. Setup: an 2 Ladder with Diagonal Constraint Operator

Let 2 denote the real Hilbert space of square-summable sequences with inner product x , y = i 1 x i y i and norm x 2 2 = i 1 x i 2 . Define the ambient state space
X ( ) : = 2 × 2 , z = ( x , λ ) X ( ) .
Fix p > 0 and define the bounded diagonal operator A : 2 2 by
( A x ) i : = a i x i , a i : = i p .
Then A op = sup i | a i | = 1 .
For each K 1 define the refinement spaces
X ( K ) : = R K × R K 2 × 2
embedded into X ( ) by zero padding. Let P K be orthogonal projection (truncate to first K coordinates), and set A K : = P K A P K = diag ( a 1 , , a K ) .

5.2. Dynamics: Damped Primal–Dual Flow

Fix damping parameters m > 0 and α > 0 . For each K, consider the linear flow on X ( K ) :
x ˙ ( K ) ( t ) = m x ( K ) ( t ) A K λ ( K ) ( t ) , λ ˙ ( K ) ( t ) = A K x ( K ) ( t ) α λ ( K ) ( t ) , t 0 .

5.3. Ledgers and Readouts

Define the two nonnegative ledgers and the ledger vector
R 1 ( K ) ( t ) : = x ( K ) ( t ) 2 2 , R 2 ( K ) ( t ) : = λ ( K ) ( t ) 2 2 , r ( K ) ( t ) : = ( R 1 ( K ) ( t ) , R 2 ( K ) ( t ) ) .
As readout metrics, take M = { M 1 , M 2 } with M 1 ( z ) = x 2 and M 2 ( z ) = λ 2 .

5.4. Metzler Comparison and a Checker-Friendly Hurwitz Witness

Lemma 4 
(Metzler dominance (uniform in K)). Assume p > 0 so that A op = 1 . For every K and all t 0 , the ledger vector satisfies
r ˙ ( K ) ( t ) M Toy r ( K ) ( t ) ,
where the same 2 × 2 Metzler matrix works for all K:
M Toy : = ( 2 m 1 ) 1 1 ( 2 α 1 ) .
Proof. 
Differentiate the squared norms using (30):
R ˙ 1 ( K ) ( t ) = 2 x ( K ) ( t ) , x ˙ ( K ) ( t ) = 2 m x ( K ) ( t ) 2 2 2 x ( K ) ( t ) , A K λ ( K ) ( t ) .
Using A K op A op = 1 and Young’s inequality 2 a b a 2 + b 2 gives
2 x ( K ) ( t ) , A K λ ( K ) ( t ) 2 x ( K ) ( t ) 2 A K op λ ( K ) ( t ) 2 2 x ( K ) ( t ) 2 λ ( K ) ( t ) 2 x ( K ) ( t ) 2 2 + λ ( K ) ( t ) 2 2 .
Hence R ˙ 1 ( K ) ( t ) ( 2 m 1 ) R 1 ( K ) ( t ) + R 2 ( K ) ( t ) .
Similarly,
R ˙ 2 ( K ) ( t ) = 2 λ ( K ) ( t ) , λ ˙ ( K ) ( t ) = 2 α λ ( K ) ( t ) 2 2 + 2 λ ( K ) ( t ) , A K x ( K ) ( t ) ( 2 α 1 ) R 2 ( K ) ( t ) + R 1 ( K ) ( t ) ,
where we again used A K op 1 and 2 a b a 2 + b 2 on the cross term. Collecting the two inequalities yields r ˙ ( K ) ( t ) M Toy r ( K ) ( t ) . □
Proposition 3 
(Hurwitz witness and certified one-clock rate). Assume m > 1 and α > 1 . Let w = ( 1 , 1 ) 0 and set
λ cert : = 2 min { m 1 , α 1 } > 0 .
Then w M Toy λ cert w componentwise, and hence
R tot ( K ) ( t ) : = w r ( K ) ( t ) = R 1 ( K ) ( t ) + R 2 ( K ) ( t ) e λ cert t R tot ( K ) ( 0 ) t 0 , K .
Proof. 
Compute w M Toy = 2 ( m 1 ) , 2 ( α 1 ) 2 min { m 1 , α 1 } ( 1 , 1 ) . Apply Theorem 1 to Lemma 4.

5.5. Programme Lines (O1)–(O4) (Vanishing Tails)

Lemma 5 
(Exact projective consistency). Let L K . If z ( L ) ( t ) solves (30) at level L, then P K z ( L ) ( t ) solves the level-K system with initial data P K z ( L ) ( 0 ) . In particular, for consistent initial data z ( K ) ( 0 ) = P K z ( K + 1 ) ( 0 ) we have z ( K ) ( t ) = P K z ( K + 1 ) ( t ) for all t 0 .
Proof. 
For diagonal A, the first K coordinates of (30) depend only on the first K coordinates, and A K + 1 restricted to the first K indices equals A K . Uniqueness of ODE solutions yields the identity. □
Proposition 4 
(Toy ladder satisfies (O1)–(O4)). Assume m > 1 and α > 1 . Then the toy ladder satisfies the Master Certificate programme lines on [ 0 , ) :
(i)
(O1)holds with τ K 0 and α K 0 ;
(ii)
(O2)holds with uniform margin λ = λ cert from (33);
(iii)
(O3)holds with β K 0 and δ K 0 by Lemma 5;
(iv)
(O4)holds with c M 1 = C M 1 = c M 2 = C M 2 = 1 for M 1 ( z ) = x 2 and M 2 ( z ) = λ 2 .
Proof. 
Items (i), (ii) follow from Lemma 4 and Proposition 3. Item (iii) is Lemma 5. Item (iv) holds because the readouts equal the ledgers. □

5.6. Consequence: Refinement-Limit Existence and Inherited Clock

Corollary 3 
(Toy model: unique refinement-limit trajectory and uniform rate inheritance). Assume m > 1 and α > 1 and consistent initial data z ( K ) ( 0 ) = P K z ( ) ( 0 ) for some z ( ) ( 0 ) 2 × 2 . Then there exists a unique refinement-limit trajectory z ( ) ( t ) 2 × 2 with z ( K ) ( t ) = P K z ( ) ( t ) for all K and t 0 , and the total ledger satisfies
R tot ( K ) ( t ) e λ cert t R tot ( K ) ( 0 ) , λ cert = 2 min { m 1 , α 1 } .
Proof. 
Apply Theorem 2 using Proposition 4. □

5.7. Numerical Sanity Check (Uniform in K)

Fix m = 2.0 , α = 2.5 , p = 1.0 , T = 5.0 , and a forward Euler time step Δ t = 10 3 , so that λ cert = 2 . Define the certificate ratio
Rat ( K ) ( t ) : = R tot ( K ) ( t ) e λ cert t R tot ( K ) ( 0 ) .
Table 1 reports R tot ( K ) ( 0 ) , R tot ( K ) ( T ) , max t [ 0 , T ] Rat ( K ) ( t ) , and Rat ( K ) ( T ) for K { 10 , 20 , 40 , 80 } .
Remark 2. 
The initial values increase mildly with K because z ( K ) ( 0 ) is obtained by truncating a fixed ambient 2 initial condition. The maximum ratio equals 1 (attained at t = 0 ), while Rat ( K ) ( T ) 10 5 , indicating the bound is conservative on this run.

5.8. Neural Instantiation (Protocol): A Width Ladder with Auditable Programme Lines

5.8.1. Setup: Width Ladder and a Declared Projection

Let K index width (or another scalable capacity parameter). For each K, let X ( K ) encode model parameters together with any augmented optimizer state (momenta, running averages, constraint multipliers). Fix an explicit coarse-graining map Π K K + 1 : X ( K + 1 ) X ( K ) .

5.8.2. Practical Ledgers (Loggable During Training)

Choose a finite nonnegative ledger vector r ( K ) ( t ) R 0 m that is loggable during training, e.g.
r ( K ) ( t ) = R L ( K ) ( t ) , R safe ( K ) ( t ) , R rob ( K ) ( t ) , R comp ( K ) ( t ) .

5.8.3. Estimating Metzler Coefficients from Traces

In discrete time, a checker-friendly target is the inequality
r n + 1 ( K ) ( I + h M ( K ) ) r n ( K ) + ζ n ( K ) , ζ n ( K ) 0 ,
with M ( K ) Metzler. A Hurwitz witness can be supplied by checking w M ( K ) λ w (Proposition 1) or, in the two-ledger case, the small-gain product inequality (Proposition 2).

5.8.4. Empirical Checks for (O1)–(O4)

  • (O2) uniform margin: fit envelope rates for R tot ( K ) and test uniformity in K.
  • (O3) projective Cauchy: measure x ( K ) Π K K + 1 ( x ( K + 1 ) ) W ( K ) across checkpoints.
  • (O1) tail summability: quantify unmodeled remainder budgets α K and test K α K < .
  • (O4) dictionary conditioning: bound conditioning constants c M , C M uniformly in K.

5.8.5. Interpretation

The protocol certifies training-time contraction and refinement stability in a declared ruler when the programme lines hold. Generalization, domain shift, and stochastic optimization effects are separate modules unless explicitly ledgerized.

6. Concluding Discussion and Outlook

The paper’s contribution is best read as a single verifier-facing implication. A producer supplies a finite certificate for a declared ladder and a declared ruler: (i) in-level one-clock evidence for the nonnegative ledger vector via a Metzler comparison system and a copositive witness, and (ii) cross-level programme-line budgets on the window [ 0 , T ] ensuring summable refinement tails, a K-uniform decay margin, summable projective inconsistencies, and a uniform dictionary linking the declared geometric ledger to reported readouts. Soundness means the checker does not need to inspect the optimizer internals or re-run training: validating the ruler/transfer maps, the witness inequalities, and the ladder budgets is sufficient to conclude that refinement does not introduce drift, that the ladder admits a unique refinement-limit learner on [ 0 , T ] , and that the same exponent class is inherited up to the explicit vanishing tails. In particular, any readout covered by the dictionary line inherits the same rate class, so the certified clock is not confined to an internal ledger but transports to declared external metrics.
Where certificates fail is correspondingly concentrated. In applications one typically sees (a) the uniform margin degrade with refinement, (b) dictionary constants blow up (conditioning changes the meaning of the reported metric), or (c) the cross-level tower budgets cease to be summable. Each failure has a direct diagnostic: fit levelwise envelopes and monitor the inferred rate versus K; track conditioning surrogates that upper bound C M / c M ; and measure projected mismatches sup t [ 0 , T ] z ( K ) ( t ) P K z ( K + 1 ) ( t ) W along checkpoints. When the bottleneck is the comparison matrix, the design lever is monotone: fund diagonals (increase self-decay margins) and/or reduce injections (weaken couplings), with an exact boundary in the two-ledger small-gain case.
The claim boundary is deliberate. The certificate is a stability-and-limit guarantee in a declared ruler on a declared window; it does not by itself imply statistical generalization or out-of-distribution robustness, and it does not control stochastic-gradient noise unless such effects are ledgerized or bounded by a separate probabilistic argument. Those properties remain composable add-ons: one may import generalization modules (e.g. stability or PAC-Bayes) or robustness modules (e.g. DRO/shift models) and then connect them to the certified ledgers through additional dictionary links.
Several extensions preserve the same checker philosophy. One can formulate stochastic certificates where the programme lines hold with high probability and the clock becomes probabilistic; allow switched or time-varying Metzler envelopes with common copositive weights; permit slowly varying rulers W ( t ) under a separate “ruler-drift” budget; and replace scalar dictionary bounds by structured operator inequalities for richer readout transport. Finally, a useful empirical direction is a notion of certificate coverage: how much of an observed training trace is explained by the declared comparison model versus assigned to tail pollution, tracked as a function of refinement.

Notation

Symbol Type / Domain Meaning / Assumptions
Symbol Type / Domain Meaning / Assumptions
Ambient ruler and refinement ladder
H Hilbert space Ambient realization space used to enforce a single ruler (Definition 2)
W operator on H Bounded, self-adjoint, strictly positive instrument operator defining z W 2 = z , W z H
· , · inner product Ambient Hilbert inner product on H
z W norm Instrument norm: z W : = z , W z on H
X ( K ) set / space State/parameter space at refinement level K (width, resolution, basis size, etc.)
K 0 integer Minimal refinement index considered; ladder runs over K K 0
i K map X ( K ) H Realization/embedding of level-K states into the ambient ruler space
P K projection on H Orthogonal projection onto i K ( X ( K ) ) H
Π K K + 1 map X ( K + 1 ) X ( K ) Coarse-graining / restriction map between adjacent refinement levels
Π K L map X ( L ) X ( K ) Multi-step projection Π K L : = Π K K + 1 Π L 1 L (when well-defined)
Training dynamics (discrete and continuous)
x ( K ) ( t ) curve in X ( K ) Continuous-time training trajectory at refinement level K
x n ( K ) sequence in X ( K ) Discrete-time iterates at refinement level K
Φ ( K ) map on X ( K ) Discrete update map: x n + 1 ( K ) = Φ ( K ) ( x n ( K ) )
V ( K ) vector field Continuous-time flow: x ˙ ( K ) ( t ) = V ( K ) ( x ( K ) ( t ) )
t R 0 Continuous time (or rescaled iteration-time)
n N Discrete iteration index
Ledgers and scalarizations
R i ( t ) R 0 Nonnegative ledger (risk, constraint violation, robustness proxy, etc.)
r ( t ) R 0 m Ledger vector r ( t ) = ( R 1 ( t ) , , R m ( t ) )
r ( K ) ( t ) R 0 m Level-K ledger vector along training at refinement K
R tot ( t ) R 0 Declared total ledger (scalarization) used for the certificate
w R > 0 m Positive weight vector defining R tot ( t ) = w r ( t )
W led SPD matrix Optional quadratic ledger ruler: R tot ( t ) = r ( t ) W led 2 = r ( t ) W led r ( t )
M finite set Family of reported readout metrics M ( · ) 0
c M , C M scalars Dictionary/observability constants in c M R ( K ) M ( x ( K ) ) C M R ( K ) (line (O4))
Metzler comparison system and one-clock quantities
M matrix Metzler comparison matrix: M i j 0 for i j (Definition 1)
M ( K ) matrix Level-K comparison matrix in r ˙ ( K ) M ( K ) r ( K )
λ i scalar Funded self-decay margin for ledger i when M i i = 2 λ i
η i j scalar Injection strength from ledger j to ledger i when M i j = η i j 0
e t M matrix Matrix exponential (positive for Metzler M)
spec ( M ) set Spectrum (eigenvalues) of M
μ ( M ) scalar Spectral abscissa: μ ( M ) : = max { z : z spec ( M ) }
Hurwitz property M Hurwitz μ ( M ) < 0
λ eff scalar Effective one-clock rate: λ eff : = μ ( M ) > 0 (when M is Hurwitz)
γ 1 2 scalar Gain γ 1 2 : = η 1 2 / ( 2 λ 1 ) in the 2 × 2 form
Refinement programme-line budgets (Master Certificate)
T R > 0 Declared certification horizon: programme lines (O1)–(O4) are required on [ 0 , T ]
τ K ( t ) R 0 Tail disturbance in (O1) on [ 0 , T ] : R ˙ ( K ) 2 λ K R ( K ) + τ K
α K R 0 Integrated pollution budget in (O1): 0 T τ K ( s ) d s α K with K α K <
β K R 0 Unresolved-tail budget in (O3): sup t [ 0 , T ] ( I P K ) z ( K + 1 ) ( t ) W β K with K β K <
δ K R 0 Projective mismatch budget in (O3): sup t [ 0 , T ] z ( K ) ( t ) P K z ( K + 1 ) ( t ) W δ K with K δ K <

Definitions

Entry Definition / Formula Role in the paper
Entry Definition / Formula Role in the paper
Ladder geometry (“one ruler”)
Projective consistency Π K K + 2 = Π K K + 1 Π K + 1 K + 2 Encodes “same task” across refinement levels
One ruler (ambient instrument restriction) There exist ( H , W ) and realizations i K , P K such that i K ( Π K K + 1 x ) = P K ( i K + 1 x ) and x W ( K ) 2 = i K x , W i K x H Forbids moving goalposts (Definition 2)
Instrument norm and ruler square root z W : = z , W z H = W 1 / 2 z H Fixes the single measurement convention across all levels
Instrument contractivity (projection stability) P K z W z W on i K + 1 ( X ( K + 1 ) ) Ensures coarse projections do not inflate the ruler norm (Assumption A1)
Non-expansiveness across levels Π K K + 1 x W ( K ) x W ( K + 1 ) Basic stability of coarse-graining (Lemma 2)
Cross-level discrepancy (ambient) d K L ( x ( K ) , x ( L ) ) : = i K x ( K ) P K ( i L x ( L ) ) W Canonical “apples-to-apples” cross-level distance
Projective telescoping bound i K x ( K ) P K ( i L x ( L ) ) W j = K L 1 i j x ( j ) P j ( i j + 1 x ( j + 1 ) ) W Turns summable adjacent mismatches into a Cauchy tower (Lemma 3)
Refinement-limit learner (projective limit) x ( ) ( t ) H with P K x ( ) ( t ) = i K x ( K ) ( t ) (compatible Cauchy tower) Existence/uniqueness of a refinement-limit object (Theorem 2)
Ledgers, scalarizations, and readouts
Ledger vector r ( t ) = ( R 1 ( t ) , , R m ( t ) ) R 0 m Collects multiple nonnegative training-time quantities
Declared total ledger R tot ( t ) = w r ( t ) with w 0 Single scalar clock target for contraction
Quadratic ledger ruler (optional) R tot ( t ) = r ( t ) W led 2 = r ( t ) W led r ( t ) , W led 0 Alternative scalarization when a quadratic contract is preferred
Reported readouts / metrics M = { M } , with M ( · ) 0 External observables whose stability is transported
Dictionary (observability) line c M R ( K ) ( t ) M ( x ( K ) ( t ) ) C M R ( K ) ( t ) uniformly in K Transfers the certified clock to readouts ((O4))
Readout Lipschitz transport (optional) | M ( u ) M ( v ) | L M u v W on a declared bounded set Converts ruler convergence to readout convergence (optional strengthening)
Metzler comparison and one-clock reduction
Metzler matrix M R m × m with M i j 0 for i j Positivity structure for ledger couplings
Metzler comparison inequality r ˙ ( t ) M r ( t ) componentwise Auditable coupling model (Definition 1)
Semigroup positivity M Metzler e t M 0 entrywise for all t 0 Enables order-preserving comparison
Comparison principle (Duhamel form) If y solves y ˙ = M y , y ( 0 ) = r ( 0 ) and z : = y r , then z ( t ) = 0 t e ( t s ) M q ( s ) d s , q : = z ˙ M z 0 Correct proof mechanism for r ( t ) e t M r ( 0 ) (Lemma 1)
Funding + injection parametrization M i i = 2 λ i , M i j = η i j 0 ( i j ) Interpretable design levers (fund diagonals, reduce injections)
Spectral abscissa and effective rate μ ( M ) : = max { z : z spec ( M ) } ,    λ eff : = μ ( M ) if μ ( M ) < 0 Defines the certified one-clock exponent class
One-clock certificate (witness form) w 0 , λ > 0 with w M λ w w r ( t ) e λ t w r ( 0 ) Core reduction theorem (Theorem 1)
Hurwitz ⇒ witness (sharp rate) If μ ( M ) < 0 and M is Metzler, w 0 with w M μ ( M ) w Explains existence of witnesses / sharp clock (Proposition 1)
Two-ledger small-gain criterion For 2 × 2 funded+injection M, M Hurwitz η 1 2 η 2 1 < 4 λ 1 λ 2 Exact design boundary (Proposition 2)
Master Certificate programme lines on [ 0 , T ]
Certification horizon T > 0 (declared) All programme lines are audited on [ 0 , T ]
(O1) Tail-robust envelope R ˙ ( K ) 2 λ K R ( K ) + τ K ,    0 T τ K α K ,    K α K < Controls time-direction pollution on [ 0 , T ]
(O2) Uniform margin λ K λ > 0 independent of K One exponent class across refinement
(O3) Geometric tower budget sup t [ 0 , T ] ( I P K ) z ( K + 1 ) ( t ) W + z ( K ) ( t ) P K z ( K + 1 ) ( t ) W β K + δ K ,    K ( β K + δ K ) < Cauchy tower in one ruler (no drift) on [ 0 , T ]
(O4) Uniform dictionary on [ 0 , T ] c M R ( K ) ( t ) M ( x ( K ) ( t ) ) C M R ( K ) ( t ) uniformly in K and t [ 0 , T ] Transfers rates to reported metrics
Master Certificate (learning version) (O1)–(O4) ⇒ refinement-limit trajectory + rate inheritance on [ 0 , T ] Main soundness engine (Theorem 3)
Certificate artifact (what the prover ships)
Certificate artifact Declared ( H , W ) ; maps ( i K , P K , Π K K + 1 ) ; ledger definition R ( K ) ; budgets ( α K , β K , δ K ) ; margin λ ; dictionary constants ( c M , C M ) ; and (when applicable) a Metzler witness ( w , λ ) Concrete verifier-facing interface (the “proof-carrying” object)
Witness-finding LP (optional recipe) Find w R m , λ 0 s.t. w 1 , w M λ w , i w i = 1 Makes the one-clock witness actionable for ML/CS audiences

References

  1. Miettinen, K. Nonlinear Multiobjective Optimization . In International Series in Operations Research & Management Science; Kluwer Academic Publishers: Boston, MA, 1999; Vol. 12. [Google Scholar] [CrossRef]
  2. Sener, O.; Koltun, V. Multi-Task Learning as Multi-Objective Optimization. Proceedings of the Advances in Neural Information Processing Systems 2018, arXiv:csVol. 31, 525–536. [Google Scholar]
  3. Kaplan, J.; McCandlish, S.; Henighan, T.; Brown, T.B.; Chess, B.; Child, R.; Gray, S.; Radford, A.; Wu, J.; Amodei, D. Scaling Laws for Neural Language Models. arXiv 2020, arXiv:2001.08361. [Google Scholar] [CrossRef]
  4. Ciarlet, P.G. The Finite Element Method for Elliptic Problems . In Studies in Mathematics and Its Applications; North-Holland Publishing Company: Amsterdam, 1978; Vol. 4. [Google Scholar]
  5. Hackbusch, W. Multi-Grid Methods and Applications . In Springer Series in Computational Mathematics; Springer-Verlag: Berlin, 1985; Vol. 4. [Google Scholar] [CrossRef]
  6. Brenner, S.C.; Scott, L.R. The Mathematical Theory of Finite Element Methods . In Texts in Applied Mathematics, 3 ed.; Springer: New York, 2008; Vol. 15. [Google Scholar] [CrossRef]
  7. Lax, P.D.; Richtmyer, R.D. Survey of the Stability of Linear Finite Difference Equations. Communications on Pure and Applied Mathematics 1956, 9, 267–293. [Google Scholar] [CrossRef]
  8. Hille, E.; Phillips, R.S. Functional Analysis and Semi-Groups . In American Mathematical Society Colloquium Publications; American Mathematical Society: Providence, RI, 1957; Vol. 31. [Google Scholar]
  9. Trotter, H.F. Approximation of Semi-Groups of Operators. Pacific Journal of Mathematics 1958, 8, 887–919. [Google Scholar] [CrossRef]
  10. Pazy, A. Semigroups of Linear Operators and Applications to Partial Differential Equations . In Applied Mathematical Sciences; Springer: New York, 1983; Vol. 44. [Google Scholar] [CrossRef]
  11. Berman, A.; Plemmons, R.J. Nonnegative Matrices in the Mathematical Sciences; Academic Press, 1979. [Google Scholar] [CrossRef]
  12. Farina, L.; Rinaldi, S. Positive Linear Systems: Theory and Applications . In Pure and Applied Mathematics; Wiley–Interscience: New York, 2000; Vol. 255. [Google Scholar]
  13. Smith, H.L. Monotone Dynamical Systems: An Introduction to the Theory of Competitive and Cooperative Systems . In Mathematical Surveys and Monographs; American Mathematical Society: Providence, RI, 1995; Vol. 41. [Google Scholar] [CrossRef]
  14. Briat, C. Linear Parameter-Varying and Time-Delay Systems: Analysis, Observation, Filtering & Control . In Advances in Delays and Dynamics; Springer: Berlin, Heidelberg, 2015. [Google Scholar] [CrossRef]
  15. Kreyszig, E. Introductory Functional Analysis with Applications . In Wiley Classics Library; Wiley, 1989; Vol. 17. [Google Scholar]
  16. Bousquet, O.; Elisseeff, A. Stability and Generalization. Journal of Machine Learning Research 2002, 2, 499–526. [Google Scholar] [CrossRef]
  17. McAllester, D.A. PAC-Bayesian Model Averaging. In Proceedings of the Proceedings of the Twelfth Annual Conference on Computational Learning Theory (COLT ’99), New York, NY, USA, 1999; pp. 164–170. [Google Scholar] [CrossRef]
  18. Catoni, O. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning . In Institute of Mathematical Statistics Lecture Notes–Monograph Series; Institute of Mathematical Statistics: Beachwood, OH, 2007; Vol. 56. [Google Scholar] [CrossRef]
  19. Duchi, J.C.; Namkoong, H. Learning Models with Uniform Performance via Distributionally Robust Optimization. The Annals of Statistics 2021, 49, 1378–1406. [Google Scholar] [CrossRef]
  20. Katz, G.; Barrett, C.; Dill, D.L.; Julian, K.; Kochenderfer, M.J. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. In Proceedings of the Computer Aided Verification (CAV 2017), Proceedings, Part I; Majumdar, R., Kuncak, V., Eds.; Lecture Notes in Computer Science : Cham, 2017; Vol. 10426, pp. 97–117. [Google Scholar] [CrossRef]
  21. Gehr, T.; Mirman, M.; Drachsler-Cohen, D.; Tsankov, P.; Chaudhuri, S.; Vechev, M. AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation. In Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 2018; pp. 3–18. [Google Scholar] [CrossRef]
  22. Seshia, S.A.; Sadigh, D.; Sastry, S.S. Toward Verified Artificial Intelligence An earlier technical version appeared as. Communications of the ACM 2022, arXiv:1606.0851465, 46–55. [Google Scholar] [CrossRef]
  23. Necula, G.C. Proof-carrying code. In Proceedings of the Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’97), New York, NY, USA, 1997; pp. 106–119. [Google Scholar] [CrossRef]
  24. Coddington, E.A.; Levinson, N. Theory of Ordinary Differential Equations; McGraw–Hill: New York, 1955. [Google Scholar]
  25. Khalil, H.K. Nonlinear Systems, 3 ed.; Prentice Hall: Upper Saddle River, NJ, 2002. [Google Scholar]
  26. Grönwall, T.H. Note on the Derivatives with Respect to a Parameter of the Solutions of a System of Differential Equations. Annals of Mathematics (2) 1919, 20, 292–296. [Google Scholar] [CrossRef]
  27. Robbins, H.; Monro, S. A Stochastic Approximation Method. The Annals of Mathematical Statistics 1951, 22, 400–407. [Google Scholar] [CrossRef]
  28. Benaïm, M. Dynamics of Stochastic Approximation Algorithms. In Séminaire de Probabilités XXXIII; Azéma, J., Émery, M., Ledoux, M., Yor, M., Eds.; Springer: Berlin, Heidelberg; Lecture Notes in Mathematics , 1999; Vol. 1709, pp. 1–68. [Google Scholar] [CrossRef]
  29. Kushner, H.J.; Yin, G.G. Stochastic Approximation and Recursive Algorithms and Applications . In Stochastic Modelling and Applied Probability, 2 ed.; Springer: New York, 2003; Vol. 35. [Google Scholar] [CrossRef]
  30. Bramble, J.H.; Pasciak, J.E.; Xu, J. Parallel Multilevel Preconditioners. Mathematics of Computation 1990, 55, 1–22. [Google Scholar] [CrossRef]
  31. Thomée, V. Galerkin Finite Element Methods for Parabolic Problems . In Springer Series in Computational Mathematics, 2 ed.; Springer: Berlin, Heidelberg, 2006. [Google Scholar] [CrossRef]
  32. Emmrich, E. Discrete Versions of Gronwall’s Lemma and Their Application to the Numerical Analysis of Parabolic Problems. In Preprint Reihe Mathematik 637; Technische Universität Berlin, 1999. [Google Scholar]
  33. Desoer, C.A.; Vidyasagar, M. Feedback Systems: Input-Output Properties; Academic Press: New York, 1975. [Google Scholar]
  34. Jiang, Z.P.; Teel, A.R.; Praly, L. Small-Gain Theorem for ISS Systems and Applications. IEEE Transactions on Automatic Control 1994, 39, 1609–1619. [Google Scholar] [CrossRef]
Table 1. Numerical sanity check of the toy certificate R tot ( K ) ( t ) e λ cert t R tot ( K ) ( 0 ) with λ cert = 2 for m = 2.0 , α = 2.5 , p = 1.0 , T = 5.0 , Δ t = 10 3 .
Table 1. Numerical sanity check of the toy certificate R tot ( K ) ( t ) e λ cert t R tot ( K ) ( 0 ) with λ cert = 2 for m = 2.0 , α = 2.5 , p = 1.0 , T = 5.0 , Δ t = 10 3 .
K R tot ( K ) ( 0 ) R tot ( K ) ( T ) max t [ 0 , T ] Rat ( K ) ( t ) Rat ( K ) ( T )
10 2.125857 × 10 1 1.091714 × 10 10 1.000000 1.131149 × 10 5
20 2.370105 × 10 1 1.247584 × 10 10 1.000000 1.159436 × 10 5
40 2.459643 × 10 1 1.351165 × 10 10 1.000000 1.209988 × 10 5
80 2.486140 × 10 1 1.378366 × 10 10 1.000000 1.221192 × 10 5
Numerical sanity check of the toy certificate R tot ( K ) ( t ) e λ cert t R tot ( K ) ( 0 ) with λ cert = 2 for m = 2.0 , α = 2.5 , p = 1.0 , T = 5.0 , Δ t = 10 3 .
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated