Spectral and Analytic Structure of the Nyman–Beurling–Báez–Duarte Approximation

Deep Bhattacharjee; Pallab Nandi; Onwuka Frederick; Priyanka Samal

doi:10.20944/preprints202506.0772.v2

Submitted:

08 March 2026

Posted:

10 March 2026

You are already at the latest version

Abstract

We study the structural and analytic aspects of the B\'{a}ez--Duarte approximation problem within the Nyman--Beurling framework, which furnishes a functional-analytic equivalent of the Riemann Hypothesis (RH). Our work studies structural features of this framework; it does not prove RH. First (Rank-one collapse and Hilbert-space theory). The integer-dilate Gram matrix $ G_M=\frac{1}{3}\textbf{dd}^\top $ is rank-one, giving $ span\{r_1,\ldots,r_M\}=span\{x\} $ and fixed distance $ d_M=\frac12 $ for all M. We give the explicit Moore–Penrose pseudoinverse $ G_M^+ $ and the one-dimensional collapse of the optimisation problem. Second (Exact Gram matrix formula). We prove a fully rigorous closed-form expression for the inner products of the correct sawtooth basis: using the Bernoulli polynomial representation of the fractional part, $ \int_0^1\{jx\}\{kx\}\,dx = \frac{\gcd(j,k)^2}{jk}\Bigl(\frac{1}{12} + \frac{B_2(0)}{2}\Bigr) + \frac{1}{4}\Bigl(1-\frac{\gcd(j,k)}{j}\Bigr)\Bigl(1-\frac{\gcd(j,k)}{k}\Bigr) + E_{jk}, $ where $ E_{jk} $ is an explicit correction from higher Bernoulli terms, expressed via the Hurwitz zeta function. The arithmetic role of $ \gcd(j,k) $ is made precise. Third (Hardy-space bounds). Using the $ H^2(\Pi^+) $ reproducing kernel and the Mellin isometry, we prove: (a) the distance identity $ d_M^2=\|1/s-F_M^*(s)\zeta(s)/s\|_{H^2}^2 $; (b) an explicit lower bound $ d_M^2\ge\sum_\rho\frac{|F_M^*(\rho)|^2|\zeta'(\rho)|^{-2}}{|\rho|^2}\cdot c(\rho) $ from the zeros of $ \zeta $; and (c) a pointwise Hardy-space inequality relating $ d_M $ to the supremum of $ |1-F_M^*({\tfrac12}+it)\zeta({\tfrac12}+it)/({\tfrac12}+it)| $ on the critical line. Fourth (Kalman filtration stability). Under the observation model $ z_M=d_M+\varepsilon_M $ with $\varepsilon_M$ sub-Gaussian of variance $ \sigma^2 $, the Kalman estimator satisfies a rigorous oracle inequality $ \mathbf{E}|d_M^{KF}-d_M|^2\le \sigma^2 K_\infty(2-K_\infty)^{-1} $, with an almost-sure bound $ |d_M^{KF}-d_M|\le CM^{-\alpha} $ whenever $ |d_M-d|=O(M^{-\alpha}) $. Fifth (Möbius sparsity). We prove $ |c_k^*|=O(k^{-1+\varepsilon}) $ via Dirichlet series techniques and show that the coefficient sequence is bounded in $ \ell^2 $, with connections to the Möbius function made precise through the optimality conditions. Sixth (Structural Mellin theorem). We identify a hidden structural observation in the Mellin identity: the Gram kernel $ K_G(s,w)=\zeta(s+\bar w)/(s+\bar w) $ appears as the reproducing kernel of the Hardy space $ H^2(\Pi^+) $ restricted to the approximation subspace $ W_M $, and its singularity at $ s+\bar w=1 $ encodes the pole of $ \zeta $ while the zeros of $ \zeta $ in the critical strip contribute exactly as spectral obstructions. Disclaimer. This paper does not prove RH. All results are structural, computational, and analytic observations within the equivalent framework.

Keywords:

Nyman–Beurling criterion

;

Báez–Duarte approximation

;

Riemann zeta function

;

Kalman filtration

;

Gram matrix spectral theory

;

Mellin transform

;

Hardy spaces

;

Dirichlet polynomial approximation

;

Hilbert–Pólya conjecture

;

compact operators

;

analytic number theory

;

rank-one collapse

;

SVD stability

;

Li criterion

Subject:

Computer Science and Mathematics - Algebra and Number Theory

MSC: 11M26 (primary); 47B32; 47G10; 42A10; 65F20; 65F35; 93E11 (secondary)

1. Introduction

1.1. The Riemann Hypothesis and Its Context

The Riemann Hypothesis (RH) asserts that every nontrivial zero

ρ

of the Riemann zeta function

ζ (s) = \sum_{n = 1}^{\infty} n^{- s}, Re (s) > 1,

satisfies

Re (ρ) = \frac{1}{2}

. Since Riemann’s memoir of 1859, the hypothesis has shaped the entire landscape of analytic number theory. Listed as one of the seven Millennium Prize Problems by the Clay Mathematics Institute [2], it remains unresolved.

Its importance stems not only from its elegance, but from its deep connections to the distribution of primes. Under RH, the prime-counting error satisfies

| ψ (x) - x | = O (\sqrt{x} {log}^{2} x)

, where

ψ (x) = \sum_{p^{k} \leq x} log p

is the Chebyshev function. Any zero off the critical line would produce larger oscillations in the distribution of primes [1,12].

1.2. Equivalent Reformulations of RH

A rich ecosystem of equivalent reformulations has grown around RH, each illuminating a different facet of the hypothesis. Among the most celebrated are:

(i): Nyman–Beurling criterion [6,7]: $1 \in {\bar{span {f_{θ} : θ \in (0, 1)}}}^{L^{2}}$ if and only if RH holds.
(ii): Báez–Duarte strengthening [8]: it suffices to use the countable family ${r_{k} (x) = {x / k} : k \in N}$ .
(iii): Li’s criterion [11]: RH is equivalent to the positivity of the coefficients $λ_{n} = \sum_{ρ} [1 - {(1 - 1 / ρ)}^{n}]$ for all $n \geq 1$ .
(iv): Robin’s criterion [16]: RH is equivalent to $σ (n) < e^{γ} n log log n$ for all $n \geq 5041$ , where $σ (n)$ is the sum-of-divisors function and $γ$ is the Euler–Mascheroni constant.
(v): Weil’s explicit formula criterion: RH is equivalent to the positivity of certain explicit sums over primes and zeros.

The present paper focuses on the Nyman–Beurling–Báez–Duarte framework, which has the unique advantage of being both analytically rigorous and computationally tractable.

1.3. The Nyman–Beurling Framework: Background and Prior Work

The Nyman–Beurling criterion originated in Nyman’s 1950 Uppsala thesis [6], where he proved that

ζ

has no zeros in the strip

\frac{1}{2} < Re (s) < 1

if and only if certain functions related to the fractional part of

1 / x

are dense in

L^{2} (0, 1)

. Beurling [7] reformulated and extended this, connecting it more cleanly to RH.

The approach gained momentum through Burnol’s [10] explicit product formula for the projection norm (Theorem 4) and Báez–Duarte’s [8] reduction to a countable basis

{r_{k} (x) = {x / k}}

. Báez–Duarte [9] and subsequent authors attempted numerical evaluations of

d_{M}

for moderate M.

Prior computational investigations. Attempts to numerically study

d_{M}

have used a variety of strategies. Some implementations directly compute the Gram matrix

{(G_{M})}_{j k} = 〈r_{j}, r_{k}〉

and solve the normal equations. Others discretise the inner products at finitely many quadrature nodes. A common source of error in these investigations—addressed thoroughly in this paper—is the failure to distinguish between the functions

r_{k} (x) = {x / k}

(which reduce to

x / k

on

(0, 1)

and generate only the one-dimensional subspace

span {x}

) and the genuine sawtooth basis

{\tilde{r}}_{k} (x) = {k x}

(which has k teeth on

(0, 1)

and spans a genuinely k-dimensional function space on the subintervals of scale

1 / k

). This collapse is the central structural observation of Section 3 and Section 4.

The Mellin transform provides the bridge between the functional-analytic statement and the zero-set of

ζ

. Specifically, the Nyman–Beurling condition

1 \in {\bar{span {f_{θ}}}}^{L^{2}}

is equivalent, via the isometry

M : L^{2} (0, 1) \to H^{2} (Π^{+})

and the identity

\hat{f_{θ}} (s) = θ^{s} ζ (s) / s

, to the density of

{θ^{s} ζ (s) / s : θ \in (0, 1)}

in

H^{2} (Π^{+})

, which in turn is equivalent to RH. This Hardy-space perspective is developed systematically in Section 9.

1.4. The Central Problem: Structural Degeneracy

A fundamental difficulty in the numerical study of the Báez–Duarte problem—one that has not been adequately addressed in the existing literature—concerns the behaviour of the basis functions on the unit interval

(0, 1)

.

For

x \in (0, 1)

and integer

k \geq 1

:

r_{k} (x) = {x / k} = x / k

, since

0 < x / k < 1

. Therefore all functions

r_{1}, r_{2}, \dots

are scalar multiples of x, and

span {r_{1}, \dots, r_{M}} = span {x} for all M \geq 1 .

The Gram matrix

G_{M}

is rank-one for every

M \geq 2

. This rank-one collapse is a fundamental structural obstruction:

1 \notin span {x}

, so

d_{M} = \frac{1}{2}

for all M, and any numerical scheme claiming

d_{M} \to 0

using these (degenerate) basis functions is erroneous.

This observation does not invalidate the Báez–Duarte theorem, which is a theorem about closures in

L^{2}

. Rather, it reveals that a naively discretised implementation—where

r_{k}

is evaluated at points

x \in (0, 1)

and treated as though these evaluations capture the genuine sawtooth structure—misses the essential nonlinearity. Correct numerical implementations must use the sawtooth functions

{\tilde{r}}_{k} (x) = {k x}

(or equivalently,

{x / k}

for

x \in (0, k)

), which are genuinely nonlinear and linearly independent.

1.5. Contributions and Outline

This paper makes the following contributions:

(i): Rank-one collapse and Hilbert-space theory (Section 3 and Section 4): We prove $G_{M} = \frac{1}{3} d d^{⊤}$ is rank-one, derive the Moore–Penrose pseudoinverse $G_{M}^{+} = \frac{3}{{∥ d ∥}^{4}} d d^{⊤}$ , establish $d_{M} = \frac{1}{2}$ for all M, and explain the collapse of the approximation to a one-dimensional scalar problem.
(ii): Structural degeneracy analysis for numerical implementations (Section 3, Section 4 and Section 5): We provide a detailed explanation of why incorrect basis functions cause structurally misleading numerical results, and characterise the correct sawtooth basis ${\tilde{r}}_{k} (x) = {k x}$ needed for genuine approximation.
(iii): Gram matrix spectral theory and stability (Section 6): We establish $κ ({\tilde{G}}_{M}) = Θ (M^{2})$ via compact operator theory and Weyl’s law, prove an operator-stability theorem $∥ {\tilde{G}}_{M} - G_{M}^{cts} ∥_{2} \leq C / N$ , and derive eigenvalue perturbation bounds.
(iv): Numerically stable algorithms (Section 7): Truncated-SVD and economy-QR algorithms with rigorous backward-stability guarantees.
(v): Kalman Filtration theory (Section 8): Convergence preservation, smoothing-error bounds $O (M^{- α})$ , variance reduction, and a general EWMA theorem.
(vi): Mellin-transform analytic theorems (Section 9): We prove the isometry $d_{M}^{2} = {∥ 1 / s - F_{M} (s) ζ (s) / s ∥}_{H^{2}}^{2}$ , derive lower bounds on $d_{M}$ from values of the Dirichlet polynomial $F_{M}$ at zeros of $ζ$ , and prove a key lemma expressing inner products of sawtooth functions via the Hurwitz zeta function.
(vii): Number-theoretic connections (Section 14, Section 15 and Section 16): Structural parallels with Li’s criterion, compressed sensing, and the Hilbert–Pólya philosophy.

What this paper does not prove. It does not establish

d = 0

(equivalent to RH and still open), does not bound the convergence rate of

d_{M}

unconditionally, and does not prove any component of Conjecture 1. All results are structural, computational, and analytic observations within the equivalent framework.

2. Background and Functional-Analytic Framework

2.1. The Riemann Zeta Function

For

Re (s) > 1

, the Riemann zeta function is defined by the absolutely convergent Dirichlet series

ζ (s) = \sum_{n = 1}^{\infty} n^{- s} = \prod_{p prime} {(1 - p^{- s})}^{- 1} .

By analytic continuation,1

ζ

extends to a meromorphic function on

C

with a simple pole at

s = 1

and trivial zeros at the negative even integers

s = - 2, - 4, \dots

.

The nontrivial zeros

ρ = β + i γ

satisfy

0 < β < 1

(the open critical strip). The Riemann Hypothesis asserts

β = \frac{1}{2}

for all of them.

Key analytic facts:

(i): (Hardy [5]) Infinitely many nontrivial zeros lie on $Re (s) = \frac{1}{2}$ .
(ii): (Odlyzko [3]) The first $1.5 \times 10^{10}$ zeros (ordered by $| Im (ρ) |$ ) all satisfy $Re (ρ) = \frac{1}{2}$ .
(iii): (Zero-free region) There exists a constant $c > 0$ such that $ζ (s) \neq 0$ for $Re (s) \geq 1 - c / log (| Im (s) | + 2)$ .

2.2. Hilbert Spaces and Projection Theory

We recall the key tools from functional analysis.2

Theorem 1

(Orthogonal Projection Theorem). Let

H

be a Hilbert space and

V \subset H

a closed subspace. For any

f \in H

, there exists a unique

\hat{f} \in V

such that

∥f - \hat{f}∥ = d i s t (f, V)

. Moreover,

\hat{f} = P_{V} f

where

P_{V} : H \to V

is the orthogonal projection, characterised by

f - P_{V} f ⊥ V

.

Corollary 1

(Best

L^{2}

approximation). The best

L^{2} (0, 1)

approximation to f from a closed subspace

V = span {v_{1}, \dots, v_{M}}

(with

v_{j}

linearly independent) is

\hat{f} = \sum_{k = 1}^{M} c_{k}^{*} v_{k}

where the coefficients

c^{*} = {(c_{1}^{*}, \dots, c_{M}^{*})}^{⊤}

satisfy the normal equations

G_{M} c^{*} = b

, with

{(G_{M})}_{j k} = 〈v_{j}, v_{k}〉

and

b_{j} = 〈f, v_{j}〉

. The minimum distance is

d i s t {(f, V)}^{2} = {∥f∥}^{2} - b^{⊤} G_{M}^{- 1} b

.

Remark 1.

Corollary 1 requires

G_{M}

to be invertible, i.e., the

v_{j}

to be linearly independent. When

G_{M}

is rank-deficient (as in the integer-dilate case), the formula

{∥f∥}^{2} - b^{⊤} G_{M}^{- 1} b

is undefined, and one must use the Moore–Penrose pseudoinverse instead.

2.3. Hardy Spaces and the Mellin Transform

The connection between the Nyman–Beurling criterion and RH is mediated by the Mellin transform.3

Definition 1

(Hardy space

H^{2} (Π^{+})

). TheHardy space

H^{2} (Π^{+})

consists of analytic functions

F : Π^{+} \to C

(where

Π^{+} = {s \in C : Re (s) > 0}

) such that

{sup}_{σ > 0} \int_{- \infty}^{\infty} {| F (σ + i t) |}^{2} d t < \infty

. With the inner product

{〈F, G〉}_{H^{2}} = \frac{1}{2 π} \int_{- \infty}^{\infty} F (\frac{1}{2} + i t) \bar{G (\frac{1}{2} + i t)} d t

,

H^{2} (Π^{+})

is a Hilbert space.

The key identity connecting the Nyman–Beurling basis to

ζ

is:

\hat{f_{θ}} (s) = \int_{0}^{1} x^{s - 1} \{x / θ\} d x = \frac{θ^{s} ζ (s)}{s}, Re (s) > 1 .

(1)

This shows that the Mellin images of the dilate functions

f_{θ}

are scalar multiples of

ζ (s) / s

. The closure condition in

L^{2} (0, 1)

thus translates to an analytic approximation condition in

H^{2} (Π^{+})

.

2.4. The Nyman–Beurling Criterion

Definition 2

(Fractional-part dilates). For

θ \in (0, 1)

, define

f_{θ} : (0, 1) \to R, f_{θ} (x) = \{x / θ\} = \frac{x}{θ} - ⌊\frac{x}{θ}⌋ .

Note that

f_{θ} \in L^{2} (0, 1)

since

| f_{θ} | \leq 1

a.e. Let

N = {\bar{span {f_{θ} : θ \in (0, 1)}}}^{L^{2}}

.

Theorem 2

(Nyman–Beurling [6,7]).

RH \Leftrightarrow d_{N} : = d i s t (1, N) = 0

.

Remark 2

(Logical status). Theorem 2 is anequivalence. Any computation showing

d_{M} \to 0

confirms consistency with RH but does not constitute an independent proof. An independent proof would require establishing

d = 0

by a direct analytic argument that does not invoke the equivalence.

2.5. The Báez–Duarte Formulation

Theorem 3

(Báez–Duarte [8]). Let

r_{k} (x) = \{x / k\}

for

k \in N

and

B = {\bar{span {r_{k} : k \geq 1}}}^{L^{2} (0, 1)}

. Then

RH \Leftrightarrow 1 \in B

.

The Báez–Duarte theorem is a strengthening of the Nyman–Beurling result because it replaces an uncountable parameter set

θ \in (0, 1)

with the countable set

{1 / k : k \in N}

, making the problem amenable to computation.4

The finite-dimensional approximation problem:

d_{M} : = d i s t (1, V_{M}), V_{M} = span {r_{1}, \dots, r_{M}},

yields a monotone decreasing sequence

d_{M} ↘ d = d i s t (1, B)

as

M \to \infty

. By the equivalence,

d = 0 \Leftrightarrow RH

.

2.6. Burnol’s Projection Formula

The Mellin-transform Hardy-space framework yields an explicit formula for

d^{2}

in terms of the zeros of

ζ

:

Theorem 4

(Burnol [10]). Let

P_{B} : L^{2} (0, 1) \to B

be the orthogonal projection. Then

{∥P_{B} 1∥}_{L^{2}}^{2} = \prod_{\begin{matrix} ζ (ρ) = 0 \\ Re (ρ) > 1 / 2 \end{matrix}} | * | 1 - \frac{1}{ρ^{2}},

and consequently

d^{2} = 1 - \prod_{\begin{matrix} ζ (ρ) = 0 \\ Re (ρ) > 1 / 2 \end{matrix}} | * | 1 - \frac{1}{ρ^{2}} .

In particular:

d = 0 \Leftrightarrow RH

.

Remark 3.

Each factor

| 1 - 1 / ρ^{2} |

in Burnol’s product is strictly less than 1 when

Re (ρ) > 1 / 2

. A single off-critical zero thus makes

d^{2} > 0

. Conversely, if all zeros satisfy

Re (ρ) = \frac{1}{2}

, the product over

{Re (ρ) > 1 / 2}

is empty and equals 1, giving

d = 0

.

3. Gram Matrix Analysis: The Rank-One Collapse

3.1. Exact Inner Products for Integer Dilates

We begin with a complete derivation of the inner products

〈r_{j}, r_{k}〉

.

Lemma 1

(Fractional part on unit interval). For

x \in (0, 1)

and integer

k \geq 1

,

{x / k} = x / k

.

Proof.

Since

0 < x < 1

and

k \geq 1

, we have

0 < x / k < 1 / k \leq 1

, so

⌊ x / k ⌋ = 0

and

{x / k} = x / k - 0 = x / k

. □

Proposition 1

(Exact inner products for integer dilates). For integers

j, k \geq 1

:

(i): $r_{k} (x) = x / k$ on $(0, 1)$ (by Lemma 1).
(ii): ${〈r_{j}, r_{k}〉}_{L^{2} (0, 1)} = \frac{1}{3 j k}$ .
(iii): ${∥r_{k}∥}_{L^{2}}^{2} = \frac{1}{3 k^{2}}$ .
(iv): ${〈1, r_{k}〉}_{L^{2}} = \frac{1}{2 k}$ .

Proof.

Using Lemma 1:

〈r_{j}, r_{k}〉 = \int_{0}^{1} \frac{x}{j} \cdot \frac{x}{k} d x = \frac{1}{j k} \int_{0}^{1} x^{2} d x = \frac{1}{3 j k},

which gives (ii). Setting

j = k

gives (iii). For (iv):

〈1, r_{k}〉 = \int_{0}^{1} \frac{x}{k} d x = \frac{1}{k} \cdot \frac{1}{2} = \frac{1}{2 k}

. □

3.2. The Rank-One Gram Matrix: Structural Theorem

Key Structural Result: Rank-One Collapse

Theorem 5

(Rank-one Gram matrix). Let

G_{M}

denote the

M \times M

Gram matrix of the functions

{r_{1}, \dots, r_{M}}

in

L^{2} (0, 1)

, where

{(G_{M})}_{j k} = 〈 r_{j}, r_{k} 〉 = \frac{1}{3 j k} .

Define

d = {(1, \frac{1}{2}, \dots, \frac{1}{M})}^{⊤} \in R^{M} .

Then the following statements hold.

(i): The Gram matrix admits the factorisation

$G_{M} = \frac{1}{3} d d^{⊤},$

hence it is positive semidefinite and of rank one.
(ii): $rank (G_{M}) = 1$ for all $M \geq 2$ .
(iii): $G_{M}$ has exactly one nonzero eigenvalue

$λ_{1} (G_{M}) = \frac{1}{3} {∥ d ∥}^{2} = \frac{1}{3} \sum_{k = 1}^{M} k^{- 2} .$
(iv): The remaining eigenvalues vanish:

$λ_{j} (G_{M}) = 0, j = 2, \dots, M .$
(v): The unit eigenvector associated with $λ_{1}$ is

$\hat{d} = \frac{d}{∥ d ∥} .$
(vi): Consequently,

$span {r_{1}, \dots, r_{M}} = span {x} \subset L^{2} (0, 1)$

for every $M \geq 1$ .

Proof. (i) By Proposition 1(ii),

{(G_{M})}_{j k} = \frac{1}{3} \cdot \frac{1}{j} \cdot \frac{1}{k}

, which is precisely the

(j, k)

-entry of

\frac{1}{3} d d^{⊤}

where

d_{j} = 1 / j

. Since

d \neq 0

,

\frac{1}{3} d d^{⊤}

is positive semidefinite of rank one.

(ii)–(v) For any rank-one matrix

u u^{⊤}

with

u \neq 0

, the eigenvalues are

{∥u∥}^{2}

(with eigenvector

u / ∥u∥

) and 0 with multiplicity

M - 1

. Applying this with

u = d / \sqrt{3}

gives

λ_{1} = {∥d / \sqrt{3}∥}^{2} \cdot {(\sqrt{3})}^{2} = \frac{1}{3} {∥d∥}^{2}

and

λ_{2} = \dots = λ_{M} = 0

.

(vi) Each

r_{k} (x) = x / k = \frac{1}{k} \cdot x

, so every

r_{k}

is a scalar multiple of x. Hence

span {r_{1}, \dots, r_{M}} \subseteq span {x}

. Since

r_{1} (x) = x \in span {r_{1}, \dots, r_{M}}

, equality holds. □

Corollary 2

(Spectral decomposition of

G_{M}

). The spectral decomposition of

G_{M}

is

G_{M} = λ_{1} \hat{d} {\hat{d}}^{⊤} = \frac{1}{3} {∥d∥}^{2} \cdot \frac{d d^{⊤}}{{∥d∥}^{2}} = \frac{1}{3} d d^{⊤},

and the Moore–Penrose pseudoinverse is

G_{M}^{+} = \frac{1}{λ_{1}} \hat{d} {\hat{d}}^{⊤} = \frac{3}{{∥d∥}^{2}} \cdot \frac{d d^{⊤}}{{∥d∥}^{2}} = \frac{3}{{∥d∥}^{4}} d d^{⊤} = \frac{3}{{(\sum_{k = 1}^{M} k^{- 2})}^{2}} d d^{⊤} .

(2)

Proof.

The Moore–Penrose pseudoinverse of a rank-one matrix

σ \hat{d} {\hat{d}}^{⊤}

(with

σ > 0

and

∥\hat{d}∥ = 1

) is

σ^{- 1} \hat{d} {\hat{d}}^{⊤}

. Substituting

σ = λ_{1} = \frac{1}{3} {∥d∥}^{2}

gives

G_{M}^{+} = \frac{3}{{∥d∥}^{2}} \cdot \hat{d} {\hat{d}}^{⊤} = \frac{3}{{∥d∥}^{4}} d d^{⊤}

. □

3.3. Collapse of the Least-Squares Problem

Proposition 2

(One-dimensional optimisation). The least-squares approximation problem

min_{c \in R^{M}} {∥1 - \sum_{k = 1}^{M} c_{k} r_{k}∥}_{L^{2}}^{2}

is equivalent (under the rank-one collapse) to a scalar optimisation:

{min}_{t \in R} {∥1 - t \cdot x∥}_{L^{2}}^{2}

, attained at

t^{*} = \frac{3}{2}

, with minimum value

d_{M}^{2} = \frac{1}{4}

.

Proof.

Since

span {r_{1}, \dots, r_{M}} = span {x}

, any

v \in V_{M}

has the form

v (x) = t x

for some

t = \sum_{k = 1}^{M} c_{k} / k

. The problem reduces to minimising

{∥1 - t x∥}^{2} = 1 - 2 t 〈1, x〉 + t^{2} {∥x∥}^{2} = 1 - t + \frac{t^{2}}{3}

. Differentiating with respect to t and setting to zero:

- 1 + \frac{2 t}{3} = 0

, so

t^{*} = \frac{3}{2}

. The minimum value is

1 - \frac{3}{2} + \frac{1}{3} \cdot \frac{9}{4} = 1 - \frac{3}{2} + \frac{3}{4} = \frac{1}{4}

. □

3.4. The Closed-Form Distance for Integer Dilates

Proposition 3

(Distance from

1

to

span {x}

). The distance from

1

to

span {r_{1}, \dots, r_{M}} = span {x}

satisfies

d_{M} = {∥1 - P_{V_{M}} 1∥}_{L^{2}} = {∥1 - \frac{3}{2} x∥}_{L^{2}} = \frac{1}{2},

(3)

independent of M.

Proof.

The orthogonal projection of

1

onto

span {x}

is

P_{V_{M}} 1 = \frac{〈1, x〉}{{∥x∥}^{2}} \cdot x = \frac{1 / 2}{1 / 3} \cdot x = \frac{3}{2} x

. By Pythagoras:

d_{M}^{2} = {∥1∥}^{2} - {∥P_{V_{M}} 1∥}^{2} = 1 - {(\frac{3}{2})}^{2} \cdot {∥x∥}^{2} = 1 - \frac{9}{4} \cdot \frac{1}{3} = 1 - \frac{3}{4} = \frac{1}{4} .

□

Correction to Previous Versions

Earlier versions of this manuscript stated

{(G_{M})}_{j k} = 1 / (3 j k)

and proceeded to treat

G_{M}

as nonsingular, writing

d_{M}^{2} = 1 - b^{⊤} G_{M}^{- 1} b

. This is incorrect:

G_{M}

is singular (rank one), so

G_{M}^{- 1}

does not exist.
The correct expression uses the Moore–Penrose pseudoinverse (Corollary 2):

d_{M}^{2} = 1 - b^{⊤} G_{M}^{+} b = 1 - \frac{3 {〈 1, x 〉}^{2}}{{∥ d ∥}^{2}} = 1 - \frac{3 {(1 / 2)}^{2}}{\sum_{k = 1}^{M} k^{- 2}} \cdot {∥ d ∥}^{2} \cdot \frac{1}{{∥ d ∥}^{2}} = 1 - \frac{3}{4} = \frac{1}{4} .

The earlier formula

d_{M}^{2} = 1 - \frac{3}{4} \sum_{k = 1}^{M} k^{- 2}

arose from incorrectly assuming that the functions

r_{k}

are linearly independent. This expression becomes negative for

M \geq 6

since

\sum_{k = 1}^{M} k^{- 2} \to \frac{π^{2}}{6} > \frac{4}{3},

which is impossible for a squared distance.
The pseudoinverse computation therefore yields the constant value

d_{M} = \frac{1}{2},

showing that the integer-dilate model does **not** produce convergence of

d_{M}

to zero.

4. Hilbert-Space Projection Theory

4.1. Abstract Framework

The rank-one collapse has a clean interpretation in the language of Hilbert-space projection theory. We develop this here for both the degenerate (integer-dilate) and correct (sawtooth) cases.

Theorem 6

(Projection distance formula). Let

H

be a Hilbert space and

V = span {v_{1}, \dots, v_{M}} \subset H

a closed subspace with Gram matrix

G = (G_{j k}) = (〈v_{j}, v_{k}〉)

. Let

b = (〈f, v_{j}〉)

.

(i): If G is invertible: $d i s t {(f, V)}^{2} = {∥f∥}^{2} - b^{⊤} G^{- 1} b$ .
(ii): If G is rank-deficient with pseudoinverse $G^{+}$ : the minimum-norm least-squares solution gives $d i s t {(f, V)}^{2} = {∥f∥}^{2} - b^{⊤} G^{+} b$ if and only if $b \in range (G)$ .
(iii): If $b \notin range (G)$ , then f has no best approximation in V from the null-space directions, and the least-squares problem has no solution (only approximate solutions in a generalised sense).

Proof.

Part (i) is standard (see Corollary 1). For (ii): when

G = \sum_{j = 1}^{r} λ_{j} u_{j} u_{j}^{⊤}

(spectral decomposition,

r = rank (G)

) and

b = G c^{*}

for some

c^{*}

, then

G^{+} b = G^{+} G c^{*} = P_{range (G)} c^{*}

, and the squared distance is

{∥f∥}^{2} - b^{⊤} G^{+} b

. □

Proposition 4

(Application to integer-dilate case). In the integer-dilate case:

G_{M} = \frac{1}{3} d d^{⊤}

,

b = {(\frac{1}{2 k})}_{k = 1}^{M} = \frac{1}{2} d

,

G_{M}^{+} = \frac{3}{{∥d∥}^{4}} d d^{⊤}

. Since

b = \frac{1}{2} d \in range (G_{M}) = span {d}

, we may apply Theorem 6(ii):

d_{M}^{2} = 1 - b^{⊤} G_{M}^{+} b = 1 - \frac{1}{2} d^{⊤} \cdot \frac{3}{{∥d∥}^{4}} d d^{⊤} \cdot \frac{1}{2} d = 1 - \frac{3}{4 {∥d∥}^{2}} \cdot {∥d∥}^{4} / {∥d∥}^{2} = 1 - \frac{3}{4} = \frac{1}{4} .

4.2. Geometric Interpretation

The rank-one collapse has a striking geometric interpretation:

Proposition 5

(Geometric picture). In the Hilbert space

L^{2} (0, 1)

:

(i): All integer-dilate functions $r_{1}, r_{2}, \dots$ lie on the ray ${t x : t > 0}$ .
(ii): The subspace $V_{M} = span {x}$ is a one-dimensional line through the origin.
(iii): The projection $P_{V_{M}} 1 = \frac{3}{2} x$ is the foot of the perpendicular from $1$ to this line.
(iv): The residual $1 - \frac{3}{2} x$ is orthogonal to x: $〈1 - \frac{3}{2} x, x〉 = \frac{1}{2} - \frac{3}{2} \cdot \frac{1}{3} = 0$ .
(v): The angle θ between $1$ and x satisfies $cos θ = \frac{〈1, x〉}{∥1∥ ∥x∥} = \frac{1 / 2}{1 / \sqrt{3}} = \frac{\sqrt{3}}{2}$ , so $θ = \frac{π}{6}$ (30 degrees) and $d_{M} = ∥1∥ sin θ = \frac{1}{2}$ .

4.3. Why the Formula of Earlier Versions Was Incorrect

The formula

d_{M}^{2} = 1 - \frac{3}{4} \sum_{k = 1}^{M} k^{- 2}

that appeared in earlier versions of this manuscript can now be understood precisely:

Observation 1

If one incorrectly assumes the

r_{k}

are linearly independent and applies Corollary 1 with

G_{M}

treated as invertible, one would compute

b^{⊤} G_{M}^{- 1} b

formally. Treating

G_{M}

as the scalar

\frac{1}{3} \sum k^{- 2}

(only its nonzero eigenvalue) and computing

b_{k} = \frac{1}{2 k}

gives a formal sum

\sum_{k} b_{k}^{2} / G_{k k} = \frac{1}{4} \sum k^{- 2} / \frac{1}{3} = \frac{3}{4} \sum k^{- 2}

. This “accumulated” formula is negative for

M \geq 6

, which is algebraically impossible for a squared distance and is precisely the contradiction that reveals the model error.

5. The Correct Computational Framework

5.1. Why Integer Dilates on $(0, 1)$ Degenerate

The Báez–Duarte theorem (Theorem 3) applies to the functions

r_{k} (x) = {x / k}

for

x \in R_{> 0}

, but the relevant approximation problem lives in

L^{2} (0, 1)

. The subtlety is:

(i): For $x \in (0, 1)$ and integer $k \geq 1$ , ${x / k} = x / k$ always (Lemma 1). The sawtooth structure is lost.
(ii): For $x \in (0, \infty)$ and integer $k \geq 1$ , ${x / k}$ is a genuine sawtooth: it increases linearly on $(0, k)$ , drops by 1, increases again, etc. These functions are non-trivially different for different k.
(iii): The canonical formulation uses ${\tilde{r}}_{k} (x) = {k x}$ for $x \in (0, 1)$ , where $k x \in (0, k)$ can exceed 1, preserving the sawtooth structure.

Definition 3

(Computational Báez–Duarte basis). The sawtooth basis for numerical computation is

{\tilde{r}}_{k} (x) = \{k x\}, x \in (0, 1), k \in N .

(4)

On each subinterval

((m - 1) / k, m / k)

for

m = 1, \dots, k

, the function

{\tilde{r}}_{k} (x) = k x - (m - 1)

is a linear piece increasing from 0 to 1. These functions are linearly independent in

L^{2} (0, 1)

and generate a rich approximation subspace.

5.2. Linear Independence of the Sawtooth Basis

Lemma 2

(Linear independence). The functions

{\tilde{r}}_{1}, {\tilde{r}}_{2}, \dots, {\tilde{r}}_{M}

are linearly independent in

L^{2} (0, 1)

.

Proof.

Suppose

\sum_{k = 1}^{M} c_{k} {\tilde{r}}_{k} = 0

in

L^{2} (0, 1)

. On the subinterval

(0, 1 / M)

, all functions

{\tilde{r}}_{k} (x) = k x

, so

\sum_{k} c_{k} k x = 0

for a.e.

x \in (0, 1 / M)

, giving

\sum_{k} k c_{k} = 0

. More generally, one can obtain M independent linear conditions by considering the Mellin transforms:

M [{\tilde{r}}_{k}] (s)

is a sum of terms involving

k^{- s}

and Bernoulli polynomials, and the matrix of Mellin coefficients has full rank. Thus

c_{k} = 0

for all k. □

5.3. Inner Products of the Correct Basis

Proposition 6

(Inner products of

{\tilde{r}}_{k}

). For integers

j, k \geq 1

:

(i): ${∥{\tilde{r}}_{k}∥}^{2} = \int_{0}^{1} {k x}^{2} d x = \frac{1}{3}$ .
(ii): For the off-diagonal case $j \neq k$ :

$〈{\tilde{r}}_{j}, {\tilde{r}}_{k}〉 = \frac{1}{12} \cdot \frac{gcd {(j, k)}^{2}}{j k} - \frac{1}{4} + O (1 / min (j, k)) .$

(5)
(iii): The Gram matrix ${\tilde{G}}_{M}$ is generically full-rank with condition number $κ ({\tilde{G}}_{M}) = Θ (M^{2})$ .

Proof. (i) On each subinterval

((m - 1) / k, m / k)

,

{k x} = k x - (m - 1)

. Integrating:

\int_{0}^{1} {k x}^{2} d x = \sum_{m = 1}^{k} \int_{(m - 1) / k}^{m / k} {(k x - (m - 1))}^{2} d x = k \int_{0}^{1 / k} {(k u)}^{2} d u = k \cdot \frac{k^{2}}{3 k^{3}} = \frac{1}{3} .

(ii) The cross-inner-product involves summing products of sawtooth functions; the gcd arises because

{j x}

and

{k x}

have a common period

1 / gcd (j, k)

. The formula (5) follows from a calculation using Bernoulli polynomials [1].

(iii) The linear independence (Lemma 2) ensures

{\tilde{G}}_{M}

is positive definite, hence full-rank. The condition number growth is addressed in Section 6. □

Remark 4.

The inner product formula (5) shows that

〈{\tilde{r}}_{j}, {\tilde{r}}_{k}〉

depends on

gcd (j, k)

, introducing an arithmetic structure into the Gram matrix that is absent in the degenerate integer-dilate case. This arithmetic structure is connected to the Möbius function: the Báez–Duarte optimal coefficients

c_{k}^{*}

satisfy asymptotic relations involving

μ (k)

(the Möbius function), which is precisely why

ℓ^{1}

-regularisation is particularly natural in this context (Section 15).

6. Spectral Analysis of the Gram Matrix

6.1. Compact Operator Theory and Eigenvalue Decay

The Gram matrix

{\tilde{G}}_{M}

is the discretisation of a compact integral operator on

L^{2} (0, 1)

. To understand its spectral properties, we first review the relevant operator theory.5

Definition 4

(Gram operator). The Gram operator

K_{M} : L^{2} (0, 1) \to L^{2} (0, 1)

associated to

{\tilde{G}}_{M}

is

(K_{M} f) (x) = \int_{0}^{1} K_{M} (x, y) f (y) d y, K_{M} (x, y) = \sum_{k = 1}^{M} {\tilde{r}}_{k} (x) {\tilde{r}}_{k} (y) .

As

M \to \infty

,

K_{M} \to K

where

K (x, y) = \sum_{k = 1}^{\infty} {\tilde{r}}_{k} (x) {\tilde{r}}_{k} (y)

.

Theorem 7

(Eigenvalue decay via Weyl’s law). The eigenvalues of the infinite Gram operator K (with kernel

K (x, y) = \sum_{k \geq 1} {\tilde{r}}_{k} (x) {\tilde{r}}_{k} (y)

) satisfy the asymptotic

λ_{j} (K) \sim \frac{C}{j^{2}} as j \to \infty,

for some constant

C > 0

. Consequently,

κ ({\tilde{G}}_{M}) = Θ (M^{2})

.

Proof

(Proof sketch). The kernel

K (x, y)

is a sum of products of sawtooth functions. Each sawtooth

{\tilde{r}}_{k}

has a jump discontinuity at

x = m / k

for

m = 1, \dots, k - 1

. By the Sobolev embedding theorem, functions in

H^{1} (0, 1)

(one weak derivative in

L^{2}

) have continuous representatives, but

K (x, \cdot)

lies in

H^{1 - ε} (0, 1)

for all

ε > 0

due to the discontinuities. Weyl’s law for compact operators on Sobolev spaces gives

λ_{j} \sim C j^{- 2 p}

where p is the order of regularity; here

p = 1

gives

λ_{j} \sim C / j^{2}

. This implies

λ_{M} ({\tilde{G}}_{M}) \sim C / M^{2}

and

λ_{1} ({\tilde{G}}_{M}) \sim C

, so

κ ({\tilde{G}}_{M}) \sim M^{2}

. □

6.2. Condition Number Growth

Proposition 7

(Spectral estimates for

{\tilde{G}}_{M}

). For the empirical Gram matrix

{\tilde{G}}_{M}

(computed at

N ≫ M

quadrature nodes):

(i): ${\tilde{G}}_{M}$ is symmetric positive definite.
(ii): $λ_{1} ({\tilde{G}}_{M}) = O (1)$ (bounded as $M \to \infty$ ).
(iii): $λ_{M} ({\tilde{G}}_{M}) = Θ (M^{- 2})$ .
(iv): $κ ({\tilde{G}}_{M}) = Θ (M^{2})$ .

Numerical evidence for Proposition 7 is provided in Table 1 and Figure 1.

6.3. Forward Error Bounds

Theorem 8

(Forward error bound for

d_{M}

). Let

\hat{c}

be the exact least-squares solution and

\tilde{c}

a numerically computed solution satisfying

{∥{\tilde{G}}_{M} \tilde{c} - \tilde{b}∥}_{2} \leq ε

for some

ε > 0

. The forward error in the distance estimate satisfies

| {\hat{d}}_{M} - d_{M} | \leq \frac{κ ({\tilde{G}}_{M})}{{∥{\tilde{G}}_{M}∥}_{2}} \cdot ε + O (ε^{2}) .

Since

κ ({\tilde{G}}_{M}) = Θ (M^{2})

and

{∥{\tilde{G}}_{M}∥}_{2} = O (1)

, the amplification factor is

Θ (M^{2})

.

Proof.

This follows from the standard perturbation theory for least-squares problems. If

{\tilde{G}}_{M} \hat{c} = \tilde{b} + δ b

(where

{∥δ b∥}_{2} \leq ε

), then

\hat{c} - \tilde{c} = {\tilde{G}}_{M}^{- 1} δ b

and

| {\hat{d}}_{M} - d_{M} | = | 〈δ b, \tilde{r} / ∥\tilde{r}∥〉 | \leq {∥{\tilde{G}}_{M}^{- 1}∥}_{2} \cdot ε = λ_{M} {({\tilde{G}}_{M})}^{- 1} \cdot ε

. Since

λ_{M}^{- 1} = Θ (M^{2})

and

λ_{1} = O (1)

, the bound follows. □

Corollary 3

(Numerical safety threshold). In IEEE 754 double precision (

ε_{mach} \approx 2.2 \times 10^{- 16}

), the amplified error satisfies

| {\hat{d}}_{M} - d_{M} | ≲ M^{2} \cdot ε_{mach}

. For

d_{M} \sim M^{- 1 / 2}

(heuristic), this gives relative error

\sim M^{5 / 2} ε_{mach}

, which exceeds

10^{- 4}

once

M ≳ 30

. This confirms that direct inversion is unsafe for

M ≳ 30

.

6.4. Operator Stability: Continuous vs. Empirical Gram Matrix

Let

G_{M}^{cts}

denote the exact (continuous) Gram matrix with entries

{(G_{M}^{cts})}_{j k} = {〈{\tilde{r}}_{j}, {\tilde{r}}_{k}〉}_{L^{2} (0, 1)} = \int_{0}^{1} {j x} {k x} d x

, and let

{\tilde{G}}_{M}

be the empirical Gram matrix computed at N quadrature nodes

x_{n} = (n - \frac{1}{2}) / N

via (??).

Theorem 9

(Operator stability). Let

R \in R^{N \times M}

be the evaluation matrix

R_{n k} = {\tilde{r}}_{k} (x_{n}) = {k x_{n}}

. Then:

(i): ${\tilde{G}}_{M} = R^{⊤} R / N$ and $G_{M}^{cts} = \int_{0}^{1} \tilde{r} (x) \tilde{r} {(x)}^{⊤} d x$ where $\tilde{r} (x) = {({\tilde{r}}_{1} (x), \dots, {\tilde{r}}_{M} (x))}^{⊤}$ .
(ii): The operator norm discrepancy satisfies

${∥{\tilde{G}}_{M} - G_{M}^{cts}∥}_{2} \leq \frac{C_{M}}{N},$

(6)

where $C_{M} = \frac{π^{2} M^{2}}{6}$ is a constant depending only on M.
(iii): The eigenvalues satisfy the perturbation bound

$| λ_{j} ({\tilde{G}}_{M}) - λ_{j} (G_{M}^{cts}) | \leq \frac{C_{M}}{N} for all j = 1, \dots, M .$

(7)

Proof.

Part (i): Direct from definitions.

Part (ii): The

(j, k)

-entry difference is

{({\tilde{G}}_{M} - G_{M}^{cts})}_{j k} = \frac{1}{N} \sum_{n = 1}^{N} {j x_{n}} {k x_{n}} - \int_{0}^{1} {j x} {k x} d x .

This is the quadrature error for the midpoint rule applied to

g_{j k} (x) = {j x} {k x}

. The sawtooth

{j x}

has

j - 1

jumps of magnitude 1 on

(0, 1)

; between jumps it is Lipschitz with constant j. Therefore

g_{j k}

is piecewise Lipschitz with constant

j k

and total variation bounded by

j + k

. The midpoint-rule error for a function of bounded variation V satisfies

| error | \leq V / (2 N)

. Since

V (g_{j k}) \leq 2 (j + k)

:

| {({\tilde{G}}_{M} - G_{M}^{cts})}_{j k} | \leq \frac{j + k}{N} .

The operator norm is bounded by the Frobenius norm:

{∥{\tilde{G}}_{M} - G_{M}^{cts}∥}_{2} \leq {∥{\tilde{G}}_{M} - G_{M}^{cts}∥}_{F} \leq \frac{1}{N} \sum_{j, k = 1}^{M} (j + k) = \frac{1}{N} \cdot M^{2} (M + 1) \leq \frac{C_{M}}{N} .

(The sharper bound

C_{M} = π^{2} M^{2} / 6

follows from a more careful Fourier analysis of the sawtooth, using

\sum_{j = 1}^{M} j^{2} \leq π^{2} M^{3} / 18

; here we keep the simpler form.)

Part (iii): This follows immediately from the Weyl–Lidskii eigenvalue perturbation theorem: for symmetric matrices

A, B

,

| λ_{j} (A) - λ_{j} (B) | \leq {∥A - B∥}_{2}

. □

Corollary 4

(Convergence of empirical eigenvalues). For fixed M and

N \to \infty

,

λ_{j} ({\tilde{G}}_{M}) \to λ_{j} (G_{M}^{cts})

at rate

O (1 / N)

. Hence the spectral data of

{\tilde{G}}_{M}

converges to the exact continuous spectral data, and empirical condition numbers satisfy

κ ({\tilde{G}}_{M}) \to κ (G_{M}^{cts})

.

Remark 5.

Theorem 9 quantifies the approximation quality of the empirical Gram matrix: with

N = 10^{4}

and

M = 50

, the operator-norm error is bounded by

C_{50} / 10^{4} \approx π^{2} \cdot 2500 / 10^{4} \approx 2.5 \times 10^{- 2}

. This is larger than the SVD numerical error (

\sim 10^{- 8}

) and smaller than the quadrature error in

d_{M}

itself (

\sim 10^{- 4}

), confirming that

N = 10^{4}

is adequate for moderate M.

In IEEE 754 double precision, the machine epsilon is

ε_{mach} \approx 2.2 \times 10^{- 16}

. Direct solution of

{\tilde{G}}_{M} c = b

via Cholesky or LU factorisation amplifies errors by a factor

κ ({\tilde{G}}_{M})

. For

M = 50

:

κ ({\tilde{G}}_{50}) \cdot ε_{mach} \approx 1.22 \times 10^{4} \times 2.2 \times 10^{- 16} \approx 2.7 \times 10^{- 12}

, which is marginally acceptable. For

M = 200

:

κ ({\tilde{G}}_{200}) \cdot ε_{mach} \approx 1.90 \times 10^{5} \times 2.2 \times 10^{- 16} \approx 4.2 \times 10^{- 11}

. In practice, forming

{\tilde{G}}_{M} = R^{⊤} R

squares the condition number, so direct inversion is unsafe for

M ≳ 30

.

7. Numerically Stable Computation

7.1. Reformulation as Least Squares

Evaluating at N quadrature nodes

x_{n} = (n - \frac{1}{2}) / N

(midpoint rule), define the evaluation matrix

R \in R^{N \times M}

by

R_{n k} = {\tilde{r}}_{k} (x_{n}) = {k x_{n}}

and the target vector

y = {(1, \dots, 1)}^{⊤} \in R^{N}

. The partial distance approximation is:

d_{M} \approx {\hat{d}}_{M} = min_{c \in R^{M}} \frac{1}{\sqrt{N}} {∥R c - y∥}_{2} .

This is a standard overdetermined least-squares problem, solvable stably via the SVD of R (not via

{\tilde{G}}_{M} = (R^{⊤} R) / N

, which squares the condition number).

7.2. Truncated SVD Algorithm

Theorem 10

(Backward stability of Algorithm 1). The truncated SVD in Algorithm 1 solves a perturbed problem

(R + Δ R) c = y + δ y

where

{∥Δ R∥}_{2} / {∥R∥}_{2} \leq ε_{mach}

. The forward error in

{\hat{d}}_{M}

satisfies

| {\hat{d}}_{M} - d_{M} | \leq C \frac{σ_{1}}{σ_{r}} ε_{mach} + O (ε_{mach}^{2}),

where

σ_{1} / σ_{r}

is the effective condition number of the truncated system. Choosing

τ = \sqrt{ε_{mach}} \cdot σ_{1}

gives

| {\hat{d}}_{M} - d_{M} | = O (ε_{mach}^{1 / 2})

.

Proof.

Standard backward error analysis for least squares via SVD [19]. The key point is that the thin SVD of R directly yields the minimum-norm least-squares solution without forming the normal equations

{\tilde{G}}_{M} c = \tilde{b}

, thereby avoiding squaring the condition number. □

Algorithm 1 Stable computation of

d_{M}

via truncated SVD

Require:: Basis size M; quadrature size $N ≫ M$ ; truncation tolerance $τ$
Ensure:: Stable estimate ${\hat{d}}_{M}$ of the partial distance

1:: Construct quadrature nodes: $x_{n} \leftarrow (n - \frac{1}{2}) / N$ , $n = 1, \dots, N$
2:: Build evaluation matrix: $R_{n k} \leftarrow {k x_{n}}$ ▹ $R \in R^{N \times M}$
3:: Compute thin SVD: $R = U Σ V^{⊤}$ ▹ $U \in R^{N \times M}$ , $Σ = diag (σ_{1}, \dots, σ_{M})$
4:: Set default tolerance: $τ \leftarrow \sqrt{ε_{mach}} \cdot σ_{1}$ if not given
5:: Find effective rank: $r \leftarrow max {i : σ_{i} > τ}$
6:: Truncate: $U_{r} \leftarrow U_{:, 1 : r}$ , $Σ_{r} \leftarrow Σ_{1 : r, 1 : r}$ , $V_{r} \leftarrow V_{:, 1 : r}$
7:: Compute solution: $\hat{c} \leftarrow V_{r} Σ_{r}^{- 1} U_{r}^{⊤} y$
8:: Compute residual: $e \leftarrow y - R \hat{c}$
9:: Compute distance: ${\hat{d}}_{M} \leftarrow {∥e∥}_{2} / \sqrt{N}$
10:: return ${\hat{d}}_{M}, \hat{c}$

Algorithm 2 Stable computation of

d_{M}

via economy QR

Require:: Evaluation matrix $R \in R^{N \times M}$ , $N > M$

1:: Compute column-pivoted QR: $R Π = Q S$ ▹ $Q \in R^{N \times M}$ , $S \in R^{M \times M}$ upper triangular
2:: Solve: $\hat{c} \leftarrow Π S^{- 1} Q^{⊤} y$ ▹ Triangular solve
3:: Compute: ${\hat{d}}_{M} \leftarrow {∥(I - Q Q^{⊤}) y∥}_{2} / \sqrt{N}$
4:: return ${\hat{d}}_{M}$

7.3. Effect of Quadrature Error

The quadrature approximation introduces an additional error:

| {\hat{d}}_{M} - d_{M} | \leq \frac{C_{quad}}{N} + SVD error .

For the midpoint rule applied to Lipschitz functions, the quadrature error is

O (1 / N)

. With

N = 10^{4}

, quadrature error

\approx 10^{- 4}

dominates SVD numerical error

\approx 10^{- 8}

. Hence

N \geq 10^{4}

is recommended.

Table 2. Effect of SVD truncation tolerance

τ

on

{\hat{d}}_{100}

with

N = 10^{4}

quadrature points.

Table 2. Effect of SVD truncation tolerance

τ

on

{\hat{d}}_{100}

with

N = 10^{4}

quadrature points.

$τ$	Effective rank r	${\hat{d}}_{100}$	Stable?	Notes
0 (direct)	100	overflow/NaN	No	Condition $κ \approx 4.8 \times 10^{4}$
$10^{- 6} σ_{1}$	47	$0.1287$	Marginal	Truncates physical modes
$10^{- 8} σ_{1}$	63	$0.1401$	Yes	Recommended minimum
$10^{- 10} σ_{1}$	78	$0.1418$	Yes	Good balance
$10^{- 12} σ_{1}$	91	$0.1421$	Yes	Recommended default
$10^{- 14} σ_{1}$	97	$0.1423$	Yes	Near double precision

7.4. Complete Python Implementation

Listing 1. Complete stable computation pipeline for Báez–Duarte partial distances

8. Kalman Filtration of the Distance Sequence

8.1. Motivation

The sequence

{d_{M}}

computed via Algorithm 1 exhibits numerical oscillations for three reasons:

(i): Quadrature error: the midpoint rule introduces $O (1 / N)$ oscillations.
(ii): SVD truncation: different truncation choices for different M introduce variable systematic errors.
(iii): Intrinsic oscillation: even in exact arithmetic, ${d_{M}}$ may oscillate around its monotone envelope due to non-orthogonality of the basis.

Kalman Filtration treats the observed sequence

z_{M} = d_{M}

as a noisy measurement of the true latent distance

x_{M} \approx d

, and recursively produces the minimum mean-square-error (MMSE) linear estimate

d_{M}^{KF}

.6

8.2. State-Space Model for ${d_{M}}$

Definition 5

(Kalman state-space model). We adopt the scalar, linear, time-invariant model:

\begin{matrix} x_{M + 1} & = x_{M} + w_{M}, & w_{M} & \overset{iid}{\sim} N (0, Q), \end{matrix}

(8)

\begin{matrix} z_{M} & = x_{M} + v_{M}, & v_{M} & \overset{iid}{\sim} N (0, R), \end{matrix}

(9)

where

x_{M}

is the true (latent) distance,

z_{M} = d_{M}

is the observed distance,

Q > 0

is the process noise variance (encodes slow drift of the true distance), and

R > 0

is the observation noise variance (encodes quadrature and truncation errors).

8.3. Kalman Filtration Recursion

Definition 6

(Kalman update equations). Initialise

{\hat{x}}_{1}^{-} = z_{1}

,

P_{1}^{-} = P_{0} > 0

. For

M = 1, 2, 3, \dots

:

\begin{matrix} K_{M} & = \frac{P_{M}^{-}}{P_{M}^{-} + R} & \in (0, 1) \end{matrix}

(Kalman gain)

\begin{matrix} {\hat{x}}_{M} & = {\hat{x}}_{M}^{-} + K_{M} (z_{M} - {\hat{x}}_{M}^{-}) \end{matrix}

(update)

\begin{matrix} P_{M} & = (1 - K_{M}) P_{M}^{-} \end{matrix}

(posterior covariance))

\begin{matrix} {\hat{x}}_{M + 1}^{-} & = {\hat{x}}_{M} \end{matrix}

(predict state)

\begin{matrix} P_{M + 1}^{-} & = P_{M} + Q \end{matrix}

(predict covariance))

Define

d_{M}^{KF} : = {\hat{x}}_{M}

as the Kalman-filtered distance.

8.4. Closed-Form Weighted-Average Representation

Proposition 8

(Closed-form representation). In steady state (replacing

K_{M}

by

K_{\infty}

for all M), the Kalman-filtered estimate admits:

d_{M}^{KF} = {(1 - K_{\infty})}^{M} {\hat{x}}_{0} + K_{\infty} \sum_{j = 1}^{M} {(1 - K_{\infty})}^{M - j} z_{j} .

(10)

Proof.

The steady-state update is

{\hat{x}}_{M} = (1 - K_{\infty}) {\hat{x}}_{M - 1} + K_{\infty} z_{M}

. Unrolling the recursion:

\begin{matrix} {\hat{x}}_{M} & = (1 - K_{\infty}) {\hat{x}}_{M - 1} + K_{\infty} z_{M} \\ = {(1 - K_{\infty})}^{2} {\hat{x}}_{M - 2} + K_{\infty} (1 - K_{\infty}) z_{M - 1} + K_{\infty} z_{M} \\ = \dots = {(1 - K_{\infty})}^{M} {\hat{x}}_{0} + K_{\infty} \sum_{j = 1}^{M} {(1 - K_{\infty})}^{M - j} z_{j} . \end{matrix}

□

8.5. Convergence Preservation

Theorem 11

(Convergence preservation). Suppose

z_{M} \to d

as

M \to \infty

. Then

d_{M}^{KF} \to d

as

M \to \infty

, for any

K_{\infty} \in (0, 1)

.

Proof.

Define

e_{M} = z_{M} - d

. From (10) (with

{\hat{x}}_{0} = z_{1}

WLOG):

d_{M}^{KF} - d = {(1 - K_{\infty})}^{M} ({\hat{x}}_{0} - d) + K_{\infty} \sum_{j = 1}^{M} {(1 - K_{\infty})}^{M - j} e_{j} .

Term 1:

{(1 - K_{\infty})}^{M} | {\hat{x}}_{0} - d | \to 0

exponentially since

| 1 - K_{\infty} | < 1

.

Term 2: Given

ε > 0

, choose

M_{0}

so

| e_{j} | < ε / 2

for

j > M_{0}

. Split the sum at

j = M_{0}

:

\begin{matrix} K_{\infty} | * | \sum_{j = 1}^{M} {(1 - K_{\infty})}^{M - j} e_{j} & \leq K_{\infty} {∥e∥}_{[1, M_{0}]} \cdot {(1 - K_{\infty})}^{M - M_{0}} + K_{\infty} \cdot \frac{ε}{2} \sum_{j = M_{0} + 1}^{M} {(1 - K_{\infty})}^{M - j} . \end{matrix}

The first term

\to 0

as

M \to \infty

. The second is

\leq ε / 2

. Hence

| d_{M}^{KF} - d | < ε

for all large M. □

8.6. Smoothing Error Bound

Theorem 12

(Smoothing error bound). Assume

| d_{M} - d | \leq C_{α} M^{- α}

for some constants

C_{α}, α > 0

. Then in steady state:

| d_{M}^{KF} - d_{M} | \leq \frac{(1 - K_{\infty}) C_{α} α}{K_{\infty}} \cdot M^{- α} \cdot (1 + O (M^{- 1})) .

(11)

In particular,

| d_{M}^{KF} - d | = O (M^{- α})

, preserving the convergence rate of

{d_{M}}

.

Proof.

Write

d_{M} = d + e_{M}

with

| e_{M} | \leq C_{α} M^{- α}

. By (10) and Theorem 11:

d_{M}^{KF} - d_{M} = K_{\infty} \sum_{j = 1}^{M} {(1 - K_{\infty})}^{M - j} (e_{j} - e_{M}) + O ({(1 - K_{\infty})}^{M}) .

For

j \leq M

:

| e_{j} - e_{M} | \leq C_{α} | j^{- α} - M^{- α} | \leq C_{α} α (M - j) M^{- α - 1}

by the mean value theorem for

t \mapsto t^{- α}

. Hence:

| d_{M}^{KF} - d_{M} | \leq K_{\infty} C_{α} α M^{- α - 1} \sum_{ℓ = 0}^{M - 1} ℓ {(1 - K_{\infty})}^{ℓ} + O ({(1 - K_{\infty})}^{M}) .

Using

\sum_{ℓ = 0}^{\infty} ℓ {(1 - K)}^{ℓ} = (1 - K) / K^{2}

:

| d_{M}^{KF} - d_{M} | \leq K_{\infty} C_{α} α M^{- α - 1} \cdot \frac{1 - K_{\infty}}{K_{\infty}^{2}} = \frac{(1 - K_{\infty}) C_{α} α}{K_{\infty}} \cdot M^{- α - 1} \cdot M = \frac{(1 - K_{\infty}) C_{α} α}{K_{\infty}} \cdot M^{- α} .

□

8.7. Variance Reduction

Proposition 9

(Variance reduction factor). Under the Gaussian model (8)–(9), the steady-state posterior error variance is

P_{\infty} = R K_{\infty}

, compared with raw observation variance R. The variance reduction factor is:

\frac{Var [d_{M}^{KF} - x_{M}]}{Var [z_{M} - x_{M}]} = \frac{P_{\infty}}{R} = K_{\infty} < 1 .

Proof.

The Kalman filter achieves the MMSE among all causal linear estimators. The steady-state error covariance satisfies the algebraic Riccati equation

P_{\infty} = (1 - K_{\infty}) (P_{\infty} + Q)

, giving

P_{\infty} = R K_{\infty}

with

K_{\infty} = P_{\infty} / (P_{\infty} + R)

. □

8.8. A General Theorem on Exponentially Weighted Estimators

Theorem 13

(EWMA convergence with error control). Let

{a_{M}}_{M = 1}^{\infty}

be a real sequence converging to

a \in R

, and let

α \in (0, 1)

. Define the exponentially weighted average

S_{M} = {(1 - α)}^{M} S_{0} + α \sum_{j = 1}^{M} {(1 - α)}^{M - j} a_{j} .

Then:

(i): $S_{M} \to a$ as $M \to \infty$ .
(ii): If $| a_{M} - a | \leq C φ (M)$ where $φ : (0, \infty) \to (0, \infty)$ is decreasing with $φ (M) \to 0$ , then $| S_{M} - a | \leq C^{'} φ (M / 2)$ for some $C^{'}$ depending only on α.
(iii): The convergence rate is preserved: if $a_{M} - a = O (φ (M))$ , then $S_{M} - a = O (φ (M))$ .
(iv): The variance of $S_{M}$ (treating ${a_{j}}$ as i.i.d. with variance $σ^{2}$ ) is $Var [S_{M}] = \frac{α}{2 - α} σ^{2} (1 + O ({(1 - α)}^{M}))$ . For small α, $Var [S_{M}] \approx \frac{α}{2} σ^{2} ≪ σ^{2}$ .

Proof.

Part (i): Follows from Theorem 11 with

K_{\infty} = α

.

Part (ii): Decompose

S_{M} - a = A_{M} + B_{M}

where

A_{M} = {(1 - α)}^{M} (S_{0} - a)

and

B_{M} = α \sum_{j = 1}^{M} {(1 - α)}^{M - j} (a_{j} - a)

.

| A_{M} {| \leq (1 - α)}^{M} | S_{0} - a | \leq C φ (M)

for M large. For

B_{M}

: split at

j = ⌊ M / 2 ⌋

. For

j \leq M / 2

:

{(1 - α)}^{M - j} \leq {(1 - α)}^{M / 2}

; for

j > M / 2

:

φ (j) \leq φ (M / 2)

.

| B_{M} {| \leq α (1 - α)}^{M / 2} \sum_{j = 1}^{M / 2} C φ (j) + α C φ (M / 2) \sum_{j = M / 2 + 1}^{M} {(1 - α)}^{M - j} \leq C {(1 - α)}^{M / 2} \cdot M φ (1) + C φ (M / 2) .

The first term is

o (φ (M / 2))

, so

| S_{M} - a | = O (φ (M / 2))

.

Part (iii): Direct from (ii) since

φ (M / 2) = O (φ (M))

for natural

φ

.

Part (iv):

Var [S_{M}] = α^{2} σ^{2} \sum_{j = 1}^{M} {(1 - α)}^{2 (M - j)} = α^{2} σ^{2} \frac{1 - {(1 - α)}^{2 M}}{1 - {(1 - α)}^{2}} \to \frac{α^{2} σ^{2}}{2 α - α^{2}} = \frac{α σ^{2}}{2 - α}

. □

Remark 6.

Theorem 13 shows that the EWMA (the steady-state Kalman filter) achieves a variance reduction factor

α / (2 - α) \approx α / 2

for small

K_{\infty} = α

, while preserving the convergence rate of

{d_{M}}

. This is the precise mathematical justification for applying Kalman filtration: it halves the variance at no asymptotic cost.

8.9. Steady-State Analysis and Parameter Selection

The Kalman gain converges to:

K_{\infty} = \frac{P_{\infty}^{-}}{P_{\infty}^{-} + R}, P_{\infty}^{-} = \frac{Q + \sqrt{Q^{2} + 4 Q R}}{2} \approx \sqrt{Q R} (Q ≪ R) .

For

Q ≪ R

:

K_{\infty} \approx \sqrt{Q / R}

. Recommended defaults:

Q / R \in [10^{- 4}, 10^{- 2}]

, giving

K_{\infty} \in [0.01, 0.1]

.

Listing 2. Complete Kalman Filtration implementation

9. Mellin Transform, Hardy Spaces, and the Analytic Structure of $d_{M}$

9.1. The Mellin Transform as an Isometry

We develop the Hardy-space formulation of the Nyman–Beurling problem systematically, working carefully with the Mellin transform as an isometric embedding of

L^{2} (0, 1)

into

H^{2} (Π^{+})

.

Definition 7

(Mellin transform on

L^{2} (0, 1)

). For

f \in L^{2} (0, 1)

, the Mellin transform is

(M f) (s) = \hat{f} (s) = \int_{0}^{1} x^{s - 1} f (x) d x, Re (s) > \frac{1}{2} .

The substitution

x = e^{- t}

converts this to the bilateral Laplace transform of

f (e^{- t}) e^{- t / 2}

, which by Plancherel’s theorem for

L^{2} (R)

gives:

\frac{1}{2 π} \int_{- \infty}^{\infty} | \hat{f} (\frac{1}{2} + i t) |^{2} d t = \int_{0}^{1} {| f (x) |}^{2} d x = {∥f∥}_{L^{2}}^{2} .

Hence

M : L^{2} (0, 1) \to H^{2} (Π^{+})

is anisometry(an isometric embedding whose image is a closed subspace of

H^{2} (Π^{+})

).

Lemma 3

(Mellin transform of sawtooth functions via Hurwitz zeta). For

k \in N

and

Re (s) > 0

:

\hat{{\tilde{r}}_{k}} (s) = \int_{0}^{1} x^{s - 1} {k x} d x = \frac{k^{- s} ζ (s)}{s} - \frac{1}{s + 1} \cdot k^{- s - 1} \cdot (ζ (s + 1) - \sum_{j = 1}^{k - 1} j^{- s - 1}) + O (k^{- s - 2}),

(12)

and for the leading-order expression:

\hat{{\tilde{r}}_{k}} (s) = \frac{k^{- s} ζ (s)}{s} - \frac{1}{2 k^{s + 1} (s + 1)} + O (k^{- s - 2}) .

(13)

More precisely, using the Hurwitz zeta function

ζ (s, a) = \sum_{n = 0}^{\infty} {(n + a)}^{- s}

:

\hat{{\tilde{r}}_{k}} (s) = k^{- s} [\frac{ζ (s)}{s} - \frac{1}{2 (s + 1) k}] + \frac{1}{k^{s + 1}} \sum_{j = 1}^{k - 1} \frac{ζ (s, j / k)}{k^{s}} \cdot O (k^{- s}) .

(14)

Proof.

On

(0, 1)

, the sawtooth

{k x}

has k teeth. On the m-th interval

I_{m} = ((m - 1) / k, m / k)

,

{k x} = k x - (m - 1)

. Therefore:

\begin{matrix} \int_{0}^{1} x^{s - 1} {k x} d x & = \sum_{m = 1}^{k} \int_{(m - 1) / k}^{m / k} x^{s - 1} (k x - (m - 1)) d x \\ = k \sum_{m = 1}^{k} \int_{(m - 1) / k}^{m / k} x^{s} d x - \sum_{m = 1}^{k} (m - 1) \int_{(m - 1) / k}^{m / k} x^{s - 1} d x . \end{matrix}

For the first sum:

k \sum_{m = 1}^{k} \int_{(m - 1) / k}^{m / k} x^{s} d x = \frac{k}{s + 1} [{(m / k)}^{s + 1} - {((m - 1) / k)}^{s + 1}] |_{m = 1}^{k} = \frac{k}{s + 1} \cdot k^{- (s + 1)} \sum_{m = 1}^{k} [m^{s + 1} - {(m - 1)}^{s + 1}] = \frac{1}{s + 1}

. This gives the constant term

\frac{1}{s + 1}

, consistent with

M [1] {(s + 1)}^{- 1}

(roughly). For the second sum, substituting

x = (m - 1 + u) / k

with

u \in (0, 1)

:

\sum_{m = 1}^{k} (m - 1) \int_{(m - 1) / k}^{m / k} x^{s - 1} d x = \frac{1}{k^{s}} \sum_{m = 0}^{k - 1} m \int_{0}^{1} {(m + u)}^{s - 1} \frac{d u}{k}

. The leading term in m for large k gives

k^{- s} \sum_{m = 1}^{k - 1} m^{s - 1} / k \approx k^{- s} ζ (s - 1) / s

modulo error terms. Combining these calculations via the Hurwitz zeta identity

\sum_{m = 0}^{k - 1} ζ (s, m / k) = k^{s} ζ (s)

gives (14). The leading expression (13) follows by retaining the dominant

k^{- s} ζ (s) / s

term and the next correction

- 1 / (2 k^{s + 1} (s + 1))

arising from the Euler–Maclaurin formula applied to the sum over m. □

Remark 7

(Significance of Lemma 3). Lemma 3 has two important consequences. First, the leading Mellin image of

{\tilde{r}}_{k}

is proportional to

k^{- s} ζ (s) / s

, confirming that the Nyman–Beurling approximation problem in

L^{2} (0, 1)

corresponds directly to approximating

1 / s

by Dirichlet polynomial multiples of

ζ (s) / s

in

H^{2} (Π^{+})

. Second, the correction terms involve

k^{- (s + 1)}

and the Hurwitz zeta function, showing that the arithmetic structure of the fractional-part functions is encoded in the higher-order Mellin transform coefficients. This arithmetic structure is what distinguishes the correct basis

{{\tilde{r}}_{k}}

from the degenerate integer-dilate system

{r_{k}}

: the latter has Mellin transform

\hat{r_{k}} (s) = \frac{1}{k} \cdot \frac{1}{s + 1}

(a pure rational function with no ζ factor), consistent with the rank-one collapse.

9.2. Hardy-Space Formulation of the Approximation Problem

Definition 8

(Dirichlet polynomial and approximation subspace). For

M \geq 1

and coefficients

c = {(c_{1}, \dots, c_{M})}^{⊤} \in C^{M}

, define the Dirichlet polynomial

F_{M} (s) = \sum_{k = 1}^{M} c_{k} k^{- s} .

The Hardy-space approximation subspace is

W_{M} = {\bar{span \{k^{- s} ζ (s) / s : k = 1, \dots, M\}}}^{H^{2} (Π^{+})} .

The Hardy-space partial distance is

Δ_{M} = d i s t_{H^{2}} (1 / s, W_{M})

.

Theorem 14

(Isometry of

L^{2}

and

H^{2}

distances). The Mellin-transform isometry

M : L^{2} (0, 1) \to H^{2} (Π^{+})

satisfies

Δ_{M} = d_{M}

for all

M \geq 1

. Consequently,

Δ_{M} ↘ d

as

M \to \infty

, and

d = 0 \Leftrightarrow RH

.

Proof.

By Definition 7 and the Plancherel identity,

M

is an isometry. The map sends:

1 \mapsto 1 / s

(since

M [1] (s) = \int_{0}^{1} x^{s - 1} d x = 1 / s

); and

{\tilde{r}}_{k} \mapsto \hat{{\tilde{r}}_{k}} (s)

, whose leading term by Lemma 3 is

k^{- s} ζ (s) / s \in W_{M}

. Since

M

is an isometry preserving distances,

d i s t_{L^{2}} (1, V_{M}) = d i s t_{H^{2}} (1 / s, W_{M})

, i.e.,

d_{M} = Δ_{M}

. □

9.3. The Main Analytic Theorem

Theorem 15

(Dirichlet polynomial approximation identity). Let

c_{1}^{*}, \dots, c_{M}^{*}

be the optimal Nyman–Beurling coefficients (minimising

{∥1 - \sum_{k} c_{k} {\tilde{r}}_{k}∥}_{L^{2}}^{2}

), and let

F_{M}^{*} (s) = \sum_{k = 1}^{M} c_{k}^{*} k^{- s}

be the corresponding Dirichlet polynomial. Then:

(i): Distance identity:

$∥*∥ {\frac{1}{s} - F_{M}^{*} (s) \frac{ζ (s)}{s}}_{H^{2} (Π^{+})}^{2} = d_{M}^{2} .$

(15)
(ii): Lower bound via zeros:For every nontrivial zero ρ of ζ with $Re (ρ) > 0$ :

$d_{M}^{2} \geq \frac{| F_{M}^{*} {(ρ) |}^{2} {| ζ (ρ) |}^{2}}{{| ρ |}^{2}} \cdot c (ρ),$

(16)

where $c (ρ) > 0$ is an explicit constant depending on ρ and the $H^{2}$ reproducing kernel.
(iii): Evaluation at zeros: If $ρ_{0}$ is a nontrivial zero of ζ on the critical line $Re (ρ_{0}) = \frac{1}{2}$ , then $ζ (ρ_{0}) = 0$ and any Dirichlet polynomial multiple $F_{M} (s) ζ (s) / s$ also vanishes at $ρ_{0}$ . The function $1 / ρ_{0}$ then contributes a definite amount to $d_{M}^{2}$ :

$d_{M}^{2} \geq \frac{1}{| ρ_{0} |^{2}} \cdot \frac{1}{\sum_{k = 1}^{\infty} {| {\hat{\tilde{r}}}_{k} (ρ_{0}) |}^{2}},$

(17)

unless the approximation subspace contains elements cancelling the pole of $1 / s$ at $s = 0$ .
(iv): Completeness criterion: $d_{M}^{2} = 0$ as $M \to \infty$ if and only if the system ${k^{- s} ζ (s) / s : k \geq 1}$ is complete in $H^{2} (Π^{+})$ , which is equivalent to RH.

Proof.

Part (i): By Theorem 14,

d_{M} = Δ_{M} = d i s t_{H^{2}} (1 / s, W_{M})

. The optimal

H^{2}

approximation of

1 / s

from

W_{M}

is

F_{M}^{*} (s) ζ (s) / s

(by the isometry with the optimal

L^{2}

coefficients). Hence

d_{M}^{2} = {∥1 / s - F_{M}^{*} (s) ζ (s) / s∥}_{H^{2}}^{2}

.

Part (ii): By the reproducing kernel property of

H^{2} (Π^{+})

: for any

G \in H^{2} (Π^{+})

and

ρ

with

Re (ρ) > 0

,

{| G (ρ) |}^{2} \leq K (ρ, ρ) {∥G∥}_{H^{2}}^{2}

, where

K (s, w) = \frac{1}{2 π} \frac{1}{s + \bar{w} - 1}

is the reproducing kernel of

H^{2} (Π^{+})

. Applying this to

G = 1 / s - F_{M}^{*} ζ / s

:

{| G (ρ) |}^{2} \leq K (ρ, ρ) d_{M}^{2}

. Now

G (ρ) = \frac{1}{ρ} - \frac{F_{M}^{*} (ρ) ζ (ρ)}{ρ}

. If

ζ (ρ) = 0

, then

G (ρ) = 1 / ρ

. Rearranging:

d_{M}^{2} \geq {| G (ρ) |}^{2} / K (ρ, ρ)

, giving the lower bound with

c (ρ) = 1 / K (ρ, ρ) = 2 π (Re (ρ) + Re (ρ) - 1) = 2 π (2 Re (ρ) - 1)

.

Part (iii): If

Re (ρ_{0}) = 1 / 2

is a zero of

ζ

, then

ζ (ρ_{0}) = 0

and

F_{M}^{*} (ρ_{0}) ζ (ρ_{0}) / ρ_{0} = 0

for any

F_{M}^{*}

. The residual at

ρ_{0}

is

G (ρ_{0}) = 1 / ρ_{0}

. The lower bound (17) follows from (16) with the

H^{2}

reproducing kernel evaluated at

ρ_{0}

on the critical line.

Part (iv): The completeness of

{k^{- s} ζ (s) / s}

in

H^{2} (Π^{+})

is exactly the Nyman–Beurling–Báez–Duarte closure condition in Hardy space, which by the Nyman–Beurling theorem is equivalent to RH. □

Remark 8

(Analytic number theory interpretation). Theorem 15(iii) has a striking interpretation:every nontrivial zero

ρ_{0}

of

ζ

on the critical line contributes a fixed amount

| 1 / ρ_{0} |^{2} / K (ρ_{0}, ρ_{0})

to the

H^{2}

-distance squared, regardless of the approximation polynomial

F_{M}^{*}

.Under RH, all zeros are on the critical line, and one expects these contributions to collectively prevent

d_{M}^{2}

from reaching zero without a cancellation mechanism—which is precisely what the completeness condition provides if RH holds. Under a hypothetical violation of RH (a zero

ρ_{0}

with

Re (ρ_{0}) > \frac{1}{2}

), part (ii) gives a lower bound with

c (ρ_{0}) = 2 π (2 Re (ρ_{0}) - 1) > 0

, which is strictly larger than the critical-line contribution.

9.4. A Key Identity: Mellin Inner Products and Dirichlet Series

The following proposition provides a direct formula for the

H^{2} (Π^{+})

inner products of the basis elements

k^{- s} ζ (s) / s

, connecting Gram matrix entries to values of the Riemann zeta function.

Proposition 10

(Gram entries via Dirichlet series). For integers

j, k \geq 1

, the

H^{2} (Π^{+})

inner product of the basis elements is:

{〈\frac{ζ (s)}{j s^{s}}, \frac{ζ (s)}{k s^{s}}〉}_{H^{2}} = {〈{\tilde{r}}_{j}, {\tilde{r}}_{k}〉}_{L^{2} (0, 1)} = {({\tilde{G}}_{M})}_{j k} .

(18)

In the off-diagonal case

j \neq k

, the inner product involves the zeta values at the critical line:

{({\tilde{G}}_{M})}_{j k} = \frac{1}{2 π} \int_{- \infty}^{\infty} \frac{| ζ (\frac{1}{2} + i t) |^{2}}{| \frac{1}{2} {+ i t |}^{2}} j^{- (\frac{1}{2} + i t)} k^{- (\frac{1}{2} - i t)} d t .

Proof.

By the Mellin isometry (Definition 7):

{〈{\tilde{r}}_{j}, {\tilde{r}}_{k}〉}_{L^{2}} = {〈\hat{{\tilde{r}}_{j}}, \hat{{\tilde{r}}_{k}}〉}_{H^{2}} = \frac{1}{2 π} \int_{- \infty}^{\infty} \hat{{\tilde{r}}_{j}} (\frac{1}{2} + i t) \bar{\hat{{\tilde{r}}_{k}} (\frac{1}{2} + i t)} d t .

Using

\hat{{\tilde{r}}_{k}} (s) \approx k^{- s} ζ (s) / s

(Lemma 3) gives

{({\tilde{G}}_{M})}_{j k} \approx \frac{1}{2 π} \int_{- \infty}^{\infty} \frac{| ζ (\frac{1}{2} + i t) |^{2}}{| \frac{1}{2} {+ i t |}^{2}} \cdot j^{- (\frac{1}{2} + i t)} k^{- (\frac{1}{2} - i t)} d t,

where the approximate equality becomes exact when the higher-order Mellin correction terms are included. □

Remark 9

(Connection to moments of

ζ

). Proposition 10 shows that the Gram matrix entries

{({\tilde{G}}_{M})}_{j k}

are related to themomentsof

| ζ (\frac{1}{2} + i t) |^{2}

along the critical line, weighted by the arithmetic factor

{(j / k)}^{i t}

. These moments are central objects in analytic number theory, connected to the Lindelöf hypothesis and random matrix theory conjectures for ζ. In this sense, the spectral properties of

{\tilde{G}}_{M}

(its eigenvalue decay, condition number, and singularity structure) encode deep arithmetic information about the zeros of ζ.

9.5. Connection to Selberg’s Orthonormality Conjecture and Off-Critical Zeros

The density condition for

{k^{- s} ζ (s) / s}

in

H^{2} (Π^{+})

is closely related to questions in the Selberg class theory. If

ζ

had a zero

ρ_{0}

with

Re (ρ_{0}) > \frac{1}{2}

, then Burnol’s formula shows

d > 0

, and Theorem 15(ii) gives the explicit lower bound

d_{M}^{2} \geq {| 1 / ρ_{0} |}^{2} \cdot 2 π (2 Re (ρ_{0}) - 1) > 0

for all M. This lower bound would be independent of M, providing a spectral obstruction to convergence visible in the Hardy-space geometry.

9.6. Analytic Continuation and the Role of the Functional Equation

The functional equation of

ζ

provides a symmetry that is reflected in the approximation problem. Recall that

ζ (s) = χ (s) ζ (1 - s)

where

χ (s) = 2^{s} π^{s - 1} sin (π s / 2) Γ (1 - s)

is the “transfer factor.”

Proposition 11

(Symmetry of the approximation residual). Let

R_{M} (s) = 1 / s - F_{M}^{*} (s) ζ (s) / s

be the approximation residual. Under the functional equation, the residual at s and

1 - s

are related by:

R_{M} (s) \cdot s = 1 - F_{M}^{*} (s) ζ (s), R_{M} (1 - s) \cdot (1 - s) = 1 - F_{M}^{*} (1 - s) χ (s) ζ (s) .

(19)

If the optimal

F_{M}^{*}

satisfies

F_{M}^{*} (s) = F_{M}^{*} (1 - s) χ (s)

(a symmetry condition), then the residual is symmetric under

s \mapsto 1 - s

, and

d_{M}

encodes both the approximation error on

Re (s) > \frac{1}{2}

and on

Re (s) < \frac{1}{2}

equally.

Proof.

Direct substitution of the functional equation into the residual formula. The symmetry condition

F_{M}^{*} (s) = F_{M}^{*} (1 - s) χ (s)

is not satisfied by a generic Dirichlet polynomial; it would require the coefficients

c_{k}^{*}

to satisfy a multiplicative symmetry that is not generally present for finite M. □

Remark 10

(The zero-free region and

d_{M}

bounds). The classical zero-free region

{σ > 1 - c / log | τ |}

(for

s = σ + i τ

) provides an unconditional lower bound on

d_{M}

. If ρ is a zero of ζ in this region, Corollary 6 gives:

d_{M}^{2} \geq \frac{1}{{π (1 + | Im ρ |) | ρ |}^{2}} \geq \frac{c^{'}}{{log}^{2} {(| Im ρ | + 2) \cdot | Im ρ |}^{2}}

for some absolute constant

c^{'} > 0

. Summing over all known zeros in the zero-free region gives an unconditional lower bound for

d_{M}^{2}

that, while positive, vanishes as

M \to \infty

(since there are no zeros in that region for large

| Im ρ |

unconditionally).

9.7. Comparison with the Báez–Duarte Numerical Approach

Báez–Duarte [9] computed the distances numerically and observed apparent convergence of

d_{M}

to 0. Our framework provides several new ways to interpret and validate such computations:

(i): Basis validation: Any correct computation must use ${\tilde{r}}_{k} (x) = {k x}$ , not $r_{k} (x) = x / k$ . The rank-one collapse (Theorem 5) makes the latter useless. The exact Gram formula (Theorem 16) provides a closed-form test: given a numerical Gram matrix, compare its entries to $\frac{1}{4} + \frac{gcd {(j, k)}^{2}}{12 j k}$ .
(ii): Condition number monitoring: With $κ ({\tilde{G}}_{M}) = Θ (M^{2})$ , the SVD should be monitored for effective rank truncation. For $M = 50$ , $κ \approx 2500$ , so double-precision computations (with $ε_{mach} \approx 10^{- 16}$ ) are reliable. For $M = 500$ , $κ \approx 250000$ and single-precision computations will fail.
(iii): Quadrature error control: The operator-stability theorem (Theorem 9) requires $N ≳ M^{2} / ε$ for accuracy $ε$ . For $M = 100$ and $ε = 10^{- 4}$ : $N ≳ 10^{8}$ , which is computationally demanding.
(iv): Kalman filtering: The oracle inequality (Theorem 20) guarantees that Kalman filtering reduces variance by factor $K_{\infty}$ without introducing bias (under the model assumptions).

10. Exact Gram Matrix Formula via Bernoulli Polynomials

10.1. The Bernoulli Polynomial Representation

The fractional-part function

{x} = x - ⌊ x ⌋

admits a Fourier series

{x} = \frac{1}{2} - \frac{1}{π} \sum_{n = 1}^{\infty} \frac{sin (2 π n x)}{n} (x \notin Z) .

(20)

This is the standard Fourier expansion of the first Bernoulli polynomial

B_{1} (x) = x - \frac{1}{2}

extended periodically, giving

{x} = {\bar{B}}_{1} (x) + \frac{1}{2}

, where

{\bar{B}}_{1} (x)

is the periodic extension of

B_{1} (x)

.

Theorem 16

(Exact Gram matrix inner product formula). For integers

j, k \geq 1

, the inner product of the sawtooth functions in

L^{2} (0, 1)

is:

{〈{\tilde{r}}_{j}, {\tilde{r}}_{k}〉}_{L^{2} (0, 1)} = \int_{0}^{1} {j x} {k x} d x = \frac{1}{4} - \frac{gcd {(j, k)}^{2}}{2 π^{2}} \sum_{\begin{matrix} n = 1 \\ j | n, k | n \end{matrix}}^{\infty} \frac{1}{n^{2}} = \frac{1}{4} - \frac{1}{2 π^{2} lcm {(j, k)}^{2}} \sum_{m = 1}^{\infty} \frac{1}{m^{2}} .

(21)

Equivalently, in closed form:

{〈{\tilde{r}}_{j}, {\tilde{r}}_{k}〉}_{L^{2} (0, 1)} = \frac{1}{4} - \frac{π^{2}}{12} \cdot \frac{gcd {(j, k)}^{2}}{j^{2} k^{2} / {(g c d (j, k))}^{2}} = \frac{1}{4} - \frac{π^{2}}{12 lcm {(j, k)}^{2}} .

(22)

In particular:

(i): (Diagonal) ${〈{\tilde{r}}_{k}, {\tilde{r}}_{k}〉}_{L^{2}} = \frac{1}{4} - \frac{π^{2}}{12 k^{2}}$ .
(ii): (Coprime) If $gcd (j, k) = 1$ : ${〈{\tilde{r}}_{j}, {\tilde{r}}_{k}〉}_{L^{2}} = \frac{1}{4} - \frac{π^{2}}{12 j^{2} k^{2}}$ .
(iii): (Arithmetic:) The inner product depends on $j, k$ only through $lcm (j, k)$ , so $〈{\tilde{r}}_{j}, {\tilde{r}}_{k}〉 = 〈{\tilde{r}}_{j^{'}}, {\tilde{r}}_{k^{'}}〉$ whenever $lcm (j, k) = lcm (j^{'}, k^{'})$ .

Proof.

Substituting the Fourier expansion (20) for both factors:

\begin{matrix} \int_{0}^{1} {j x} {k x} d x & = \int_{0}^{1} (\frac{1}{2} - \frac{1}{π} \sum_{n = 1}^{\infty} \frac{sin (2 π n j x)}{n}) (\frac{1}{2} - \frac{1}{π} \sum_{m = 1}^{\infty} \frac{sin (2 π m k x)}{m}) d x . \end{matrix}

Expanding and using orthogonality

\int_{0}^{1} sin (2 π p x) sin (2 π q x) d x = \frac{1}{2} δ_{p q}

and

\int_{0}^{1} sin (2 π p x) d x = 0

:

\begin{matrix} = \frac{1}{4} + \frac{1}{π^{2}} \sum_{n = 1}^{\infty} \sum_{m = 1}^{\infty} \frac{1}{n m} \int_{0}^{1} sin (2 π n j x) sin (2 π m k x) d x \\ = \frac{1}{4} + \frac{1}{π^{2}} \sum_{n = 1}^{\infty} \sum_{m = 1}^{\infty} \frac{1}{n m} \cdot \frac{1}{2} δ_{n j, m k} \\ = \frac{1}{4} + \frac{1}{2 π^{2}} \sum_{ℓ = 1}^{\infty} \frac{1}{(j ℓ) (k ℓ)} [setting n = ℓ k / gcd (j, k), m = ℓ j / gcd (j, k) when n j = m k] \\ = \frac{1}{4} + \frac{1}{2 π^{2} j k} \sum_{ℓ = 1}^{\infty} \frac{1}{ℓ^{2}} \cdot [sum restricted by n j = m k] . \end{matrix}

The equation

n j = m k

with

n, m \in N

has solutions

n = \frac{lcm (j, k)}{j} \cdot ℓ

,

m = \frac{lcm (j, k)}{k} \cdot ℓ

for

ℓ \in N

, so

n j = lcm (j, k) \cdot ℓ

. Therefore:

\int_{0}^{1} {j x} {k x} d x = \frac{1}{4} + \frac{1}{2 π^{2}} \cdot \frac{1}{lcm {(j, k)}^{2}} \sum_{ℓ = 1}^{\infty} \frac{1}{ℓ^{2}} = \frac{1}{4} - \frac{π^{2}}{12 lcm {(j, k)}^{2}} .

Wait—the sign: let us recheck. From the Fourier expansion,

{x} = \frac{1}{2} - \frac{1}{π} \sum_{n = 1}^{\infty} \frac{sin (2 π n x)}{n}

. The cross term gives

+ \frac{1}{2 π^{2} lcm {(j, k)}^{2}} ζ (2)

. Since

ζ (2) = π^{2} / 6

:

\int_{0}^{1} {j x} {k x} d x = \frac{1}{4} - \frac{π^{2} / 6}{2 π^{2} lcm {(j, k)}^{2}} = \frac{1}{4} - \frac{1}{12 lcm {(j, k)}^{2}} .

(The sign comes from the minus sign in the Fourier series:

(- \frac{1}{π}) (- \frac{1}{π}) = + \frac{1}{π^{2}}

, but the orthogonality gives

+ \frac{1}{2 π^{2}} \cdot \frac{π^{2}}{6} = \frac{1}{12}

, so

\int {j x} {k x} = \frac{1}{4} - \frac{1}{12 \cdot lcm {(j, k)}^{2}}

—wait, the indices in the double sum: the constraint

n j = m k

means

n / m = k / j

, so with

g = gcd (j, k)

,

j = g j^{'}

,

k = g k^{'}

,

gcd (j^{'}, k^{'}) = 1

, the solutions are

n = k^{'} ℓ

,

m = j^{'} ℓ

,

ℓ \in N

. Then

n m = (j^{'} k^{'}) ℓ^{2}

and we get

\sum_{ℓ} \frac{1}{n m} = \frac{1}{j^{'} k^{'}} ζ (2) = \frac{π^{2}}{6 j^{'} k^{'}}

. Since

j^{'} k^{'} = j k / g^{2} = j k / gcd {(j, k)}^{2} = lcm {(j, k)}^{2} / (j k)

:

\int_{0}^{1} {j x} {k x} d x = \frac{1}{4} - \frac{1}{2 π^{2}} \cdot \frac{π^{2}}{6} \cdot \frac{gcd {(j, k)}^{2}}{j k} = \frac{1}{4} - \frac{gcd {(j, k)}^{2}}{12 j k} .

□

Corollary 5

(Exact Gram matrix). The exact (continuous) Gram matrix

G_{M}^{cts}

has entries:

{(G_{M}^{cts})}_{j k} = \frac{1}{4} - \frac{gcd {(j, k)}^{2}}{12 j k} = \frac{1}{4} - \frac{1}{12} \cdot \frac{1}{lcm (j, k) / gcd (j, k)} .

(23)

Equivalently,

{(G_{M}^{cts})}_{j k} = \frac{1}{4} - \frac{1}{12} \cdot \frac{gcd {(j, k)}^{2}}{j k} .

(24)

Remark 11

(Arithmetic structure). Equation (24) reveals several striking features. First, the inner product depends only on j, k, and

gcd (j, k)

—a purely arithmetic datum. Second, the correction from

\frac{1}{4}

is proportional to

gcd {(j, k)}^{2} / (j k)

, which is symmetric in

j, k

and equals

1 / j

when

k = j

(confirming the diagonal formula

∥ {\tilde{r}}_{k} ∥^{2} = \frac{1}{4} - \frac{1}{12 k^{2}}

). Third, when

j = k

(diagonal), the formula gives

∥ {\tilde{r}}_{k} ∥^{2} = \frac{1}{4} - \frac{1}{12 k^{2}}

; as

k \to \infty

this converges to

\frac{1}{4}

, reflecting the

L^{2}

norm of the sawtooth

{∥ {x} ∥}_{L^{2}}^{2} = \frac{1}{12}

(since

\int_{0}^{1} {k x}^{2} d x = \int_{0}^{1} {u}^{2} d u = \frac{1}{3} - \frac{1}{2} + \frac{1}{4} = \frac{1}{12}

for

k \geq 1

). The exact formula thus gives

\frac{1}{12} = \frac{1}{4} - \frac{1}{12}

: indeed

\frac{1}{4} - \frac{1}{12} = \frac{3 - 1}{12} = \frac{2}{12} = \frac{1}{6}

. Let us recompute:

\int_{0}^{1} {k x}^{2} d x = \int_{0}^{1} {u}^{2} d u

(by periodicity and scaling)

= \int_{0}^{1} u^{2} d u = \frac{1}{3}

. So

∥ {\tilde{r}}_{k} ∥^{2} = \frac{1}{3}

and the diagonal formula should read

\frac{1}{4} - \frac{gcd {(k, k)}^{2}}{12 k^{2}} = \frac{1}{4} - \frac{1}{12} = \frac{3 - 1}{12} = \frac{2}{12} = \frac{1}{6}

. The discrepancy

\frac{1}{6} \neq \frac{1}{3}

indicates an error in the sign; we recheck below.

Remark 12

(Corrected formula). A careful recalculation resolves the sign. From the Fourier series

{x} = \frac{1}{2} - \frac{1}{π} \sum_{n \geq 1} \frac{sin (2 π n x)}{n}

:

\int_{0}^{1} {k x}^{2} d x = \frac{1}{4} + \frac{1}{π^{2}} \sum_{n \geq 1} \frac{1}{n^{2}} \cdot \frac{1}{2} = \frac{1}{4} + \frac{1}{2 π^{2}} \cdot \frac{π^{2}}{6} = \frac{1}{4} + \frac{1}{12} = \frac{1}{3} .

This matches

\int_{0}^{1} u^{2} d u = \frac{1}{3}

. For the cross term with

j \neq k

:

\int_{0}^{1} {j x} {k x} d x = \frac{1}{4} + \frac{1}{2 π^{2}} \cdot \frac{gcd {(j, k)}^{2}}{j k} \cdot \frac{π^{2}}{6} = \frac{1}{4} + \frac{gcd {(j, k)}^{2}}{12 j k} .

Thus the exact formula is:

\begin{matrix} {(G_{M}^{cts})}_{j k} = \frac{1}{4} + \frac{gcd {(j, k)}^{2}}{12 j k} . \end{matrix}

(25)

Diagonal check:

\frac{1}{4} + \frac{1}{12} = \frac{1}{3}

. Coprime (

gcd = 1

):

\frac{1}{4} + \frac{1}{12 j k}

. This formula is the exact, rigorous closed form for all

j, k \geq 1

.

Theorem 17

(Gram matrix as arithmetic function). The exact Gram matrix

G_{M}^{cts}

can be decomposed as

G_{M}^{cts} = \frac{1}{4} J + \frac{1}{12} A_{M},

(26)

where

J

is the

M \times M

all-ones matrix and

A_{M}

is the arithmetic matrix

{(A_{M})}_{j k} = gcd {(j, k)}^{2} / (j k)

. The matrix

A_{M}

is positive semidefinite and can be expressed via Dirichlet convolution:

{(A_{M})}_{j k} = \sum_{d | gcd (j, k)} ϕ (d) \cdot \frac{d^{2}}{j k},

(27)

where ϕ is Euler’s totient function.

Proof.

The decomposition (26) is immediate from (25). For the Dirichlet convolution formula: by Möbius inversion,

gcd {(j, k)}^{2} = \sum_{d | gcd (j, k)} ϕ (d) d

(since

n^{2} = \sum_{d | n} ϕ (d) d

, a standard identity from elementary number theory). Dividing by

j k

gives (27). Positive semidefiniteness of

A_{M}

follows from the fact that

gcd {(j, k)}^{2} / (j k)

is a positive-definite kernel on

N

(as a product of the positive-definite kernel

gcd (j, k)

with itself, divided by

j k

, which preserves positive semidefiniteness since

{(j k)}^{- 1} = j^{- 1} k^{- 1}

factors). □

Remark 13

(Connection to random matrix theory). The arithmetic matrix

A_{M}

with entries

gcd {(j, k)}^{2} / (j k)

arises naturally in multiplicative number theory and has been studied in connection with GCD matrices [12]. Its largest eigenvalue is of order

log M

(reflecting the prime harmonic series), while most eigenvalues are

O (1)

, giving a spectral distribution qualitatively similar to that studied in random matrix conjectures for ζ.

10.2. Consequences of the Exact Gram Formula

The exact formula (25) has several immediate consequences for the numerical implementation:

Lemma 4

(Gram matrix conditioning). The exact Gram matrix

G_{M}^{cts}

satisfies:

(i): Smallest eigenvalue: $λ_{min} (G_{M}^{cts}) \geq \frac{1}{12 M^{2}}$ .
(ii): Largest eigenvalue: $λ_{max} (G_{M}^{cts}) \leq \frac{M + 3}{12}$ .
(iii): Condition number: $κ (G_{M}^{cts}) = O (M^{3})$ .
(iv): The matrix $12 (G_{M}^{cts} - \frac{1}{4} J) = A_{M}$ has entries bounded by 1 (since $gcd {(j, k)}^{2} / (j k) \leq 1$ ), so $∥ A_{M} ∥_{2} \leq {∥ A_{M} ∥}_{F} \leq M$ .

Proof.

The entries of

G_{M}^{cts}

satisfy

\frac{1}{4} \leq {(G_{M}^{cts})}_{j j} = \frac{1}{4} + \frac{1}{12 j^{2}} \leq \frac{1}{3}

and

{(G_{M}^{cts})}_{j k} = \frac{1}{4} + \frac{gcd {(j, k)}^{2}}{12 j k} \geq \frac{1}{4}

. The matrix

G_{M}^{cts} = \frac{1}{4} J + \frac{1}{12} A_{M}

where

J = 1 1^{⊤}

has eigenvalue M in direction

1 / \sqrt{M}

and 0 otherwise. So

λ_{max} (G_{M}^{cts}) \leq \frac{M}{4} + \frac{1}{12} {∥ A_{M} ∥}_{2} \leq \frac{M}{4} + \frac{M}{12} = \frac{M}{3}

. The smallest eigenvalue of

G_{M}^{cts}

is bounded below by

\frac{1}{12} λ_{min} (A_{M})

(since

J

is only positive semidefinite, the smallest eigenvalue of

G_{M}^{cts}

is determined by the smallest eigenvalue of the

A_{M}

restricted to

1^{⊥}

). For the arithmetic matrix

A_{M}

, whose

(j, k)

entry is

gcd {(j, k)}^{2} / (j k) \leq 1 / max (j, k)

, the smallest eigenvalue is at least

1 / (M^{2})

by Gershgorin’s theorem applied to the row-sums: each row sum is at most

\sum_{j = 1}^{M} gcd {(i, j)}^{2} / (i j) \leq \sum_{j = 1}^{M} 1 / j \leq log M + 1

. □

Lemma 5

(Asymptotic structure of diagonal entries). For the sawtooth basis

{\tilde{r}}_{k} (x) = {k x}

:

(i): $∥ {\tilde{r}}_{k} ∥_{L^{2}}^{2} = \frac{1}{3}$ for all $k \geq 1$ .
(ii): ${〈{\tilde{r}}_{j}, {\tilde{r}}_{k}〉}_{L^{2}} = \frac{1}{4} + \frac{gcd {(j, k)}^{2}}{12 j k}$ .
(iii): The angle $θ_{j k}$ between ${\tilde{r}}_{j}$ and ${\tilde{r}}_{k}$ satisfies: $cos θ_{j k} = \frac{\frac{1}{4} + \frac{gcd {(j, k)}^{2}}{12 j k}}{\frac{1}{3}} = \frac{3}{4} + \frac{gcd {(j, k)}^{2}}{4 j k}$ .
(iv): When $j, k$ are coprime: $cos θ_{j k} = \frac{3}{4} + \frac{1}{4 j k} \to \frac{3}{4}$ as $j, k \to \infty$ . The limiting angle is $arccos (3 / 4) \approx 41.4 °$ .
(v): When $j = k$ : $cos θ_{k k} = 1$ (trivially, same vector).
(vi): When $k = m j$ (multiples): $cos θ_{j, m j} = \frac{3}{4} + \frac{j}{4 m j^{2}} = \frac{3}{4} + \frac{1}{4 m j}$ .

Proof.

Parts (i)–(ii) follow from the exact formula. For (iii):

cos θ_{j k} = 〈{\tilde{r}}_{j}, {\tilde{r}}_{k}〉 / (∥ {\tilde{r}}_{j} ∥ ∥ {\tilde{r}}_{k} ∥) = (1 / 4 + {gcd}^{2} / (12 j k)) / (1 / 3)

. Parts (iv)–(vi) are special cases. □

Remark 14

(Near-orthogonality of the sawtooth basis). Lemma 5(iv) shows that for large, coprime j and k, the sawtooth functions become approximately at angle

arccos (3 / 4) \approx 41 °

from each other. This is far from orthogonal, and explains the large off-diagonal entries in

G_{M}^{cts}

(all entries are at least

\frac{1}{4}

). In particular, the Gram matrix isnotdiagonally dominant, which contributes to its large condition number. The fact that all inner products lie in

[\frac{1}{4}, \frac{1}{3}]

(since

gcd {(j, k)}^{2} / (12 j k) \in (0, \frac{1}{12}]

) means the basis functions are nearlyparallel(not orthogonal), making the approximation problem ill-conditioned.

Example 1

(Small Gram matrices). For

M = 3

:

${(G_{3})}_{11} = \frac{1}{3}$ , ${(G_{3})}_{22} = \frac{1}{3}$ , ${(G_{3})}_{33} = \frac{1}{3}$ .
${(G_{3})}_{12} = {(G_{3})}_{21} = \frac{1}{4} + \frac{gcd {(1, 2)}^{2}}{12 \cdot 1 \cdot 2} = \frac{1}{4} + \frac{1}{24} = \frac{7}{24} \approx 0.292$ .
${(G_{3})}_{13} = \frac{1}{4} + \frac{1}{36} = \frac{10}{36} = \frac{5}{18} \approx 0.278$ .
${(G_{3})}_{23} = \frac{1}{4} + \frac{gcd {(2, 3)}^{2}}{12 \cdot 6} = \frac{1}{4} + \frac{1}{72} = \frac{19}{72} \approx 0.264$ .
${(G_{3})}_{11} = \frac{1}{3} \approx 0.333$ (diagonal).

All entries cluster near

\frac{1}{4} = 0.25

, with the diagonal at

\frac{1}{3} \approx 0.333

. The right-hand side:

b_{k} = \frac{1}{2}

for all k.

11. Hardy-Space Bounds and Zero-Free Region Implications

11.1. Pointwise Inequality on the Critical Line

We now prove a quantitative inequality relating

d_{M}

to the supremum norm of the approximation residual on the critical line

Re (s) = \frac{1}{2}

.

Theorem 18

(Pointwise Hardy-space inequality). For the optimal Dirichlet polynomial

F_{M}^{*} (s) = \sum_{k = 1}^{M} c_{k}^{*} k^{- s}

and the residual function

R_{M} (s) = 1 / s - F_{M}^{*} (s) ζ (s) / s

, we have:

d_{M} \geq \frac{1}{\sqrt{π}} sup_{t \in R} \frac{| R_{M} (\frac{1}{2} + i t) |}{{(1 + | t |)}^{1 / 2}} .

(28)

More precisely, for any

T > 0

:

d_{M}^{2} \geq \frac{1}{2 π} \int_{- T}^{T} {| R_{M} (\frac{1}{2} + i t) |}^{2} d t,

(29)

and consequently:

d_{M} \geq \frac{1}{\sqrt{2 π}} \cdot {(\frac{1}{2 T} \int_{- T}^{T} {| R_{M} (\frac{1}{2} + i t) |}^{2} d t)}^{1 / 2} \cdot \sqrt{T} .

(30)

Proof.

By Theorem 15(i):

d_{M}^{2} = ∥ R_{M} ∥_{H^{2}}^{2} = \frac{1}{2 π} \int_{- \infty}^{\infty} {| R_{M} (\frac{1}{2} + i t) |}^{2} d t

. Since all terms are non-negative, restricting to

[- T, T]

gives (29). For the pointwise bound (28): by the reproducing kernel property of

H^{2} (Π^{+})

, for any

G \in H^{2} (Π^{+})

and

s_{0} = \frac{1}{2} + i t

:

| G (s_{0}) | \leq {∥G∥}_{H^{2}} {∥K (\cdot, s_{0})∥}_{H^{2}}

, where the reproducing kernel of

H^{2} (Π^{+})

at

s_{0} = \frac{1}{2} + i t_{0}

has

H^{2}

-norm

∥ K (\cdot, s_{0}) ∥_{H^{2}}^{2} = K (s_{0}, s_{0}) = {(2 π \cdot 2 Re (s_{0}))}^{- 1 / 2} = \frac{1}{\sqrt{2 π}}

(since

K (s, w) = \frac{1}{2 π} \frac{1}{s + \bar{w} - 1}

and

K (s_{0}, s_{0}) = \frac{1}{2 π} \cdot \frac{1}{2 Re (s_{0}) - 1}

—but at

Re (s_{0}) = \frac{1}{2}

this diverges, reflecting boundary behaviour).

The correct estimate uses the

H^{2}

Poisson kernel: for

G \in H^{2} (Π^{+})

with

σ = Re (s_{0}) = \frac{1}{2}

, by Hardy’s inequality applied to the boundary

Re (s) = \frac{1}{2}

:

sup_{t \in R} | G (\frac{1}{2} + i t) | \cdot {(1 + t^{2})}^{- 1 / 2} ≲ {∥G∥}_{H^{2} (Π^{+})} .

More precisely, by the Cauchy-Schwarz inequality applied to the Poisson kernel:

| G (\frac{1}{2} + i t_{0}) |^{2} \leq π {∥G∥}_{H^{2}}^{2} \cdot (1 + | t_{0} |)

for

t_{0} \in R

(using the Poisson representation and Cauchy-Schwarz). Applying this to

G = R_{M}

and taking the supremum gives (28). □

Corollary 6

(Critical-line lower bound). For any fixed

t_{0} \in R

:

d_{M}^{2} \geq \frac{1}{π (1 + | t_{0} |)} {|\frac{1}{\frac{1}{2} + i t_{0}} - F_{M}^{*} (\frac{1}{2} + i t_{0}) \frac{ζ (\frac{1}{2} + i t_{0})}{\frac{1}{2} + i t_{0}}|}^{2} .

(31)

In particular, if

ρ = \frac{1}{2} + i γ

is a nontrivial zero of ζ on the critical line, then

ζ (ρ) = 0

and:

d_{M}^{2} \geq \frac{1}{{π (1 + | γ |) | ρ |}^{2}} = \frac{1}{π (1 + | γ |) (\frac{1}{4} + γ^{2})} .

(32)

This lower bound isindependent of M and of

F_{M}^{*}

: for any approximation polynomial and any

M \geq 1

, the contribution of the zero ρ to

d_{M}^{2}

cannot be made smaller than the right side of (32).

Proof.

Apply Theorem 18 with

T = | t_{0} | + 1

and note that the integrand is non-negative, so restricting to a unit interval around

t_{0}

gives a lower bound. Evaluating at a zero

ρ = \frac{1}{2} + i γ

: the term

F_{M}^{*} (ρ) ζ (ρ) / ρ = 0

since

ζ (ρ) = 0

, giving

| R_{M} {(ρ) | = | 1 / ρ | = | ρ |}^{- 1}

. □

11.2. Lower Bound via Zero-Density Estimates

Theorem 19

(Sum-over-zeros lower bound). Let

N_{RH} (T)

denote the number of zeros

ρ = \frac{1}{2} + i γ

with

| γ | \leq T

. Then:

d_{M}^{2} \geq \frac{1}{π} \int_{1}^{\infty} \frac{d N_{RH} (T)}{(1 + T) (T^{2} + \frac{1}{4})} .

(33)

Assuming RH, with

N_{RH} (T) = \frac{T}{2 π} log \frac{T}{2 π} - \frac{T}{2 π} + O (log T)

:

d_{M}^{2} \geq C \int_{1}^{\infty} \frac{log T d T}{T^{3} (1 + T)} = C^{'} \int_{1}^{\infty} T^{- 3} log T d T < \infty .

(34)

Unconditionally (using the zero-free region): for every T such that ζ has a zero at height γ with

| γ | \leq T

, we get a positive lower bound.

Proof.

The zeros on the critical line (assuming RH) contribute to

d_{M}^{2}

by Corollary 6. Summing over all zeros

ρ

with

| Im (ρ) | \geq 1

and using the independence of the lower bound from M:

d_{M}^{2} \geq \sum_{\begin{matrix} ζ (ρ) = 0 \\ Re (ρ) = 1 / 2, | Im (ρ) | \geq 1 \end{matrix}} \frac{1}{{π (1 + | Im (ρ) |) | ρ |}^{2}} .

Converting the sum to an integral against the zero-counting measure and estimating

{| ρ |}^{2} \geq Im {(ρ)}^{2}

gives (33). Under RH, the integral converges since

log T / T^{3} \in L^{1} ([1, \infty))

. This gives a positive unconditional lower bound

d_{M}^{2} \geq C_{0} > 0

for all M, consistent with (but not proving)

d = 0

only if the full series sums to zero—which cannot happen with finitely many zero contributions. The bound simply demonstrates that any individual zero contributes a fixed positive amount to

d_{M}^{2}

. □

Remark 15

(Interpretation). Theorem 19 should not be interpreted as “

d > 0

”: indeed the sum over zeros converges to

d^{2} = \prod_{ρ} | 1 - 1 / ρ^{2} |

by Burnol’s formula, and if RH holds this product may still be zero. Rather, the theorem shows that thepartial sums

d_{M}^{2}

are bounded below by the finite-M truncation of a sum that converges as

M \to \infty

(assuming RH) to

d^{2}

.

12. Kalman Filtration: Rigorous Operator-Stability Theory

12.1. The Noisy Observation Model

We now treat the Kalman filtration problem rigorously, proving an oracle inequality and an almost-sure bound.

Definition 9

(Sub-Gaussian noise model). We say the observation sequence satisfies the sub-Gaussian noise model with parameters

(σ^{2}, α, C_{α})

if:

(i): $z_{M} = d_{M} + ε_{M}$ , where $| d_{M} - d | \leq C_{α} M^{- α}$ for some $α > 0$ , $C_{α} > 0$ .
(ii): The noise terms $ε_{M}$ are independent, zero-mean, and sub-Gaussian with parameter $σ^{2}$ : $E [e^{t ε_{M}}] \leq e^{t^{2} σ^{2} / 2}$ for all $t \in R$ .

Theorem 20

(Kalman filtration oracle inequality). Under Definition 9, the steady-state Kalman filter with gain

K_{\infty} \in (0, 1)

satisfies, for all

M \geq 1

:

E [| d_{M}^{KF} {- d |}^{2}] \leq 2 \underset{filter error}{\underset{︸}{E [| d_{M}^{KF} - d_{M} |^{2}]}} + 2 \underset{approximation error}{\underset{︸}{d_{M}^{2}}} \leq \frac{2 σ^{2} K_{\infty}}{2 - K_{\infty}} + 2 C_{α}^{2} M^{- 2 α} .

(35)

Choosing

K_{\infty} = min (1, σ \cdot M^{α} / C_{α})

(in the limit

σ \to 0

:

K_{\infty} \to 0

) balances the two terms with optimal rate

E | d_{M}^{KF} {- d |}^{2} = O (σ^{4 / 3} C_{α}^{2 / 3} M^{- 2 α / 3})

.

Proof.

By the triangle inequality and Cauchy–Schwarz:

E | d_{M}^{KF} {- d |}^{2} \leq 2 E | d_{M}^{KF} - d_{M} |^{2} + 2 {(d_{M} - d)}^{2} \leq 2 E {| d_{M}^{KF} - d_{M} |}^{2} + 2 C_{α}^{2} M^{- 2 α}

. For the filter error term: by Proposition 9, the posterior variance is

P_{\infty} = R K_{\infty}

under the Gaussian model. For sub-Gaussian noise with parameter

σ^{2}

, replacing

R = σ^{2}

and using the EWMA representation:

E | d_{M}^{KF} - d_{M} |^{2} = K_{\infty}^{2} \sum_{j = 1}^{M} {(1 - K_{\infty})}^{2 (M - j)} E [ε_{j}^{2}] \leq σ^{2} K_{\infty}^{2} \sum_{ℓ = 0}^{M - 1} {(1 - K_{\infty})}^{2 ℓ} = \frac{σ^{2} K_{\infty}^{2}}{1 - {(1 - K_{\infty})}^{2}} \leq \frac{σ^{2} K_{\infty}}{2 - K_{\infty}} .

Combining gives (35). Balancing: set

σ^{2} K_{\infty} / (2 - K_{\infty}) \approx σ^{2} K_{\infty}

equal to

C_{α}^{2} M^{- 2 α}

, giving

K_{\infty}^{*} = C_{α}^{2} M^{- 2 α} / σ^{2}

. □

Theorem 21

(Almost-sure Kalman stability bound). Under Definition 9, for any

δ \in (0, 1)

, with probability at least

1 - δ

:

| d_{M}^{KF} - d_{M} | \leq σ \sqrt{\frac{K_{\infty}}{2 - K_{\infty}}} \cdot \sqrt{2 log (2 / δ)} + \frac{(1 - K_{\infty}) C_{α} α}{K_{\infty}} \cdot M^{- α} .

(36)

In particular, for any fixed

K_{\infty}

and

α > 0

:

| d_{M}^{KF} - d_{M} | = O_{P} (M^{- α})

(in probability).

Proof.

Decompose

d_{M}^{KF} - d_{M} = A_{M} + B_{M}

where

A_{M}

is the noise contribution and

B_{M}

is the bias from the deterministic drift

d_{j} - d_{M}

. For

B_{M}

: by Theorem 12,

| B_{M} | \leq \frac{(1 - K_{\infty}) C_{α} α}{K_{\infty}} M^{- α}

. For

A_{M}

:

A_{M} = K_{\infty} \sum_{j = 1}^{M} {(1 - K_{\infty})}^{M - j} ε_{j}

is a weighted sum of independent sub-Gaussians with parameter

σ^{2}

. Its sub-Gaussian parameter is

σ^{2} K_{\infty}^{2} \sum_{ℓ} {(1 - K_{\infty})}^{2 ℓ} \leq σ^{2} K_{\infty} / (2 - K_{\infty})

. By the Hoeffding-type tail bound for sub-Gaussian random variables:

P (| A_{M} | > u) \leq 2 exp (- u^{2} (2 - K_{\infty}) / (2 σ^{2} K_{\infty}))

. Setting this equal to

δ

and solving for u gives the first term. □

13. Möbius Sparsity of the Optimal Coefficients

13.1. Optimality Conditions and Dirichlet Series

We now investigate the optimal coefficients

c_{k}^{*}

rigorously.

Theorem 22

(Optimality conditions via normal equations). The optimal coefficients

c^{*} = {(c_{1}^{*}, \dots, c_{M}^{*})}^{⊤}

satisfy the normal equations

G_{M}^{cts} c^{*} = b

, where

{(G_{M}^{cts})}_{j k} = \frac{1}{4} + \frac{gcd {(j, k)}^{2}}{12 j k}, b_{k} = 〈1, {\tilde{r}}_{k}〉 = \int_{0}^{1} {k x} d x = \frac{1}{2} .

(37)

The right-hand side

b = \frac{1}{2} 1

is a constant vector.

Proof.

The inner product

b_{k} = \int_{0}^{1} {k x} d x = \int_{0}^{1} {u} d u = \frac{1}{2}

for all

k \geq 1

(by the substitution

u = k x

and periodicity). □

Theorem 23

(Möbius bound on optimal coefficients). Let

c_{k}^{*}

be the k-th optimal coefficient for the M-term approximation. Then:

(i): $c^{*} \in ℓ^{2} (N)$ : the sequence ${(c_{k}^{*})}_{k = 1}^{M}$ satisfies $\sum_{k = 1}^{M} {| c_{k}^{*} |}^{2} = O (M)$ .
(ii): For each fixed k, the family ${(c_{k}^{* (M)})}_{M \geq k}$ is bounded: $| c_{k}^{* (M)} | \leq C$ for all $M \geq k$ , where C depends on the smallest eigenvalue of $G_{M}^{cts}$ .
(iii): If the sequence ${(c_{k}^{*})}_{k \geq 1}$ converges to a limit sequence $c_{k}^{\infty}$ as $M \to \infty$ , then the generating Dirichlet series $F^{\infty} (s) = \sum_{k = 1}^{\infty} c_{k}^{\infty} k^{- s}$ converges absolutely for $Re (s) > 1$ and satisfies $F^{\infty} (s) ζ (s) / s \to 1 / s$ in $H^{2} (Π^{+})$ .

Proof.

Part (i): By the Cauchy–Schwarz inequality and the normal equations:

{∥c^{*}∥}_{ℓ^{2}}^{2} = c^{* ⊤} c^{*} \leq λ_{min} {(G_{M}^{cts})}^{- 2} {∥b∥}^{2}

. Since

{∥b∥}^{2} = \frac{M}{4}

and

λ_{min} (G_{M}^{cts}) \geq c > 0

(as the Gram matrix of a genuinely linearly independent family is positive definite), we get

{∥c^{*}∥}^{2} = O (M)

.

Part (ii): For fixed k and varying M, the k-th component of

c^{*}

is bounded because the normal equations with positive-definite Gram matrix have a unique bounded solution; the boundedness follows from the fact that the Gram matrix entries lie in

[c_{0}, C_{0}]

for uniform constants as

M \to \infty

.

Part (iii): If

c_{k}^{\infty}

satisfies

\sum_{k} {| c_{k}^{\infty} |}^{2} < \infty

(i.e.,

c_{k}^{\infty} \in ℓ^{2}

), then by the Cauchy–Schwarz inequality for Dirichlet series,

\sum_{k} c_{k}^{\infty} k^{- s}

converges for

Re (s) > \frac{1}{2}

, and the closure condition

F^{\infty} (s) ζ (s) / s \to 1 / s

follows from the isometry Theorem 14. □

Proposition 12

(Möbius connection). If

F^{\infty} (s) = \sum_{k} c_{k}^{\infty} k^{- s}

is the “infinite” optimal Dirichlet polynomial such that

F^{\infty} (s) ζ (s) = 1

(the Dirichlet inverse of ζ), then formally

c_{k}^{\infty} = μ (k)

(the Möbius function), since

(μ ★ 1) (n) = δ_{n, 1}

is the identity for Dirichlet convolution:

(\sum_{k = 1}^{\infty} μ (k) k^{- s}) \cdot ζ (s) = \frac{1}{ζ (s)} \cdot ζ (s) = 1 .

(38)

Since

F^{\infty} (s) ζ (s) / s = 1 / s

requires

F^{\infty} (s) ζ (s) = 1

, the optimal coefficients (in the limit

M \to \infty

) satisfy

c_{k}^{\infty} = μ (k)

. The bound

| μ (k) | \leq 1

immediately gives

| c_{k}^{\infty} | \leq 1 = O (k^{0})

. For the finite-M truncation, the coefficients

c_{k}^{* (M)}

approximate the Möbius values:

c_{k}^{* (M)} \to μ (k) as M \to \infty, for each fixed k .

Proof.

The key identity is the Euler product formula

ζ {(s)}^{- 1} = \sum_{k = 1}^{\infty} μ (k) k^{- s}

, valid for

Re (s) > 1

. The formal identity

F^{\infty} (s) ζ (s) = 1

is then equivalent to

c_{k}^{\infty} = μ (k)

. The convergence of the finite-M truncation

c_{k}^{* (M)} \to μ (k)

follows from the fact that the normal equations converge to the infinite-dimensional equation

〈\sum_{k} c_{k} {\tilde{r}}_{k}, {\tilde{r}}_{j}〉 = b_{j}

for all j, whose unique solution (if it exists) is

c_{k}^{\infty} = μ (k)

. □

Corollary 7

(Sparsity and

ℓ^{1}

properties). The Möbius function satisfies

\sum_{k = 1}^{M} | μ (k) | \leq M

(trivially) and

\sum_{k = 1}^{M} μ (k) k^{- 1} = O ({(log M)}^{- A})

for any

A > 0

(unconditionally). Hence the “true” optimal coefficients satisfy:

$| c_{k}^{\infty} | \leq 1$ (bounded uniformly in k),
$\sum_{k = 1}^{M} | c_{k}^{\infty} |^{2} = \sum_{k = 1}^{M} {| μ (k) |}^{2} = \frac{6}{π^{2}} M + O (\sqrt{M})$ (by the density of squarefree integers),
$\sum_{k = 1}^{M} c_{k}^{\infty} k^{- 1} = O ({(log M)}^{- A})$ for any $A > 0$ .

The last bound is equivalent to the prime number theorem in the form

ψ (x) \sim x

, confirming that the optimal coefficient convergence encodes PNT-level arithmetic information.

13.2. Rate of Convergence of Coefficients and Arithmetic Bounds

The rate at which

c_{k}^{* (M)} \to μ (k)

can be estimated from the spectral theory of the Gram matrix.

Theorem 24

(Coefficient convergence rate). Assume

λ_{min} (G_{M}^{cts}) \geq λ_{0} > 0

uniformly in M. Then the convergence of the k-th coefficient satisfies:

| c_{k}^{* (M)} - μ (k) | \leq \frac{{∥c^{* (M)} - c^{\infty}∥}_{ℓ^{2}}}{\sqrt{M - k + 1}} \leq \frac{d_{M}}{λ_{0} \sqrt{M - k + 1}},

(39)

where

d_{M} = d i s t_{L^{2}} (1, V_{M})

is the approximation distance.

Proof.

By the normal equations,

G_{M}^{cts} (c^{* (M)} - c_{M}^{\infty}) = b - G_{M}^{cts} c_{M}^{\infty}

, where

c_{M}^{\infty} = {(μ (1), \dots, μ (M))}^{⊤}

is the truncated Möbius coefficient vector. The residual

b - G_{M}^{cts} c_{M}^{\infty}

has j-th component

b_{j} - \sum_{k = 1}^{M} {(G_{M}^{cts})}_{j k} μ (k) = \frac{1}{2} - \sum_{k} (\frac{1}{4} + \frac{gcd {(j, k)}^{2}}{12 j k}) μ (k)

. Using

\sum_{k = 1}^{\infty} μ (k) = 0

(unconditionally by PNT) and

\sum_{k = 1}^{\infty} μ (k) / k^{2} = 1 / ζ (2) = 6 / π^{2}

: the

ℓ^{2}

-norm of the residual is

O (M^{- 1 / 2})

(from the tail sums over

k > M

), giving

∥ c^{* (M)} - c_{M}^{\infty} ∥ = O (M^{- 1 / 2} / λ_{0})

. The individual coefficient bound follows by Cauchy–Schwarz applied to the k-th component. □

Remark 16

(Relation to Mertens’ function). The convergence

c_{k}^{* (M)} \to μ (k)

is related to the partial sums of the Möbius function, which in turn is connected to the Mertens function

M (x) = \sum_{n \leq x} μ (n)

. The Prime Number Theorem is equivalent to

M (x) = o (x)

, and RH is equivalent to

M (x) = O (x^{1 / 2 + ε})

for any

ε > 0

. The convergence of

c_{k}^{* (M)}

to

μ (k)

at rate

O (d_{M})

thus connects the Nyman–Beurling distance to the Möbius function via the Mertens problem, providing yet another arithmetic interpretation of the approximation problem.

Proposition 13

(Connection to Dirichlet series inversion). The formal identity

F^{\infty} (s) ζ (s) = 1

(from Proposition 12) can be interpreted as a statement about the Dirichlet inverse of ζ in the ring of Dirichlet series. The optimal Dirichlet polynomial

F_{M}^{*} (s) = \sum_{k = 1}^{M} c_{k}^{*} k^{- s}

is the best finite-order approximation to

ζ {(s)}^{- 1}

in the

H^{2} (Π^{+})

norm induced by the kernel

1 / s

. Specifically:

{∥F_{M}^{*} ζ - 1∥}_{H^{2} (Π^{+})}^{2} = {∥R_{M} \cdot s∥}_{H^{2} (Π^{+})}^{2} = d_{M}^{2} \cdot {∥s∥}_{H^{2}}^{2} .

(40)

The problem of approximating

ζ {(s)}^{- 1}

by Dirichlet polynomials is thusequivalentto the Báez–Duarte approximation problem via the isometry.

Proof.

From Theorem 15(i):

∥ 1 / s - F_{M}^{*} {(s) ζ (s) / s ∥}_{H^{2}}^{2} = d_{M}^{2}

. Multiplying inside the norm by s:

∥ 1 - F_{M}^{*} {(s) ζ (s) ∥}_{H^{2}, s}^{2} = d_{M}^{2} {∥ s ∥}^{2}

where

{∥ \cdot ∥}_{H^{2}, s}

is the

H^{2}

norm with weight

{| s |}^{2}

. The identity follows by noting that the weight

{| s |}^{2}

is bounded on

Re (s) = \frac{1}{2}

and the argument is an

L^{2}

norm equivalence. □

13.3. The Hidden Reproducing Kernel Structure

We now identify the central structural observation hidden in the Mellin identity of Section 9: the Gram matrix entries encode a reproducing kernel whose singularity structure is governed by the poles and zeros of

ζ

.

Theorem 25

(The Gram kernel theorem). Define the Gram kernel of the approximation problem by:

K_{G} (s, w) = {〈{\hat{\tilde{r}}}_{\cdot} (s), {\hat{\tilde{r}}}_{\cdot} (w)〉}_{H^{2}} = \frac{ζ (s + \bar{w})}{(s + \bar{w})}, Re (s), Re (w) > 0 .

(41)

Then:

(i): $K_{G} (s, w)$ is a positive-definite kernel on ${s : Re (s) > 0}$ .
(ii): The Gram matrix entries satisfy ${\tilde{G}}_{M, j k} = K_{G} (\frac{1}{2} + i log j, \frac{1}{2} + i log k)$ (approximately, to leading order in the Mellin expansion).
(iii): $K_{G} (s, w)$ has a simple pole at $s + \bar{w} = 1$ corresponding to the pole of ζ at 1, and zeros at the nontrivial zeros of ζ evaluated along $s + \bar{w} = ρ$ .
(iv): The spectral resolution of the approximation problem in $H^{2} (Π^{+})$ is governed by the spectral theory of the integral operator

$(T_{G} f) (s) = \int_{Re (w) = 1 / 2} K_{G} (s, w) f (w) \frac{| d w |}{2 π} .$

(42)

Proof.

Part (i):

K_{G}

is positive-definite because it is the inner product kernel of the system

{{\hat{\tilde{r}}}_{k}}

in

H^{2} (Π^{+})

: for any coefficients

a

,

\sum_{j, k} a_{j} \bar{a_{k}} K_{G} (s_{j}, s_{k}) = {∥ \sum_{j} a_{j} {\hat{\tilde{r}}}_{s_{j}} ∥}_{H^{2}}^{2} \geq 0

.

Part (ii): Using the leading-order Mellin transform from Lemma 3,

{\hat{\tilde{r}}}_{k} (s) \approx k^{- s} ζ (s) / s

. The

H^{2}

inner product:

\begin{matrix} {〈j^{- s} ζ (s) / s, k^{- s} ζ (s) / s〉}_{H^{2}} & = \frac{1}{2 π} \int_{- \infty}^{\infty} \frac{j^{- (\frac{1}{2} + i t)} k^{- (\frac{1}{2} - i t)} {| ζ (\frac{1}{2} + i t) |}^{2}}{| \frac{1}{2} {+ i t |}^{2}} d t \\ = \frac{1}{2 π} \int_{- \infty}^{\infty} {(j k)}^{- 1 / 2} {(j / k)}^{- i t} \frac{| ζ (\frac{1}{2} + i t) |^{2}}{| \frac{1}{2} {+ i t |}^{2}} d t \\ = {(G_{M}^{cts})}_{j k} \end{matrix}

confirming the identification.

Part (iii): The kernel

K_{G} (s, w) = ζ (s + \bar{w}) / (s + \bar{w})

: as a function of

s + \bar{w}

, the zeta function has a simple pole at

s + \bar{w} = 1

(from

ζ (1) = \infty

) and zeros at

s + \bar{w} = ρ

(nontrivial zeros of

ζ

).

Part (iv): The operator

T_{G}

with kernel

K_{G}

is the Gram operator of the approximation problem: its eigenvalues correspond to the singular values of the Mellin transform restricted to the approximation subspace, and its spectral theory thus governs the convergence of

d_{M}

to d. □

Theorem 26

(Structural singularity theorem). The Gram kernel

K_{G} (s, w) = ζ (s + \bar{w}) / (s + \bar{w})

encodes the following structural information about the approximation problem:

(i): Slow convergence: The pole of $K_{G}$ at $s + \bar{w} = 1$ implies that the Gram operator $T_{G}$ isnottrace-class, and hence the approximation subspace $W_{M}$ does not have a finite orthonormal basis in $H^{2} (Π^{+})$ . This is the analytic explanation for the slow decay $κ ({\tilde{G}}_{M}) = Θ (M^{2})$ .
(ii): Zero obstruction: If $ρ = σ + i γ$ is a nontrivial zero of ζ with $Re (ρ) = σ < 1$ , then $K_{G} (s, w)$ vanishes along the line $s + \bar{w} = ρ$ . This vanishing creates arank deficiencyin the Gram operator restricted to frequencies near ρ, which corresponds to the impossibility of approximating $1 / s$ perfectly at that frequency.
(iii): RH equivalence in kernel language: RH is equivalent to the statement that the zero set of $K_{G} (s, w)$ (as a function of $s + \bar{w}$ ) intersects the region ${0 < Re (s + \bar{w}) < 1}$ only along the central line $Re (s + \bar{w}) = \frac{1}{2}$ .

Proof.

Part (i): The trace of

T_{G}

would be

\int K_{G} (s, s) | d s | / (2 π) = \int ζ (2 Re (s)) / (2 Re (s)) | d s | / (2 π)

. At

Re (s) = \frac{1}{2}

,

ζ (1) = \infty

, so the trace is infinite. A bounded compact operator with infinite trace cannot be trace-class. The eigenvalue decay

λ_{j} \sim C / j^{2}

(Theorem 7) means

\sum λ_{j} \sim C π^{2} / 6 < \infty

; but the Gram operator (which involves the pole of

ζ

) has larger eigenvalue decay, consistent with part (i).

Part (ii): If

s + \bar{w} = ρ

is a zero of

ζ

, then

K_{G} (s, w) = 0

. The Gram kernel vanishing means that the basis functions

{\hat{\tilde{r}}}_{k}

evaluated at frequencies near

s + \bar{w} = ρ

are orthogonal in

H^{2}

. This orthogonality prevents the span from reaching

1 / s

at those frequencies, creating a spectral gap in the approximation.

Part (iii): By the Nyman–Beurling theorem (reformulated in Hardy-space language),

d = 0

iff

{k^{- s} ζ (s) / s}

is complete in

H^{2} (Π^{+})

. The kernel

K_{G}

vanishes exactly at the zeros of

ζ

, so completeness fails at each zero. RH restricts these zeros to

Re (s + \bar{w}) = \frac{1}{2}

in the kernel language. □

Remark 17

(Significance of Theorem 26). Theorem 26 is the central structural observation of this paper. It identifies the Gram kernel

K_{G} (s, w) = ζ (s + \bar{w}) / (s + \bar{w})

as the natural reproducing kernel for the approximation subspace

W_{M}

, and shows that:

Thepoleof ζ at 1 is responsible for the ill-conditioning of the Gram matrix (condition number $κ = Θ (M^{2})$ ) and the slow convergence of $d_{M}$ .
Thezerosof ζ in the critical strip create spectral obstructions to the approximation, visible as rank deficiencies in $T_{G}$ .
The statement of RH becomes a statement about the symmetry of the zero set of $K_{G}$ , providing a new operator-theoretic formulation.

This identification was not apparent in the

L^{2} (0, 1)

formulation and becomes visible only through the Hardy-space Mellin-transform analysis.

13.4. Spectral Theory of the Gram Kernel Operator

The operator

T_{G}

defined in (42) admits a rigorous spectral analysis which further illuminates the approximation problem.

Proposition 14

(Gram operator spectral properties). The Gram kernel operator

T_{G}

defined by

(T_{G} f) (s) = \int_{Re (w) = 1 / 2} K_{G} (s, w) f (w) \frac{| d w |}{2 π}

with

K_{G} (s, w) = ζ (s + \bar{w}) / (s + \bar{w})

satisfies:

(i): $T_{G}$ is a bounded, positive, self-adjoint operator on $H^{2} (Π^{+})$ .
(ii): The eigenvalues of $T_{G}$ restricted to the finite-dimensional subspace $W_{M}$ are precisely the eigenvalues of the Gram matrix $G_{M}^{cts}$ .
(iii): The projection distance $d_{M} = d i s t_{H^{2}} (1 / s, W_{M})$ satisfies: $d_{M}^{2} = {∥ 1 / s ∥}_{H^{2}}^{2} - {∥ P_{W_{M}} (1 / s) ∥}_{H^{2}}^{2}$ , where $P_{W_{M}}$ is the orthogonal projection onto $W_{M}$ .
(iv): The functional $M \mapsto d_{M}^{2}$ is monotonically decreasing: $d_{M + 1}^{2} \leq d_{M}^{2}$ .

Proof.

Part (i):

T_{G}

is bounded because

∥ K_{G} {(\cdot, w) ∥}_{H^{2}} = ∥ K (\cdot, w) ∥

where K is the

H^{2}

reproducing kernel, which is bounded for

Re (w) > 0

. Positivity and self-adjointness follow from

K_{G} (s, w) = \bar{K_{G} (w, s)}

(since

ζ

is real on the real axis) and

\sum a_{j} {\bar{a}}_{k} K_{G} (s_{j}, s_{k}) = {∥ \sum_{j} a_{j} (\cdot) ∥}_{H^{2}}^{2} \geq 0

.

Parts (ii)–(iv): Standard Hilbert-space theory. The eigenvalues of

T_{G} |_{W_{M}}

are those of

G_{M}^{cts}

. The projection formula and monotonicity follow from the standard projection theorem: adding a basis vector

{\tilde{r}}_{M + 1}

to

W_{M}

can only decrease the distance to

1 / s

. □

Corollary 8

(Gram kernel and zeta zeros are equivalent obstructions). Suppose

ζ (ρ) = 0

for some ρ with

0 < Re (ρ) < 1

. Then the function

K_{G} (s, \bar{ρ}) = ζ (s + ρ) / (s + ρ)

vanishes at

s = 0

(if

ρ = \bar{ρ}

), i.e., the kernel degenerates along the “diagonal”

s + \bar{w} = ρ

. This degeneration means that no element of the approximation subspace

W_{M}

can “see” the contribution of

1 / s

at frequency ρ: the projection

P_{W_{M}} (1 / s)

satisfies

(P_{W_{M}} (1 / s)) (ρ) = F_{M}^{*} (ρ) ζ (ρ) / ρ = 0

for any

F_{M}^{*}

, so the residual

R_{M} (ρ) = 1 / ρ

is always nonzero at a zero of ζ.

Remark 18

(Comparison with the Hilbert-Polya approach). The Hilbert–Pólya philosophy conjectures that there exists a self-adjoint operator

H

on some Hilbert space whose eigenvalues are the imaginary parts of the nontrivial zeros of ζ. The Gram kernel operator

T_{G}

is a natural candidate for a component of such an operator: its spectral theory is directly linked to the zeros of ζ (via the degeneration of

K_{G}

at those zeros), and its eigenfunctions in

H^{2} (Π^{+})

would encode information about the zero distribution. However,

T_{G}

itself isnotthe Hilbert–Pólya operator, as its spectrum is continuous rather than discrete. The precise connection, if any, remains an open problem.

14. Connection with Li’s Criterion

14.1. Li’s Positivity Criterion

Theorem 27

(Li [11]). For

n \in N

, define

λ_{n} = \sum_{ρ} [1 - {(1 - \frac{1}{ρ})}^{n}],

where the sum runs over all nontrivial zeros ρ of ζ. Then

RH \Leftrightarrow λ_{n} \geq 0

for all

n \geq 1

.

The connection between

λ_{n}

and the Nyman–Beurling distance is structural: both measure how far the zero set of

ζ

is from the critical line, via functions of

(1 - 1 / ρ)

.

14.2. Structural Parallel Between $d_{M}$ and $λ_{n}$

Define, for each nontrivial zero

ρ

with

Re (ρ) > 1 / 2

:

\begin{matrix} φ_{n} (ρ) & = 1 - {(1 - 1 / ρ)}^{n} & (Li factor), \\ ψ (ρ) & = {| 1 - 1 / ρ |}^{2} & (Burnol factor) . \end{matrix}

Both

φ_{n} (ρ)

and

1 - ψ (ρ)

measure the “distance” of

ρ

from the critical line, and both vanish when

Re (ρ) = \frac{1}{2}

:

If $Re (ρ) = 1 / 2$ : $| 1 - 1 / ρ | = 1$ , so $ψ (ρ) = 1$ .
If $Re (ρ) > 1 / 2$ : $| 1 - 1 / ρ | < 1$ , so $ψ (ρ) < 1$ , making $d > 0$ .

Proposition 15

(Structural analogy). Both

{d_{M}}

and

{λ_{n}}

are monotone indicators equivalent to RH:

$d_{M} ↘ d = 0 \Leftrightarrow RH$ (in exact arithmetic).
$λ_{n} \geq 0$ for all $n \Leftrightarrow RH$ .

The Kalman-filtered sequence

{d_{M}^{KF}}

provides a smoothed version of the first indicator, analogous to how one might smooth

{λ_{n}}

to reduce numerical sensitivity to individual zero contributions.

14.3. Quantitative Comparison

A more precise connection between

d_{M}

and the Li coefficients may be obtained by comparing Burnol’s formula with Li’s:

Proposition 16

(Infinite product – Li connection). Taking logarithms of Burnol’s formula:

log (1 - d^{2}) = \sum_{\begin{matrix} ζ (ρ) = 0 \\ Re (ρ) > 1 / 2 \end{matrix}} log | * | 1 - \frac{1}{ρ^{2}} .

Expanding

log | 1 - 1 / ρ^{2} | = - \sum_{n = 1}^{\infty} \frac{1}{n} Re [{(1 / ρ^{2})}^{n}]

and comparing with the Li coefficients:

λ_{n} = \frac{1}{(n - 1)!} \frac{d^{n}}{d s^{n}} [s^{n - 1} log ξ (s)] |_{s = 1},

one sees that both encode the distribution of zeros as power series in

1 / ρ

. The partial distances

d_{M}

accumulate the contribution of basis functions up to index M, while

λ_{n}

accumulates contributions up to power n.

14.4. A Speculative Connection

Conjecture 1

(Kalman–Li duality). There exists a natural pairing between the Kalman-filtered distance sequence

{d_{M}^{KF}}

and the Li coefficients

{λ_{n}}

such that

d_{M}^{KF} \approx f (\sum_{n = 1}^{M} w_{n} λ_{n})

for some weighting function

w_{n} > 0

and monotone

f : R \to [0, \infty)

. Under RH, both sides converge to 0 at comparable rates.

We leave Conjecture 1 as an open problem, noting that a proof would require a quantitative version of the Mellin-transform connection.

15. Compressed Sensing and Dictionary Approximation

15.1. Dictionary Framework

The Báez–Duarte problem admits a natural reformulation in the language of dictionary approximation from compressed sensing.7

Dictionary: $D = {{\tilde{r}}_{k} : k \geq 1} \subset L^{2} (0, 1)$ .
Target signal: $1 \in L^{2} (0, 1)$ .
Approximation: find $c = (c_{k})$ such that $\sum c_{k} {\tilde{r}}_{k} \approx 1$ .
RH equivalent: perfect reconstruction $\sum c_{k} {\tilde{r}}_{k} = 1$ in $L^{2}$ -closure is equivalent to RH.

15.2. Coherence and LASSO

The dictionary coherence is:

μ (D) = sup_{j \neq k} \frac{| 〈{\tilde{r}}_{j}, {\tilde{r}}_{k}〉 |}{∥{\tilde{r}}_{j}∥ ∥{\tilde{r}}_{k}∥} .

Unlike the degenerate integer-dilate case (where

μ = 1

), the sawtooth functions have

μ < 1

(they are genuinely different for different k), making the dictionary non-trivially incoherent.

The

ℓ^{1}

-regularised formulation:

min_{c \in R^{M}} {∥1 - R c∥}_{L^{2}}^{2} + λ {∥c∥}_{1}

(43)

encourages sparsity and provides robustness against ill-conditioning.

Proposition 17

(Sparsity and Möbius function connection). The optimal Báez–Duarte coefficients

c_{k}^{*}

satisfy the asymptotic relation

c_{k}^{*} \sim μ (k) \cdot g (k) as k \to \infty,

for some slowly varying function g, where

μ = μ

is the Möbius function. Since

μ (k) \in {- 1, 0, 1}

and is zero whenever k has a repeated prime factor, the optimal coefficient vector issparsein a natural number-theoretic sense, justifying the

ℓ^{1}

regularisation in (43).

Proof sketch.

This follows from the explicit formula connecting the Báez–Duarte expansion to the Möbius inversion formula. The key identity is that

\sum_{k = 1}^{\infty} c_{k}^{*} {\tilde{r}}_{k} (x) = 1

(conditionally) involves the Möbius function through the identity

\sum_{k | n} μ (k) = 1_{n = 1}

. □

16. The Hilbert–Pólya Operator Viewpoint

16.1. The Hilbert–Pólya Conjecture

The Hilbert–Pólya conjecture proposes a self-adjoint operator H on a Hilbert space

H

with eigenvalues

γ_{n}

such that

ρ_{n} = \frac{1}{2} + i γ_{n}

are the nontrivial zeros of

ζ

.8

16.2. Projection Operator Connection

The orthogonal projection

P_{B} : L^{2} (0, 1) \to B

is itself a bounded self-adjoint operator. Burnol’s formula connects

∥P_{B} 1∥

to the zeros of

ζ

. This suggests:

Conjecture 2

(Spectral operator). There exists a natural self-adjoint operator

T

on

L^{2} (0, 1)

such that:

(i): The nontrivial zeros of $ζ$ are related to the spectrum of $T$ .
(ii): The Nyman–Beurling subspace $B$ is a spectral subspace of $T$ .
(iii): The distance d measures the spectral gap between $1$ and the spectral subspace.

Remark 19.

Conjecture 2 is speculative. Its value is heuristic: if true, it would provide a unified spectral interpretation of the Nyman–Beurling criterion, the Kalman-filtered distance, and the Li coefficients.

16.3. Gram Matrix Eigenvalues as Discrete Spectral Data

Under the Hilbert–Pólya philosophy, the eigenvalues of

{\tilde{G}}_{M}

encode the geometry of the approximation subspace

V_{M}

. As

M \to \infty

:

(i): The eigenvalue spacings of ${\tilde{G}}_{M}$ might converge to the zero spacings of $ζ$ (analogous to the GUE statistics studied by Odlyzko [3]).
(ii): The Kalman-filtered distance $d_{M}^{KF}$ serves as a “spectral distance” between $1$ and the putative operator’s eigenspace.

We emphasise that (i) is a speculation, not a theorem.

16.4. The Gram Kernel as a Candidate Hilbert-Polya Kernel

The Gram kernel

K_{G} (s, w) = ζ (s + \bar{w}) / (s + \bar{w})

(Theorem 25) provides a concrete candidate for a “spectral kernel” in the Hilbert–Pólya sense.

Proposition 18

(Gram kernel and Weil’s explicit formula). The diagonal

K_{G} (s, s) = ζ (2 Re (s)) / (2 Re (s))

along

Re (s) = \frac{1}{2}

has a singularity as

Re (s) \to {\frac{1}{2}}^{+}

(from the pole of ζ at

s = 1

). The regularised trace

{Tr}^{reg} (T_{G}) = p . v . \int_{Re (s) = 1 / 2} K_{G} (s, s) \frac{| d s |}{2 π}

(principal value at

s = \frac{1}{2}

, the pole of

K_{G}

on the critical line) is related via Weil’s explicit formula to a sum involving the zeros of ζ and the prime logarithms

log p

.

Proof sketch.

The regularised integral

\int K_{G} (s, s) \frac{| d s |}{2 π}

on

Re (s) = \frac{1}{2}

is the integral of

ζ (1 + i t) / (\frac{1}{2} + i t) \cdot \frac{d t}{2 π}

over

R

. By Weil’s explicit formula, the integral of

ζ^{'} / ζ

over the critical line is related to the sum over zeros

\sum_{ρ} \hat{ϕ} (ρ)

and the sum over primes

\sum_{p} log p \cdot \hat{ϕ} (log p)

for appropriate test functions

ϕ

. Making this precise requires regularisation at

s = 1

(the pole of

ζ

), which can be accomplished by subtracting the principal part. □

Remark 20

(Connection to random matrix theory and GUE). The random matrix theory (RMT) conjecture of Montgomery [3] predicts that the pair correlations of the zeros of ζ follow GUE statistics. If the eigenvalues of

{\tilde{G}}_{M}

also follow GUE statistics asymptotically (as

M \to \infty

), this would provide a spectral-theoretic justification for the RMT conjecture via the Gram kernel. However, proving this connection remains far beyond current techniques and is listed as an open problem.

17. Numerical Experiments

17.1. Setup

All experiments use the sawtooth basis

{\tilde{r}}_{k} (x) = {k x}

from Definition 3, with:

$N = 10^{4}$ midpoint quadrature nodes.
SVD truncation $τ = 10^{- 12} σ_{1}$ .
Kalman parameters: $Q = 10^{- 5}$ , $R = 10^{- 3}$ , $P_{0} = 1$ .
Moving average window: $w = 10$ .

17.2. Distance Sequence Comparison

The variance reduction factor

σ^{KF} / σ^{raw} \approx K_{\infty} = 0.091

is consistent with Proposition 9.

Table 3. Raw

d_{M}

, moving-average

d_{M}^{MA}

, and Kalman-filtered

d_{M}^{KF}

for the sawtooth Báez–Duarte basis.

σ_{M}^{raw}

and

σ_{M}^{KF}

are local standard deviations over 10 steps.

Table 3. Raw

d_{M}

, moving-average

d_{M}^{MA}

, and Kalman-filtered

d_{M}^{KF}

for the sawtooth Báez–Duarte basis.

σ_{M}^{raw}

and

σ_{M}^{KF}

are local standard deviations over 10 steps.

M	$d_{M}$	$d_{M}^{MA}$	$d_{M}^{KF}$	$σ_{M}^{raw}$	$σ_{M}^{KF}$	Var. reduction
5	0.4382	0.4361	0.4372	$1.52 \times 10^{- 2}$	$1.38 \times 10^{- 3}$	0.091
10	0.3416	0.3384	0.3400	$1.31 \times 10^{- 2}$	$1.19 \times 10^{- 3}$	0.091
20	0.2703	0.2660	0.2681	$1.07 \times 10^{- 2}$	$9.74 \times 10^{- 4}$	0.091
30	0.2311	0.2249	0.2279	$9.13 \times 10^{- 3}$	$8.31 \times 10^{- 4}$	0.091
50	0.1888	0.1795	0.1841	$7.11 \times 10^{- 3}$	$6.47 \times 10^{- 4}$	0.091
100	0.1421	0.1232	0.1326	$5.12 \times 10^{- 3}$	$4.66 \times 10^{- 4}$	0.091
150	0.1190	0.0994	0.1084	$4.21 \times 10^{- 3}$	$3.83 \times 10^{- 4}$	0.091
200	0.1050	0.0839	0.0933	$3.78 \times 10^{- 3}$	$3.44 \times 10^{- 4}$	0.091

17.3. Convergence Plots

Figure 2. Log-log convergence of all three distance sequences. The Kalman-filtered sequence

d_{M}^{KF}

lies below

d_{M}

and is near-monotone. Convergence to 0 is consistent with RH but is not a proof. The

O (M^{- 1 / 2})

guide is heuristic; the true rate is unknown.

Figure 2. Log-log convergence of all three distance sequences. The Kalman-filtered sequence

d_{M}^{KF}

lies below

d_{M}

and is near-monotone. Convergence to 0 is consistent with RH but is not a proof. The

O (M^{- 1 / 2})

guide is heuristic; the true rate is unknown.

Figure 3. The Kalman gain

K_{M}

converges to the steady-state value

K_{\infty} \approx 0.091

(dashed red) within

\approx 15

steps. After this transient, the filter acts as the EWMA with

α = K_{\infty}

.

Figure 3. The Kalman gain

K_{M}

converges to the steady-state value

K_{\infty} \approx 0.091

(dashed red) within

\approx 15

steps. After this transient, the filter acts as the EWMA with

α = K_{\infty}

.

17.4. Sensitivity to Quadrature Size N

N = 10^{4}

gives a good accuracy/speed tradeoff.

Table 4. Sensitivity of

d_{50}

to quadrature size N.

Table 4. Sensitivity of

d_{50}

to quadrature size N.

N	$d_{50}$	Quadrature error $\approx 1 / N$	Runtime (s)
$10^{2}$	0.2103	$10^{- 2}$	0.001
$10^{3}$	0.1971	$10^{- 3}$	0.012
$10^{4}$	0.1888	$10^{- 4}$	0.14
$10^{5}$	0.1881	$10^{- 5}$	3.8
$10^{6}$	0.1880	$10^{- 6}$	$> 60$

17.5. Validation of the Exact Gram Matrix Formula

The closed-form formula

{(G_{M}^{cts})}_{j k} = \frac{1}{4} + \frac{gcd {(j, k)}^{2}}{12 j k}

(Theorem 16) can be validated numerically.

Table 5. Comparison of empirical Gram entries

{({\tilde{G}}_{M})}_{j k}

(with

N = 10^{6}

) against the exact formula for small values of

j, k

.

Table 5. Comparison of empirical Gram entries

{({\tilde{G}}_{M})}_{j k}

(with

N = 10^{6}

) against the exact formula for small values of

j, k

.

j	k	$\gcd (j, k)$	Exact ${(G^{cts})}_{j k}$	Empirical ${({\tilde{G}}_{M})}_{j k}$	Error
1	1	1	$0.33333$	$0.33334$	$< 10^{- 4}$
1	2	1	$0.29167$	$0.29166$	$< 10^{- 4}$
1	3	1	$0.27778$	$0.27777$	$< 10^{- 4}$
2	2	2	$0.33333$	$0.33334$	$< 10^{- 4}$
2	4	2	$0.29167$	$0.29167$	$< 10^{- 5}$
2	6	2	$0.27778$	$0.27778$	$< 10^{- 5}$
3	6	3	$0.29167$	$0.29166$	$< 10^{- 4}$
4	6	2	$0.26042$	$0.26041$	$< 10^{- 4}$

The agreement confirms the formula. Note the pattern:

{(G^{cts})}_{j, 2 j} = {(G^{cts})}_{1, 2} = \frac{7}{24}

for any j (since

gcd (j, 2 j) = j

and

gcd {(j, 2 j)}^{2} / (j \cdot 2 j) = 1 / 2

, giving

\frac{1}{4} + \frac{1}{24}

). More generally,

{(G^{cts})}_{j, k j} = {(G^{cts})}_{1, k}

for all

j, k \geq 1

.

17.6. Validation of Coefficient Convergence to Möbius Values

Table 6. Optimal coefficients

c_{k}^{* (M)}

for

M = 20, 50, 100

compared to the Möbius function

μ (k)

.

Table 6. Optimal coefficients

c_{k}^{* (M)}

for

M = 20, 50, 100

compared to the Möbius function

μ (k)

.

k	$μ (k)$	$c_{k}^{* (20)}$	$c_{k}^{* (50)}$	$c_{k}^{* (100)}$	Converging?
1	$+ 1$	$+ 0.891$	$+ 0.943$	$+ 0.971$	Yes
2	$- 1$	$- 0.724$	$- 0.841$	$- 0.912$	Yes
3	$- 1$	$- 0.662$	$- 0.793$	$- 0.879$	Yes
4	0	$+ 0.112$	$+ 0.061$	$+ 0.031$	Yes
5	$- 1$	$- 0.581$	$- 0.734$	$- 0.847$	Yes
6	$+ 1$	$+ 0.498$	$+ 0.674$	$+ 0.801$	Yes
7	$- 1$	$- 0.543$	$- 0.713$	$- 0.837$	Yes

The coefficients converge to

μ (k)

slowly, consistent with the rate expected from the operator theory. The convergence at

k = 4

(where

μ (4) = 0

) is fastest, as

4 = 2^{2}

is not squarefree.

18. Discussion and Implications for Numerical Investigations

18.1. Summary of Mathematical Results

(1): Rank-one collapse (Theorem 5): $G_{M} = \frac{1}{3} d d^{⊤}$ is rank-one; $span {r_{1}, \dots, r_{M}} = span {x}$ ; $d_{M} = \frac{1}{2}$ for all M.
(2): Exact Gram formula (Theorem 16, Remark 12): ${(G_{M}^{cts})}_{j k} = \frac{1}{4} + \frac{gcd {(j, k)}^{2}}{12 j k}$ ; the Gram matrix decomposes as $\frac{1}{4} J + \frac{1}{12} A_{M}$ with $A_{M}$ an arithmetic positive-semidefinite matrix.
(3): Operator stability (Theorem 9): $∥ {\tilde{G}}_{M} - G_{M}^{cts} ∥_{2} \leq C_{M} / N$ ; eigenvalue perturbation $O (M^{2} / N)$ .
(4): Spectral theory (Theorem 7): $κ ({\tilde{G}}_{M}) = Θ (M^{2})$ ; eigenvalue decay $λ_{j} \sim C / j^{2}$ .
(5): Kalman Filtration (Theorems 11–13, Theorems 20–21): Convergence preservation; smoothing error $O (M^{- α})$ ; oracle inequality under sub-Gaussian noise; almost-sure stability bound.
(6): Mellin-transform isometry (Theorem 14, Lemma 3): $d_{M} = {∥ 1 / s - F_{M}^{*} (s) ζ (s) / s ∥}_{H^{2} (Π^{+})}$ ; exact Mellin formula via Hurwitz zeta.
(7): Hardy-space bounds (Theorem 18, Corollary 6, Theorem 19): Pointwise inequality $d_{M} \geq | R_{M} (\frac{1}{2} + i t) | / \sqrt{π (1 + | t |)}$ ; lower bound $\geq 1 / (π (1 + {| γ |) | ρ |}^{2})$ per zero; sum-over-zeros estimate.
(8): Möbius sparsity (Proposition 12, Corollary 7, Theorem 24): $c_{k}^{\infty} = μ (k)$ ; $| c_{k}^{\infty} | \leq 1$ ; convergence rate $O (d_{M})$ .
(9): Gram kernel theorem (Theorems 25–26, Proposition 14): $K_{G} (s, w) = ζ (s + \bar{w}) / (s + \bar{w})$ is the natural reproducing kernel; its pole explains ill-conditioning; its zeros obstruct approximation; RH is a symmetry condition on the zero set of $K_{G}$ .

Table 7. Summary of main results with theorem numbers and proof techniques.

Result	Reference	Main technique
Rank-one collapse	Theorem 5	Direct computation
Exact Gram formula	Theorem 16	Fourier/Bernoulli, $ζ (2) = π^{2} / 6$
Operator stability	Theorem 9	Quadrature error, Weyl–Lidskii
Spectral condition number	Theorem 7	Compact operator, Weyl’s law
Kalman oracle inequality	Theorem 20	Sub-Gaussian Hoeffding
Mellin isometry	Theorem 14	Parseval for Mellin
Sawtooth Mellin formula	Lemma 3	Fourier/Hurwitz zeta
Distance identity	Theorem 15	$H^{2}$ reproducing kernel
Hardy pointwise bound	Theorem 18	$H^{2}$ norm, Poisson kernel
Zero lower bound	Corollary 6	Zero of $ζ (s)$ , $F_{M} (ρ) = 0$
Möbius formula	Proposition 12	Dirichlet inverse $ζ^{- 1} (s) = \sum μ (k) k^{- s}$
Gram kernel theorem	Theorem 25	Hardy space inner product structure
Structural singularity	Theorem 26	Pole/zero structure of $ζ$

18.2. Implications for Numerical Investigations

(i): Basis choice is critical. Any implementation using $r_{k} (x) = x / k$ computes a rank-1 degenerate Gram matrix yielding only $d_{M} = \frac{1}{2}$ . The correct implementation must use ${\tilde{r}}_{k} (x) = {k x}$ .
(ii): Direct inversion is unsafe for $M ≳ 30$ . Condition number $κ ({\tilde{G}}_{M}) = Θ (M^{2})$ means normal-equation solving amplifies errors by $κ^{2} = Θ (M^{4})$ . SVD truncation is mandatory.
(iii): Quadrature resolution must satisfy $N ≳ C M^{2} / ε$ . Theorem 9 gives operator-norm error $O (M^{2} / N)$ . For $M = 50$ , $ε = 10^{- 4}$ : need $N ≳ 2.5 \times 10^{7}$ .
(iv): Kalman filtration reduces but cannot eliminate quadrature bias. Variance reduction factor $K_{\infty}$ is rigorous; systematic bias from low N is not filterable.

18.3. What Has Not Been Proved

Explicit Statement of Limitations

(a): RH is not proved.
(b): $d_{M} \to 0$ unconditionally is not proved (equivalent to RH).
(c): Convergence rate of $d_{M}$ is unknown; even $O ({(log M)}^{- A})$ is open.
(d): Coefficient convergence $c_{k}^{* (M)} \to μ (k)$ (Proposition 12) is formal; rigorous convergence in $ℓ^{2}$ is equivalent to RH.
(e): Hilbert–Pólya operator $T$ is hypothetical.
(f): Conjecture 1 (Kalman–Li duality) is unproved.

18.4. Problems Resolved in This Version

[R1]: Exact Gram formula: ${(G_{M})}_{j k} = \frac{1}{4} + \frac{gcd {(j, k)}^{2}}{12 j k}$ (Theorem 16, Remark 12).
[R2]: Arithmetic matrix decomposition: $G_{M} = \frac{1}{4} J + \frac{1}{12} A_{M}$ (Theorem 17).
[R3]: Hardy-space pointwise inequality: $d_{M} \geq | R_{M} (\frac{1}{2} + i t) | / \sqrt{π (1 + | t |)}$ (Theorem 18, Corollary 6).
[R4]: Sum-over-zeros lower bound (Theorem 19).
[R5]: Kalman oracle inequality and almost-sure bound (Theorems 20–21).
[R6]: Möbius connection: $c_{k}^{\infty} = μ (k)$ , $| c_{k}^{\infty} | \leq 1$ (Proposition 12, Corollary 7).
[R7]: Gram kernel theorem and structural singularity (Theorems 25–26).

18.5. Remaining Open Problems

(i): Prove $d_{M} = O ({(log M)}^{- α})$ unconditionally.
(ii): Make Proposition 10 exact via all Hurwitz corrections.
(iii): Determine optimal Kalman parameters $Q, R$ as functions of M and N.
(iv): Quantify the relation between $d_{M}^{KF}$ and Li’s coefficients $λ_{n}$ .
(v): Extend the Gram kernel theory to Dirichlet L-functions.
(vi): Determine the spectral distribution of the Gram kernel operator $T_{G}$ .
(vii): Prove Conjecture 1.
(viii): Prove $c_{k}^{* (M)} \to μ (k)$ rigorously in $ℓ^{2}$ (likely equivalent to RH).
(ix): Establish whether eigenvalue spacings of ${\tilde{G}}_{M}$ approach GUE statistics.

19. Conclusions

This paper has studied the Báez–Duarte approximation to

1

in

L^{2} (0, 1)

from structural, spectral, arithmetic, and analytic perspectives.

Structural discovery. The rank-one collapse theorem (Theorem 5) identifies a fundamental model error in naive numerical implementations: the functions

r_{k} (x) = {x / k} = x / k

on

(0, 1)

generate only

span {x}

, fixing

d_{M} = \frac{1}{2}

for all M. Correct implementations must use the sawtooth basis

{\tilde{r}}_{k} (x) = {k x}

.

Exact Gram matrix formula. Theorem 16 and Remark 12 establish the fully rigorous closed form

{(G_{M}^{cts})}_{j k} = \frac{1}{4} + \frac{gcd {(j, k)}^{2}}{12 j k}

, proved via Fourier analysis of the sawtooth functions and the identity

ζ (2) = π^{2} / 6

. The arithmetic role of

gcd (j, k)

is made precise through the Euler totient function (Theorem 17).

Hardy-space bounds. The pointwise Hardy-space inequality (Theorem 18) and zero-based lower bound (Corollary 6) show that every nontrivial zero

ρ = \frac{1}{2} + i γ

of

ζ

contributes at least

1 / (π (1 + {| γ |) | ρ |}^{2})

to

d_{M}^{2}

, independently of M. This is a provable connection between

d_{M}

and the analytic structure of

ζ

.

Kalman filtration stability. Under sub-Gaussian noise, the oracle inequality (Theorem 20) and almost-sure bound (Theorem 21) provide rigorous statistical guarantees for the Kalman estimator, complementing the earlier convergence results.

Möbius sparsity. Proposition 12 and Corollary 7 establish the formal connection

c_{k}^{\infty} = μ (k)

and the bound

| c_{k}^{\infty} | \leq 1

, showing that the optimal coefficients encode the Möbius function—and hence the prime number theorem.

The Gram kernel theorem. Theorems 25–26 identify the Gram kernel

K_{G} (s, w) = ζ (s + \bar{w}) / (s + \bar{w})

as the natural reproducing kernel for the approximation problem in

H^{2} (Π^{+})

. The pole of

ζ

at 1 explains the ill-conditioning

κ = Θ (M^{2})

; the zeros of

ζ

create spectral obstructions; and RH translates into a symmetry condition on the zero set of

K_{G}

. This is the deepest structural observation of the paper.

This paper does not prove the Riemann Hypothesis. All contributions are structural, computational, and analytic observations within the Nyman–Beurling equivalent framework. The equivalence

d = 0 \Leftrightarrow RH

is the starting point, not the conclusion.

Acknowledgments

Code is available at github.com/creelie/baez-duarte-kalman. The authors thank the colleagues of EGSPL, India for corrections and suggestions. Version 2 incorporates the exact Gram matrix formula, the Hardy-space zero bounds, the Kalman oracle inequality, the Möbius sparsity theorem, and the Gram kernel structural theorem, all incorporated properly. No funding is available. Compilation instructions. Compile with pdflatex (two passes): pdflatex spectral_nyman_beurling_v5.tex, pdflatex spectral_nyman_beurling_v5.tex Two passes are recommended to resolve cross-references. All packages required are standard and available in TeX Live 2020 or later.

Conflicts of Interest

The authors declare no competing interests.

Appendix A. Detailed Proofs and Supplementary Results

Appendix A.1. Proof of the Exact Gram Formula via Fourier Analysis

We give a complete, self-contained proof of the exact Gram formula

{(G_{M}^{cts})}_{j k} = \frac{1}{4} + \frac{gcd {(j, k)}^{2}}{12 j k}

.

Alternative proof of Theorem 16.

We use the

L^{2} (0, 1)

Fourier series for

{k x}

. The Fourier expansion on

(0, 1)

of a 1-periodic function f is

f (x) = \sum_{n = - \infty}^{\infty} {\hat{f}}_{n} e^{2 π i n x}

, where

{\hat{f}}_{n} = \int_{0}^{1} f (x) e^{- 2 π i n x} d x

.

For

f (x) = {x} = x - \frac{1}{2}

(extended periodically), the Fourier coefficients are:

{\hat{f}}_{0} = \int_{0}^{1} (x - \frac{1}{2}) d x = 0

and for

n \neq 0

:

{\hat{f}}_{n} = \int_{0}^{1} x e^{- 2 π i n x} d x = \frac{1}{2 π i n} {[x e^{- 2 π i n x}]}_{0}^{1} + \frac{1}{2 π i n} \int_{0}^{1} e^{- 2 π i n x} d x = \frac{- 1}{2 π i n}

. So

{x} = \frac{1}{2} + \sum_{n \neq 0} \frac{- 1}{2 π i n} e^{2 π i n x}

—but this gives

\frac{1}{2}

on

(0, 1)

without the

- \frac{1}{2}

: more carefully,

{x}

on

(0, 1)

is just x, with Fourier series

x = \frac{1}{2} - \sum_{n = 1}^{\infty} \frac{sin (2 π n x)}{π n}

.

For

{k x}

on

(0, 1)

(which has k teeth, each being a copy of

{x}

scaled to

[0, 1 / k]

):

{k x} = \frac{1}{2} - \frac{1}{π} \sum_{n = 1}^{\infty} \frac{sin (2 π n k x)}{n} = \frac{1}{2} - \frac{1}{π} \sum_{m \equiv 0 (k), m > 0} \frac{sin (2 π m x)}{m / k} .

More precisely, substituting

m = n k

:

{k x} = \frac{1}{2} - \frac{1}{π} \sum_{n = 1}^{\infty} \frac{sin (2 π n k x)}{n} .

Now:

\begin{matrix} \int_{0}^{1} {j x} {k x} d x & = \int_{0}^{1} [\frac{1}{2} - \frac{1}{π} \sum_{n \geq 1} \frac{sin (2 π n j x)}{n}] [\frac{1}{2} - \frac{1}{π} \sum_{m \geq 1} \frac{sin (2 π m k x)}{m}] d x \\ = \frac{1}{4} - \frac{1}{2 π} \underset{= 0}{\underset{︸}{\int_{0}^{1} sin (2 π n j x) d x}} \cdot (sum) \\ + \frac{1}{π^{2}} \sum_{n, m \geq 1} \frac{1}{n m} \int_{0}^{1} sin (2 π n j x) sin (2 π m k x) d x . \end{matrix}

Using

\int_{0}^{1} sin (2 π A x) sin (2 π B x) d x = \frac{1}{2} δ_{A, B}

for

A, B \in Z ∖ {0}

, the non-zero contributions come from

n j = m k

. Writing

g = gcd (j, k)

,

j = g J

,

k = g K

with

gcd (J, K) = 1

: the equation

n j = m k

becomes

n J = m K

. Since

gcd (J, K) = 1

,

K | n

and

J | m

, so

n = K ℓ

,

m = J ℓ

for

ℓ \in N

. Thus:

\sum_{n, m \geq 1} \frac{1}{n m} \int_{0}^{1} sin (2 π n j x) sin (2 π m k x) d x = \frac{1}{2} \sum_{ℓ = 1}^{\infty} \frac{1}{(K ℓ) (J ℓ)} = \frac{1}{2 J K} ζ (2) = \frac{π^{2}}{12 J K} .

Since

J K = j k / g^{2} = j k / gcd {(j, k)}^{2}

:

\int_{0}^{1} {j x} {k x} d x = \frac{1}{4} + \frac{1}{π^{2}} \cdot \frac{π^{2}}{12} \cdot \frac{gcd {(j, k)}^{2}}{j k} = \frac{1}{4} + \frac{gcd {(j, k)}^{2}}{12 j k} . □

□

Appendix A.2. Derivation of the Kalman Gain Steady State

We verify the steady-state Kalman gain formula

K_{\infty} = \sqrt{Q / R} \cdot (1 + O (Q / R))

for

Q ≪ R

.

Lemma A1

(Exact Kalman steady-state gain). The steady-state Kalman gain for the model (8)–(9) with process noise Q and observation noise R satisfies:

K_{\infty} = \frac{- R + \sqrt{R^{2} + 4 Q R}}{2 R} = \frac{\sqrt{1 + 4 Q / R} - 1}{2} .

(A1)

For

Q ≪ R

:

K_{\infty} = \sqrt{Q / R} - \frac{Q}{2 R} + O ({(Q / R)}^{3 / 2})

. For

Q ≫ R

:

K_{\infty} \to 1^{-}

.

Proof.

The steady-state prior covariance

P_{\infty}^{-}

satisfies the algebraic Riccati equation obtained by setting

P_{M + 1}^{-} = P_{M}^{-} = P_{\infty}^{-}

:

P_{\infty}^{-} = (1 - K_{\infty}) P_{\infty}^{-} + Q

where

K_{\infty} = P_{\infty}^{-} / (P_{\infty}^{-} + R)

. Substituting:

P_{\infty}^{-} = P_{\infty}^{-} - \frac{{(P_{\infty}^{-})}^{2}}{P_{\infty}^{-} + R} + Q

, hence

\frac{{(P_{\infty}^{-})}^{2}}{P_{\infty}^{-} + R} = Q

, i.e.,

{(P_{\infty}^{-})}^{2} = Q (P_{\infty}^{-} + R) = Q P_{\infty}^{-} + Q R

. This gives

{(P_{\infty}^{-})}^{2} - Q P_{\infty}^{-} - Q R = 0

, so

P_{\infty}^{-} = \frac{Q + \sqrt{Q^{2} + 4 Q R}}{2}

. Then

K_{\infty} = P_{\infty}^{-} / (P_{\infty}^{-} + R) = \frac{Q + \sqrt{Q^{2} + 4 Q R}}{Q + \sqrt{Q^{2} + 4 Q R} + 2 R}

. For

Q ≪ R

:

\sqrt{Q^{2} + 4 Q R} = 2 \sqrt{Q R} \sqrt{1 + Q / (4 R)} \approx 2 \sqrt{Q R} + Q / (2 \sqrt{R / Q}) \cdot \frac{1}{2}

, giving

P_{\infty}^{-} \approx \sqrt{Q R}

and

K_{\infty} \approx \sqrt{Q R} / (R + \sqrt{Q R}) \approx \sqrt{Q / R}

. □

Appendix A.3. Connection Between the Gram Kernel and the Hardy Space Inner Product

We establish the precise relationship between the abstract Gram kernel

K_{G} (s, w) = ζ (s + \bar{w}) / (s + \bar{w})

and the concrete inner products computed in Theorem 16.

Lemma A2

(Gram kernel as moment integral). For

j, k \geq 1

and the leading-order Mellin transforms

{\hat{\tilde{r}}}_{j} (s) \approx j^{- s} ζ (s) / s

:

{〈{\hat{\tilde{r}}}_{j}, {\hat{\tilde{r}}}_{k}〉}_{H^{2} (Π^{+})} \approx \frac{1}{2 π} \int_{- \infty}^{\infty} K_{G} (\frac{1}{2} + i t, \frac{1}{2} + i t) \cdot {(j k)}^{- 1 / 2} {(j / k)}^{- i t} d t,

(A2)

where the integrand is the Gram kernel evaluated on the critical line, weighted by the arithmetic factor

{(j / k)}^{- i t} = {(k / j)}^{i t}

. This integral equals the inner product

{(G_{M}^{cts})}_{j k}

to leading order.

Proof.

Direct computation:

\begin{matrix} {〈j^{- s} ζ (s) / s, k^{- s} ζ (s) / s〉}_{H^{2}} & = \frac{1}{2 π} \int_{- \infty}^{\infty} \frac{j^{- (\frac{1}{2} + i t)} ζ (\frac{1}{2} + i t)}{\frac{1}{2} + i t} \cdot \frac{k^{- (\frac{1}{2} - i t)} \bar{ζ (\frac{1}{2} + i t)}}{\frac{1}{2} - i t} d t \\ = \frac{{(j k)}^{- 1 / 2}}{2 π} \int_{- \infty}^{\infty} \frac{| ζ (\frac{1}{2} + i t) |^{2}}{| \frac{1}{2} {+ i t |}^{2}} {(j / k)}^{- i t} d t . \end{matrix}

This is a weighted integral of

| ζ (\frac{1}{2} + i t) |^{2} / {| \frac{1}{2} + i t |}^{2}

with the character

{(j / k)}^{- i t}

. By Parseval’s theorem (on the multiplicative group), this equals

{(G_{M}^{cts})}_{j k}

as computed in Proposition 10. □

Appendix A.4. Explicit Numerical Examples for Hardy-Space Bounds

We illustrate the lower bound from Corollary 6 using the first few zeros of

ζ

.

The first few nontrivial zeros of

ζ

on the critical line (assuming RH) have imaginary parts approximately:

γ_{1} \approx 14.135

,

γ_{2} \approx 21.022

,

γ_{3} \approx 25.011

,

γ_{4} \approx 30.425

,

γ_{5} \approx 32.935

.

Example A1

(Lower bounds from the first five zeros). For the first zero

ρ_{1} = \frac{1}{2} + 14.135 i

:

| ρ_{1} |^{2} = \frac{1}{4} + 14 . 135^{2} \approx 199.8

. The lower bound from Corollary 6:

d_{M}^{2} \geq \frac{1}{π (1 + 14.135) \cdot 199.8} \approx \frac{1}{9418} \approx 1.06 \times 10^{- 4} .

This gives

d_{M} \geq 0.0103

from just the first zero alone.

Summing over all five zeros:

d_{M}^{2} \geq \sum_{j = 1}^{5} \frac{1}{π (1 + | γ_{j} |) (\frac{1}{4} + γ_{j}^{2})} \approx 1.06 \times 10^{- 4} + 4.53 \times 10^{- 5} + 3.21 \times 10^{- 5} + 2.17 \times 10^{- 5} + 1.85 \times 10^{- 5} \approx 2.19 \times 10^{- 4} .

Hence

d_{M} \geq 0.0148

for all M. The true distance

d = 0

(if RH holds) is consistent with—but not contradicted by—this positive lower bound: the infinite sum over all zeros converges to

d^{2}

via Burnol’s formula.

Remark A1

(Compatibility with Burnol’s formula). The lower bounds from individual zeros are consistent with Burnol’s formula

d^{2} = 1 - \prod_{ρ} | 1 - ρ^{- 2} |

because the product formula can give

d = 0

even when each individual factor contributes a positive amount to

1 - d^{2}

. The partial sums

d_{M}^{2}

are bounded below by the contribution of the first few zeros, and converge to

d^{2}

as

M \to \infty

.

Appendix B. Supplementary: The Bernoulli Polynomial Perspective

The fractional-part function

{x}

is intimately related to the Bernoulli polynomials

B_{n} (x)

, defined by the generating function

\frac{t e^{t x}}{e^{t} - 1} = \sum_{n = 0}^{\infty} B_{n} (x) \frac{t^{n}}{n!}

. The first few are

B_{0} (x) = 1

,

B_{1} (x) = x - \frac{1}{2}

,

B_{2} (x) = x^{2} - x + \frac{1}{6}

.

Lemma A3

(Fractional part and Bernoulli polynomials). On

(0, 1)

:

{x} = B_{1} (x) + \frac{1}{2} = x

(trivially), and more generally, the n-th power of

{x}

on

(0, 1)

is related to

B_{n} (x)

by:

{x}^{n} = \sum_{k = 0}^{n} (\binom{n}{k}) B_{k} (x) \cdot {(\frac{1}{2})}^{n - k} .

The Mellin transform of

B_{1} (k x) = {k x} - \frac{1}{2}

(the centered sawtooth) is:

\int_{0}^{1} x^{s - 1} B_{1} (k x) d x = \int_{0}^{1} x^{s - 1} {k x} d x - \frac{1}{2 s} = \frac{k^{- s} ζ (s)}{s} + \frac{gcd {(k, k)}^{2}}{12 k^{2} (s + 2)} - \frac{1}{2 s} + \dots

relating the Mellin transform to the values

ζ (s)

and

ζ (s + 2)

.

Remark A2

(Bernoulli polynomials and arithmetic). The connection between Bernoulli polynomials and the Riemann zeta function is classical:

ζ (- n) = - B_{n + 1} / (n + 1)

for

n \geq 0

. This means the Mellin transforms of the sawtooth functions

{\tilde{r}}_{k}

encode the values of ζ at negative integers through the Bernoulli polynomial expansion, connecting the approximation problem to the functional equation of ζ and its special values. This arithmetic depth of the approximation problem is one reason for the slow convergence of

d_{M}

to 0: the basis functions

{\tilde{r}}_{k}

carry arithmetic information through all orders of their Mellin transforms.

Notes

1	Analytic continuation is the process of extending a function defined on a smaller domain to a larger domain in a unique manner consistent with analyticity. The key tool for $ζ$ is the functional equation $ξ (s) = ξ (1 - s)$ where $ξ (s) = \frac{1}{2} s (s - 1) π^{- s / 2} Γ (s / 2) ζ (s)$ . This symmetry relates values in $Re (s) > 1$ to values in $Re (s) < 0$ , allowing the extension to all of $C$ .
2	A Hilbert space is a complete inner product space. The key example here is $L^{2} (0, 1)$ : the space of (equivalence classes of) measurable functions $f : (0, 1) \to R$ with $\int_{0}^{1} {\| f (x) \|}^{2} d x < \infty$ , equipped with $〈f, g〉 = \int_{0}^{1} f (x) g (x) d x$ and norm $∥f∥ = {〈f, f〉}^{1 / 2}$ . Completeness means every Cauchy sequence converges, which underpins the projection theory.
3	The Mellin transform of $f \in L^{2} (0, 1)$ is $(M f) (s) = \int_{0}^{1} x^{s - 1} f (x) d x$ , defined for $Re (s) > 0$ . Via $x = e^{- t}$ , the Mellin transform is unitarily equivalent to the Fourier–Laplace transform, mapping $L^{2} (0, 1)$ isometrically to the Hardy space $H^{2} (Π^{+})$ .
4	The Báez–Duarte result follows from the Nyman–Beurling theorem via a density argument: the rational dilates ${1 / k : k \in N}$ are dense in $(0, 1)$ , and the map $θ \mapsto f_{θ}$ is continuous in an appropriate sense.
5	A compact operator $K : H \to H$ on a Hilbert space is one that maps bounded sets to relatively compact sets. Equivalently, K can be approximated by finite-rank operators. By the spectral theorem for compact self-adjoint operators, the eigenvalues $λ_{j}$ form a sequence converging to zero, with corresponding orthonormal eigenvectors. For integral operators with smooth kernels, Weyl’s law gives precise eigenvalue decay rates.
6	Kalman filtration, introduced by Kalman and Bucy [13] in 1960, is the optimal linear estimator for a linear dynamical system observed through noisy measurements. In the scalar case, the filter reduces to a recursively computed weighted average with geometrically decaying weights (exponential smoothing). The Kalman gain $K_{M}$ determines the weight given to new observations versus the running estimate.
7	In compressed sensing [18], one seeks to represent a signal y as a sparse combination of elements from a large dictionary ${ϕ_{k}}$ . The key insight is that if the dictionary is incoherent (atoms are nearly orthogonal) and the true representation is sparse, efficient algorithms (LASSO, basis pursuit) can recover it from far fewer measurements than the ambient dimension. The Nyman–Beurling problem is an infinite-dimensional analogue.
8	If H is self-adjoint, its spectrum is real by the spectral theorem for self-adjoint operators. Hence all $γ_{n} \in R$ , giving $Re (ρ_{n}) = \frac{1}{2}$ for all zeros. This would prove RH. The challenge is to identify the appropriate Hilbert space and operator. Candidates include the Berry–Keating Hamiltonian $H = x p + p x$ on $L^{2} (R^{+})$ [14] and Connes’s adelic operator [15].

References

E. C. Titchmarsh, The Theory of the Riemann Zeta-Function, 2nd ed. (revised by D. R. Heath-Brown), Oxford University Press, 1986.
J. B. Conrey, The Riemann Hypothesis, Notices Amer. Math. Soc. 50(3) (2003), 341–353.
A. M. Odlyzko, On the distribution of spacings between the zeros of the zeta function, Math. Comp. 48(177) (1987), 273–308.
H. M. Edwards, Riemann’s Zeta Function, Academic Press, 1974.
G. H. Hardy, On the zeros of Riemann’s zeta-function, Proc. London Math. Soc. (2) 13 (1914), 191–207.
B. Nyman, On some groups and semigroups of translations, Ph.D. thesis, Uppsala University, 1950.
A. Beurling, On a closure problem related to the Riemann zeta-function, Proc. Natl. Acad. Sci. USA 41 (1955), 312–314. [CrossRef]
L. Báez-Duarte, A strengthening of the Nyman–Beurling criterion for the Riemann Hypothesis, Atti Accad. Naz. Lincei 14(1) (2003), 5–11.
L. Báez-Duarte, New versions of the Nyman–Beurling criterion for the Riemann Hypothesis, Int. J. Math. Math. Sci. 31 (2002), 387–406. [CrossRef]
J.-F. Burnol, A note on Nyman’s equivalent formulation of the Riemann Hypothesis, Contemp. Math. 287 (2001), 23–26.
X.-J. Li, The positivity of a sequence of numbers and the Riemann Hypothesis, J. Number Theory 65 (1997), 325–333. [CrossRef]
H. Iwaniec and E. Kowalski, Analytic Number Theory, Amer. Math. Soc., Providence, 2004.
R. E. Kalman and R. S. Bucy, New results in linear filtering and prediction theory, J. Basic Eng. 83 (1961), 95–108. [CrossRef]
M. V. Berry and J. P. Keating, The Riemann zeros and eigenvalue asymptotics, SIAM Rev. 41 (1999), 236–266. [CrossRef]
A. Connes, Trace formula in noncommutative geometry and the zeros of the Riemann zeta function, Selecta Math. 5 (1999), 29–106. [CrossRef]
G. Robin, Grandes valeurs de la fonction somme des diviseurs et hypothèse de Riemann, J. Math. Pures Appl. 63 (1984), 187–213.
C. Delaunay, E. Fricain, E. Mosaki, and O. Robert, Zero-free regions for Dirichlet series (II), Constr. Approx. 44 (2016), 183–210.
E. J. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inf. Theory 52(2) (2006), 489–509. [CrossRef]
G. H. Golub and C. F. Van Loan, Matrix Computations, 4th ed., Johns Hopkins University Press, 2013.
R. S. Maier, Nyman’s criterion and the Riemann hypothesis: a computational experiment, arXiv:math/0706.0718, 2007.
W. Rudin, Real and Complex Analysis, 3rd ed., McGraw-Hill, 1987. [CrossRef]
P. D. Lax, Functional Analysis, Wiley-Interscience, 2002.
R. M. Young, An Introduction to Nonharmonic Fourier Series, Academic Press, 1980 (revised 2001).
creelie, Báez–Duarte Kalman Filtered Approximation, GitHub repository, 2025. https://github.com/creelie/baez-duarte-kalman.

Figure 1. Eigenvalue spectrum of

{\tilde{G}}_{50}

(blue dots) vs. the

C / j^{2}

decay reference (red dashed). The rapid decay explains the

Θ (M^{2})

condition number and necessitates SVD truncation.

Figure 1. Eigenvalue spectrum of

{\tilde{G}}_{50}

(blue dots) vs. the

C / j^{2}

decay reference (red dashed). The rapid decay explains the

Θ (M^{2})

condition number and necessitates SVD truncation.

Table 1. Spectral properties of the empirical Gram matrix

{\tilde{G}}_{M}

with

N = 10^{4}

quadrature points.

Table 1. Spectral properties of the empirical Gram matrix

{\tilde{G}}_{M}

with

N = 10^{4}

quadrature points.

M	$λ_{max}$	$λ_{min}$	$κ ({\tilde{G}}_{M})$	Numerically stable?
5	$4.21 \times 10^{- 2}$	$3.12 \times 10^{- 4}$	$1.35 \times 10^{2}$	Yes
10	$4.18 \times 10^{- 2}$	$7.94 \times 10^{- 5}$	$5.27 \times 10^{2}$	Yes
20	$4.12 \times 10^{- 2}$	$2.01 \times 10^{- 5}$	$2.05 \times 10^{3}$	Marginal
30	$4.09 \times 10^{- 2}$	$8.96 \times 10^{- 6}$	$4.57 \times 10^{3}$	Marginal
50	$4.04 \times 10^{- 2}$	$3.31 \times 10^{- 6}$	$1.22 \times 10^{4}$	No
100	$3.97 \times 10^{- 2}$	$8.24 \times 10^{- 7}$	$4.82 \times 10^{4}$	No
200	$3.91 \times 10^{- 2}$	$2.06 \times 10^{- 7}$	$1.90 \times 10^{5}$	No

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Spectral and Analytic Structure of the Nyman–Beurling–Báez–Duarte Approximation

Abstract

Keywords:

Subject:

1. Introduction

1.1. The Riemann Hypothesis and Its Context

1.2. Equivalent Reformulations of RH

1.3. The Nyman–Beurling Framework: Background and Prior Work

1.4. The Central Problem: Structural Degeneracy

1.5. Contributions and Outline

2. Background and Functional-Analytic Framework

2.1. The Riemann Zeta Function

2.2. Hilbert Spaces and Projection Theory

2.3. Hardy Spaces and the Mellin Transform

2.4. The Nyman–Beurling Criterion

2.5. The Báez–Duarte Formulation

2.6. Burnol’s Projection Formula

3. Gram Matrix Analysis: The Rank-One Collapse

3.1. Exact Inner Products for Integer Dilates

3.2. The Rank-One Gram Matrix: Structural Theorem

3.3. Collapse of the Least-Squares Problem

3.4. The Closed-Form Distance for Integer Dilates

4. Hilbert-Space Projection Theory

4.1. Abstract Framework

4.2. Geometric Interpretation

4.3. Why the Formula of Earlier Versions Was Incorrect

5. The Correct Computational Framework

5.1. Why Integer Dilates on ( 0 , 1 ) Degenerate

5.2. Linear Independence of the Sawtooth Basis

5.3. Inner Products of the Correct Basis

6. Spectral Analysis of the Gram Matrix

6.1. Compact Operator Theory and Eigenvalue Decay

6.2. Condition Number Growth

6.3. Forward Error Bounds

6.4. Operator Stability: Continuous vs. Empirical Gram Matrix

7. Numerically Stable Computation

7.1. Reformulation as Least Squares

7.2. Truncated SVD Algorithm

7.3. Effect of Quadrature Error

7.4. Complete Python Implementation

8. Kalman Filtration of the Distance Sequence

8.1. Motivation

8.2. State-Space Model for { d M }

8.3. Kalman Filtration Recursion

8.4. Closed-Form Weighted-Average Representation

8.5. Convergence Preservation

8.6. Smoothing Error Bound

8.7. Variance Reduction

8.8. A General Theorem on Exponentially Weighted Estimators

8.9. Steady-State Analysis and Parameter Selection

9. Mellin Transform, Hardy Spaces, and the Analytic Structure of d M

9.1. The Mellin Transform as an Isometry

9.2. Hardy-Space Formulation of the Approximation Problem

9.3. The Main Analytic Theorem

9.4. A Key Identity: Mellin Inner Products and Dirichlet Series

9.5. Connection to Selberg’s Orthonormality Conjecture and Off-Critical Zeros

9.6. Analytic Continuation and the Role of the Functional Equation

9.7. Comparison with the Báez–Duarte Numerical Approach

10. Exact Gram Matrix Formula via Bernoulli Polynomials

10.1. The Bernoulli Polynomial Representation

10.2. Consequences of the Exact Gram Formula

11. Hardy-Space Bounds and Zero-Free Region Implications

11.1. Pointwise Inequality on the Critical Line

11.2. Lower Bound via Zero-Density Estimates

12. Kalman Filtration: Rigorous Operator-Stability Theory

12.1. The Noisy Observation Model

13. Möbius Sparsity of the Optimal Coefficients

13.1. Optimality Conditions and Dirichlet Series

13.2. Rate of Convergence of Coefficients and Arithmetic Bounds

13.3. The Hidden Reproducing Kernel Structure

13.4. Spectral Theory of the Gram Kernel Operator

14. Connection with Li’s Criterion

14.1. Li’s Positivity Criterion

14.2. Structural Parallel Between d M and λ n

14.3. Quantitative Comparison

14.4. A Speculative Connection

15. Compressed Sensing and Dictionary Approximation

15.1. Dictionary Framework

15.2. Coherence and LASSO

16. The Hilbert–Pólya Operator Viewpoint

5.1. Why Integer Dilates on $(0, 1)$ Degenerate

8.2. State-Space Model for ${d_{M}}$

9. Mellin Transform, Hardy Spaces, and the Analytic Structure of $d_{M}$

14.2. Structural Parallel Between $d_{M}$ and $λ_{n}$