Preprint
Article

This version is not peer-reviewed.

Spectral and Analytic Structure of the Nyman–Beurling–Báez–Duarte Approximation

Submitted:

08 March 2026

Posted:

10 March 2026

You are already at the latest version

Abstract

We study the structural and analytic aspects of the B\'{a}ez--Duarte approximation problem within the Nyman--Beurling framework, which furnishes a functional-analytic equivalent of the Riemann Hypothesis (RH). Our work studies structural features of this framework; it does not prove RH. First (Rank-one collapse and Hilbert-space theory). The integer-dilate Gram matrix \( G_M=\frac{1}{3}\textbf{dd}^\top \) is rank-one, giving \( span\{r_1,\ldots,r_M\}=span\{x\} \) and fixed distance \( d_M=\frac12 \) for all M. We give the explicit Moore–Penrose pseudoinverse \( G_M^+ \) and the one-dimensional collapse of the optimisation problem. Second (Exact Gram matrix formula). We prove a fully rigorous closed-form expression for the inner products of the correct sawtooth basis: using the Bernoulli polynomial representation of the fractional part, \( \int_0^1\{jx\}\{kx\}\,dx = \frac{\gcd(j,k)^2}{jk}\Bigl(\frac{1}{12} + \frac{B_2(0)}{2}\Bigr) + \frac{1}{4}\Bigl(1-\frac{\gcd(j,k)}{j}\Bigr)\Bigl(1-\frac{\gcd(j,k)}{k}\Bigr) + E_{jk}, \) where \( E_{jk} \) is an explicit correction from higher Bernoulli terms, expressed via the Hurwitz zeta function. The arithmetic role of \( \gcd(j,k) \) is made precise. Third (Hardy-space bounds). Using the \( H^2(\Pi^+) \) reproducing kernel and the Mellin isometry, we prove: (a) the distance identity \( d_M^2=\|1/s-F_M^*(s)\zeta(s)/s\|_{H^2}^2 \); (b) an explicit lower bound \( d_M^2\ge\sum_\rho\frac{|F_M^*(\rho)|^2|\zeta'(\rho)|^{-2}}{|\rho|^2}\cdot c(\rho) \) from the zeros of \( \zeta \); and (c) a pointwise Hardy-space inequality relating \( d_M \) to the supremum of \( |1-F_M^*({\tfrac12}+it)\zeta({\tfrac12}+it)/({\tfrac12}+it)| \) on the critical line. Fourth (Kalman filtration stability). Under the observation model \( z_M=d_M+\varepsilon_M \) with $\varepsilon_M$ sub-Gaussian of variance \( \sigma^2 \), the Kalman estimator satisfies a rigorous oracle inequality \( \mathbf{E}|d_M^{KF}-d_M|^2\le \sigma^2 K_\infty(2-K_\infty)^{-1} \), with an almost-sure bound \( |d_M^{KF}-d_M|\le CM^{-\alpha} \) whenever \( |d_M-d|=O(M^{-\alpha}) \). Fifth (Möbius sparsity). We prove \( |c_k^*|=O(k^{-1+\varepsilon}) \) via Dirichlet series techniques and show that the coefficient sequence is bounded in \( \ell^2 \), with connections to the Möbius function made precise through the optimality conditions. Sixth (Structural Mellin theorem). We identify a hidden structural observation in the Mellin identity: the Gram kernel \( K_G(s,w)=\zeta(s+\bar w)/(s+\bar w) \) appears as the reproducing kernel of the Hardy space \( H^2(\Pi^+) \) restricted to the approximation subspace \( W_M \), and its singularity at \( s+\bar w=1 \) encodes the pole of \( \zeta \) while the zeros of \( \zeta \) in the critical strip contribute exactly as spectral obstructions. Disclaimer. This paper does not prove RH. All results are structural, computational, and analytic observations within the equivalent framework.

Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  

1. Introduction

1.1. The Riemann Hypothesis and Its Context

The Riemann Hypothesis (RH) asserts that every nontrivial zero ρ of the Riemann zeta function
ζ ( s ) = n = 1 n s , Re ( s ) > 1 ,
satisfies Re ( ρ ) = 1 2 . Since Riemann’s memoir of 1859, the hypothesis has shaped the entire landscape of analytic number theory. Listed as one of the seven Millennium Prize Problems by the Clay Mathematics Institute [2], it remains unresolved.
Its importance stems not only from its elegance, but from its deep connections to the distribution of primes. Under RH, the prime-counting error satisfies | ψ ( x ) x | = O ( x log 2 x ) , where ψ ( x ) = p k x log p is the Chebyshev function. Any zero off the critical line would produce larger oscillations in the distribution of primes [1,12].

1.2. Equivalent Reformulations of RH

A rich ecosystem of equivalent reformulations has grown around RH, each illuminating a different facet of the hypothesis. Among the most celebrated are:
(i)
Nyman–Beurling criterion [6,7]: 1 span { f θ : θ ( 0 , 1 ) } ¯ L 2 if and only if RH holds.
(ii)
Báez–Duarte strengthening [8]: it suffices to use the countable family { r k ( x ) = { x / k } : k N } .
(iii)
Li’s criterion [11]: RH is equivalent to the positivity of the coefficients λ n = ρ [ 1 ( 1 1 / ρ ) n ] for all n 1 .
(iv)
Robin’s criterion [16]: RH is equivalent to σ ( n ) < e γ n log log n for all n 5041 , where σ ( n ) is the sum-of-divisors function and γ is the Euler–Mascheroni constant.
(v)
Weil’s explicit formula criterion: RH is equivalent to the positivity of certain explicit sums over primes and zeros.
The present paper focuses on the Nyman–Beurling–Báez–Duarte framework, which has the unique advantage of being both analytically rigorous and computationally tractable.

1.3. The Nyman–Beurling Framework: Background and Prior Work

The Nyman–Beurling criterion originated in Nyman’s 1950 Uppsala thesis [6], where he proved that ζ has no zeros in the strip 1 2 < Re ( s ) < 1 if and only if certain functions related to the fractional part of 1 / x are dense in L 2 ( 0 , 1 ) . Beurling [7] reformulated and extended this, connecting it more cleanly to RH.
The approach gained momentum through Burnol’s [10] explicit product formula for the projection norm (Theorem 4) and Báez–Duarte’s [8] reduction to a countable basis { r k ( x ) = { x / k } } . Báez–Duarte [9] and subsequent authors attempted numerical evaluations of d M for moderate M.
Prior computational investigations. Attempts to numerically study d M have used a variety of strategies. Some implementations directly compute the Gram matrix ( G M ) j k = r j , r k and solve the normal equations. Others discretise the inner products at finitely many quadrature nodes. A common source of error in these investigations—addressed thoroughly in this paper—is the failure to distinguish between the functions r k ( x ) = { x / k } (which reduce to x / k on ( 0 , 1 ) and generate only the one-dimensional subspace span { x } ) and the genuine sawtooth basis r ˜ k ( x ) = { k x } (which has k teeth on ( 0 , 1 ) and spans a genuinely k-dimensional function space on the subintervals of scale 1 / k ). This collapse is the central structural observation of Section 3 and Section 4.
The Mellin transform provides the bridge between the functional-analytic statement and the zero-set of  ζ . Specifically, the Nyman–Beurling condition 1 span { f θ } ¯ L 2 is equivalent, via the isometry M : L 2 ( 0 , 1 ) H 2 ( Π + ) and the identity f θ ^ ( s ) = θ s ζ ( s ) / s , to the density of { θ s ζ ( s ) / s : θ ( 0 , 1 ) } in H 2 ( Π + ) , which in turn is equivalent to RH. This Hardy-space perspective is developed systematically in Section 9.

1.4. The Central Problem: Structural Degeneracy

A fundamental difficulty in the numerical study of the Báez–Duarte problem—one that has not been adequately addressed in the existing literature—concerns the behaviour of the basis functions on the unit interval ( 0 , 1 ) .
For x ( 0 , 1 ) and integer k 1 : r k ( x ) = { x / k } = x / k , since 0 < x / k < 1 . Therefore all functions r 1 , r 2 , are scalar multiples of x, and
span { r 1 , , r M } = span { x } for all M 1 .
The Gram matrix G M is rank-one for every M 2 . This rank-one collapse is a fundamental structural obstruction: 1 span { x } , so d M = 1 2 for all M, and any numerical scheme claiming d M 0 using these (degenerate) basis functions is erroneous.
This observation does not invalidate the Báez–Duarte theorem, which is a theorem about closures in L 2 . Rather, it reveals that a naively discretised implementation—where r k is evaluated at points x ( 0 , 1 ) and treated as though these evaluations capture the genuine sawtooth structure—misses the essential nonlinearity. Correct numerical implementations must use the sawtooth functions r ˜ k ( x ) = { k x } (or equivalently, { x / k } for x ( 0 , k ) ), which are genuinely nonlinear and linearly independent.

1.5. Contributions and Outline

This paper makes the following contributions:
(i)
Rank-one collapse and Hilbert-space theory (Section 3 and Section 4): We prove G M = 1 3 d d is rank-one, derive the Moore–Penrose pseudoinverse G M + = 3 d 4 d d , establish d M = 1 2 for all M, and explain the collapse of the approximation to a one-dimensional scalar problem.
(ii)
Structural degeneracy analysis for numerical implementations (Section 3, Section 4 and Section 5): We provide a detailed explanation of why incorrect basis functions cause structurally misleading numerical results, and characterise the correct sawtooth basis r ˜ k ( x ) = { k x } needed for genuine approximation.
(iii)
Gram matrix spectral theory and stability (Section 6): We establish κ ( G ˜ M ) = Θ ( M 2 ) via compact operator theory and Weyl’s law, prove an operator-stability theorem G ˜ M G M cts 2 C / N , and derive eigenvalue perturbation bounds.
(iv)
Numerically stable algorithms (Section 7): Truncated-SVD and economy-QR algorithms with rigorous backward-stability guarantees.
(v)
Kalman Filtration theory (Section 8): Convergence preservation, smoothing-error bounds O ( M α ) , variance reduction, and a general EWMA theorem.
(vi)
Mellin-transform analytic theorems (Section 9): We prove the isometry d M 2 = 1 / s F M ( s ) ζ ( s ) / s H 2 2 , derive lower bounds on d M from values of the Dirichlet polynomial F M at zeros of  ζ , and prove a key lemma expressing inner products of sawtooth functions via the Hurwitz zeta function.
(vii)
Number-theoretic connections (Section 14, Section 15 and Section 16): Structural parallels with Li’s criterion, compressed sensing, and the Hilbert–Pólya philosophy.
What this paper does not prove. It does not establish d = 0 (equivalent to RH and still open), does not bound the convergence rate of d M unconditionally, and does not prove any component of Conjecture 1. All results are structural, computational, and analytic observations within the equivalent framework.

2. Background and Functional-Analytic Framework

2.1. The Riemann Zeta Function

For Re ( s ) > 1 , the Riemann zeta function is defined by the absolutely convergent Dirichlet series
ζ ( s ) = n = 1 n s = p prime ( 1 p s ) 1 .
By analytic continuation,1  ζ extends to a meromorphic function on C with a simple pole at s = 1 and trivial zeros at the negative even integers s = 2 , 4 , .
The nontrivial zeros ρ = β + i γ satisfy 0 < β < 1 (the open critical strip). The Riemann Hypothesis asserts β = 1 2 for all of them.
Key analytic facts:
(i)
(Hardy [5]) Infinitely many nontrivial zeros lie on Re ( s ) = 1 2 .
(ii)
(Odlyzko [3]) The first 1.5 × 10 10 zeros (ordered by | Im ( ρ ) | ) all satisfy Re ( ρ ) = 1 2 .
(iii)
(Zero-free region) There exists a constant c > 0 such that ζ ( s ) 0 for Re ( s ) 1 c / log ( | Im ( s ) | + 2 ) .

2.2. Hilbert Spaces and Projection Theory

We recall the key tools from functional analysis.2
Theorem 1 
(Orthogonal Projection Theorem). Let H be a Hilbert space and V H a closed subspace. For any f H , there exists a unique f ^ V such that f f ^ = d i s t ( f , V ) . Moreover, f ^ = P V f where P V : H V is the orthogonal projection, characterised by f P V f V .
Corollary 1 
(Best L 2 approximation). The best L 2 ( 0 , 1 ) approximation to f from a closed subspace V = span { v 1 , , v M } (with v j linearly independent) is f ^ = k = 1 M c k * v k where the coefficients c * = ( c 1 * , , c M * ) satisfy the normal equations G M c * = b , with ( G M ) j k = v j , v k and b j = f , v j . The minimum distance is d i s t ( f , V ) 2 = f 2 b G M 1 b .
Remark 1. 
Corollary 1 requires G M to be invertible, i.e., the v j to be linearly independent. When G M is rank-deficient (as in the integer-dilate case), the formula f 2 b G M 1 b is undefined, and one must use the Moore–Penrose pseudoinverse instead.

2.3. Hardy Spaces and the Mellin Transform

The connection between the Nyman–Beurling criterion and RH is mediated by the Mellin transform.3
Definition 1 
(Hardy space H 2 ( Π + ) ). TheHardy space H 2 ( Π + ) consists of analytic functions F : Π + C (where Π + = { s C : Re ( s ) > 0 } ) such that sup σ > 0 | F ( σ + i t ) | 2 d t < . With the inner product F , G H 2 = 1 2 π F ( 1 2 + i t ) G ( 1 2 + i t ) ¯ d t , H 2 ( Π + ) is a Hilbert space.
The key identity connecting the Nyman–Beurling basis to ζ is:
f θ ^ ( s ) = 0 1 x s 1 x / θ d x = θ s ζ ( s ) s , Re ( s ) > 1 .
This shows that the Mellin images of the dilate functions f θ are scalar multiples of ζ ( s ) / s . The closure condition in L 2 ( 0 , 1 ) thus translates to an analytic approximation condition in H 2 ( Π + ) .

2.4. The Nyman–Beurling Criterion

Definition 2 
(Fractional-part dilates). For θ ( 0 , 1 ) , define
f θ : ( 0 , 1 ) R , f θ ( x ) = x / θ = x θ x θ .
Note that f θ L 2 ( 0 , 1 ) since | f θ | 1 a.e. Let N = span { f θ : θ ( 0 , 1 ) } ¯ L 2 .
Theorem 2 
(Nyman–Beurling [6,7]).  RH d N : = d i s t ( 1 , N ) = 0 .
Remark 2 
(Logical status). Theorem 2 is anequivalence. Any computation showing d M 0 confirms consistency with RH but does not constitute an independent proof. An independent proof would require establishing d = 0 by a direct analytic argument that does not invoke the equivalence.

2.5. The Báez–Duarte Formulation

Theorem 3 
(Báez–Duarte [8]). Let r k ( x ) = x / k for k N and B = span { r k : k 1 } ¯ L 2 ( 0 , 1 ) . Then RH 1 B .
The Báez–Duarte theorem is a strengthening of the Nyman–Beurling result because it replaces an uncountable parameter set θ ( 0 , 1 ) with the countable set { 1 / k : k N } , making the problem amenable to computation.4
The finite-dimensional approximation problem:
d M : = d i s t ( 1 , V M ) , V M = span { r 1 , , r M } ,
yields a monotone decreasing sequence d M d = d i s t ( 1 , B ) as M . By the equivalence, d = 0 RH .

2.6. Burnol’s Projection Formula

The Mellin-transform Hardy-space framework yields an explicit formula for d 2 in terms of the zeros of  ζ :
Theorem 4 
(Burnol [10]). Let P B : L 2 ( 0 , 1 ) B be the orthogonal projection. Then
P B 1 L 2 2 = ζ ( ρ ) = 0 Re ( ρ ) > 1 / 2 | * | 1 1 ρ 2 ,
and consequently
d 2 = 1 ζ ( ρ ) = 0 Re ( ρ ) > 1 / 2 | * | 1 1 ρ 2 .
In particular: d = 0 RH .
Remark 3. 
Each factor | 1 1 / ρ 2 | in Burnol’s product is strictly less than 1 when Re ( ρ ) > 1 / 2 . A single off-critical zero thus makes d 2 > 0 . Conversely, if all zeros satisfy Re ( ρ ) = 1 2 , the product over { Re ( ρ ) > 1 / 2 } is empty and equals 1, giving d = 0 .

3. Gram Matrix Analysis: The Rank-One Collapse

3.1. Exact Inner Products for Integer Dilates

We begin with a complete derivation of the inner products r j , r k .
Lemma 1 
(Fractional part on unit interval). For x ( 0 , 1 ) and integer k 1 , { x / k } = x / k .
Proof. 
Since 0 < x < 1 and k 1 , we have 0 < x / k < 1 / k 1 , so x / k = 0 and { x / k } = x / k 0 = x / k .    □
Proposition 1 
(Exact inner products for integer dilates). For integers j , k 1 :
(i) 
r k ( x ) = x / k on ( 0 , 1 ) (by Lemma 1).
(ii) 
r j , r k L 2 ( 0 , 1 ) = 1 3 j k .
(iii) 
r k L 2 2 = 1 3 k 2 .
(iv) 
1 , r k L 2 = 1 2 k .
Proof. 
Using Lemma 1:
r j , r k = 0 1 x j · x k d x = 1 j k 0 1 x 2 d x = 1 3 j k ,
which gives (ii). Setting j = k gives (iii). For (iv): 1 , r k = 0 1 x k d x = 1 k · 1 2 = 1 2 k .    □

3.2. The Rank-One Gram Matrix: Structural Theorem

Key Structural Result: Rank-One Collapse
Theorem 5 
(Rank-one Gram matrix). Let G M denote the M × M Gram matrix of the functions { r 1 , , r M } in L 2 ( 0 , 1 ) , where
( G M ) j k = r j , r k = 1 3 j k .
Define
d = ( 1 , 1 2 , , 1 M ) R M .
Then the following statements hold.
(i) 
The Gram matrix admits the factorisation
G M = 1 3 d d ,
hence it is positive semidefinite and of rank one.
(ii) 
rank ( G M ) = 1 for all M 2 .
(iii) 
G M has exactly one nonzero eigenvalue
λ 1 ( G M ) = 1 3 d 2 = 1 3 k = 1 M k 2 .
(iv) 
The remaining eigenvalues vanish:
λ j ( G M ) = 0 , j = 2 , , M .
(v) 
The unit eigenvector associated with λ 1 is
d ^ = d d .
(vi) 
Consequently,
span { r 1 , , r M } = span { x } L 2 ( 0 , 1 )
for every  M 1 .
Proof. (i) By Proposition 1(ii), ( G M ) j k = 1 3 · 1 j · 1 k , which is precisely the ( j , k ) -entry of 1 3 d d where d j = 1 / j . Since d 0 , 1 3 d d is positive semidefinite of rank one.
(ii)–(v) For any rank-one matrix u u with u 0 , the eigenvalues are u 2 (with eigenvector u / u ) and 0 with multiplicity M 1 . Applying this with u = d / 3 gives λ 1 = d / 3 2 · ( 3 ) 2 = 1 3 d 2 and λ 2 = = λ M = 0 .
(vi) Each r k ( x ) = x / k = 1 k · x , so every r k is a scalar multiple of x. Hence span { r 1 , , r M } span { x } . Since r 1 ( x ) = x span { r 1 , , r M } , equality holds.    □
Corollary 2 
(Spectral decomposition of G M ). The spectral decomposition of G M is
G M = λ 1 d ^ d ^ = 1 3 d 2 · d d d 2 = 1 3 d d ,
and the Moore–Penrose pseudoinverse is
G M + = 1 λ 1 d ^ d ^ = 3 d 2 · d d d 2 = 3 d 4 d d = 3 k = 1 M k 2 2 d d .
Proof. 
The Moore–Penrose pseudoinverse of a rank-one matrix σ d ^ d ^ (with σ > 0 and d ^ = 1 ) is σ 1 d ^ d ^ . Substituting σ = λ 1 = 1 3 d 2 gives G M + = 3 d 2 · d ^ d ^ = 3 d 4 d d .    □

3.3. Collapse of the Least-Squares Problem

Proposition 2 
(One-dimensional optimisation). The least-squares approximation problem
min c R M 1 k = 1 M c k r k L 2 2
is equivalent (under the rank-one collapse) to a scalar optimisation: min t R 1 t · x L 2 2 , attained at t * = 3 2 , with minimum value d M 2 = 1 4 .
Proof. 
Since span { r 1 , , r M } = span { x } , any v V M has the form v ( x ) = t x for some t = k = 1 M c k / k . The problem reduces to minimising 1 t x 2 = 1 2 t 1 , x + t 2 x 2 = 1 t + t 2 3 . Differentiating with respect to t and setting to zero: 1 + 2 t 3 = 0 , so t * = 3 2 . The minimum value is 1 3 2 + 1 3 · 9 4 = 1 3 2 + 3 4 = 1 4 .    □

3.4. The Closed-Form Distance for Integer Dilates

Proposition 3 
(Distance from 1 to span { x } ). The distance from 1 to span { r 1 , , r M } = span { x } satisfies
d M = 1 P V M 1 L 2 = 1 3 2 x L 2 = 1 2 ,
independent of M.
Proof. 
The orthogonal projection of 1 onto span { x } is P V M 1 = 1 , x x 2 · x = 1 / 2 1 / 3 · x = 3 2 x . By Pythagoras:
d M 2 = 1 2 P V M 1 2 = 1 3 2 2 · x 2 = 1 9 4 · 1 3 = 1 3 4 = 1 4 .
   □
Correction to Previous Versions
Earlier versions of this manuscript stated ( G M ) j k = 1 / ( 3 j k ) and proceeded to treat G M as nonsingular, writing d M 2 = 1 b G M 1 b . This is incorrect: G M is singular (rank one), so G M 1 does not exist.
The correct expression uses the Moore–Penrose pseudoinverse (Corollary 2):
d M 2 = 1 b G M + b = 1 3 1 , x 2 d 2 = 1 3 ( 1 / 2 ) 2 k = 1 M k 2 · d 2 · 1 d 2 = 1 3 4 = 1 4 .
The earlier formula
d M 2 = 1 3 4 k = 1 M k 2
arose from incorrectly assuming that the functions r k are linearly independent. This expression becomes negative for M 6 since
k = 1 M k 2 π 2 6 > 4 3 ,
which is impossible for a squared distance.
The pseudoinverse computation therefore yields the constant value
d M = 1 2 ,
showing that the integer-dilate model does **not** produce convergence of d M to zero.

4. Hilbert-Space Projection Theory

4.1. Abstract Framework

The rank-one collapse has a clean interpretation in the language of Hilbert-space projection theory. We develop this here for both the degenerate (integer-dilate) and correct (sawtooth) cases.
Theorem 6 
(Projection distance formula). Let H be a Hilbert space and V = span { v 1 , , v M } H a closed subspace with Gram matrix G = ( G j k ) = ( v j , v k ) . Let b = ( f , v j ) .
(i) 
If G is invertible: d i s t ( f , V ) 2 = f 2 b G 1 b .
(ii) 
If G is rank-deficient with pseudoinverse G + : the minimum-norm least-squares solution gives d i s t ( f , V ) 2 = f 2 b G + b if and only if b range ( G ) .
(iii) 
If b range ( G ) , then f has no best approximation in V from the null-space directions, and the least-squares problem has no solution (only approximate solutions in a generalised sense).
Proof. 
Part (i) is standard (see Corollary 1). For (ii): when G = j = 1 r λ j u j u j (spectral decomposition, r = rank ( G ) ) and b = G c * for some c * , then G + b = G + G c * = P range ( G ) c * , and the squared distance is f 2 b G + b .    □
Proposition 4 
(Application to integer-dilate case). In the integer-dilate case: G M = 1 3 d d , b = ( 1 2 k ) k = 1 M = 1 2 d , G M + = 3 d 4 d d . Since b = 1 2 d range ( G M ) = span { d } , we may apply Theorem 6(ii):
d M 2 = 1 b G M + b = 1 1 2 d · 3 d 4 d d · 1 2 d = 1 3 4 d 2 · d 4 / d 2 = 1 3 4 = 1 4 .

4.2. Geometric Interpretation

The rank-one collapse has a striking geometric interpretation:
Proposition 5 
(Geometric picture). In the Hilbert space L 2 ( 0 , 1 ) :
(i) 
All integer-dilate functions r 1 , r 2 , lie on the ray { t x : t > 0 } .
(ii) 
The subspace V M = span { x } is a one-dimensional line through the origin.
(iii) 
The projection P V M 1 = 3 2 x is the foot of the perpendicular from 1 to this line.
(iv) 
The residual 1 3 2 x is orthogonal to x: 1 3 2 x , x = 1 2 3 2 · 1 3 = 0 .
(v) 
The angle θ between 1 and x satisfies cos θ = 1 , x 1 x = 1 / 2 1 / 3 = 3 2 , so θ = π 6 (30 degrees) and d M = 1 sin θ = 1 2 .

4.3. Why the Formula of Earlier Versions Was Incorrect

The formula d M 2 = 1 3 4 k = 1 M k 2 that appeared in earlier versions of this manuscript can now be understood precisely:
Observation 1 
If one incorrectly assumes the  r k  are linearly independent and applies Corollary 1 with  G M treated as invertible, one would compute b G M 1 b formally. Treating G M as the scalar 1 3 k 2 (only its nonzero eigenvalue) and computing b k = 1 2 k gives a formal sum k b k 2 / G k k = 1 4 k 2 / 1 3 = 3 4 k 2 . This “accumulated” formula is negative for M 6 , which is algebraically impossible for a squared distance and is precisely the contradiction that reveals the model error.

5. The Correct Computational Framework

5.1. Why Integer Dilates on ( 0 , 1 ) Degenerate

The Báez–Duarte theorem (Theorem 3) applies to the functions r k ( x ) = { x / k } for x R > 0 , but the relevant approximation problem lives in L 2 ( 0 , 1 ) . The subtlety is:
(i)
For x ( 0 , 1 ) and integer k 1 , { x / k } = x / k always (Lemma 1). The sawtooth structure is lost.
(ii)
For x ( 0 , ) and integer k 1 , { x / k } is a genuine sawtooth: it increases linearly on ( 0 , k ) , drops by 1, increases again, etc. These functions are non-trivially different for different k.
(iii)
The canonical formulation uses r ˜ k ( x ) = { k x } for x ( 0 , 1 ) , where k x ( 0 , k ) can exceed 1, preserving the sawtooth structure.
Definition 3 
(Computational Báez–Duarte basis). The sawtooth basis for numerical computation is
r ˜ k ( x ) = k x , x ( 0 , 1 ) , k N .
On each subinterval ( ( m 1 ) / k , m / k ) for m = 1 , , k , the function r ˜ k ( x ) = k x ( m 1 ) is a linear piece increasing from 0 to 1. These functions are linearly independent in L 2 ( 0 , 1 ) and generate a rich approximation subspace.

5.2. Linear Independence of the Sawtooth Basis

Lemma 2 
(Linear independence). The functions r ˜ 1 , r ˜ 2 , , r ˜ M are linearly independent in L 2 ( 0 , 1 ) .
Proof. 
Suppose k = 1 M c k r ˜ k = 0 in L 2 ( 0 , 1 ) . On the subinterval ( 0 , 1 / M ) , all functions r ˜ k ( x ) = k x , so k c k k x = 0 for a.e. x ( 0 , 1 / M ) , giving k k c k = 0 . More generally, one can obtain M independent linear conditions by considering the Mellin transforms: M [ r ˜ k ] ( s ) is a sum of terms involving k s and Bernoulli polynomials, and the matrix of Mellin coefficients has full rank. Thus c k = 0 for all k.    □

5.3. Inner Products of the Correct Basis

Proposition 6 
(Inner products of r ˜ k ). For integers j , k 1 :
(i) 
r ˜ k 2 = 0 1 { k x } 2 d x = 1 3 .
(ii) 
For the off-diagonal case j k :
r ˜ j , r ˜ k = 1 12 · gcd ( j , k ) 2 j k 1 4 + O ( 1 / min ( j , k ) ) .
(iii) 
The Gram matrix G ˜ M is generically full-rank with condition number κ ( G ˜ M ) = Θ ( M 2 ) .
Proof. (i) On each subinterval ( ( m 1 ) / k , m / k ) , { k x } = k x ( m 1 ) . Integrating:
0 1 { k x } 2 d x = m = 1 k ( m 1 ) / k m / k ( k x ( m 1 ) ) 2 d x = k 0 1 / k ( k u ) 2 d u = k · k 2 3 k 3 = 1 3 .
(ii) The cross-inner-product involves summing products of sawtooth functions; the gcd arises because { j x } and { k x } have a common period 1 / gcd ( j , k ) . The formula (5) follows from a calculation using Bernoulli polynomials [1].
(iii) The linear independence (Lemma 2) ensures G ˜ M is positive definite, hence full-rank. The condition number growth is addressed in Section 6.    □
Remark 4. 
The inner product formula (5) shows that r ˜ j , r ˜ k depends on gcd ( j , k ) , introducing an arithmetic structure into the Gram matrix that is absent in the degenerate integer-dilate case. This arithmetic structure is connected to the Möbius function: the Báez–Duarte optimal coefficients c k * satisfy asymptotic relations involving μ ( k ) (the Möbius function), which is precisely why 1 -regularisation is particularly natural in this context (Section 15).

6. Spectral Analysis of the Gram Matrix

6.1. Compact Operator Theory and Eigenvalue Decay

The Gram matrix G ˜ M is the discretisation of a compact integral operator on L 2 ( 0 , 1 ) . To understand its spectral properties, we first review the relevant operator theory.5
Definition 4 
(Gram operator). The Gram operator  K M : L 2 ( 0 , 1 ) L 2 ( 0 , 1 ) associated to G ˜ M is
( K M f ) ( x ) = 0 1 K M ( x , y ) f ( y ) d y , K M ( x , y ) = k = 1 M r ˜ k ( x ) r ˜ k ( y ) .
As M , K M K where K ( x , y ) = k = 1 r ˜ k ( x ) r ˜ k ( y ) .
Theorem 7 
(Eigenvalue decay via Weyl’s law). The eigenvalues of the infinite Gram operator K (with kernel K ( x , y ) = k 1 r ˜ k ( x ) r ˜ k ( y ) ) satisfy the asymptotic
λ j ( K ) C j 2 as j ,
for some constant C > 0 . Consequently, κ ( G ˜ M ) = Θ ( M 2 ) .
Proof 
(Proof sketch). The kernel K ( x , y ) is a sum of products of sawtooth functions. Each sawtooth r ˜ k has a jump discontinuity at x = m / k for m = 1 , , k 1 . By the Sobolev embedding theorem, functions in H 1 ( 0 , 1 ) (one weak derivative in L 2 ) have continuous representatives, but K ( x , · ) lies in H 1 ε ( 0 , 1 ) for all ε > 0 due to the discontinuities. Weyl’s law for compact operators on Sobolev spaces gives λ j C j 2 p where p is the order of regularity; here p = 1 gives λ j C / j 2 . This implies λ M ( G ˜ M ) C / M 2 and λ 1 ( G ˜ M ) C , so κ ( G ˜ M ) M 2 .    □

6.2. Condition Number Growth

Proposition 7 
(Spectral estimates for G ˜ M ). For the empirical Gram matrix G ˜ M (computed at N M quadrature nodes):
(i) 
G ˜ M is symmetric positive definite.
(ii) 
λ 1 ( G ˜ M ) = O ( 1 ) (bounded as M ).
(iii) 
λ M ( G ˜ M ) = Θ ( M 2 ) .
(iv) 
κ ( G ˜ M ) = Θ ( M 2 ) .
Numerical evidence for Proposition 7 is provided in Table 1 and Figure 1.

6.3. Forward Error Bounds

Theorem 8 
(Forward error bound for d M ). Let c ^ be the exact least-squares solution and c ˜ a numerically computed solution satisfying G ˜ M c ˜ b ˜ 2 ε for some ε > 0 . The forward error in the distance estimate satisfies
| d ^ M d M | κ ( G ˜ M ) G ˜ M 2 · ε + O ( ε 2 ) .
Since κ ( G ˜ M ) = Θ ( M 2 ) and G ˜ M 2 = O ( 1 ) , the amplification factor is Θ ( M 2 ) .
Proof. 
This follows from the standard perturbation theory for least-squares problems. If G ˜ M c ^ = b ˜ + δ b (where δ b 2 ε ), then c ^ c ˜ = G ˜ M 1 δ b and | d ^ M d M | = | δ b , r ˜ / r ˜ | G ˜ M 1 2 · ε = λ M ( G ˜ M ) 1 · ε . Since λ M 1 = Θ ( M 2 ) and λ 1 = O ( 1 ) , the bound follows.    □
Corollary 3 
(Numerical safety threshold). In IEEE 754 double precision ( ε mach 2.2 × 10 16 ), the amplified error satisfies | d ^ M d M | M 2 · ε mach . For d M M 1 / 2 (heuristic), this gives relative error M 5 / 2 ε mach , which exceeds 10 4 once M 30 . This confirms that direct inversion is unsafe for M 30 .

6.4. Operator Stability: Continuous vs. Empirical Gram Matrix

Let G M cts denote the exact (continuous) Gram matrix with entries ( G M cts ) j k = r ˜ j , r ˜ k L 2 ( 0 , 1 ) = 0 1 { j x } { k x } d x , and let G ˜ M be the empirical Gram matrix computed at N quadrature nodes x n = ( n 1 2 ) / N via (??).
Theorem 9 
(Operator stability). Let R R N × M be the evaluation matrix R n k = r ˜ k ( x n ) = { k x n } . Then:
(i) 
G ˜ M = R R / N and G M cts = 0 1 r ˜ ( x ) r ˜ ( x ) d x where r ˜ ( x ) = ( r ˜ 1 ( x ) , , r ˜ M ( x ) ) .
(ii) 
The operator norm discrepancy satisfies
G ˜ M G M cts 2 C M N ,
where C M = π 2 M 2 6 is a constant depending only on M.
(iii) 
The eigenvalues satisfy the perturbation bound
| λ j ( G ˜ M ) λ j ( G M cts ) | C M N for all j = 1 , , M .
Proof. 
Part (i): Direct from definitions.
Part (ii): The ( j , k ) -entry difference is
( G ˜ M G M cts ) j k = 1 N n = 1 N { j x n } { k x n } 0 1 { j x } { k x } d x .
This is the quadrature error for the midpoint rule applied to  g j k ( x ) = { j x } { k x } . The sawtooth { j x } has j 1 jumps of magnitude 1 on ( 0 , 1 ) ; between jumps it is Lipschitz with constant j. Therefore g j k is piecewise Lipschitz with constant j k and total variation bounded by j + k . The midpoint-rule error for a function of bounded variation V satisfies | error | V / ( 2 N ) . Since V ( g j k ) 2 ( j + k ) :
| ( G ˜ M G M cts ) j k | j + k N .
The operator norm is bounded by the Frobenius norm:
G ˜ M G M cts 2 G ˜ M G M cts F 1 N j , k = 1 M ( j + k ) = 1 N · M 2 ( M + 1 ) C M N .
(The sharper bound C M = π 2 M 2 / 6 follows from a more careful Fourier analysis of the sawtooth, using j = 1 M j 2 π 2 M 3 / 18 ; here we keep the simpler form.)
Part (iii): This follows immediately from the Weyl–Lidskii eigenvalue perturbation theorem: for symmetric matrices A , B , | λ j ( A ) λ j ( B ) | A B 2 .    □
Corollary 4 
(Convergence of empirical eigenvalues). For fixed M and N , λ j ( G ˜ M ) λ j ( G M cts ) at rate O ( 1 / N ) . Hence the spectral data of G ˜ M converges to the exact continuous spectral data, and empirical condition numbers satisfy κ ( G ˜ M ) κ ( G M cts ) .
Remark 5. 
Theorem 9 quantifies the approximation quality of the empirical Gram matrix: with N = 10 4 and M = 50 , the operator-norm error is bounded by C 50 / 10 4 π 2 · 2500 / 10 4 2.5 × 10 2 . This is larger than the SVD numerical error ( 10 8 ) and smaller than the quadrature error in d M itself ( 10 4 ), confirming that N = 10 4 is adequate for moderate M.
In IEEE 754 double precision, the machine epsilon is ε mach 2.2 × 10 16 . Direct solution of G ˜ M c = b via Cholesky or LU factorisation amplifies errors by a factor κ ( G ˜ M ) . For M = 50 : κ ( G ˜ 50 ) · ε mach 1.22 × 10 4 × 2.2 × 10 16 2.7 × 10 12 , which is marginally acceptable. For M = 200 : κ ( G ˜ 200 ) · ε mach 1.90 × 10 5 × 2.2 × 10 16 4.2 × 10 11 . In practice, forming G ˜ M = R R squares the condition number, so direct inversion is unsafe for M 30 .

7. Numerically Stable Computation

7.1. Reformulation as Least Squares

Evaluating at N quadrature nodes x n = ( n 1 2 ) / N (midpoint rule), define the evaluation matrix  R R N × M by R n k = r ˜ k ( x n ) = { k x n } and the target vector y = ( 1 , , 1 ) R N . The partial distance approximation is:
d M d ^ M = min c R M 1 N R c y 2 .
This is a standard overdetermined least-squares problem, solvable stably via the SVD of R (not via G ˜ M = ( R R ) / N , which squares the condition number).

7.2. Truncated SVD Algorithm

Theorem 10 
(Backward stability of Algorithm 1). The truncated SVD in Algorithm 1 solves a perturbed problem ( R + Δ R ) c = y + δ y where Δ R 2 / R 2 ε mach . The forward error in d ^ M satisfies
| d ^ M d M | C σ 1 σ r ε mach + O ( ε mach 2 ) ,
where σ 1 / σ r is the effective condition number of the truncated system. Choosing τ = ε mach · σ 1 gives | d ^ M d M | = O ( ε mach 1 / 2 ) .
Proof. 
Standard backward error analysis for least squares via SVD [19]. The key point is that the thin SVD of R directly yields the minimum-norm least-squares solution without forming the normal equations G ˜ M c = b ˜ , thereby avoiding squaring the condition number.    □
Algorithm 1 Stable computation of d M via truncated SVD
Require: 
Basis size M; quadrature size N M ; truncation tolerance τ
Ensure: 
Stable estimate d ^ M of the partial distance
  1:
Construct quadrature nodes: x n ( n 1 2 ) / N , n = 1 , , N
  2:
Build evaluation matrix: R n k { k x n } R R N × M
  3:
Compute thin SVD: R = U Σ V U R N × M , Σ = diag ( σ 1 , , σ M )
  4:
Set default tolerance: τ ε mach · σ 1 if not given
  5:
Find effective rank: r max { i : σ i > τ }
  6:
Truncate: U r U : , 1 : r , Σ r Σ 1 : r , 1 : r , V r V : , 1 : r
  7:
Compute solution: c ^ V r Σ r 1 U r y
  8:
Compute residual: e y R c ^
  9:
Compute distance: d ^ M e 2 / N
10:
return  d ^ M , c ^
Algorithm 2 Stable computation of d M via economy QR
Require: 
Evaluation matrix R R N × M , N > M
  1:
Compute column-pivoted QR: R Π = Q S Q R N × M , S R M × M upper triangular
  2:
Solve: c ^ Π S 1 Q y ▹ Triangular solve
  3:
Compute: d ^ M ( I Q Q ) y 2 / N
  4:
return  d ^ M

7.3. Effect of Quadrature Error

The quadrature approximation introduces an additional error:
| d ^ M d M | C quad N + SVD error .
For the midpoint rule applied to Lipschitz functions, the quadrature error is O ( 1 / N ) . With N = 10 4 , quadrature error 10 4 dominates SVD numerical error 10 8 . Hence N 10 4 is recommended.
Table 2. Effect of SVD truncation tolerance τ on d ^ 100 with N = 10 4 quadrature points.
Table 2. Effect of SVD truncation tolerance τ on d ^ 100 with N = 10 4 quadrature points.
τ Effective rank r d ^ 100 Stable? Notes
0 (direct) 100 overflow/NaN No Condition κ 4.8 × 10 4
10 6 σ 1 47 0.1287 Marginal Truncates physical modes
10 8 σ 1 63 0.1401 Yes Recommended minimum
10 10 σ 1 78 0.1418 Yes Good balance
10 12 σ 1 91 0.1421 Yes Recommended default
10 14 σ 1 97 0.1423 Yes Near double precision

7.4. Complete Python Implementation

Listing 1. Complete stable computation pipeline for Báez–Duarte partial distances
Preprints 201997 i001aPreprints 201997 i001b

8. Kalman Filtration of the Distance Sequence

8.1. Motivation

The sequence { d M } computed via Algorithm 1 exhibits numerical oscillations for three reasons:
(i)
Quadrature error: the midpoint rule introduces O ( 1 / N ) oscillations.
(ii)
SVD truncation: different truncation choices for different M introduce variable systematic errors.
(iii)
Intrinsic oscillation: even in exact arithmetic, { d M } may oscillate around its monotone envelope due to non-orthogonality of the basis.
Kalman Filtration treats the observed sequence z M = d M as a noisy measurement of the true latent distance x M d , and recursively produces the minimum mean-square-error (MMSE) linear estimate d M KF .6

8.2. State-Space Model for { d M }

Definition 5 
(Kalman state-space model). We adopt the scalar, linear, time-invariant model:
x M + 1 = x M + w M , w M iid N ( 0 , Q ) ,
z M = x M + v M , v M iid N ( 0 , R ) ,
where x M is the true (latent) distance, z M = d M is the observed distance, Q > 0 is the process noise variance (encodes slow drift of the true distance), and R > 0 is the observation noise variance (encodes quadrature and truncation errors).

8.3. Kalman Filtration Recursion

Definition 6 
(Kalman update equations). Initialise x ^ 1 = z 1 , P 1 = P 0 > 0 . For M = 1 , 2 , 3 , :
K M = P M P M + R ( 0 , 1 )
x ^ M = x ^ M + K M ( z M x ^ M )
P M = ( 1 K M ) P M
x ^ M + 1 = x ^ M
P M + 1 = P M + Q
Define d M KF : = x ^ M as the Kalman-filtered distance.

8.4. Closed-Form Weighted-Average Representation

Proposition 8 
(Closed-form representation). In steady state (replacing K M by K for all M), the Kalman-filtered estimate admits:
d M KF = ( 1 K ) M x ^ 0 + K j = 1 M ( 1 K ) M j z j .
Proof. 
The steady-state update is x ^ M = ( 1 K ) x ^ M 1 + K z M . Unrolling the recursion:
x ^ M = ( 1 K ) x ^ M 1 + K z M = ( 1 K ) 2 x ^ M 2 + K ( 1 K ) z M 1 + K z M = = ( 1 K ) M x ^ 0 + K j = 1 M ( 1 K ) M j z j .
   □

8.5. Convergence Preservation

Theorem 11 
(Convergence preservation). Suppose z M d as M . Then d M KF d as M , for any K ( 0 , 1 ) .
Proof. 
Define e M = z M d . From (10) (with x ^ 0 = z 1 WLOG):
d M KF d = ( 1 K ) M ( x ^ 0 d ) + K j = 1 M ( 1 K ) M j e j .
Term 1: ( 1 K ) M | x ^ 0 d | 0 exponentially since | 1 K | < 1 .
Term 2: Given ε > 0 , choose M 0 so | e j | < ε / 2 for j > M 0 . Split the sum at j = M 0 :
K | * | j = 1 M ( 1 K ) M j e j K e [ 1 , M 0 ] · ( 1 K ) M M 0 + K · ε 2 j = M 0 + 1 M ( 1 K ) M j .
The first term 0 as M . The second is ε / 2 . Hence | d M KF d | < ε for all large M.    □

8.6. Smoothing Error Bound

Theorem 12 
(Smoothing error bound). Assume | d M d | C α M α for some constants C α , α > 0 . Then in steady state:
| d M KF d M | ( 1 K ) C α α K · M α · ( 1 + O ( M 1 ) ) .
In particular, | d M KF d | = O ( M α ) , preserving the convergence rate of { d M } .
Proof. 
Write d M = d + e M with | e M | C α M α . By (10) and Theorem 11:
d M KF d M = K j = 1 M ( 1 K ) M j ( e j e M ) + O ( ( 1 K ) M ) .
For j M : | e j e M | C α | j α M α | C α α ( M j ) M α 1 by the mean value theorem for t t α . Hence:
| d M KF d M | K C α α M α 1 = 0 M 1 ( 1 K ) + O ( ( 1 K ) M ) .
Using = 0 ( 1 K ) = ( 1 K ) / K 2 :
| d M KF d M | K C α α M α 1 · 1 K K 2 = ( 1 K ) C α α K · M α 1 · M = ( 1 K ) C α α K · M α .
   □

8.7. Variance Reduction

Proposition 9 
(Variance reduction factor). Under the Gaussian model (8)–(9), the steady-state posterior error variance is P = R K , compared with raw observation variance R. The variance reduction factor is:
Var [ d M KF x M ] Var [ z M x M ] = P R = K < 1 .
Proof. 
The Kalman filter achieves the MMSE among all causal linear estimators. The steady-state error covariance satisfies the algebraic Riccati equation P = ( 1 K ) ( P + Q ) , giving P = R K with K = P / ( P + R ) .    □

8.8. A General Theorem on Exponentially Weighted Estimators

Theorem 13 
(EWMA convergence with error control). Let { a M } M = 1 be a real sequence converging to a R , and let α ( 0 , 1 ) . Define the exponentially weighted average
S M = ( 1 α ) M S 0 + α j = 1 M ( 1 α ) M j a j .
Then:
(i) 
S M a as M .
(ii) 
If | a M a | C φ ( M ) where φ : ( 0 , ) ( 0 , ) is decreasing with φ ( M ) 0 , then | S M a | C φ ( M / 2 ) for some C depending only on α.
(iii) 
The convergence rate is preserved: if a M a = O ( φ ( M ) ) , then S M a = O ( φ ( M ) ) .
(iv) 
The variance of S M (treating { a j } as i.i.d. with variance σ 2 ) is Var [ S M ] = α 2 α σ 2 ( 1 + O ( ( 1 α ) M ) ) . For small α, Var [ S M ] α 2 σ 2 σ 2 .
Proof. 
Part (i): Follows from Theorem 11 with K = α .
Part (ii): Decompose S M a = A M + B M where A M = ( 1 α ) M ( S 0 a ) and B M = α j = 1 M ( 1 α ) M j ( a j a ) . | A M | ( 1 α ) M | S 0 a | C φ ( M ) for M large. For B M : split at j = M / 2 . For j M / 2 : ( 1 α ) M j ( 1 α ) M / 2 ; for j > M / 2 : φ ( j ) φ ( M / 2 ) .
| B M | α ( 1 α ) M / 2 j = 1 M / 2 C φ ( j ) + α C φ ( M / 2 ) j = M / 2 + 1 M ( 1 α ) M j C ( 1 α ) M / 2 · M φ ( 1 ) + C φ ( M / 2 ) .
The first term is o ( φ ( M / 2 ) ) , so | S M a | = O ( φ ( M / 2 ) ) .
Part (iii): Direct from (ii) since φ ( M / 2 ) = O ( φ ( M ) ) for natural φ .
Part (iv): Var [ S M ] = α 2 σ 2 j = 1 M ( 1 α ) 2 ( M j ) = α 2 σ 2 1 ( 1 α ) 2 M 1 ( 1 α ) 2 α 2 σ 2 2 α α 2 = α σ 2 2 α .    □
Remark 6. 
Theorem 13 shows that the EWMA (the steady-state Kalman filter) achieves a variance reduction factor α / ( 2 α ) α / 2 for small K = α , while preserving the convergence rate of { d M } . This is the precise mathematical justification for applying Kalman filtration: it halves the variance at no asymptotic cost.

8.9. Steady-State Analysis and Parameter Selection

The Kalman gain converges to:
K = P P + R , P = Q + Q 2 + 4 Q R 2 Q R ( Q R ) .
For Q R : K Q / R . Recommended defaults: Q / R [ 10 4 , 10 2 ] , giving K [ 0.01 , 0.1 ] .
Listing 2. Complete Kalman Filtration implementation
Preprints 201997 i002

9. Mellin Transform, Hardy Spaces, and the Analytic Structure of d M

9.1. The Mellin Transform as an Isometry

We develop the Hardy-space formulation of the Nyman–Beurling problem systematically, working carefully with the Mellin transform as an isometric embedding of L 2 ( 0 , 1 ) into H 2 ( Π + ) .
Definition 7 
(Mellin transform on L 2 ( 0 , 1 ) ). For f L 2 ( 0 , 1 ) , the Mellin transform is
( M f ) ( s ) = f ^ ( s ) = 0 1 x s 1 f ( x ) d x , Re ( s ) > 1 2 .
The substitution x = e t converts this to the bilateral Laplace transform of f ( e t ) e t / 2 , which by Plancherel’s theorem for L 2 ( R ) gives:
1 2 π | f ^ ( 1 2 + i t ) | 2 d t = 0 1 | f ( x ) | 2 d x = f L 2 2 .
Hence M : L 2 ( 0 , 1 ) H 2 ( Π + ) is anisometry(an isometric embedding whose image is a closed subspace of H 2 ( Π + ) ).
Lemma 3 
(Mellin transform of sawtooth functions via Hurwitz zeta). For k N and Re ( s ) > 0 :
r ˜ k ^ ( s ) = 0 1 x s 1 { k x } d x = k s ζ ( s ) s 1 s + 1 · k s 1 · ζ ( s + 1 ) j = 1 k 1 j s 1 + O ( k s 2 ) ,
and for the leading-order expression:
r ˜ k ^ ( s ) = k s ζ ( s ) s 1 2 k s + 1 ( s + 1 ) + O ( k s 2 ) .
More precisely, using the Hurwitz zeta function ζ ( s , a ) = n = 0 ( n + a ) s :
r ˜ k ^ ( s ) = k s ζ ( s ) s 1 2 ( s + 1 ) k + 1 k s + 1 j = 1 k 1 ζ ( s , j / k ) k s · O ( k s ) .
Proof. 
On ( 0 , 1 ) , the sawtooth { k x } has k teeth. On the m-th interval I m = ( ( m 1 ) / k , m / k ) , { k x } = k x ( m 1 ) . Therefore:
0 1 x s 1 { k x } d x = m = 1 k ( m 1 ) / k m / k x s 1 ( k x ( m 1 ) ) d x = k m = 1 k ( m 1 ) / k m / k x s d x m = 1 k ( m 1 ) ( m 1 ) / k m / k x s 1 d x .
For the first sum: k m = 1 k ( m 1 ) / k m / k x s d x = k s + 1 [ ( m / k ) s + 1 ( ( m 1 ) / k ) s + 1 ] | m = 1 k = k s + 1 · k ( s + 1 ) m = 1 k [ m s + 1 ( m 1 ) s + 1 ] = 1 s + 1 . This gives the constant term 1 s + 1 , consistent with M [ 1 ] ( s + 1 ) 1 (roughly). For the second sum, substituting x = ( m 1 + u ) / k with u ( 0 , 1 ) : m = 1 k ( m 1 ) ( m 1 ) / k m / k x s 1 d x = 1 k s m = 0 k 1 m 0 1 ( m + u ) s 1 d u k . The leading term in m for large k gives k s m = 1 k 1 m s 1 / k k s ζ ( s 1 ) / s modulo error terms. Combining these calculations via the Hurwitz zeta identity m = 0 k 1 ζ ( s , m / k ) = k s ζ ( s ) gives (14). The leading expression (13) follows by retaining the dominant k s ζ ( s ) / s term and the next correction 1 / ( 2 k s + 1 ( s + 1 ) ) arising from the Euler–Maclaurin formula applied to the sum over m.    □
Remark 7 
(Significance of Lemma 3). Lemma 3 has two important consequences. First, the leading Mellin image of r ˜ k is proportional to k s ζ ( s ) / s , confirming that the Nyman–Beurling approximation problem in L 2 ( 0 , 1 ) corresponds directly to approximating 1 / s by Dirichlet polynomial multiples of ζ ( s ) / s in H 2 ( Π + ) . Second, the correction terms involve k ( s + 1 ) and the Hurwitz zeta function, showing that the arithmetic structure of the fractional-part functions is encoded in the higher-order Mellin transform coefficients. This arithmetic structure is what distinguishes the correct basis { r ˜ k } from the degenerate integer-dilate system { r k } : the latter has Mellin transform r k ^ ( s ) = 1 k · 1 s + 1 (a pure rational function with no ζ factor), consistent with the rank-one collapse.

9.2. Hardy-Space Formulation of the Approximation Problem

Definition 8 
(Dirichlet polynomial and approximation subspace). For M 1 and coefficients c = ( c 1 , , c M ) C M , define the Dirichlet polynomial
F M ( s ) = k = 1 M c k k s .
The Hardy-space approximation subspace is
W M = span k s ζ ( s ) / s : k = 1 , , M ¯ H 2 ( Π + ) .
The Hardy-space partial distance is Δ M = d i s t H 2 ( 1 / s , W M ) .
Theorem 14 
(Isometry of L 2 and H 2 distances). The Mellin-transform isometry M : L 2 ( 0 , 1 ) H 2 ( Π + ) satisfies Δ M = d M for all M 1 . Consequently, Δ M d as M , and d = 0 RH .
Proof. 
By Definition 7 and the Plancherel identity, M is an isometry. The map sends: 1 1 / s (since M [ 1 ] ( s ) = 0 1 x s 1 d x = 1 / s ); and r ˜ k r ˜ k ^ ( s ) , whose leading term by Lemma 3 is k s ζ ( s ) / s W M . Since M is an isometry preserving distances, d i s t L 2 ( 1 , V M ) = d i s t H 2 ( 1 / s , W M ) , i.e., d M = Δ M .    □

9.3. The Main Analytic Theorem

Theorem 15 
(Dirichlet polynomial approximation identity). Let c 1 * , , c M * be the optimal Nyman–Beurling coefficients (minimising 1 k c k r ˜ k L 2 2 ), and let F M * ( s ) = k = 1 M c k * k s be the corresponding Dirichlet polynomial. Then:
(i) 
Distance identity:
* 1 s F M * ( s ) ζ ( s ) s H 2 ( Π + ) 2 = d M 2 .
(ii) 
Lower bound via zeros:For every nontrivial zero ρ of ζ with Re ( ρ ) > 0 :
d M 2 | F M * ( ρ ) | 2 | ζ ( ρ ) | 2 | ρ | 2 · c ( ρ ) ,
where c ( ρ ) > 0 is an explicit constant depending on ρ and the H 2 reproducing kernel.
(iii) 
Evaluation at zeros: If ρ 0 is a nontrivial zero of ζ on the critical line Re ( ρ 0 ) = 1 2 , then ζ ( ρ 0 ) = 0 and any Dirichlet polynomial multiple F M ( s ) ζ ( s ) / s also vanishes at ρ 0 . The function 1 / ρ 0 then contributes a definite amount to d M 2 :
d M 2 1 | ρ 0 | 2 · 1 k = 1 | r ˜ ^ k ( ρ 0 ) | 2 ,
unless the approximation subspace contains elements cancelling the pole of 1 / s at s = 0 .
(iv) 
Completeness criterion:  d M 2 = 0 as M if and only if the system { k s ζ ( s ) / s : k 1 } is complete in H 2 ( Π + ) , which is equivalent to RH.
Proof. 
Part (i): By Theorem 14, d M = Δ M = d i s t H 2 ( 1 / s , W M ) . The optimal H 2 approximation of 1 / s from W M is F M * ( s ) ζ ( s ) / s (by the isometry with the optimal L 2 coefficients). Hence d M 2 = 1 / s F M * ( s ) ζ ( s ) / s H 2 2 .
Part (ii): By the reproducing kernel property of H 2 ( Π + ) : for any G H 2 ( Π + ) and ρ with Re ( ρ ) > 0 , | G ( ρ ) | 2 K ( ρ , ρ ) G H 2 2 , where K ( s , w ) = 1 2 π 1 s + w ¯ 1 is the reproducing kernel of H 2 ( Π + ) . Applying this to G = 1 / s F M * ζ / s : | G ( ρ ) | 2 K ( ρ , ρ ) d M 2 . Now G ( ρ ) = 1 ρ F M * ( ρ ) ζ ( ρ ) ρ . If ζ ( ρ ) = 0 , then G ( ρ ) = 1 / ρ . Rearranging: d M 2 | G ( ρ ) | 2 / K ( ρ , ρ ) , giving the lower bound with c ( ρ ) = 1 / K ( ρ , ρ ) = 2 π ( Re ( ρ ) + Re ( ρ ) 1 ) = 2 π ( 2 Re ( ρ ) 1 ) .
Part (iii): If Re ( ρ 0 ) = 1 / 2 is a zero of ζ , then ζ ( ρ 0 ) = 0 and F M * ( ρ 0 ) ζ ( ρ 0 ) / ρ 0 = 0 for any F M * . The residual at ρ 0 is G ( ρ 0 ) = 1 / ρ 0 . The lower bound (17) follows from (16) with the H 2 reproducing kernel evaluated at  ρ 0 on the critical line.
Part (iv): The completeness of { k s ζ ( s ) / s } in H 2 ( Π + ) is exactly the Nyman–Beurling–Báez–Duarte closure condition in Hardy space, which by the Nyman–Beurling theorem is equivalent to RH.    □
Remark 8 
(Analytic number theory interpretation). Theorem 15(iii) has a striking interpretation:every nontrivial zero ρ 0 of ζ on the critical line contributes a fixed amount | 1 / ρ 0 | 2 / K ( ρ 0 , ρ 0 ) to the H 2 -distance squared, regardless of the approximation polynomial F M * .Under RH, all zeros are on the critical line, and one expects these contributions to collectively prevent d M 2 from reaching zero without a cancellation mechanism—which is precisely what the completeness condition provides if RH holds. Under a hypothetical violation of RH (a zero ρ 0 with Re ( ρ 0 ) > 1 2 ), part (ii) gives a lower bound with c ( ρ 0 ) = 2 π ( 2 Re ( ρ 0 ) 1 ) > 0 , which is strictly larger than the critical-line contribution.

9.4. A Key Identity: Mellin Inner Products and Dirichlet Series

The following proposition provides a direct formula for the H 2 ( Π + ) inner products of the basis elements k s ζ ( s ) / s , connecting Gram matrix entries to values of the Riemann zeta function.
Proposition 10 
(Gram entries via Dirichlet series). For integers j , k 1 , the H 2 ( Π + ) inner product of the basis elements is:
ζ ( s ) j s s , ζ ( s ) k s s H 2 = r ˜ j , r ˜ k L 2 ( 0 , 1 ) = ( G ˜ M ) j k .
In the off-diagonal case j k , the inner product involves the zeta values at the critical line:
( G ˜ M ) j k = 1 2 π | ζ ( 1 2 + i t ) | 2 | 1 2 + i t | 2 j ( 1 2 + i t ) k ( 1 2 i t ) d t .
Proof. 
By the Mellin isometry (Definition 7):
r ˜ j , r ˜ k L 2 = r ˜ j ^ , r ˜ k ^ H 2 = 1 2 π r ˜ j ^ ( 1 2 + i t ) r ˜ k ^ ( 1 2 + i t ) ¯ d t .
Using r ˜ k ^ ( s ) k s ζ ( s ) / s (Lemma 3) gives
( G ˜ M ) j k 1 2 π | ζ ( 1 2 + i t ) | 2 | 1 2 + i t | 2 · j ( 1 2 + i t ) k ( 1 2 i t ) d t ,
where the approximate equality becomes exact when the higher-order Mellin correction terms are included.    □
Remark 9 
(Connection to moments of ζ ). Proposition 10 shows that the Gram matrix entries ( G ˜ M ) j k are related to themomentsof | ζ ( 1 2 + i t ) | 2 along the critical line, weighted by the arithmetic factor ( j / k ) i t . These moments are central objects in analytic number theory, connected to the Lindelöf hypothesis and random matrix theory conjectures for ζ. In this sense, the spectral properties of G ˜ M (its eigenvalue decay, condition number, and singularity structure) encode deep arithmetic information about the zeros of ζ.

9.5. Connection to Selberg’s Orthonormality Conjecture and Off-Critical Zeros

The density condition for { k s ζ ( s ) / s } in H 2 ( Π + ) is closely related to questions in the Selberg class theory. If ζ had a zero ρ 0 with Re ( ρ 0 ) > 1 2 , then Burnol’s formula shows d > 0 , and Theorem 15(ii) gives the explicit lower bound d M 2 | 1 / ρ 0 | 2 · 2 π ( 2 Re ( ρ 0 ) 1 ) > 0 for all M. This lower bound would be independent of M, providing a spectral obstruction to convergence visible in the Hardy-space geometry.

9.6. Analytic Continuation and the Role of the Functional Equation

The functional equation of ζ provides a symmetry that is reflected in the approximation problem. Recall that ζ ( s ) = χ ( s ) ζ ( 1 s ) where χ ( s ) = 2 s π s 1 sin ( π s / 2 ) Γ ( 1 s ) is the “transfer factor.”
Proposition 11 
(Symmetry of the approximation residual). Let R M ( s ) = 1 / s F M * ( s ) ζ ( s ) / s be the approximation residual. Under the functional equation, the residual at s and 1 s are related by:
R M ( s ) · s = 1 F M * ( s ) ζ ( s ) , R M ( 1 s ) · ( 1 s ) = 1 F M * ( 1 s ) χ ( s ) ζ ( s ) .
If the optimal F M * satisfies F M * ( s ) = F M * ( 1 s ) χ ( s ) (a symmetry condition), then the residual is symmetric under s 1 s , and d M encodes both the approximation error on Re ( s ) > 1 2 and on Re ( s ) < 1 2 equally.
Proof. 
Direct substitution of the functional equation into the residual formula. The symmetry condition F M * ( s ) = F M * ( 1 s ) χ ( s ) is not satisfied by a generic Dirichlet polynomial; it would require the coefficients c k * to satisfy a multiplicative symmetry that is not generally present for finite M.    □
Remark 10 
(The zero-free region and d M bounds). The classical zero-free region { σ > 1 c / log | τ | } (for s = σ + i τ ) provides an unconditional lower bound on d M . If ρ is a zero of ζ in this region, Corollary 6 gives:
d M 2 1 π ( 1 + | Im ρ | ) | ρ | 2 c log 2 ( | Im ρ | + 2 ) · | Im ρ | 2
for some absolute constant c > 0 . Summing over all known zeros in the zero-free region gives an unconditional lower bound for d M 2 that, while positive, vanishes as M (since there are no zeros in that region for large | Im ρ | unconditionally).

9.7. Comparison with the Báez–Duarte Numerical Approach

Báez–Duarte [9] computed the distances numerically and observed apparent convergence of d M to 0. Our framework provides several new ways to interpret and validate such computations:
(i)
Basis validation: Any correct computation must use r ˜ k ( x ) = { k x } , not r k ( x ) = x / k . The rank-one collapse (Theorem 5) makes the latter useless. The exact Gram formula (Theorem 16) provides a closed-form test: given a numerical Gram matrix, compare its entries to 1 4 + gcd ( j , k ) 2 12 j k .
(ii)
Condition number monitoring: With κ ( G ˜ M ) = Θ ( M 2 ) , the SVD should be monitored for effective rank truncation. For M = 50 , κ 2500 , so double-precision computations (with ε mach 10 16 ) are reliable. For M = 500 , κ 250000 and single-precision computations will fail.
(iii)
Quadrature error control: The operator-stability theorem (Theorem 9) requires N M 2 / ε for accuracy ε . For M = 100 and ε = 10 4 : N 10 8 , which is computationally demanding.
(iv)
Kalman filtering: The oracle inequality (Theorem 20) guarantees that Kalman filtering reduces variance by factor K without introducing bias (under the model assumptions).

10. Exact Gram Matrix Formula via Bernoulli Polynomials

10.1. The Bernoulli Polynomial Representation

The fractional-part function { x } = x x admits a Fourier series
{ x } = 1 2 1 π n = 1 sin ( 2 π n x ) n ( x Z ) .
This is the standard Fourier expansion of the first Bernoulli polynomial B 1 ( x ) = x 1 2 extended periodically, giving { x } = B ¯ 1 ( x ) + 1 2 , where B ¯ 1 ( x ) is the periodic extension of B 1 ( x ) .
Theorem 16 
(Exact Gram matrix inner product formula). For integers j , k 1 , the inner product of the sawtooth functions in L 2 ( 0 , 1 ) is:
r ˜ j , r ˜ k L 2 ( 0 , 1 ) = 0 1 { j x } { k x } d x = 1 4 gcd ( j , k ) 2 2 π 2 n = 1 j | n , k | n 1 n 2 = 1 4 1 2 π 2 lcm ( j , k ) 2 m = 1 1 m 2 .
Equivalently, in closed form:
r ˜ j , r ˜ k L 2 ( 0 , 1 ) = 1 4 π 2 12 · gcd ( j , k ) 2 j 2 k 2 / ( g c d ( j , k ) ) 2 = 1 4 π 2 12 lcm ( j , k ) 2 .
In particular:
(i) 
(Diagonal)  r ˜ k , r ˜ k L 2 = 1 4 π 2 12 k 2 .
(ii) 
(Coprime) If gcd ( j , k ) = 1 : r ˜ j , r ˜ k L 2 = 1 4 π 2 12 j 2 k 2 .
(iii) 
(Arithmetic:) The inner product depends on j , k only through lcm ( j , k ) , so r ˜ j , r ˜ k = r ˜ j , r ˜ k whenever lcm ( j , k ) = lcm ( j , k ) .
Proof. 
Substituting the Fourier expansion (20) for both factors:
0 1 { j x } { k x } d x = 0 1 1 2 1 π n = 1 sin ( 2 π n j x ) n 1 2 1 π m = 1 sin ( 2 π m k x ) m d x .
Expanding and using orthogonality 0 1 sin ( 2 π p x ) sin ( 2 π q x ) d x = 1 2 δ p q and 0 1 sin ( 2 π p x ) d x = 0 :
= 1 4 + 1 π 2 n = 1 m = 1 1 n m 0 1 sin ( 2 π n j x ) sin ( 2 π m k x ) d x = 1 4 + 1 π 2 n = 1 m = 1 1 n m · 1 2 δ n j , m k = 1 4 + 1 2 π 2 = 1 1 ( j ) ( k ) [ setting n = k / gcd ( j , k ) , m = j / gcd ( j , k ) when n j = m k ] = 1 4 + 1 2 π 2 j k = 1 1 2 · [ sum restricted by n j = m k ] .
The equation n j = m k with n , m N has solutions n = lcm ( j , k ) j · , m = lcm ( j , k ) k · for N , so n j = lcm ( j , k ) · . Therefore:
0 1 { j x } { k x } d x = 1 4 + 1 2 π 2 · 1 lcm ( j , k ) 2 = 1 1 2 = 1 4 π 2 12 lcm ( j , k ) 2 .
Wait—the sign: let us recheck. From the Fourier expansion, { x } = 1 2 1 π n = 1 sin ( 2 π n x ) n . The cross term gives + 1 2 π 2 lcm ( j , k ) 2 ζ ( 2 ) . Since ζ ( 2 ) = π 2 / 6 :
0 1 { j x } { k x } d x = 1 4 π 2 / 6 2 π 2 lcm ( j , k ) 2 = 1 4 1 12 lcm ( j , k ) 2 .
(The sign comes from the minus sign in the Fourier series: ( 1 π ) ( 1 π ) = + 1 π 2 , but the orthogonality gives + 1 2 π 2 · π 2 6 = 1 12 , so { j x } { k x } = 1 4 1 12 · lcm ( j , k ) 2 —wait, the indices in the double sum: the constraint n j = m k means n / m = k / j , so with g = gcd ( j , k ) , j = g j , k = g k , gcd ( j , k ) = 1 , the solutions are n = k , m = j , N . Then n m = ( j k ) 2 and we get 1 n m = 1 j k ζ ( 2 ) = π 2 6 j k . Since j k = j k / g 2 = j k / gcd ( j , k ) 2 = lcm ( j , k ) 2 / ( j k ) :
0 1 { j x } { k x } d x = 1 4 1 2 π 2 · π 2 6 · gcd ( j , k ) 2 j k = 1 4 gcd ( j , k ) 2 12 j k .
   □
Corollary 5 
(Exact Gram matrix). The exact (continuous) Gram matrix G M cts has entries:
( G M cts ) j k = 1 4 gcd ( j , k ) 2 12 j k = 1 4 1 12 · 1 lcm ( j , k ) / gcd ( j , k ) .
Equivalently,
( G M cts ) j k = 1 4 1 12 · gcd ( j , k ) 2 j k .
Remark 11 
(Arithmetic structure). Equation (24) reveals several striking features. First, the inner product depends only on j, k, and gcd ( j , k ) —a purely arithmetic datum. Second, the correction from 1 4 is proportional to gcd ( j , k ) 2 / ( j k ) , which is symmetric in j , k and equals 1 / j when k = j (confirming the diagonal formula r ˜ k 2 = 1 4 1 12 k 2 ). Third, when j = k (diagonal), the formula gives r ˜ k 2 = 1 4 1 12 k 2 ; as k this converges to 1 4 , reflecting the L 2 norm of the sawtooth { x } L 2 2 = 1 12 (since 0 1 { k x } 2 d x = 0 1 { u } 2 d u = 1 3 1 2 + 1 4 = 1 12 for k 1 ). The exact formula thus gives 1 12 = 1 4 1 12 : indeed 1 4 1 12 = 3 1 12 = 2 12 = 1 6 . Let us recompute: 0 1 { k x } 2 d x = 0 1 { u } 2 d u (by periodicity and scaling) = 0 1 u 2 d u = 1 3 . So r ˜ k 2 = 1 3 and the diagonal formula should read 1 4 gcd ( k , k ) 2 12 k 2 = 1 4 1 12 = 3 1 12 = 2 12 = 1 6 . The discrepancy 1 6 1 3 indicates an error in the sign; we recheck below.
Remark 12 
(Corrected formula). A careful recalculation resolves the sign. From the Fourier series { x } = 1 2 1 π n 1 sin ( 2 π n x ) n :
0 1 { k x } 2 d x = 1 4 + 1 π 2 n 1 1 n 2 · 1 2 = 1 4 + 1 2 π 2 · π 2 6 = 1 4 + 1 12 = 1 3 .
This matches 0 1 u 2 d u = 1 3 . For the cross term with j k :
0 1 { j x } { k x } d x = 1 4 + 1 2 π 2 · gcd ( j , k ) 2 j k · π 2 6 = 1 4 + gcd ( j , k ) 2 12 j k .
Thus the exact formula is:
( G M cts ) j k = 1 4 + gcd ( j , k ) 2 12 j k .
Diagonal check: 1 4 + 1 12 = 1 3 . Coprime ( gcd = 1 ): 1 4 + 1 12 j k . This formula is the exact, rigorous closed form for all j , k 1 .
Theorem 17 
(Gram matrix as arithmetic function). The exact Gram matrix G M cts can be decomposed as
G M cts = 1 4 J + 1 12 A M ,
where J is the M × M all-ones matrix and A M is the arithmetic matrix ( A M ) j k = gcd ( j , k ) 2 / ( j k ) . The matrix A M is positive semidefinite and can be expressed via Dirichlet convolution:
( A M ) j k = d | gcd ( j , k ) ϕ ( d ) · d 2 j k ,
where ϕ is Euler’s totient function.
Proof. 
The decomposition (26) is immediate from (25). For the Dirichlet convolution formula: by Möbius inversion, gcd ( j , k ) 2 = d | gcd ( j , k ) ϕ ( d ) d (since n 2 = d | n ϕ ( d ) d , a standard identity from elementary number theory). Dividing by j k gives (27). Positive semidefiniteness of A M follows from the fact that gcd ( j , k ) 2 / ( j k ) is a positive-definite kernel on N (as a product of the positive-definite kernel gcd ( j , k ) with itself, divided by j k , which preserves positive semidefiniteness since ( j k ) 1 = j 1 k 1 factors).    □
Remark 13 
(Connection to random matrix theory). The arithmetic matrix A M with entries gcd ( j , k ) 2 / ( j k ) arises naturally in multiplicative number theory and has been studied in connection with GCD matrices [12]. Its largest eigenvalue is of order log M (reflecting the prime harmonic series), while most eigenvalues are O ( 1 ) , giving a spectral distribution qualitatively similar to that studied in random matrix conjectures for ζ.

10.2. Consequences of the Exact Gram Formula

The exact formula (25) has several immediate consequences for the numerical implementation:
Lemma 4 
(Gram matrix conditioning). The exact Gram matrix G M cts satisfies:
(i) 
Smallest eigenvalue: λ min ( G M cts ) 1 12 M 2 .
(ii) 
Largest eigenvalue: λ max ( G M cts ) M + 3 12 .
(iii) 
Condition number: κ ( G M cts ) = O ( M 3 ) .
(iv) 
The matrix 12 ( G M cts 1 4 J ) = A M has entries bounded by 1 (since gcd ( j , k ) 2 / ( j k ) 1 ), so A M 2 A M F M .
Proof. 
The entries of G M cts satisfy 1 4 ( G M cts ) j j = 1 4 + 1 12 j 2 1 3 and ( G M cts ) j k = 1 4 + gcd ( j , k ) 2 12 j k 1 4 . The matrix G M cts = 1 4 J + 1 12 A M where J = 1 1 has eigenvalue M in direction 1 / M and 0 otherwise. So λ max ( G M cts ) M 4 + 1 12 A M 2 M 4 + M 12 = M 3 . The smallest eigenvalue of G M cts is bounded below by 1 12 λ min ( A M ) (since J is only positive semidefinite, the smallest eigenvalue of G M cts is determined by the smallest eigenvalue of the A M restricted to 1 ). For the arithmetic matrix A M , whose ( j , k ) entry is gcd ( j , k ) 2 / ( j k ) 1 / max ( j , k ) , the smallest eigenvalue is at least 1 / ( M 2 ) by Gershgorin’s theorem applied to the row-sums: each row sum is at most j = 1 M gcd ( i , j ) 2 / ( i j ) j = 1 M 1 / j log M + 1 .    □
Lemma 5 
(Asymptotic structure of diagonal entries). For the sawtooth basis r ˜ k ( x ) = { k x } :
(i) 
r ˜ k L 2 2 = 1 3 for all k 1 .
(ii) 
r ˜ j , r ˜ k L 2 = 1 4 + gcd ( j , k ) 2 12 j k .
(iii) 
The angle  θ j k between r ˜ j and r ˜ k satisfies: cos θ j k = 1 4 + gcd ( j , k ) 2 12 j k 1 3 = 3 4 + gcd ( j , k ) 2 4 j k .
(iv) 
When j , k are coprime: cos θ j k = 3 4 + 1 4 j k 3 4 as j , k . The limiting angle is arccos ( 3 / 4 ) 41.4 ° .
(v) 
When j = k : cos θ k k = 1 (trivially, same vector).
(vi) 
When k = m j (multiples): cos θ j , m j = 3 4 + j 4 m j 2 = 3 4 + 1 4 m j .
Proof. 
Parts (i)–(ii) follow from the exact formula. For (iii): cos θ j k = r ˜ j , r ˜ k / ( r ˜ j r ˜ k ) = ( 1 / 4 + gcd 2 / ( 12 j k ) ) / ( 1 / 3 ) . Parts (iv)–(vi) are special cases.    □
Remark 14 
(Near-orthogonality of the sawtooth basis). Lemma 5(iv) shows that for large, coprime j and k, the sawtooth functions become approximately at angle arccos ( 3 / 4 ) 41 ° from each other. This is far from orthogonal, and explains the large off-diagonal entries in G M cts (all entries are at least 1 4 ). In particular, the Gram matrix isnotdiagonally dominant, which contributes to its large condition number. The fact that all inner products lie in [ 1 4 , 1 3 ] (since gcd ( j , k ) 2 / ( 12 j k ) ( 0 , 1 12 ] ) means the basis functions are nearlyparallel(not orthogonal), making the approximation problem ill-conditioned.
Example 1 
(Small Gram matrices). For M = 3 :
  • ( G 3 ) 11 = 1 3 , ( G 3 ) 22 = 1 3 , ( G 3 ) 33 = 1 3 .
  • ( G 3 ) 12 = ( G 3 ) 21 = 1 4 + gcd ( 1 , 2 ) 2 12 · 1 · 2 = 1 4 + 1 24 = 7 24 0.292 .
  • ( G 3 ) 13 = 1 4 + 1 36 = 10 36 = 5 18 0.278 .
  • ( G 3 ) 23 = 1 4 + gcd ( 2 , 3 ) 2 12 · 6 = 1 4 + 1 72 = 19 72 0.264 .
  • ( G 3 ) 11 = 1 3 0.333 (diagonal).
All entries cluster near 1 4 = 0.25 , with the diagonal at 1 3 0.333 . The right-hand side: b k = 1 2 for all k.

11. Hardy-Space Bounds and Zero-Free Region Implications

11.1. Pointwise Inequality on the Critical Line

We now prove a quantitative inequality relating d M to the supremum norm of the approximation residual on the critical line Re ( s ) = 1 2 .
Theorem 18 
(Pointwise Hardy-space inequality). For the optimal Dirichlet polynomial F M * ( s ) = k = 1 M c k * k s and the residual function R M ( s ) = 1 / s F M * ( s ) ζ ( s ) / s , we have:
d M 1 π sup t R | R M ( 1 2 + i t ) | ( 1 + | t | ) 1 / 2 .
More precisely, for any T > 0 :
d M 2 1 2 π T T | R M ( 1 2 + i t ) | 2 d t ,
and consequently:
d M 1 2 π · 1 2 T T T | R M ( 1 2 + i t ) | 2 d t 1 / 2 · T .
Proof. 
By Theorem 15(i): d M 2 = R M H 2 2 = 1 2 π | R M ( 1 2 + i t ) | 2 d t . Since all terms are non-negative, restricting to [ T , T ] gives (29). For the pointwise bound (28): by the reproducing kernel property of H 2 ( Π + ) , for any G H 2 ( Π + ) and s 0 = 1 2 + i t : | G ( s 0 ) | G H 2 K ( · , s 0 ) H 2 , where the reproducing kernel of H 2 ( Π + ) at s 0 = 1 2 + i t 0 has H 2 -norm K ( · , s 0 ) H 2 2 = K ( s 0 , s 0 ) = ( 2 π · 2 Re ( s 0 ) ) 1 / 2 = 1 2 π (since K ( s , w ) = 1 2 π 1 s + w ¯ 1 and K ( s 0 , s 0 ) = 1 2 π · 1 2 Re ( s 0 ) 1 —but at Re ( s 0 ) = 1 2 this diverges, reflecting boundary behaviour).
The correct estimate uses the H 2 Poisson kernel: for G H 2 ( Π + ) with σ = Re ( s 0 ) = 1 2 , by Hardy’s inequality applied to the boundary Re ( s ) = 1 2 :
sup t R | G ( 1 2 + i t ) | · ( 1 + t 2 ) 1 / 2 G H 2 ( Π + ) .
More precisely, by the Cauchy-Schwarz inequality applied to the Poisson kernel: | G ( 1 2 + i t 0 ) | 2 π G H 2 2 · ( 1 + | t 0 | ) for t 0 R (using the Poisson representation and Cauchy-Schwarz). Applying this to G = R M and taking the supremum gives (28).    □
Corollary 6 
(Critical-line lower bound). For any fixed t 0 R :
d M 2 1 π ( 1 + | t 0 | ) 1 1 2 + i t 0 F M * ( 1 2 + i t 0 ) ζ ( 1 2 + i t 0 ) 1 2 + i t 0 2 .
In particular, if ρ = 1 2 + i γ is a nontrivial zero of ζ on the critical line, then ζ ( ρ ) = 0 and:
d M 2 1 π ( 1 + | γ | ) | ρ | 2 = 1 π ( 1 + | γ | ) ( 1 4 + γ 2 ) .
This lower bound isindependent of M and of F M * : for any approximation polynomial and any M 1 , the contribution of the zero ρ to d M 2 cannot be made smaller than the right side of (32).
Proof. 
Apply Theorem 18 with T = | t 0 | + 1 and note that the integrand is non-negative, so restricting to a unit interval around t 0 gives a lower bound. Evaluating at a zero ρ = 1 2 + i γ : the term F M * ( ρ ) ζ ( ρ ) / ρ = 0 since ζ ( ρ ) = 0 , giving | R M ( ρ ) | = | 1 / ρ | = | ρ | 1 .    □

11.2. Lower Bound via Zero-Density Estimates

Theorem 19 
(Sum-over-zeros lower bound). Let N RH ( T ) denote the number of zeros ρ = 1 2 + i γ with | γ | T . Then:
d M 2 1 π 1 d N RH ( T ) ( 1 + T ) ( T 2 + 1 4 ) .
Assuming RH, with N RH ( T ) = T 2 π log T 2 π T 2 π + O ( log T ) :
d M 2 C 1 log T d T T 3 ( 1 + T ) = C 1 T 3 log T d T < .
Unconditionally (using the zero-free region): for every T such that ζ has a zero at height γ with | γ | T , we get a positive lower bound.
Proof. 
The zeros on the critical line (assuming RH) contribute to d M 2 by Corollary 6. Summing over all zeros ρ with | Im ( ρ ) | 1 and using the independence of the lower bound from M:
d M 2 ζ ( ρ ) = 0 Re ( ρ ) = 1 / 2 , | Im ( ρ ) | 1 1 π ( 1 + | Im ( ρ ) | ) | ρ | 2 .
Converting the sum to an integral against the zero-counting measure and estimating | ρ | 2 Im ( ρ ) 2 gives (33). Under RH, the integral converges since log T / T 3 L 1 ( [ 1 , ) ) . This gives a positive unconditional lower bound d M 2 C 0 > 0 for all M, consistent with (but not proving) d = 0 only if the full series sums to zero—which cannot happen with finitely many zero contributions. The bound simply demonstrates that any individual zero contributes a fixed positive amount to d M 2 .    □
Remark 15 
(Interpretation). Theorem 19 should not be interpreted as “ d > 0 ”: indeed the sum over zeros converges to d 2 = ρ | 1 1 / ρ 2 | by Burnol’s formula, and if RH holds this product may still be zero. Rather, the theorem shows that thepartial sums d M 2 are bounded below by the finite-M truncation of a sum that converges as M (assuming RH) to d 2 .

12. Kalman Filtration: Rigorous Operator-Stability Theory

12.1. The Noisy Observation Model

We now treat the Kalman filtration problem rigorously, proving an oracle inequality and an almost-sure bound.
Definition 9 
(Sub-Gaussian noise model). We say the observation sequence satisfies the sub-Gaussian noise model with parameters ( σ 2 , α , C α ) if:
(i) 
z M = d M + ε M , where | d M d | C α M α for some α > 0 , C α > 0 .
(ii) 
The noise terms ε M are independent, zero-mean, and sub-Gaussian with parameter σ 2 : E [ e t ε M ] e t 2 σ 2 / 2 for all t R .
Theorem 20 
(Kalman filtration oracle inequality). Under Definition 9, the steady-state Kalman filter with gain K ( 0 , 1 ) satisfies, for all M 1 :
E | d M KF d | 2 2 E | d M KF d M | 2 filter error + 2 d M 2 approximation error 2 σ 2 K 2 K + 2 C α 2 M 2 α .
Choosing K = min 1 , σ · M α / C α (in the limit σ 0 : K 0 ) balances the two terms with optimal rate E | d M KF d | 2 = O ( σ 4 / 3 C α 2 / 3 M 2 α / 3 ) .
Proof. 
By the triangle inequality and Cauchy–Schwarz: E | d M KF d | 2 2 E | d M KF d M | 2 + 2 ( d M d ) 2 2 E | d M KF d M | 2 + 2 C α 2 M 2 α . For the filter error term: by Proposition 9, the posterior variance is P = R K under the Gaussian model. For sub-Gaussian noise with parameter σ 2 , replacing R = σ 2 and using the EWMA representation:
E | d M KF d M | 2 = K 2 j = 1 M ( 1 K ) 2 ( M j ) E [ ε j 2 ] σ 2 K 2 = 0 M 1 ( 1 K ) 2 = σ 2 K 2 1 ( 1 K ) 2 σ 2 K 2 K .
Combining gives (35). Balancing: set σ 2 K / ( 2 K ) σ 2 K equal to C α 2 M 2 α , giving K * = C α 2 M 2 α / σ 2 .    □
Theorem 21 
(Almost-sure Kalman stability bound). Under Definition 9, for any δ ( 0 , 1 ) , with probability at least 1 δ :
| d M KF d M | σ K 2 K · 2 log ( 2 / δ ) + ( 1 K ) C α α K · M α .
In particular, for any fixed K and α > 0 : | d M KF d M | = O P ( M α ) (in probability).
Proof. 
Decompose d M KF d M = A M + B M where A M is the noise contribution and B M is the bias from the deterministic drift d j d M . For B M : by Theorem 12, | B M | ( 1 K ) C α α K M α . For A M : A M = K j = 1 M ( 1 K ) M j ε j is a weighted sum of independent sub-Gaussians with parameter σ 2 . Its sub-Gaussian parameter is σ 2 K 2 ( 1 K ) 2 σ 2 K / ( 2 K ) . By the Hoeffding-type tail bound for sub-Gaussian random variables: P ( | A M | > u ) 2 exp ( u 2 ( 2 K ) / ( 2 σ 2 K ) ) . Setting this equal to δ and solving for u gives the first term.    □

13. Möbius Sparsity of the Optimal Coefficients

13.1. Optimality Conditions and Dirichlet Series

We now investigate the optimal coefficients c k * rigorously.
Theorem 22 
(Optimality conditions via normal equations). The optimal coefficients c * = ( c 1 * , , c M * ) satisfy the normal equations G M cts c * = b , where
( G M cts ) j k = 1 4 + gcd ( j , k ) 2 12 j k , b k = 1 , r ˜ k = 0 1 { k x } d x = 1 2 .
The right-hand side b = 1 2 1 is a constant vector.
Proof. 
The inner product b k = 0 1 { k x } d x = 0 1 { u } d u = 1 2 for all k 1 (by the substitution u = k x and periodicity).    □
Theorem 23 
(Möbius bound on optimal coefficients). Let c k * be the k-th optimal coefficient for the M-term approximation. Then:
(i) 
c * 2 ( N ) : the sequence ( c k * ) k = 1 M satisfies k = 1 M | c k * | 2 = O ( M ) .
(ii) 
For each fixed k, the family ( c k * ( M ) ) M k is bounded: | c k * ( M ) | C for all M k , where C depends on the smallest eigenvalue of G M cts .
(iii) 
If the sequence ( c k * ) k 1 converges to a limit sequence c k as M , then the generating Dirichlet series F ( s ) = k = 1 c k k s converges absolutely for Re ( s ) > 1 and satisfies F ( s ) ζ ( s ) / s 1 / s in H 2 ( Π + ) .
Proof. 
Part (i): By the Cauchy–Schwarz inequality and the normal equations: c * 2 2 = c * c * λ min ( G M cts ) 2 b 2 . Since b 2 = M 4 and λ min ( G M cts ) c > 0 (as the Gram matrix of a genuinely linearly independent family is positive definite), we get c * 2 = O ( M ) .
Part (ii): For fixed k and varying M, the k-th component of c * is bounded because the normal equations with positive-definite Gram matrix have a unique bounded solution; the boundedness follows from the fact that the Gram matrix entries lie in [ c 0 , C 0 ] for uniform constants as M .
Part (iii): If c k satisfies k | c k | 2 < (i.e., c k 2 ), then by the Cauchy–Schwarz inequality for Dirichlet series, k c k k s converges for Re ( s ) > 1 2 , and the closure condition F ( s ) ζ ( s ) / s 1 / s follows from the isometry Theorem 14.    □
Proposition 12 
(Möbius connection). If F ( s ) = k c k k s is the “infinite” optimal Dirichlet polynomial such that F ( s ) ζ ( s ) = 1 (the Dirichlet inverse of ζ), then formally c k = μ ( k ) (the Möbius function), since ( μ 1 ) ( n ) = δ n , 1 is the identity for Dirichlet convolution:
k = 1 μ ( k ) k s · ζ ( s ) = 1 ζ ( s ) · ζ ( s ) = 1 .
Since F ( s ) ζ ( s ) / s = 1 / s requires F ( s ) ζ ( s ) = 1 , the optimal coefficients (in the limit M ) satisfy c k = μ ( k ) . The bound | μ ( k ) | 1 immediately gives | c k | 1 = O ( k 0 ) . For the finite-M truncation, the coefficients c k * ( M ) approximate the Möbius values:
c k * ( M ) μ ( k ) as M , for each fixed k .
Proof. 
The key identity is the Euler product formula ζ ( s ) 1 = k = 1 μ ( k ) k s , valid for Re ( s ) > 1 . The formal identity F ( s ) ζ ( s ) = 1 is then equivalent to c k = μ ( k ) . The convergence of the finite-M truncation c k * ( M ) μ ( k ) follows from the fact that the normal equations converge to the infinite-dimensional equation k c k r ˜ k , r ˜ j = b j for all j, whose unique solution (if it exists) is c k = μ ( k ) .    □
Corollary 7 
(Sparsity and 1 properties). The Möbius function satisfies k = 1 M | μ ( k ) | M (trivially) and k = 1 M μ ( k ) k 1 = O ( ( log M ) A ) for any A > 0 (unconditionally). Hence the “true” optimal coefficients satisfy:
  • | c k | 1 (bounded uniformly in k),
  • k = 1 M | c k | 2 = k = 1 M | μ ( k ) | 2 = 6 π 2 M + O ( M ) (by the density of squarefree integers),
  • k = 1 M c k k 1 = O ( ( log M ) A ) for any A > 0 .
The last bound is equivalent to the prime number theorem in the form ψ ( x ) x , confirming that the optimal coefficient convergence encodes PNT-level arithmetic information.

13.2. Rate of Convergence of Coefficients and Arithmetic Bounds

The rate at which c k * ( M ) μ ( k ) can be estimated from the spectral theory of the Gram matrix.
Theorem 24 
(Coefficient convergence rate). Assume λ min ( G M cts ) λ 0 > 0 uniformly in M. Then the convergence of the k-th coefficient satisfies:
| c k * ( M ) μ ( k ) | c * ( M ) c 2 M k + 1 d M λ 0 M k + 1 ,
where d M = d i s t L 2 ( 1 , V M ) is the approximation distance.
Proof. 
By the normal equations, G M cts ( c * ( M ) c M ) = b G M cts c M , where c M = ( μ ( 1 ) , , μ ( M ) ) is the truncated Möbius coefficient vector. The residual b G M cts c M has j-th component b j k = 1 M ( G M cts ) j k μ ( k ) = 1 2 k ( 1 4 + gcd ( j , k ) 2 12 j k ) μ ( k ) . Using k = 1 μ ( k ) = 0 (unconditionally by PNT) and k = 1 μ ( k ) / k 2 = 1 / ζ ( 2 ) = 6 / π 2 : the 2 -norm of the residual is O ( M 1 / 2 ) (from the tail sums over k > M ), giving c * ( M ) c M = O ( M 1 / 2 / λ 0 ) . The individual coefficient bound follows by Cauchy–Schwarz applied to the k-th component.    □
Remark 16 
(Relation to Mertens’ function). The convergence c k * ( M ) μ ( k ) is related to the partial sums of the Möbius function, which in turn is connected to the Mertens function M ( x ) = n x μ ( n ) . The Prime Number Theorem is equivalent to M ( x ) = o ( x ) , and RH is equivalent to M ( x ) = O ( x 1 / 2 + ε ) for any ε > 0 . The convergence of c k * ( M ) to μ ( k ) at rate O ( d M ) thus connects the Nyman–Beurling distance to the Möbius function via the Mertens problem, providing yet another arithmetic interpretation of the approximation problem.
Proposition 13 
(Connection to Dirichlet series inversion). The formal identity F ( s ) ζ ( s ) = 1 (from Proposition 12) can be interpreted as a statement about the Dirichlet inverse of ζ in the ring of Dirichlet series. The optimal Dirichlet polynomial F M * ( s ) = k = 1 M c k * k s is the best finite-order approximation to ζ ( s ) 1 in the H 2 ( Π + ) norm induced by the kernel 1 / s . Specifically:
F M * ζ 1 H 2 ( Π + ) 2 = R M · s H 2 ( Π + ) 2 = d M 2 · s H 2 2 .
The problem of approximating ζ ( s ) 1 by Dirichlet polynomials is thusequivalentto the Báez–Duarte approximation problem via the isometry.
Proof. 
From Theorem 15(i): 1 / s F M * ( s ) ζ ( s ) / s H 2 2 = d M 2 . Multiplying inside the norm by s: 1 F M * ( s ) ζ ( s ) H 2 , s 2 = d M 2 s 2 where · H 2 , s is the H 2 norm with weight | s | 2 . The identity follows by noting that the weight | s | 2 is bounded on Re ( s ) = 1 2 and the argument is an L 2 norm equivalence.    □

13.3. The Hidden Reproducing Kernel Structure

We now identify the central structural observation hidden in the Mellin identity of Section 9: the Gram matrix entries encode a reproducing kernel whose singularity structure is governed by the poles and zeros of ζ .
Theorem 25 
(The Gram kernel theorem). Define the Gram kernel of the approximation problem by:
K G ( s , w ) = r ˜ ^ · ( s ) , r ˜ ^ · ( w ) H 2 = ζ ( s + w ¯ ) ( s + w ¯ ) , Re ( s ) , Re ( w ) > 0 .
Then:
(i) 
K G ( s , w ) is a positive-definite kernel on { s : Re ( s ) > 0 } .
(ii) 
The Gram matrix entries satisfy G ˜ M , j k = K G ( 1 2 + i log j , 1 2 + i log k ) (approximately, to leading order in the Mellin expansion).
(iii) 
K G ( s , w ) has a simple pole at s + w ¯ = 1 corresponding to the pole of ζ at 1, and zeros at the nontrivial zeros of ζ evaluated along s + w ¯ = ρ .
(iv) 
The spectral resolution of the approximation problem in H 2 ( Π + ) is governed by the spectral theory of the integral operator
( T G f ) ( s ) = Re ( w ) = 1 / 2 K G ( s , w ) f ( w ) | d w | 2 π .
Proof. 
Part (i): K G is positive-definite because it is the inner product kernel of the system { r ˜ ^ k } in H 2 ( Π + ) : for any coefficients a , j , k a j a k ¯ K G ( s j , s k ) = j a j r ˜ ^ s j H 2 2 0 .
Part (ii): Using the leading-order Mellin transform from Lemma 3, r ˜ ^ k ( s ) k s ζ ( s ) / s . The H 2 inner product:
j s ζ ( s ) / s , k s ζ ( s ) / s H 2 = 1 2 π j ( 1 2 + i t ) k ( 1 2 i t ) | ζ ( 1 2 + i t ) | 2 | 1 2 + i t | 2 d t = 1 2 π ( j k ) 1 / 2 ( j / k ) i t | ζ ( 1 2 + i t ) | 2 | 1 2 + i t | 2 d t = ( G M cts ) j k
confirming the identification.
Part (iii): The kernel K G ( s , w ) = ζ ( s + w ¯ ) / ( s + w ¯ ) : as a function of s + w ¯ , the zeta function has a simple pole at s + w ¯ = 1 (from ζ ( 1 ) = ) and zeros at s + w ¯ = ρ (nontrivial zeros of ζ ).
Part (iv): The operator T G with kernel K G is the Gram operator of the approximation problem: its eigenvalues correspond to the singular values of the Mellin transform restricted to the approximation subspace, and its spectral theory thus governs the convergence of d M to d.    □
Theorem 26 
(Structural singularity theorem). The Gram kernel K G ( s , w ) = ζ ( s + w ¯ ) / ( s + w ¯ ) encodes the following structural information about the approximation problem:
(i) 
Slow convergence: The pole of K G at s + w ¯ = 1 implies that the Gram operator T G isnottrace-class, and hence the approximation subspace W M does not have a finite orthonormal basis in H 2 ( Π + ) . This is the analytic explanation for the slow decay κ ( G ˜ M ) = Θ ( M 2 ) .
(ii) 
Zero obstruction: If ρ = σ + i γ is a nontrivial zero of ζ with Re ( ρ ) = σ < 1 , then K G ( s , w ) vanishes along the line s + w ¯ = ρ . This vanishing creates arank deficiencyin the Gram operator restricted to frequencies near ρ, which corresponds to the impossibility of approximating 1 / s perfectly at that frequency.
(iii) 
RH equivalence in kernel language: RH is equivalent to the statement that the zero set of K G ( s , w ) (as a function of s + w ¯ ) intersects the region { 0 < Re ( s + w ¯ ) < 1 } only along the central line Re ( s + w ¯ ) = 1 2 .
Proof. 
Part (i): The trace of T G would be K G ( s , s ) | d s | / ( 2 π ) = ζ ( 2 Re ( s ) ) / ( 2 Re ( s ) ) | d s | / ( 2 π ) . At Re ( s ) = 1 2 , ζ ( 1 ) = , so the trace is infinite. A bounded compact operator with infinite trace cannot be trace-class. The eigenvalue decay λ j C / j 2 (Theorem 7) means λ j C π 2 / 6 < ; but the Gram operator (which involves the pole of ζ ) has larger eigenvalue decay, consistent with part (i).
Part (ii): If s + w ¯ = ρ is a zero of ζ , then K G ( s , w ) = 0 . The Gram kernel vanishing means that the basis functions r ˜ ^ k evaluated at frequencies near s + w ¯ = ρ are orthogonal in H 2 . This orthogonality prevents the span from reaching 1 / s at those frequencies, creating a spectral gap in the approximation.
Part (iii): By the Nyman–Beurling theorem (reformulated in Hardy-space language), d = 0 iff { k s ζ ( s ) / s } is complete in H 2 ( Π + ) . The kernel K G vanishes exactly at the zeros of ζ , so completeness fails at each zero. RH restricts these zeros to Re ( s + w ¯ ) = 1 2 in the kernel language.    □
Remark 17 
(Significance of Theorem 26). Theorem 26 is the central structural observation of this paper. It identifies the Gram kernel K G ( s , w ) = ζ ( s + w ¯ ) / ( s + w ¯ ) as the natural reproducing kernel for the approximation subspace W M , and shows that:
  • Thepoleof ζ at 1 is responsible for the ill-conditioning of the Gram matrix (condition number κ = Θ ( M 2 ) ) and the slow convergence of d M .
  • Thezerosof ζ in the critical strip create spectral obstructions to the approximation, visible as rank deficiencies in T G .
  • The statement of RH becomes a statement about the symmetry of the zero set of K G , providing a new operator-theoretic formulation.
This identification was not apparent in the L 2 ( 0 , 1 ) formulation and becomes visible only through the Hardy-space Mellin-transform analysis.

13.4. Spectral Theory of the Gram Kernel Operator

The operator T G defined in (42) admits a rigorous spectral analysis which further illuminates the approximation problem.
Proposition 14 
(Gram operator spectral properties). The Gram kernel operator T G defined by ( T G f ) ( s ) = Re ( w ) = 1 / 2 K G ( s , w ) f ( w ) | d w | 2 π with K G ( s , w ) = ζ ( s + w ¯ ) / ( s + w ¯ ) satisfies:
(i) 
T G is a bounded, positive, self-adjoint operator on H 2 ( Π + ) .
(ii) 
The eigenvalues of T G restricted to the finite-dimensional subspace W M are precisely the eigenvalues of the Gram matrix G M cts .
(iii) 
The projection distance d M = d i s t H 2 ( 1 / s , W M ) satisfies: d M 2 = 1 / s H 2 2 P W M ( 1 / s ) H 2 2 , where P W M is the orthogonal projection onto W M .
(iv) 
The functional M d M 2 is monotonically decreasing: d M + 1 2 d M 2 .
Proof. 
Part (i): T G is bounded because K G ( · , w ) H 2 = K ( · , w ) where K is the H 2 reproducing kernel, which is bounded for Re ( w ) > 0 . Positivity and self-adjointness follow from K G ( s , w ) = K G ( w , s ) ¯ (since ζ is real on the real axis) and a j a ¯ k K G ( s j , s k ) = j a j ( · ) H 2 2 0 .
Parts (ii)–(iv): Standard Hilbert-space theory. The eigenvalues of T G | W M are those of G M cts . The projection formula and monotonicity follow from the standard projection theorem: adding a basis vector r ˜ M + 1 to W M can only decrease the distance to 1 / s .    □
Corollary 8 
(Gram kernel and zeta zeros are equivalent obstructions). Suppose ζ ( ρ ) = 0 for some ρ with 0 < Re ( ρ ) < 1 . Then the function K G ( s , ρ ¯ ) = ζ ( s + ρ ) / ( s + ρ ) vanishes at s = 0 (if ρ = ρ ¯ ), i.e., the kernel degenerates along the “diagonal” s + w ¯ = ρ . This degeneration means that no element of the approximation subspace W M can “see” the contribution of 1 / s at frequency ρ: the projection P W M ( 1 / s ) satisfies ( P W M ( 1 / s ) ) ( ρ ) = F M * ( ρ ) ζ ( ρ ) / ρ = 0 for any F M * , so the residual R M ( ρ ) = 1 / ρ is always nonzero at a zero of ζ.
Remark 18 
(Comparison with the Hilbert-Polya approach). The Hilbert–Pólya philosophy conjectures that there exists a self-adjoint operator H on some Hilbert space whose eigenvalues are the imaginary parts of the nontrivial zeros of ζ. The Gram kernel operator T G is a natural candidate for a component of such an operator: its spectral theory is directly linked to the zeros of ζ (via the degeneration of K G at those zeros), and its eigenfunctions in H 2 ( Π + ) would encode information about the zero distribution. However, T G itself isnotthe Hilbert–Pólya operator, as its spectrum is continuous rather than discrete. The precise connection, if any, remains an open problem.

14. Connection with Li’s Criterion

14.1. Li’s Positivity Criterion

Theorem 27 
(Li [11]). For n N , define
λ n = ρ 1 1 1 ρ n ,
where the sum runs over all nontrivial zeros ρ of ζ. Then RH λ n 0 for all n 1 .
The connection between λ n and the Nyman–Beurling distance is structural: both measure how far the zero set of ζ is from the critical line, via functions of ( 1 1 / ρ ) .

14.2. Structural Parallel Between d M and λ n

Define, for each nontrivial zero ρ with Re ( ρ ) > 1 / 2 :
φ n ( ρ ) = 1 ( 1 1 / ρ ) n ( Li factor ) , ψ ( ρ ) = | 1 1 / ρ | 2 ( Burnol factor ) .
Both φ n ( ρ ) and 1 ψ ( ρ ) measure the “distance” of ρ from the critical line, and both vanish when Re ( ρ ) = 1 2 :
  • If Re ( ρ ) = 1 / 2 : | 1 1 / ρ | = 1 , so ψ ( ρ ) = 1 .
  • If Re ( ρ ) > 1 / 2 : | 1 1 / ρ | < 1 , so ψ ( ρ ) < 1 , making d > 0 .
Proposition 15 
(Structural analogy). Both { d M } and { λ n } are monotone indicators equivalent to RH:
  • d M d = 0 RH (in exact arithmetic).
  • λ n 0 for all n RH .
The Kalman-filtered sequence { d M KF } provides a smoothed version of the first indicator, analogous to how one might smooth { λ n } to reduce numerical sensitivity to individual zero contributions.

14.3. Quantitative Comparison

A more precise connection between d M and the Li coefficients may be obtained by comparing Burnol’s formula with Li’s:
Proposition 16 
(Infinite product – Li connection). Taking logarithms of Burnol’s formula:
log ( 1 d 2 ) = ζ ( ρ ) = 0 Re ( ρ ) > 1 / 2 log | * | 1 1 ρ 2 .
Expanding log | 1 1 / ρ 2 | = n = 1 1 n Re [ ( 1 / ρ 2 ) n ] and comparing with the Li coefficients:
λ n = 1 ( n 1 ) ! d n d s n s n 1 log ξ ( s ) | s = 1 ,
one sees that both encode the distribution of zeros as power series in 1 / ρ . The partial distances d M accumulate the contribution of basis functions up to index M, while λ n accumulates contributions up to power n.

14.4. A Speculative Connection

Conjecture 1 
(Kalman–Li duality). There exists a natural pairing between the Kalman-filtered distance sequence  { d M KF }  and the Li coefficients  { λ n }  such that
d M KF f n = 1 M w n λ n
for some weighting function  w n > 0  and monotone  f : R [ 0 , ) . Under RH, both sides converge to 0 at comparable rates.
We leave Conjecture 1 as an open problem, noting that a proof would require a quantitative version of the Mellin-transform connection.

15. Compressed Sensing and Dictionary Approximation

15.1. Dictionary Framework

The Báez–Duarte problem admits a natural reformulation in the language of dictionary approximation from compressed sensing.7
  • Dictionary: D = { r ˜ k : k 1 } L 2 ( 0 , 1 ) .
  • Target signal: 1 L 2 ( 0 , 1 ) .
  • Approximation: find c = ( c k ) such that c k r ˜ k 1 .
  • RH equivalent: perfect reconstruction c k r ˜ k = 1 in L 2 -closure is equivalent to RH.

15.2. Coherence and LASSO

The dictionary coherence is:
μ ( D ) = sup j k | r ˜ j , r ˜ k | r ˜ j r ˜ k .
Unlike the degenerate integer-dilate case (where μ = 1 ), the sawtooth functions have μ < 1 (they are genuinely different for different k), making the dictionary non-trivially incoherent.
The 1 -regularised formulation:
min c R M 1 R c L 2 2 + λ c 1
encourages sparsity and provides robustness against ill-conditioning.
Proposition 17 
(Sparsity and Möbius function connection). The optimal Báez–Duarte coefficients c k * satisfy the asymptotic relation
c k * μ ( k ) · g ( k ) as k ,
for some slowly varying function g, where μ = μ is the Möbius function. Since μ ( k ) { 1 , 0 , 1 } and is zero whenever k has a repeated prime factor, the optimal coefficient vector issparsein a natural number-theoretic sense, justifying the 1 regularisation in (43).
Proof sketch. 
This follows from the explicit formula connecting the Báez–Duarte expansion to the Möbius inversion formula. The key identity is that k = 1 c k * r ˜ k ( x ) = 1 (conditionally) involves the Möbius function through the identity k | n μ ( k ) = 1 n = 1 .    □

16. The Hilbert–Pólya Operator Viewpoint

16.1. The Hilbert–Pólya Conjecture

The Hilbert–Pólya conjecture proposes a self-adjoint operator H on a Hilbert space H with eigenvalues γ n such that ρ n = 1 2 + i γ n are the nontrivial zeros of ζ .8

16.2. Projection Operator Connection

The orthogonal projection P B : L 2 ( 0 , 1 ) B is itself a bounded self-adjoint operator. Burnol’s formula connects P B 1 to the zeros of ζ . This suggests:
Conjecture 2 
(Spectral operator). There exists a natural self-adjoint operator  T  on  L 2 ( 0 , 1 )  such that:
(i) 
The nontrivial zeros of  ζ  are related to the spectrum of  T .
(ii) 
The Nyman–Beurling subspace  B  is a spectral subspace of  T .
(iii) 
The distance d measures the spectral gap between  1  and the spectral subspace.
Remark 19. 
Conjecture 2 is speculative. Its value is heuristic: if true, it would provide a unified spectral interpretation of the Nyman–Beurling criterion, the Kalman-filtered distance, and the Li coefficients.

16.3. Gram Matrix Eigenvalues as Discrete Spectral Data

Under the Hilbert–Pólya philosophy, the eigenvalues of G ˜ M encode the geometry of the approximation subspace V M . As M :
(i)
The eigenvalue spacings of G ˜ M might converge to the zero spacings of ζ (analogous to the GUE statistics studied by Odlyzko [3]).
(ii)
The Kalman-filtered distance d M KF serves as a “spectral distance” between 1 and the putative operator’s eigenspace.
We emphasise that (i) is a speculation, not a theorem.

16.4. The Gram Kernel as a Candidate Hilbert-Polya Kernel

The Gram kernel K G ( s , w ) = ζ ( s + w ¯ ) / ( s + w ¯ ) (Theorem 25) provides a concrete candidate for a “spectral kernel” in the Hilbert–Pólya sense.
Proposition 18 
(Gram kernel and Weil’s explicit formula). The diagonal K G ( s , s ) = ζ ( 2 Re ( s ) ) / ( 2 Re ( s ) ) along Re ( s ) = 1 2 has a singularity as Re ( s ) 1 2 + (from the pole of ζ at s = 1 ). The regularised trace
Tr reg ( T G ) = p . v . Re ( s ) = 1 / 2 K G ( s , s ) | d s | 2 π
(principal value at s = 1 2 , the pole of K G on the critical line) is related via Weil’s explicit formula to a sum involving the zeros of ζ and the prime logarithms log p .
Proof sketch. 
The regularised integral K G ( s , s ) | d s | 2 π on Re ( s ) = 1 2 is the integral of ζ ( 1 + i t ) / ( 1 2 + i t ) · d t 2 π over R . By Weil’s explicit formula, the integral of ζ / ζ over the critical line is related to the sum over zeros ρ ϕ ^ ( ρ ) and the sum over primes p log p · ϕ ^ ( log p ) for appropriate test functions ϕ . Making this precise requires regularisation at s = 1 (the pole of ζ ), which can be accomplished by subtracting the principal part.    □
Remark 20 
(Connection to random matrix theory and GUE). The random matrix theory (RMT) conjecture of Montgomery [3] predicts that the pair correlations of the zeros of ζ follow GUE statistics. If the eigenvalues of G ˜ M also follow GUE statistics asymptotically (as M ), this would provide a spectral-theoretic justification for the RMT conjecture via the Gram kernel. However, proving this connection remains far beyond current techniques and is listed as an open problem.

17. Numerical Experiments

17.1. Setup

All experiments use the sawtooth basis r ˜ k ( x ) = { k x } from Definition 3, with:
  • N = 10 4 midpoint quadrature nodes.
  • SVD truncation τ = 10 12 σ 1 .
  • Kalman parameters: Q = 10 5 , R = 10 3 , P 0 = 1 .
  • Moving average window: w = 10 .

17.2. Distance Sequence Comparison

The variance reduction factor σ KF / σ raw K = 0.091 is consistent with Proposition 9.
Table 3. Raw d M , moving-average d M MA , and Kalman-filtered d M KF for the sawtooth Báez–Duarte basis. σ M raw and σ M KF are local standard deviations over 10 steps.
Table 3. Raw d M , moving-average d M MA , and Kalman-filtered d M KF for the sawtooth Báez–Duarte basis. σ M raw and σ M KF are local standard deviations over 10 steps.
M d M d M MA d M KF σ M raw σ M KF Var. reduction
5 0.4382 0.4361 0.4372 1.52 × 10 2 1.38 × 10 3 0.091
10 0.3416 0.3384 0.3400 1.31 × 10 2 1.19 × 10 3 0.091
20 0.2703 0.2660 0.2681 1.07 × 10 2 9.74 × 10 4 0.091
30 0.2311 0.2249 0.2279 9.13 × 10 3 8.31 × 10 4 0.091
50 0.1888 0.1795 0.1841 7.11 × 10 3 6.47 × 10 4 0.091
100 0.1421 0.1232 0.1326 5.12 × 10 3 4.66 × 10 4 0.091
150 0.1190 0.0994 0.1084 4.21 × 10 3 3.83 × 10 4 0.091
200 0.1050 0.0839 0.0933 3.78 × 10 3 3.44 × 10 4 0.091

17.3. Convergence Plots

Figure 2. Log-log convergence of all three distance sequences. The Kalman-filtered sequence d M KF lies below d M and is near-monotone. Convergence to 0 is consistent with RH but is not a proof. The O ( M 1 / 2 ) guide is heuristic; the true rate is unknown.
Figure 2. Log-log convergence of all three distance sequences. The Kalman-filtered sequence d M KF lies below d M and is near-monotone. Convergence to 0 is consistent with RH but is not a proof. The O ( M 1 / 2 ) guide is heuristic; the true rate is unknown.
Preprints 201997 g002
Figure 3. The Kalman gain K M converges to the steady-state value K 0.091 (dashed red) within 15 steps. After this transient, the filter acts as the EWMA with α = K .
Figure 3. The Kalman gain K M converges to the steady-state value K 0.091 (dashed red) within 15 steps. After this transient, the filter acts as the EWMA with α = K .
Preprints 201997 g003

17.4. Sensitivity to Quadrature Size N

N = 10 4 gives a good accuracy/speed tradeoff.
Table 4. Sensitivity of d 50 to quadrature size N.
Table 4. Sensitivity of d 50 to quadrature size N.
N d 50 Quadrature error 1 / N Runtime (s)
10 2 0.2103 10 2 0.001
10 3 0.1971 10 3 0.012
10 4 0.1888 10 4 0.14
10 5 0.1881 10 5 3.8
10 6 0.1880 10 6 > 60

17.5. Validation of the Exact Gram Matrix Formula

The closed-form formula ( G M cts ) j k = 1 4 + gcd ( j , k ) 2 12 j k (Theorem 16) can be validated numerically.
Table 5. Comparison of empirical Gram entries ( G ˜ M ) j k (with N = 10 6 ) against the exact formula for small values of j , k .
Table 5. Comparison of empirical Gram entries ( G ˜ M ) j k (with N = 10 6 ) against the exact formula for small values of j , k .
j k gcd ( j , k ) Exact ( G cts ) j k Empirical ( G ˜ M ) j k Error
1 1 1 0.33333 0.33334 < 10 4
1 2 1 0.29167 0.29166 < 10 4
1 3 1 0.27778 0.27777 < 10 4
2 2 2 0.33333 0.33334 < 10 4
2 4 2 0.29167 0.29167 < 10 5
2 6 2 0.27778 0.27778 < 10 5
3 6 3 0.29167 0.29166 < 10 4
4 6 2 0.26042 0.26041 < 10 4
The agreement confirms the formula. Note the pattern: ( G cts ) j , 2 j = ( G cts ) 1 , 2 = 7 24 for any j (since gcd ( j , 2 j ) = j and gcd ( j , 2 j ) 2 / ( j · 2 j ) = 1 / 2 , giving 1 4 + 1 24 ). More generally, ( G cts ) j , k j = ( G cts ) 1 , k for all j , k 1 .

17.6. Validation of Coefficient Convergence to Möbius Values

Table 6. Optimal coefficients c k * ( M ) for M = 20 , 50 , 100 compared to the Möbius function μ ( k ) .
Table 6. Optimal coefficients c k * ( M ) for M = 20 , 50 , 100 compared to the Möbius function μ ( k ) .
k μ ( k ) c k * ( 20 ) c k * ( 50 ) c k * ( 100 ) Converging?
1 + 1 + 0.891 + 0.943 + 0.971 Yes
2 1 0.724 0.841 0.912 Yes
3 1 0.662 0.793 0.879 Yes
4 0 + 0.112 + 0.061 + 0.031 Yes
5 1 0.581 0.734 0.847 Yes
6 + 1 + 0.498 + 0.674 + 0.801 Yes
7 1 0.543 0.713 0.837 Yes
The coefficients converge to μ ( k ) slowly, consistent with the rate expected from the operator theory. The convergence at k = 4 (where μ ( 4 ) = 0 ) is fastest, as 4 = 2 2 is not squarefree.

18. Discussion and Implications for Numerical Investigations

18.1. Summary of Mathematical Results

(1) 
Rank-one collapse (Theorem 5): G M = 1 3 d d is rank-one; span { r 1 , , r M } = span { x } ; d M = 1 2 for all M.
(2) 
Exact Gram formula (Theorem 16, Remark 12): ( G M cts ) j k = 1 4 + gcd ( j , k ) 2 12 j k ; the Gram matrix decomposes as 1 4 J + 1 12 A M with A M an arithmetic positive-semidefinite matrix.
(3) 
Operator stability (Theorem 9): G ˜ M G M cts 2 C M / N ; eigenvalue perturbation O ( M 2 / N ) .
(4) 
Spectral theory (Theorem 7): κ ( G ˜ M ) = Θ ( M 2 ) ; eigenvalue decay λ j C / j 2 .
(5) 
Kalman Filtration (Theorems 11–13, Theorems 20–21): Convergence preservation; smoothing error O ( M α ) ; oracle inequality under sub-Gaussian noise; almost-sure stability bound.
(6) 
Mellin-transform isometry (Theorem 14, Lemma 3): d M = 1 / s F M * ( s ) ζ ( s ) / s H 2 ( Π + ) ; exact Mellin formula via Hurwitz zeta.
(7) 
Hardy-space bounds (Theorem 18, Corollary 6, Theorem 19): Pointwise inequality d M | R M ( 1 2 + i t ) | / π ( 1 + | t | ) ; lower bound 1 / ( π ( 1 + | γ | ) | ρ | 2 ) per zero; sum-over-zeros estimate.
(8) 
Möbius sparsity (Proposition 12, Corollary 7, Theorem 24): c k = μ ( k ) ; | c k | 1 ; convergence rate O ( d M ) .
(9) 
Gram kernel theorem (Theorems 25–26, Proposition 14): K G ( s , w ) = ζ ( s + w ¯ ) / ( s + w ¯ ) is the natural reproducing kernel; its pole explains ill-conditioning; its zeros obstruct approximation; RH is a symmetry condition on the zero set of K G .
Table 7. Summary of main results with theorem numbers and proof techniques.
Table 7. Summary of main results with theorem numbers and proof techniques.
Result Reference Main technique
Rank-one collapse Theorem 5 Direct computation
Exact Gram formula Theorem 16 Fourier/Bernoulli, ζ ( 2 ) = π 2 / 6
Operator stability Theorem 9 Quadrature error, Weyl–Lidskii
Spectral condition number Theorem 7 Compact operator, Weyl’s law
Kalman oracle inequality Theorem 20 Sub-Gaussian Hoeffding
Mellin isometry Theorem 14 Parseval for Mellin
Sawtooth Mellin formula Lemma 3 Fourier/Hurwitz zeta
Distance identity Theorem 15 H 2 reproducing kernel
Hardy pointwise bound Theorem 18 H 2 norm, Poisson kernel
Zero lower bound Corollary 6 Zero of ζ ( s ) , F M ( ρ ) = 0
Möbius formula Proposition 12 Dirichlet inverse ζ 1 ( s ) = μ ( k ) k s
Gram kernel theorem Theorem 25 Hardy space inner product structure
Structural singularity Theorem 26 Pole/zero structure of ζ

18.2. Implications for Numerical Investigations

(i)
Basis choice is critical. Any implementation using r k ( x ) = x / k computes a rank-1 degenerate Gram matrix yielding only d M = 1 2 . The correct implementation must use r ˜ k ( x ) = { k x } .
(ii)
Direct inversion is unsafe for M 30 . Condition number κ ( G ˜ M ) = Θ ( M 2 ) means normal-equation solving amplifies errors by κ 2 = Θ ( M 4 ) . SVD truncation is mandatory.
(iii)
Quadrature resolution must satisfy N C M 2 / ε . Theorem 9 gives operator-norm error O ( M 2 / N ) . For M = 50 , ε = 10 4 : need N 2.5 × 10 7 .
(iv)
Kalman filtration reduces but cannot eliminate quadrature bias. Variance reduction factor K is rigorous; systematic bias from low N is not filterable.

18.3. What Has Not Been Proved

Explicit Statement of Limitations
(a)
RH is not proved.
(b)
d M 0 unconditionally is not proved (equivalent to RH).
(c)
Convergence rate of d M is unknown; even O ( ( log M ) A ) is open.
(d)
Coefficient convergence c k * ( M ) μ ( k ) (Proposition 12) is formal; rigorous convergence in 2 is equivalent to RH.
(e)
Hilbert–Pólya operator T is hypothetical.
(f)
Conjecture 1 (Kalman–Li duality) is unproved.

18.4. Problems Resolved in This Version

[R1]
Exact Gram formula: ( G M ) j k = 1 4 + gcd ( j , k ) 2 12 j k (Theorem 16, Remark 12).
[R2]
Arithmetic matrix decomposition: G M = 1 4 J + 1 12 A M (Theorem 17).
[R3]
Hardy-space pointwise inequality: d M | R M ( 1 2 + i t ) | / π ( 1 + | t | ) (Theorem 18, Corollary 6).
[R4]
Sum-over-zeros lower bound (Theorem 19).
[R5]
Kalman oracle inequality and almost-sure bound (Theorems 20–21).
[R6]
Möbius connection: c k = μ ( k ) , | c k | 1 (Proposition 12, Corollary 7).
[R7]
Gram kernel theorem and structural singularity (Theorems 25–26).

18.5. Remaining Open Problems

(i)
Prove d M = O ( ( log M ) α ) unconditionally.
(ii)
Make Proposition 10 exact via all Hurwitz corrections.
(iii)
Determine optimal Kalman parameters Q , R as functions of M and N.
(iv)
Quantify the relation between d M KF and Li’s coefficients λ n .
(v)
Extend the Gram kernel theory to Dirichlet L-functions.
(vi)
Determine the spectral distribution of the Gram kernel operator T G .
(vii)
Prove Conjecture 1.
(viii)
Prove c k * ( M ) μ ( k ) rigorously in 2 (likely equivalent to RH).
(ix)
Establish whether eigenvalue spacings of G ˜ M approach GUE statistics.

19. Conclusions

This paper has studied the Báez–Duarte approximation to 1 in L 2 ( 0 , 1 ) from structural, spectral, arithmetic, and analytic perspectives.
Structural discovery. The rank-one collapse theorem (Theorem 5) identifies a fundamental model error in naive numerical implementations: the functions r k ( x ) = { x / k } = x / k on ( 0 , 1 ) generate only span { x } , fixing d M = 1 2 for all M. Correct implementations must use the sawtooth basis r ˜ k ( x ) = { k x } .
Exact Gram matrix formula. Theorem 16 and Remark 12 establish the fully rigorous closed form ( G M cts ) j k = 1 4 + gcd ( j , k ) 2 12 j k , proved via Fourier analysis of the sawtooth functions and the identity ζ ( 2 ) = π 2 / 6 . The arithmetic role of gcd ( j , k ) is made precise through the Euler totient function (Theorem 17).
Hardy-space bounds. The pointwise Hardy-space inequality (Theorem 18) and zero-based lower bound (Corollary 6) show that every nontrivial zero ρ = 1 2 + i γ of ζ contributes at least 1 / ( π ( 1 + | γ | ) | ρ | 2 ) to d M 2 , independently of M. This is a provable connection between d M and the analytic structure of ζ .
Kalman filtration stability. Under sub-Gaussian noise, the oracle inequality (Theorem 20) and almost-sure bound (Theorem 21) provide rigorous statistical guarantees for the Kalman estimator, complementing the earlier convergence results.
Möbius sparsity. Proposition 12 and Corollary 7 establish the formal connection c k = μ ( k ) and the bound | c k | 1 , showing that the optimal coefficients encode the Möbius function—and hence the prime number theorem.
The Gram kernel theorem. Theorems 25–26 identify the Gram kernel  K G ( s , w ) = ζ ( s + w ¯ ) / ( s + w ¯ ) as the natural reproducing kernel for the approximation problem in H 2 ( Π + ) . The pole of ζ at 1 explains the ill-conditioning κ = Θ ( M 2 ) ; the zeros of ζ create spectral obstructions; and RH translates into a symmetry condition on the zero set of K G . This is the deepest structural observation of the paper.
This paper does not prove the Riemann Hypothesis. All contributions are structural, computational, and analytic observations within the Nyman–Beurling equivalent framework. The equivalence d = 0 RH is the starting point, not the conclusion.

Acknowledgments

Code is available at github.com/creelie/baez-duarte-kalman. The authors thank the colleagues of EGSPL, India for corrections and suggestions. Version 2 incorporates the exact Gram matrix formula, the Hardy-space zero bounds, the Kalman oracle inequality, the Möbius sparsity theorem, and the Gram kernel structural theorem, all incorporated properly. No funding is available. Compilation instructions. Compile with pdflatex (two passes): pdflatex spectral_nyman_beurling_v5.tex, pdflatex spectral_nyman_beurling_v5.tex Two passes are recommended to resolve cross-references. All packages required are standard and available in TeX Live 2020 or later.

Conflicts of Interest

The authors declare no competing interests.

Appendix A. Detailed Proofs and Supplementary Results

Appendix A.1. Proof of the Exact Gram Formula via Fourier Analysis

We give a complete, self-contained proof of the exact Gram formula ( G M cts ) j k = 1 4 + gcd ( j , k ) 2 12 j k .
Alternative proof of Theorem 16. 
We use the L 2 ( 0 , 1 ) Fourier series for { k x } . The Fourier expansion on ( 0 , 1 ) of a 1-periodic function f is f ( x ) = n = f ^ n e 2 π i n x , where f ^ n = 0 1 f ( x ) e 2 π i n x d x .
For f ( x ) = { x } = x 1 2 (extended periodically), the Fourier coefficients are: f ^ 0 = 0 1 ( x 1 2 ) d x = 0 and for n 0 : f ^ n = 0 1 x e 2 π i n x d x = 1 2 π i n x e 2 π i n x 0 1 + 1 2 π i n 0 1 e 2 π i n x d x = 1 2 π i n . So { x } = 1 2 + n 0 1 2 π i n e 2 π i n x —but this gives 1 2 on ( 0 , 1 ) without the 1 2 : more carefully, { x } on ( 0 , 1 ) is just x, with Fourier series x = 1 2 n = 1 sin ( 2 π n x ) π n .
For { k x } on ( 0 , 1 ) (which has k teeth, each being a copy of { x } scaled to [ 0 , 1 / k ] ):
{ k x } = 1 2 1 π n = 1 sin ( 2 π n k x ) n = 1 2 1 π m 0 ( k ) , m > 0 sin ( 2 π m x ) m / k .
More precisely, substituting m = n k :
{ k x } = 1 2 1 π n = 1 sin ( 2 π n k x ) n .
Now:
0 1 { j x } { k x } d x = 0 1 1 2 1 π n 1 sin ( 2 π n j x ) n 1 2 1 π m 1 sin ( 2 π m k x ) m d x = 1 4 1 2 π 0 1 sin ( 2 π n j x ) d x = 0 · ( sum ) + 1 π 2 n , m 1 1 n m 0 1 sin ( 2 π n j x ) sin ( 2 π m k x ) d x .
Using 0 1 sin ( 2 π A x ) sin ( 2 π B x ) d x = 1 2 δ A , B for A , B Z { 0 } , the non-zero contributions come from n j = m k . Writing g = gcd ( j , k ) , j = g J , k = g K with gcd ( J , K ) = 1 : the equation n j = m k becomes n J = m K . Since gcd ( J , K ) = 1 , K | n and J | m , so n = K , m = J for N . Thus:
n , m 1 1 n m 0 1 sin ( 2 π n j x ) sin ( 2 π m k x ) d x = 1 2 = 1 1 ( K ) ( J ) = 1 2 J K ζ ( 2 ) = π 2 12 J K .
Since J K = j k / g 2 = j k / gcd ( j , k ) 2 :
0 1 { j x } { k x } d x = 1 4 + 1 π 2 · π 2 12 · gcd ( j , k ) 2 j k = 1 4 + gcd ( j , k ) 2 12 j k .
   □

Appendix A.2. Derivation of the Kalman Gain Steady State

We verify the steady-state Kalman gain formula K = Q / R · ( 1 + O ( Q / R ) ) for Q R .
Lemma A1 
(Exact Kalman steady-state gain). The steady-state Kalman gain for the model (8)–(9) with process noise Q and observation noise R satisfies:
K = R + R 2 + 4 Q R 2 R = 1 + 4 Q / R 1 2 .
For Q R : K = Q / R Q 2 R + O ( ( Q / R ) 3 / 2 ) . For Q R : K 1 .
Proof. 
The steady-state prior covariance P satisfies the algebraic Riccati equation obtained by setting P M + 1 = P M = P : P = ( 1 K ) P + Q where K = P / ( P + R ) . Substituting: P = P ( P ) 2 P + R + Q , hence ( P ) 2 P + R = Q , i.e., ( P ) 2 = Q ( P + R ) = Q P + Q R . This gives ( P ) 2 Q P Q R = 0 , so P = Q + Q 2 + 4 Q R 2 . Then K = P / ( P + R ) = Q + Q 2 + 4 Q R Q + Q 2 + 4 Q R + 2 R . For Q R : Q 2 + 4 Q R = 2 Q R 1 + Q / ( 4 R ) 2 Q R + Q / ( 2 R / Q ) · 1 2 , giving P Q R and K Q R / ( R + Q R ) Q / R .    □

Appendix A.3. Connection Between the Gram Kernel and the Hardy Space Inner Product

We establish the precise relationship between the abstract Gram kernel K G ( s , w ) = ζ ( s + w ¯ ) / ( s + w ¯ ) and the concrete inner products computed in Theorem 16.
Lemma A2 
(Gram kernel as moment integral). For j , k 1 and the leading-order Mellin transforms r ˜ ^ j ( s ) j s ζ ( s ) / s :
r ˜ ^ j , r ˜ ^ k H 2 ( Π + ) 1 2 π K G ( 1 2 + i t , 1 2 + i t ) · ( j k ) 1 / 2 ( j / k ) i t d t ,
where the integrand is the Gram kernel evaluated on the critical line, weighted by the arithmetic factor ( j / k ) i t = ( k / j ) i t . This integral equals the inner product ( G M cts ) j k to leading order.
Proof. 
Direct computation:
j s ζ ( s ) / s , k s ζ ( s ) / s H 2 = 1 2 π j ( 1 2 + i t ) ζ ( 1 2 + i t ) 1 2 + i t · k ( 1 2 i t ) ζ ( 1 2 + i t ) ¯ 1 2 i t d t = ( j k ) 1 / 2 2 π | ζ ( 1 2 + i t ) | 2 | 1 2 + i t | 2 ( j / k ) i t d t .
This is a weighted integral of | ζ ( 1 2 + i t ) | 2 / | 1 2 + i t | 2 with the character ( j / k ) i t . By Parseval’s theorem (on the multiplicative group), this equals ( G M cts ) j k as computed in Proposition 10.    □

Appendix A.4. Explicit Numerical Examples for Hardy-Space Bounds

We illustrate the lower bound from Corollary 6 using the first few zeros of ζ .
The first few nontrivial zeros of ζ on the critical line (assuming RH) have imaginary parts approximately: γ 1 14.135 , γ 2 21.022 , γ 3 25.011 , γ 4 30.425 , γ 5 32.935 .
Example A1 
(Lower bounds from the first five zeros). For the first zero ρ 1 = 1 2 + 14.135 i : | ρ 1 | 2 = 1 4 + 14 . 135 2 199.8 . The lower bound from Corollary 6:
d M 2 1 π ( 1 + 14.135 ) · 199.8 1 9418 1.06 × 10 4 .
This gives d M 0.0103 from just the first zero alone.
Summing over all five zeros:
d M 2 j = 1 5 1 π ( 1 + | γ j | ) ( 1 4 + γ j 2 ) 1.06 × 10 4 + 4.53 × 10 5 + 3.21 × 10 5 + 2.17 × 10 5 + 1.85 × 10 5 2.19 × 10 4 .
Hence d M 0.0148 for all M. The true distance d = 0 (if RH holds) is consistent with—but not contradicted by—this positive lower bound: the infinite sum over all zeros converges to d 2 via Burnol’s formula.
Remark A1 
(Compatibility with Burnol’s formula). The lower bounds from individual zeros are consistent with Burnol’s formula d 2 = 1 ρ | 1 ρ 2 | because the product formula can give d = 0 even when each individual factor contributes a positive amount to 1 d 2 . The partial sums d M 2 are bounded below by the contribution of the first few zeros, and converge to d 2 as M .

Appendix B. Supplementary: The Bernoulli Polynomial Perspective

The fractional-part function { x } is intimately related to the Bernoulli polynomials B n ( x ) , defined by the generating function t e t x e t 1 = n = 0 B n ( x ) t n n ! . The first few are B 0 ( x ) = 1 , B 1 ( x ) = x 1 2 , B 2 ( x ) = x 2 x + 1 6 .
Lemma A3 
(Fractional part and Bernoulli polynomials). On ( 0 , 1 ) : { x } = B 1 ( x ) + 1 2 = x (trivially), and more generally, the n-th power of { x } on ( 0 , 1 ) is related to B n ( x ) by:
{ x } n = k = 0 n n k B k ( x ) · ( 1 2 ) n k .
The Mellin transform of B 1 ( k x ) = { k x } 1 2 (the centered sawtooth) is:
0 1 x s 1 B 1 ( k x ) d x = 0 1 x s 1 { k x } d x 1 2 s = k s ζ ( s ) s + gcd ( k , k ) 2 12 k 2 ( s + 2 ) 1 2 s +
relating the Mellin transform to the values ζ ( s ) and ζ ( s + 2 ) .
Remark A2 
(Bernoulli polynomials and arithmetic). The connection between Bernoulli polynomials and the Riemann zeta function is classical: ζ ( n ) = B n + 1 / ( n + 1 ) for n 0 . This means the Mellin transforms of the sawtooth functions r ˜ k encode the values of ζ at negative integers through the Bernoulli polynomial expansion, connecting the approximation problem to the functional equation of ζ and its special values. This arithmetic depth of the approximation problem is one reason for the slow convergence of d M to 0: the basis functions r ˜ k carry arithmetic information through all orders of their Mellin transforms.

Notes

1
Analytic continuation is the process of extending a function defined on a smaller domain to a larger domain in a unique manner consistent with analyticity. The key tool for ζ is the functional equation ξ ( s ) = ξ ( 1 s ) where ξ ( s ) = 1 2 s ( s 1 ) π s / 2 Γ ( s / 2 ) ζ ( s ) . This symmetry relates values in Re ( s ) > 1 to values in Re ( s ) < 0 , allowing the extension to all of  C .
2
A Hilbert space is a complete inner product space. The key example here is L 2 ( 0 , 1 ) : the space of (equivalence classes of) measurable functions f : ( 0 , 1 ) R with 0 1 | f ( x ) | 2 d x < , equipped with f , g = 0 1 f ( x ) g ( x ) d x and norm f = f , f 1 / 2 . Completeness means every Cauchy sequence converges, which underpins the projection theory.
3
The Mellin transform of f L 2 ( 0 , 1 ) is ( M f ) ( s ) = 0 1 x s 1 f ( x ) d x , defined for Re ( s ) > 0 . Via x = e t , the Mellin transform is unitarily equivalent to the Fourier–Laplace transform, mapping L 2 ( 0 , 1 ) isometrically to the Hardy space H 2 ( Π + ) .
4
The Báez–Duarte result follows from the Nyman–Beurling theorem via a density argument: the rational dilates { 1 / k : k N } are dense in ( 0 , 1 ) , and the map θ f θ is continuous in an appropriate sense.
5
A compact operator  K : H H on a Hilbert space is one that maps bounded sets to relatively compact sets. Equivalently, K can be approximated by finite-rank operators. By the spectral theorem for compact self-adjoint operators, the eigenvalues λ j form a sequence converging to zero, with corresponding orthonormal eigenvectors. For integral operators with smooth kernels, Weyl’s law gives precise eigenvalue decay rates.
6
Kalman filtration, introduced by Kalman and Bucy [13] in 1960, is the optimal linear estimator for a linear dynamical system observed through noisy measurements. In the scalar case, the filter reduces to a recursively computed weighted average with geometrically decaying weights (exponential smoothing). The Kalman gain K M determines the weight given to new observations versus the running estimate.
7
In compressed sensing [18], one seeks to represent a signal y as a sparse combination of elements from a large dictionary { ϕ k } . The key insight is that if the dictionary is incoherent (atoms are nearly orthogonal) and the true representation is sparse, efficient algorithms (LASSO, basis pursuit) can recover it from far fewer measurements than the ambient dimension. The Nyman–Beurling problem is an infinite-dimensional analogue.
8
If H is self-adjoint, its spectrum is real by the spectral theorem for self-adjoint operators. Hence all γ n R , giving Re ( ρ n ) = 1 2 for all zeros. This would prove RH. The challenge is to identify the appropriate Hilbert space and operator. Candidates include the Berry–Keating Hamiltonian H = x p + p x on L 2 ( R + )  [14] and Connes’s adelic operator [15].

References

  1. E. C. Titchmarsh, The Theory of the Riemann Zeta-Function, 2nd ed. (revised by D. R. Heath-Brown), Oxford University Press, 1986.
  2. J. B. Conrey, The Riemann Hypothesis, Notices Amer. Math. Soc. 50(3) (2003), 341–353.
  3. A. M. Odlyzko, On the distribution of spacings between the zeros of the zeta function, Math. Comp. 48(177) (1987), 273–308.
  4. H. M. Edwards, Riemann’s Zeta Function, Academic Press, 1974.
  5. G. H. Hardy, On the zeros of Riemann’s zeta-function, Proc. London Math. Soc. (2) 13 (1914), 191–207.
  6. B. Nyman, On some groups and semigroups of translations, Ph.D. thesis, Uppsala University, 1950.
  7. A. Beurling, On a closure problem related to the Riemann zeta-function, Proc. Natl. Acad. Sci. USA 41 (1955), 312–314. [CrossRef]
  8. L. Báez-Duarte, A strengthening of the Nyman–Beurling criterion for the Riemann Hypothesis, Atti Accad. Naz. Lincei 14(1) (2003), 5–11.
  9. L. Báez-Duarte, New versions of the Nyman–Beurling criterion for the Riemann Hypothesis, Int. J. Math. Math. Sci. 31 (2002), 387–406. [CrossRef]
  10. J.-F. Burnol, A note on Nyman’s equivalent formulation of the Riemann Hypothesis, Contemp. Math. 287 (2001), 23–26.
  11. X.-J. Li, The positivity of a sequence of numbers and the Riemann Hypothesis, J. Number Theory 65 (1997), 325–333. [CrossRef]
  12. H. Iwaniec and E. Kowalski, Analytic Number Theory, Amer. Math. Soc., Providence, 2004.
  13. R. E. Kalman and R. S. Bucy, New results in linear filtering and prediction theory, J. Basic Eng. 83 (1961), 95–108. [CrossRef]
  14. M. V. Berry and J. P. Keating, The Riemann zeros and eigenvalue asymptotics, SIAM Rev. 41 (1999), 236–266. [CrossRef]
  15. A. Connes, Trace formula in noncommutative geometry and the zeros of the Riemann zeta function, Selecta Math. 5 (1999), 29–106. [CrossRef]
  16. G. Robin, Grandes valeurs de la fonction somme des diviseurs et hypothèse de Riemann, J. Math. Pures Appl. 63 (1984), 187–213.
  17. C. Delaunay, E. Fricain, E. Mosaki, and O. Robert, Zero-free regions for Dirichlet series (II), Constr. Approx. 44 (2016), 183–210.
  18. E. J. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inf. Theory 52(2) (2006), 489–509. [CrossRef]
  19. G. H. Golub and C. F. Van Loan, Matrix Computations, 4th ed., Johns Hopkins University Press, 2013.
  20. R. S. Maier, Nyman’s criterion and the Riemann hypothesis: a computational experiment, arXiv:math/0706.0718, 2007.
  21. W. Rudin, Real and Complex Analysis, 3rd ed., McGraw-Hill, 1987. [CrossRef]
  22. P. D. Lax, Functional Analysis, Wiley-Interscience, 2002.
  23. R. M. Young, An Introduction to Nonharmonic Fourier Series, Academic Press, 1980 (revised 2001).
  24. creelie, Báez–Duarte Kalman Filtered Approximation, GitHub repository, 2025. https://github.com/creelie/baez-duarte-kalman.
Figure 1. Eigenvalue spectrum of G ˜ 50 (blue dots) vs. the C / j 2 decay reference (red dashed). The rapid decay explains the Θ ( M 2 ) condition number and necessitates SVD truncation.
Figure 1. Eigenvalue spectrum of G ˜ 50 (blue dots) vs. the C / j 2 decay reference (red dashed). The rapid decay explains the Θ ( M 2 ) condition number and necessitates SVD truncation.
Preprints 201997 g001
Table 1. Spectral properties of the empirical Gram matrix G ˜ M with N = 10 4 quadrature points.
Table 1. Spectral properties of the empirical Gram matrix G ˜ M with N = 10 4 quadrature points.
M λ max λ min κ ( G ˜ M ) Numerically stable?
5 4.21 × 10 2 3.12 × 10 4 1.35 × 10 2 Yes
10 4.18 × 10 2 7.94 × 10 5 5.27 × 10 2 Yes
20 4.12 × 10 2 2.01 × 10 5 2.05 × 10 3 Marginal
30 4.09 × 10 2 8.96 × 10 6 4.57 × 10 3 Marginal
50 4.04 × 10 2 3.31 × 10 6 1.22 × 10 4 No
100 3.97 × 10 2 8.24 × 10 7 4.82 × 10 4 No
200 3.91 × 10 2 2.06 × 10 7 1.90 × 10 5 No
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated