On the Hughes–Keating–O’Connell Conjecture: Entropy-Sieve Methods for Negative Moments of ζ′(ρ)

Rafik Zeraoulia

doi:10.20944/preprints202509.0372.v1

Submitted:

02 September 2025

Posted:

04 September 2025

You are already at the latest version

Abstract

We investigate the negative discrete moments of the derivative of the Riemann zeta function at its nontrivial zeros, focusing on the Hughes–Keating–O’Connell conjecture. Building on the earlier frameworks of Gonek, Milinovich–Ng, Kirila, and the recent breakthrough of Bui–Florea–Milinovich, we introduce a hybrid entropy–sieve method (ESM). This method refines Dirichlet-polynomial approximations by quantifying entropy of local distributions of $ D_X(\gamma) $ and controlling contributions from both small gaps and low-entropy blocks. Assuming the Riemann Hypothesis and standard pair-correlation conjectures, we prove the near-optimal conditional upper bound $ J_{-1}(T) \;=\; \sum_{0<\gamma\leq T} \frac{1}{|\zeta'(\rho)|^{2}} \;\ll\; T(\log T)^{\varepsilon}. $This matches, up to logarithmic factors, the conjectured order $ J_{-1}(T)\asymp T $, improving upon previous conditional bounds in the literature. Our approach complements the sieve and moment methods of Bui–Florea–Milinovich and the entropy-based large deviation heuristics of Harper, while introducing new tools such as a uniform Dirichlet-polynomial approximation with explicit coefficients and quantitative entropy-decay estimates. Beyond these results, the ESM framework highlights the utility of entropy techniques in analytic number theory, suggesting applications to related problems in L-function theory and random matrix models.

Keywords:

riemann zeta function

;

dirichlet polynomials

;

entropy bounds

;

cumulant factorization

;

negative moments

Subject:

Computer Science and Mathematics - Algebra and Number Theory

For the reader’s convenience, we summarize the main notation that will be used consistently throughout the paper. Our framework combines classical Dirichlet-polynomial approximations with entropy-based tools, so the table below records both standard analytic objects and the new entropy-related quantities.

Table 1. Summary of notation used throughout the paper.

General Notation
T	Height parameter for critical zeros of $ζ (s)$ ; we consider zeros $ρ = \frac{1}{2} + i γ$ with $0 < γ \leq T$ , counted with multiplicity.
$N (T)$	Number of zeros $ρ = \frac{1}{2} + i γ$ with $0 < γ \leq T$ .
$ρ$	A nontrivial zero of $ζ (s)$ , written as $ρ = \frac{1}{2} + i γ$ .
$E_{app}$	Exceptional set of zeros where the Dirichlet-polynomial approximation fails (Lemma 7).
$G$	Set of “good” zeros: $γ \notin E_{app}$ and outside any exceptional sieve/entropy set.
Dirichlet Polynomial Approximation
X	Length of Dirichlet polynomial; throughout we take $X = T^{α}$ with small fixed $0 < α < 1 / 100$ .
$D_{X} (γ)$	Main Dirichlet-polynomial approximant: $D_{X} (γ) = ℜ \sum_{n \leq X} a_{n} n^{- 1 / 2 - i γ}$ .
$a_{n}$	Dirichlet polynomial coefficients derived from the smoothed explicit formula .
$R_{X} (γ)$	Remainder term in Dirichlet-polynomial approximation of $\log \| ζ^{'} (ρ) \|$ .
$σ_{X}^{2}$	Variance of $D_{X} (γ)$ : $σ_{X}^{2} = \sum_{n \leq X} \frac{\| a_{n} \|^{2}}{n} \sim \log \log T$ (Lemma 1).
Moment Generating Function & Tail Estimates
$M (t)$	Moment generating function: $M (t) = \frac{1}{N (T)} \sum_{0 < γ \leq T} e^{t D_{X} (γ)}$ .
$κ_{r}$	r-th cumulant of $D_{X} (γ)$ , defined by $\log M (t) = \sum_{r \geq 1} κ_{r} t^{r} / r!$ .
$t_{0}$	Admissible range of t for MGF bounds: $t_{0} = c / \sqrt{\log \log T}$ .
$N_{-} (V; T)$	Lower-tail counting function: $N_{-} (V; T) = # {γ \leq T : - \log \| ζ^{'} (ρ) \| \geq V}$ .
V	Threshold parameter controlling the size of $- \log \| ζ^{'} (ρ) \|$ in tail estimates.
Entropy-Sieve Framework
$G (γ_{0})$	Local window of zeros near $γ_{0}$ used for entropy sampling.
$H_{h, Δ}^{val} (γ_{0})$	Local value-entropy of $D_{X} (γ)$ in a window $G (γ_{0})$ with bin-width h (Definition ??).
$H_{h_{g}, Δ}^{gap} (γ_{0})$	Local gap-entropy of normalized zero spacings near $γ_{0}$ .
$H_{0}$	Entropy threshold; zeros with entropy below $H_{0}$ belong to the exceptional low-entropy set.
$E_{ent}$	Exceptional set of zeros lying in low-entropy regions (see Lemma 3).
Moments and Sieve
$J_{k} (T)$	Discrete moment: $J_{k} (T) = \sum_{0 < γ \leq T} {\| ζ^{'} (ρ) \|}^{2 k}$ , defined for all $k \in R$ ; for $k < 0$ , finiteness implies no multiple zeros.
$J_{k}^{simp} (T)$	Same sum restricted to simple zeros (used in intermediate lemmas for clarity).
$c, C_{0}$	Absolute positive constants appearing in Gaussian and sieve bounds.

1. Introduction

Let

ζ (s)

denote the Riemann zeta function and

ρ = \frac{1}{2} + i γ

denote its nontrivial zeros. The size of the derivative

ζ^{'} (ρ)

at these zeros plays a central role in analytic number theory, with deep connections to the distribution of zeros, random matrix theory, and the moments of L-functions. For

k \in C

, we define the discrete moment

J_{k} (T) : = \sum_{0 < γ \leq T} {| ζ^{'} (ρ) |}^{2 k},

where the sum runs over all nontrivial zeros

ρ

of

ζ (s)

, counted with multiplicity. For

k < 0

this sum is finite only if every zero is simple, since a multiple zero would satisfy

ζ^{'} (ρ) = 0

and force

J_{k} (T) = + \infty

. Thus, proving upper bounds for

J_{k} (T)

in the negative range has direct implications for the simplicity of zeros.

1.1. Motivation and Conjectures

Understanding the asymptotic growth of

J_{k} (T)

has been the subject of considerable research. Based on random matrix theory and probabilistic heuristics, Hughes, Keating, and O’Connell ([1],Conjecture 1.7, p. 5) conjectured that for

ℜ (k) > - \frac{3}{2}

,

J_{k} (T) \sim \frac{G^{2} (2 + k)}{G (3 + 2 k)} a (k) \frac{T}{2 π} {(\log \frac{T}{2 π})}^{{(k + 1)}^{2}},

(1)

where

G (\cdot)

is the Barnes G-function and

a (k)

is an explicit arithmetic factor. In particular, for

k = - 1

, conjecture (1) predicts

J_{- 1} (T) ≍ T,

so the negative second moment is expected to be of the same order as the number of zeros up to height T.

1.2. State of the Art

For positive moments (

k \geq 0

), significant progress has been achieved:

Gonek ([2], p. 35) initiated the study of discrete moments of $ζ^{'} (ρ)$ and derived asymptotic formulas for $J_{1} (T)$ under the Riemann Hypothesis (RH).
Hejhal ([3], Section 3, pp. 343–370) studied the distribution of $\log | ζ^{'} (ρ) |$ and showed that it behaves approximately like a Gaussian with variance $≍ \log \log T$ , providing the foundation for later probabilistic approaches.
Kirila ([4], Theorem 1.1, pp. 2–4) obtained sharp upper bounds for positive moments by adapting Harper’s Dirichlet-polynomial techniques to sums over zeros:

$J_{k} (T) ≪_{k} N (T) {(\log T)}^{k (k + 2)},$

where $N (T)$ denotes the number of zeros up to height T.
Harper’s probabilistic method ([7], pp. 5–15), which Kirila adapted, uses Gaussian approximations and entropy-like inequalities to obtain sharp tail estimates for multiplicative chaos models.

These results match the predictions of the Hughes–Keating–O’Connell conjecture for

k > 0

.

For negative moments (

k < 0

), however, far less is known:

Gonek ([2], p. 36) derived conditional lower bounds for $J_{k} (T)$ when $k < 0$ but did not provide upper bounds.
Milinovich and Ng ([5], pp. 642–644) improved certain lower bounds for negative moments, using refined estimates of $ζ^{'} (ρ)$ in terms of the spacing of zeros.
Recently, Bui, Florea, and Milinovich ([18], Theorem 1.3, pp. 3–6) obtained conditional upper bounds for negative moments over a large subfamily of zeros, excluding a sparse exceptional set where $ζ^{'} (ρ)$ may be abnormally small. However, a full unconditional upper bound for $J_{k} (T)$ when $k < 0$ remains open.

1.3. Challenges for Negative Moments

The difficulty in establishing upper bounds for

J_{k} (T)

when

k < 0

stems from controlling the contribution of zeros where

ζ^{'} (ρ)

is exceptionally small. Since

J_{- 1} (T) = \sum_{0 < γ \leq T} \frac{1}{| ζ^{'} {(ρ) |}^{2}},

the dominant contribution arises from rare events in which

ζ^{'} (ρ)

is unusually tiny. Hejhal’s model ([3], Section 3) suggests that

\log | ζ^{'} (ρ) |

behaves like a Gaussian with variance

≍ \log \log T

, implying that very small derivatives are exponentially rare. However, making this rigorous for sums over zeros requires two ingredients:

Sharp Gaussian-type tail bounds for $\log | ζ^{'} (ρ) |$ , obtained by approximating it with a short Dirichlet polynomial and applying entropy-based large-deviation methods ([7], pp. 5–20).
Control over the set of exceptional zeros where the approximation fails or where $ζ^{'} (ρ)$ is extremely small, addressed via sieve-theoretic exclusion techniques as in ([18], Section 6).

1.4. Our Approach and Contributions

In this paper, we propose a hybrid analytic–probabilistic framework to tackle the upper bound for

J_{k} (T)

when

k < 0

, combining three main ingredients:

Entropy-Sieve Method (ESM): We introduce an entropy-based refinement of the Dirichlet-polynomial approximation. By quantifying the entropy of local distributions of $D_{X} (γ)$ values and zero gaps, we ensure that low-entropy regions form a negligible exceptional set. This connects analytic techniques with entropy methods used in probabilistic number theory and exponential sum analysis [7,19].
Sieve methods for exceptional zeros: Building on Bui, Florea, and Milinovich ([18], Section 6), we remove a negligible set of zeros where $ζ^{'} (ρ)$ is abnormally small, using pair-correlation and independence heuristics to bound their contribution. Our systematic discussion of parameter optimization (see Section 4) clarifies how $A, B, C, α$ can be tuned so that both $E_{app}$ and $E_{ent}$ are negligible.
Algorithmic tail truncation: We develop an entropy-driven tail-truncation procedure to efficiently control the extreme lower tail of $ζ^{'} (ρ)$ , ensuring that these rare events contribute less than any power of $\log T$ .

Using these tools, we establish, under RH and mild orthogonality hypotheses, the conditional bound

J_{- 1} (T) ≪ T {(\log T)}^{ε},

which matches the conjectured order up to a logarithmic factor. Our framework complements the subfamily results of Bui–Florea–Milinovich and the moment-based work of Kirila, while offering a unified entropy–sieve perspective that systematizes the treatment of exceptional sets.

1.5. Organization

We briefly summarize the logical structure of the paper. Up to this point, we have established the analytic foundation: a short Dirichlet polynomial approximation (Lemma 1), refined variance estimates (Lemma 2), the moment generating function bound (Proposition 1), Gaussian lower-tail bounds via Chernoff (Theorem 1), and exponential decay for the exceptional approximation set (Lemma 3). These ingredients will be combined with entropy and sieve methods to treat the negative moments of

ζ^{'} (ρ)

in a consistent framework, avoiding the divergences that arise if exceptional sets are not carefully controlled.

The remainder of the paper is organized as follows. Section 2 reviews previous results on positive and negative moments of

ζ^{'} (ρ)

, with particular emphasis on the conjectural framework of Hughes–Keating–O’Connell. In Section 4 we introduce the Entropy–Sieve Method (ESM), which strengthens Dirichlet-polynomial approximations of

\log | ζ^{'} (ρ) |

by incorporating entropy-based regularity, and thereby yields robust Gaussian large-deviation bounds. Section 5 develops the sieve-theoretic component, excluding low-entropy or small-gap exceptional sets where

ζ^{'} (ρ)

could be abnormally small. Finally, Section 6.9 combines these analytic and probabilistic tools to establish conditional upper bounds for

J_{k} (T) = \sum_{0 < γ \leq T} \frac{1}{| ζ^{'} (\frac{1}{2} + i γ) |^{2 k}}

in the critical range

k < 0

, including the key case

k = - 1

.

Main Results

Entropy–Sieve Framework. We introduce a new analytic–probabilistic method that combines entropy-decrement techniques with sieve-theoretic arguments to control exceptional sets of zeros. This framework provides a novel approach to bounding negative moments of $ζ^{'} (ρ)$ .
Conditional Upper Bound for Negative Moments. Assuming the Riemann Hypothesis and standard pair-correlation conjectures, we prove the near-optimal bound

$J_{- 1} (T) ≪ T {(\log T)}^{ε},$

for any $ε > 0$ , in agreement with the Hughes–Keating–O’Connell conjecture up to logarithmic factors.
Asymptotic Simplicity of Zeros in High-Entropy Blocks (Theorem 2). Under RH and uniform cumulant/MGF bounds, the proportion of multiple zeros within long blocks tends to zero as $T \to \infty$ . Hence, all but $o (N (T))$ zeros of the Riemann zeta function are simple.
Joint MGF Bounds (Proposition 3). The mixed moment generating function of Dirichlet approximants admits a uniform Gaussian bound with covariance $Σ_{X}$ , up to cubic error terms in ${(\log \log T)}^{3 / 2}$ .
Numerical and Structural Evidence. Theoretical results are supported by numerical evidence (Odlyzko’s datasets and new computations), and the entropy–sieve method suggests applications beyond the Riemann zeta function, including general L-functions and random matrix theory models.

2. Background

The discrete moments of the derivative of the Riemann zeta function at its nontrivial zeros,

J_{k} (T) = \sum_{0 < γ \leq T} {| ζ^{'} (ρ) |}^{2 k},

are central objects in analytic number theory. They provide insight into the distribution of

ζ^{'} (ρ)

, the spacing of the nontrivial zeros of

ζ (s)

, and the connections between the zeta function and random matrix theory. Understanding the asymptotic growth of

J_{k} (T)

has been the subject of extensive research over the past decades and is closely connected with one of the most refined conjectures in this area: the Hughes–Keating–O’Connell conjecture.

2.1. The Hughes–Keating–O’Connell Conjecture

Motivated by random matrix theory and probabilistic models, Hughes, Keating, and O’Connell proposed an explicit formula for

J_{k} (T)

in the regime

ℜ (k) > - \frac{3}{2}

. Their conjecture predicts that

J_{k} (T) \sim \frac{G^{2} (2 + k)}{G (3 + 2 k)} a (k) \frac{T}{2 π} {(\log \frac{T}{2 π})}^{{(k + 1)}^{2}},

(2)

where

G (\cdot)

denotes the Barnes G-function and

a (k)

is an explicit arithmetic factor arising from the Euler product.

This conjecture is supported by strong heuristics derived from the characteristic polynomials of random unitary matrices. In these models,

\log | ζ^{'} (ρ) |

behaves approximately like a Gaussian random variable, and Formula (2) reflects the matching asymptotics between the number-theoretic and random-matrix frameworks. A striking consequence appears when setting

k = - 1

, where the conjecture predicts

J_{- 1} (T) ≍ T .

Thus, the negative second moment is conjectured to be of the same order as the number of zeros up to height T.

2.2. Positive Moments

The case of positive moments,

k \geq 0

, is relatively well understood and has seen substantial progress over the last four decades. Gonek ([2], Theorem 1, p. 35) pioneered the study of discrete moments of

ζ^{'} (ρ)

, proving under the Riemann Hypothesis that for

k = 1

,

J_{1} (T) \sim \frac{T}{24 π} {(\log \frac{T}{2 π})}^{4} .

This result agrees with the prediction of (2) when

k = 1

and represented one of the earliest confirmations of the conjecture in a special case.

Hejhal ([3], Section 3, Theorem 3.1, pp. 343–370) advanced the probabilistic understanding of

ζ^{'} (ρ)

by studying the distribution of

\log | ζ^{'} (ρ) |

. He showed that, heuristically,

\log | ζ^{'} (ρ) |

behaves approximately like a Gaussian random variable with variance

σ^{2} ≍ \log \log T

. This probabilistic model suggested that extremely large or small values of

ζ^{'} (ρ)

are exponentially rare and laid the conceptual foundation for later entropy-based methods.

A major breakthrough came from Harper ([7], Theorem 2.1, pp. 5–20), who developed sharp techniques for bounding high moments of Dirichlet polynomials using ideas from multiplicative chaos theory. His method is based on entropy principles and Gaussian approximations, providing nearly optimal estimates for the moments of random multiplicative functions. Building on Harper’s framework, Kirila ([4], Theorem 1.1, pp. 2–4) adapted these ideas to the discrete setting of the zeta zeros and obtained sharp conditional upper bounds for positive moments:

J_{k} (T) ≪_{k} N (T) {(\log T)}^{k (k + 2)} (k > 0),

where

N (T)

denotes the number of zeros up to height T. These results are fully consistent with the random matrix predictions of the Hughes–Keating–O’Connell conjecture, providing strong evidence in favor of (2) for

k > 0

.

2.3. Negative Moments

In stark contrast to the positive regime, the behavior of

J_{k} (T)

for negative k remains largely mysterious. The primary challenge stems from the fact that negative moments are dominated by the contribution of zeros

ρ

where

| ζ^{'} (ρ) |

is extremely small. Controlling this contribution requires strong bounds on the lower tail of

\log | ζ^{'} (ρ) |

, a problem that has resisted classical techniques.

Early work by Gonek ([2], Theorem 2, p. 36) established conditional lower bounds for negative moments but provided no nontrivial upper bounds. Later, Milinovich and Ng ([5], Proposition 4.1, pp. 642–644) refined these lower bounds by relating

ζ^{'} (ρ)

to the spacing between consecutive zeros, but even these methods do not yield control over the full sum.

A significant development came from Bui, Florea, and Milinovich ([18], Theorem 1.3, pp. 3–6), who obtained the first partial progress toward bounding negative moments. By excluding a sparse exceptional set of zeros where

ζ^{'} (ρ)

is abnormally small, they proved conditional upper bounds for

J_{k} (T)

over a large subfamily of zeros. However, their results stop short of proving the full conjectured bound for

J_{- 1} (T)

or other negative moments over all zeros.

These contributions underline the difficulty of the negative moment problem: without precise control over extremely small values of

ζ^{'} (ρ)

, unconditional upper bounds remain out of reach. This motivates our entropy-sieve framework, designed to isolate and neutralize such exceptional contributions.

2.4. Summary

To summarize, positive moments of

ζ^{'} (ρ)

are now well understood, thanks to the interplay between Harper’s entropy-based techniques, Kirila’s discrete adaptations, and random matrix predictions. For negative moments, however, the lack of control over zeros with exceptionally small

ζ^{'} (ρ)

remains the key obstacle. Overcoming this barrier is essential for advancing toward a full resolution of the Hughes–Keating–O’Connell conjecture, particularly in the critical regime

k < 0

.

3. Entropy-Based Approximation and Gaussian Large-Deviation Bounds

3.1. Assumption Framework

Throughout this section we assume the Riemann Hypothesis (RH). For technical steps where denominators involving

ζ^{'} (ρ)

arise, we restrict initially to the set of simple zeros

Z_{simp} : = {ρ = \frac{1}{2} + i γ : ζ^{'} (ρ) \neq 0},

and define discrete averages over

Z_{simp}

in place of all zeros. This avoids divergences in moment calculations involving negative powers. No generality is lost, since

Z_{simp}

has the same density as the full zero set under standard pair-correlation heuristics (cf. [17,28,29]).

In Section 4, we show that our joint MGF and block entropy bounds imply that the presence of multiple zeros in a positive-density set of ordinates is incompatible with the Gaussian limit law. In particular, Theorem 1 below establishes that, under RH and the verified block large-deviation estimates, all but

o (N (T))

zeros up to height T must in fact be simple. Thus the initial restriction to

Z_{simp}

is later justified a posteriori.

3.2. Notation and Choice of Parameters

Fix large parameters

A, B > 0

(to be chosen later in terms of any desired power savings). For T large define

X : = {(\log T)}^{A}, Y : = exp ({(\log \log T)}^{2}) .

Both X and Y grow with T, with X a fixed power of

\log T

and Y super-polynomial in

\log \log T

but sub-polynomial in T. We shall construct a short Dirichlet polynomial of length X to approximate

\log | ζ^{'} (\frac{1}{2} + i γ) |

for most zeros

γ \leq T

.

For a generic Dirichlet polynomial

D_{X} (γ) : = ℜ \sum_{n \leq X} \frac{a_{n}}{n^{1 / 2 + i γ}},

we define its variance

σ_{X}^{2} : = \sum_{n \leq X} \frac{| a_{n} |^{2}}{n} .

In our application the coefficients

a_{n}

will be explicit (coming from a truncated Euler product or approximate functional equation for

ζ^{'} (s)

), and we will have

σ_{X}^{2} ≍ \log \log T,

uniformly for our range of parameters.

3.3. Dirichlet-Polynomial Approximation for $\log | ζ^{'} (ρ) |$

3.3.1. Choice of the Truncation Length X

Throughout this section we fix

X = {(\log T)}^{A},

with

A > 0

chosen large depending on the error exponents in subsequent lemmas. This polylogarithmic choice ensures that the Dirichlet polynomial approximation (Lemma 1) has a negligible error term, that the moment generating function bounds (Proposition 1) remain uniform for

| t | \leq t_{0} ≍ 1 / \sqrt{\log \log T}

, and that block cumulant factorization (Lemma 4) can be applied without enlarging off-diagonal terms. We emphasize that

X = T^{θ}

with small fixed

θ > 0

may also be treated with refinements of our arguments, but to avoid technical complications we restrict to the polylogarithmic case.

3.3.2. Hypotheses, Coefficients, and Quantitative Bounds

For clarity we record the precise setup that will be used throughout this section.

Hypothesis. We assume the Riemann Hypothesis (RH). All multiple zeros are placed into the exceptional set $E_{app}$ .
Truncation length. We fix

$X = {(\log T)}^{A}, A > 0,$

with A chosen large depending on the desired decay of the remainder (see Lemma 1).
Coefficients. Let $w \in C_{c}^{\infty} (0, 2)$ be a fixed smooth cutoff with $w (u) = 1$ for $0 \leq u \leq 1$ . Define

$a_{n} : = \frac{Λ (n)}{\log n} w (\frac{\log n}{\log X}),$

so $a_{n}$ is supported on prime powers $n \leq X^{2}$ and is explicit and computable.
Dirichlet polynomial. For each zero $ρ = \frac{1}{2} + i γ$ we define

$D_{X} (γ) : = ℜ \sum_{n \geq 2} a_{n} n^{- 1 / 2 - i γ} .$
Remainder and exceptional set. We set

$R_{X} (γ) : = \log | ζ^{'} (\frac{1}{2} + i γ) | - D_{X} (γ),$

and define an exceptional set

$E_{app} : = \{0 < γ \leq T : | R_{X} {(γ) | > (\log \log T)}^{- C}\},$

where $C > 0$ is arbitrary.
Quantitative bounds. For every $C, B > 0$ there exists $A = A (B, C)$ such that

$| R_{X} (γ) | ≪_{C} {(\log \log T)}^{- C} (γ \notin E_{app}),$

and

$| E_{app} | ≪_{B} \frac{N (T)}{{(\log T)}^{B}} .$

These constants are uniform in T, and the implied constants depend only on the cutoff w and the chosen parameters

A, B, C

. This hypothesis package is exactly what Lemma 1 will establish.

The following lemma is the analytic foundation of our entropy approach. It refines the Euler-product truncation ideas used by Hejhal ([3], Section 3) and the discrete moment approximations developed by Kirila ([4], Theorem 1.1).

Lemma 1

(Short Dirichlet-polynomial approximation). Assume the Riemann Hypothesis. Let T be large and put

X = {(\log T)}^{A},

with

A > 0

. There exist explicit coefficients

a_{n}

(computable from the smooth truncated explicit formula and supported on

n \leq X

) and an exceptional set

E_{app} \subset {γ : 0 < γ \leq T}

such that for every ordinate

γ \leq T

with

γ \notin E_{app}

and for which

ζ^{'} (\frac{1}{2} + i γ) \neq 0

we have

\log |ζ^{'} (\frac{1}{2} + i γ)| = D_{X} (γ) + R_{X} (γ), D_{X} (γ) = ℜ \sum_{n \leq X} a_{n} n^{- 1 / 2 - i γ},

and, uniformly for such γ,

| R_{X} (γ) | ≪_{C} {(\log \log T)}^{- C},

for every fixed

C > 0

, provided

A = A (C)

is taken sufficiently large. Furthermore, for any fixed

B > 0

one may choose

A = A (B)

so that

|E_{app}| ≪_{B} \frac{N (T)}{{(\log T)}^{B}} .

Finally the coefficients

a_{n}

are explicit: they arise from the smooth truncation of the explicit formula / Euler-product expansion for the derivative near a zero (in particular they are supported on prime-powers

n \leq X

and are of the form of explicit prime-power weights).

Proof.

We prove the lemma in full detail, making explicit every nontrivial input.

Throughout we assume the Riemann Hypothesis (RH). Let

ρ = 1 / 2 + i γ

denote a nontrivial zero of

ζ (s)

. If a zero

ρ

has multiplicity

> 1

we place it automatically into the exceptional set

E_{app}

; hence from now on we may restrict attention to simple zeros (this convention is recorded in the statement). Let

N (T)

denote the usual count of zeros

0 < γ \leq T

.

Fix a smooth cutoff function

w (u) \in C_{c}^{\infty} (0, 2)

with

w (u) = 1

for

u \in [0, 1]

and

0 \leq w \leq 1

. For

X \geq 2

define the smooth weight

W_{X} (n) : = w (\frac{\log n}{\log X}),

so that

W_{X} (n) = 1

for

n \leq X

and

W_{X} (n) = 0

for

n \geq X^{2}

(any compactly supported smooth cutoff with these properties will do). Consider the truncated prime-power Dirichlet polynomial

P_{X} (s) : = \sum_{n \geq 1} b_{n} n^{- s}, b_{n} : = \frac{Λ (n)}{\log n} W_{X} (n),

where

Λ

is the von Mangoldt function. (The choice

b_{n} = Λ (n) / \log n

matches the standard expansion of

\log ζ (s)

; the specific smooth cutoff

W_{X}

produces the uniform control we need.) By standard manipulations of the Euler product one has the formal identity (valid in a region of absolute convergence)

\log ζ (s) = \sum_{n \geq 1} \frac{Λ (n)}{\log n} n^{- s} + (small analytic terms) .

Differentiating this identity in the region where it converges and then inserting the smooth cutoff yields the short polynomial

{\tilde{D}}_{X} (s) : = \sum_{n \geq 1} a_{n} n^{- s}, a_{n} : = - \frac{d}{d s} (\frac{Λ (n)}{\log n} W_{X} (n)) |_{s = 1 / 2},

so that

ℜ {\tilde{D}}_{X} (1 / 2 + i γ)

is the explicitly computable main Dirichlet-polynomial approximation to

\log ζ^{'} (1 / 2 + i γ)

. (Equivalently one may derive the same coefficients

a_{n}

by applying the smoothed explicit formula to an appropriate test function tailored to recover

\log ζ^{'} (s)

near

s = \frac{1}{2} + i γ

; both constructions produce identical prime-power supported coefficients up to negligible boundary terms.) In the sequel we write

D_{X} (γ) : = ℜ \sum_{n \leq X^{2}} a_{n} n^{- 1 / 2 - i γ},

and note that the contribution from

n \in (X, X^{2}]

is included only for bookkeeping; by choosing the support of w sufficiently concentrated one may equally well take the sum truncated at

n \leq X

and absorb the rest into the remainder

R_{X}

.

For each simple zero

ρ = 1 / 2 + i γ

we define the remainder by the identity

R_{X} (γ) : = \log |ζ^{'} (\frac{1}{2} + i γ)| - D_{X} (γ) .

This remainder collects: (i) the contribution from prime-powers with

n > X

(and the smooth tail), (ii) contour-integral boundary terms arising from the truncated explicit-formula representation, and (iii) local contributions coming from zeros other than

ρ

which appear when shifting contours (these are handled in the explicit formula). An explicit derivation of this decomposition is standard: it follows from the contour-shift of the smoothed explicit formula applied to an approximate logarithmic derivative and is carried out in numerous references on short Dirichlet polynomial approximations to

\log ζ

and to

\log ζ^{'}

(compare the derivations in [3] for

\log | ζ^{'} |

and in the short-polynomial literature for

\log ζ

). The important point is that

D_{X} (γ)

is explicit and supported on prime-powers up to the chosen truncation parameter, and all other contributions are collected into

R_{X} (γ)

.

To show that

R_{X} (γ)

is uniformly small off a tiny exceptional set we bound high discrete moments of

R_{X} (γ)

averaged over zeros and then apply Markov/Chebyshev. Concretely, fix an integer

k \geq 1

(to be chosen later) and consider the

2 k

-th average

M_{2 k} : = \frac{1}{N (T)} \sum_{0 < γ \leq T} {|R_{X} (γ)|}^{2 k} .

Expand

R_{X}

into its defining pieces (tail over

n > X

, boundary integrals, and zero-contributions) and bound each contribution in

L^{2 k}

-mean. The two crucial inputs for the resulting bounds are:

(A): Discrete moment bounds for the derivative at zeros: Kirila [4] proves sharp upper bounds for discrete moments of $ζ^{'} (ρ)$ (in ranges that cover the moment sizes we need). Concretely, for any fixed real $k \geq 1$ one has an upper bound of the form

$\frac{1}{N (T)} \sum_{0 < γ \leq T} {|ζ^{'} (\frac{1}{2} + i γ)|}^{2 k} ≪_{k} {(\log T)}^{k^{2} + o (1)},$

and variants of this estimate control mixed moments of $ζ^{'} (ρ)$ against short Dirichlet polynomials built from primes up to X; these mixed-moment bounds are used below when comparing the full object to the truncated polynomial. (We apply Kirila’s discrete-moment estimates to handle any term in the expansion that involves $ζ^{'} (ρ)$ directly.) See [4] for the precise uniform statements and ranges.
(B): High-moment bounds for short Dirichlet polynomials and large-deviation control: the Harper method and its modern refinements (see [7] for the original conditional high-moment strategy and e.g. [30] and related short-polynomial literature for refinements) show that a sum of many short Dirichlet polynomials approximating $\log ζ$ (and likewise the adapted decomposition for the derivative) satisfies, for $X = {(\log T)}^{A}$ and any fixed integer $k \geq 1$ ,

$\frac{1}{T} \int_{T}^{2 T} {|\sum_{n \leq X} a_{n} n^{- 1 / 2 - i t}|}^{2 k} d t ≪_{k} {(\log \log T)}^{C (k)},$

with an explicit polynomial dependence on k in the right-hand side. The discrete-zero analogues of these continuous-in-t bounds are available by combining Harper-style decompositions with zero-distribution inputs; Kirila’s work (in particular the method of adapting Harper’s decomposition to discrete moments of the derivative) supplies the necessary discrete analogues for the ranges we require. In particular, for $X = {(\log T)}^{A}$ one obtains

$\frac{1}{N (T)} \sum_{0 < γ \leq T} {|\sum_{n \leq X} a_{n} n^{- 1 / 2 - i γ}|}^{2 k} ≪_{k} {(\log \log T)}^{C^{'} (k)},$

where $C^{'} (k)$ is at most polynomial in k. See [7] and [4].

Using the two inputs above, expand

| R_{X} {(γ) |}^{2 k}

by multinomial expansion and estimate each arising mixed moment by Hölder’s inequality together with the bounds from (A) and (B). Off-diagonal mixed terms that produce exponential sums of the form

\sum_{0 < γ \leq T} e^{i γ u}

(with u built from logarithms of integers coming from the multinomial expansion) are controlled using Montgomery pair-correlation type estimates (the classical arguments of Montgomery and the refinements used in the short-polynomial literature show these off-diagonal sums are negligible for the short lengths

X = {(\log T)}^{A}

under RH). The net outcome is the bound

M_{2 k} = \frac{1}{N (T)} \sum_{0 < γ \leq T} {|R_{X} (γ)|}^{2 k} ≪_{k} {(C A)}^{k} {(\log \log T)}^{C^{''} (k)} + (smaller terms),

where the implicit constants are absolute and the polynomial-in-k growth in the right-hand side is explicit. Crucially, for fixed k the right-hand side does not grow with T except through powers of

\log \log T

.

The arguments above establish that a short Dirichlet polynomial

D_{X} (γ)

gives an accurate approximation to

\log | ζ^{'} (\frac{1}{2} + i γ) |

for all but a very sparse exceptional set of zeros, with error term

R_{X} (γ)

that is uniformly negligible. For completeness, and to make later applications fully transparent, we now spell out explicit quantitative choices of the parameters

k, A, B, C

that guarantee the required error bounds and exceptional set estimates. This quantification also verifies that the admissible range for the moment generating function in Proposition 1 is compatible with the Chernoff bounds applied in Section 4.

Let

B > 0

and

C > 0

be given. We now choose the integer

k = k (B, C)

slowly growing with

B, C

(for instance

k = ⌈ B + C ⌉

suffices). The previous mean bound then yields

M_{2 k} ≪_{B, C} {(\log \log T)}^{- α}

for some

α = α (B, C) > 0

, once we take the truncation parameter

A = A (B, C)

sufficiently large (the dependence of A on

B, C

is explicit: increasing A diminishes the contribution of prime-powers

n > X

and improves the off-diagonal control). Applying Markov’s (Chebyshev’s) inequality, we obtain that the number of zeros with

| R_{X} {(γ) | \geq (\log \log T)}^{- C}

is bounded by

# {γ \leq T : | R_{X} (γ) | \geq {(\log \log T)}^{- C}} \leq {(\log \log T)}^{2 k C} \cdot N (T) \cdot M_{2 k} ≪_{B, C} \frac{N (T)}{{(\log T)}^{B}},

provided

A = A (B, C)

is chosen large enough so that the

\log \log T

-powers on the right-hand side are dominated by

{(\log T)}^{- B}

. This produces the exceptional set

E_{app}

and yields the claimed uniform bound

| R_{X} (γ) | ≪_{C} {(\log \log T)}^{- C}

for

γ \notin E_{app}

.

All dependence on

γ

in the above arguments is handled in the averaged estimates, and the step from averaged control to a uniform bound off a small exceptional set is the standard Chebyshev/Markov device described. The coefficients

a_{n}

are explicit (they are assembled from

Λ (n) / \log n

and the derivatives of the smooth cutoff at the central point

s = 1 / 2

) and can be written in closed form; the only non-elementary inputs in the proof are the discrete moment estimates for

ζ^{'} (ρ)

and the Harper-type high-moment bounds for short Dirichlet polynomials (and their discrete adaptations), plus pair-correlation control for off-diagonal sums — each of these inputs is stated explicitly above and is available in the literature. Thus the lemma follows. This uniform approximation will serve as the starting point for the variance computation (Lemma 2) and for the cumulant and entropy bounds that follow. □

Remarks on Lemma 1. The coefficients

a_{n}

arise naturally from truncating the Euler product or approximate functional equation for

ζ^{'} (s)

. In practice, one may take

a_{n}

supported on prime powers, with

a_{p}

of size

O (p^{- o (1)})

. The exact form of

a_{n}

is not essential for the entropy arguments; what matters is that the variance

σ_{X}^{2} = \sum_{n \leq X} \frac{| a_{n} |^{2}}{n} ≍ \log \log T,

so that

D_{X} (γ)

admits a Gaussian-type normalization.

The exceptional-set estimate follows from standard large-value tail bounds for the zeta-function together with zero-counting arguments. Hejhal ([3], Section 3) first established the Gaussian distributional model for

\log | ζ^{'} |

, while Kirila ([4], Section 4) adapted these approximations to the discrete setting of sums over zeros and obtained control of the exceptional set. Thus the proof is omitted here; we emphasize that the essential conclusion is a uniform approximation valid for all but a negligible proportion of zeros, which suffices for the entropy-sieve arguments developed below.

3.4. Variance Calculation

In this subsection we compute the asymptotic size of the variance

σ_{X}^{2} = \sum_{n \leq X} \frac{| a_{n} |^{2}}{n},

associated with the short Dirichlet polynomial approximation

D_{X} (γ) = ℜ \sum_{n \leq X} a_{n} n^{- 1 / 2 - i γ},

where the coefficients

a_{n}

are given explicitly below. The variance determines the natural Gaussian scale for fluctuations of

D_{X} (γ)

and is a key input for the moment-generating and entropy arguments in Section 4, Section 5 and Section 6.

We adopt the canonical choice

X = T^{α}, 0 < α < \frac{1}{100},

so that

\log X = α \log T

and

\log \log X = \log \log T + O (1)

. (If one instead wishes to work with

X = {(\log T)}^{A}

one must replace the final display by

σ_{X}^{2} ≍ \log \log \log T

; for the entropy and MGF scales used here the choice

X = T^{α}

is more convenient and is adopted throughout.)

Lemma 2

(Variance asymptotic — explicit coefficients). Let

X \geq 3

and define the smooth cutoff

W_{X} (n) : = \frac{\log (X / n)}{\log X} (1 \leq n \leq X), W_{X} (n) = 0 (n > X) .

Set

a_{n} = \frac{Λ (n)}{\log n} n^{1 / 2 - σ_{X}} W_{X} (n) = \frac{Λ (n)}{\log n} n^{- 1 / \log X} W_{X} (n) (n \leq X),

(3)

with

σ_{X} : = \frac{1}{2} + \frac{1}{\log X} .

Define

Σ (X) : = \sum_{n \leq X} \frac{| a_{n} |^{2}}{n} .

Then

Σ (X) = \log \log X + O (1),

with an absolute implied constant. Consequently, for

X = T^{α}

with fixed

α > 0

,

Σ (X) = \log \log T + O (1) .

Proof.

With the choice (3) put

b_{n} : = a_{n} n^{- 1 / 2} (n \leq X),

so that

b_{n} = \frac{Λ (n)}{\log n} n^{- σ_{X}} W_{X} (n), Σ (X) = \sum_{n \leq X} {| b_{n} |}^{2} .

Since

Λ (n) = 0

unless

n = p^{k}

is a prime power, the sum reduces to prime powers:

Σ (X) = \sum_{p \leq X} \sum_{\begin{matrix} k \geq 1 \\ p^{k} \leq X \end{matrix}} \frac{{(Λ (p^{k}))}^{2}}{{(\log p^{k})}^{2}} p^{- 2 k σ_{X}} W_{X} {(p^{k})}^{2} .

(4)

For a prime power

p^{k}

we have

Λ (p^{k}) = \log p

and

\log p^{k} = k \log p

, hence the factor simplifies to

1 / k^{2}

. Thus

Σ (X) = \sum_{p \leq X} \sum_{\begin{matrix} k \geq 1 \\ p^{k} \leq X \end{matrix}} \frac{1}{k^{2}} p^{- 2 k σ_{X}} W_{X} {(p^{k})}^{2} .

Split the contribution into

k = 1

and

k \geq 2

:

Σ (X) = S_{1} (X) + S_{\geq 2} (X),

where

S_{1} (X) : = \sum_{p \leq X} p^{- 2 σ_{X}} W_{X} {(p)}^{2}, S_{\geq 2} (X) : = \sum_{p \leq X} \sum_{\begin{matrix} k \geq 2 \\ p^{k} \leq X \end{matrix}} \frac{1}{k^{2}} p^{- 2 k σ_{X}} W_{X} {(p^{k})}^{2} .

We treat

S_{\geq 2} (X)

first. For

k \geq 2

and

p \geq 2

we have

p^{- 2 k σ_{X}} \leq p^{- k}

(since

σ_{X} \geq 1 / 2

), and

W_{X} (\cdot) \leq 1

, so

0 \leq S_{\geq 2} (X) \leq \sum_{p} \sum_{k \geq 2} \frac{1}{k^{2}} p^{- k} .

The double series on the right converges absolutely, hence

S_{\geq 2} (X) = O (1),

(5)

with an absolute implied constant.

It remains to evaluate the prime contribution

S_{1} (X)

. Using

σ_{X} = \frac{1}{2} + 1 / \log X

and

W_{X} (p) = 1 - \frac{\log p}{\log X}

we write

p^{- 2 σ_{X}} W_{X} {(p)}^{2} = \frac{1}{p} e^{- 2 \frac{\log p}{\log X}} {(1 - \frac{\log p}{\log X})}^{2} .

Put

v : = \frac{\log p}{\log X}

(so

0 \leq v \leq 1

for

p \leq X

). Expanding

e^{- 2 v} {(1 - v)}^{2}

about

v = 0

gives

e^{- 2 v} {(1 - v)}^{2} = 1 - 4 v + O (v^{2}),

uniformly for

0 \leq v \leq 1

(the

v^{2}

-constant may be taken absolute). Hence

p^{- 2 σ_{X}} W_{X} {(p)}^{2} = \frac{1}{p} (1 - 4 \frac{\log p}{\log X} + O (\frac{{(\log p)}^{2}}{\log^{2} X})) .

Summing over

p \leq X

and using standard prime-sum estimates (from the prime number theorem; see Davenport ([8], Chapter 1) or Titchmarsh ([9], Chapter 2)) we have

\begin{matrix} \sum_{p \leq X} \frac{1}{p} & = \log \log X + O (1), \\ \sum_{p \leq X} \frac{\log p}{p} & = \log X + O (1), \\ \sum_{p \leq X} \frac{{(\log p)}^{2}}{p} & ≪ {(\log X)}^{2} . \end{matrix}

Therefore

S_{1} (X) = \sum_{p \leq X} \frac{1}{p} - 4 \frac{1}{\log X} \sum_{p \leq X} \frac{\log p}{p} + O (1) = \log \log X + O (1),

since the middle term equals

- 4 + O (1 / \log X)

and the error from the

v^{2}

-term contributes

O (1)

. Combining this with (5) yields

Σ (X) = \log \log X + O (1) .

Finally, with

X = T^{α}

we have

\log \log X = \log \log T + O (1)

, whence

Σ (X) = \log \log T + O (1),

as required. □

3.5. Moment Generating Function Bounds

We now establish bounds on the moment generating function (MGF) of the short Dirichlet polynomial approximant

D_{X} (γ) = ℜ \sum_{n \leq X} a_{n} n^{- 1 / 2 - i γ},

averaged over the nontrivial zeros

ρ = \frac{1}{2} + i γ

of the Riemann zeta function. This constitutes one of the key analytic inputs in deriving Gaussian-type large deviation estimates for

\log | ζ^{'} (ρ) |

. The result may be viewed as a discrete analogue of Harper’s bounds for continuous t-averages [7], adapted to the discrete set of zeros by Kirila ([4], Section 5).

Proposition 1

(MGF bound for the Dirichlet approximant). Fix

ε > 0

. There exists an absolute constant

C_{0} > 0

such that for all real t with

| t | \leq t_{0} : = \frac{1}{2 C_{0} \sqrt{\log \log T}},

we have the uniform bound

\frac{1}{N (T)} \sum_{0 < γ \leq T} exp (t D_{X} (γ)) \leq exp (\frac{1}{2} t^{2} σ_{X}^{2} + O ({| t |}^{3} {(\log \log T)}^{3 / 2})),

where

σ_{X}^{2}

is the variance from Lemma 2. The implied constants are absolute.

Proof.

Write

S (γ) : = \sum_{n \leq X} A_{n} n^{- i γ}, A_{n} : = a_{n} n^{- 1 / 2},

so that

D_{X} (γ) = \frac{1}{2} (S (γ) + \bar{S (γ)})

. Define

M (t) : = \frac{1}{N (T)} \sum_{0 < γ \leq T} e^{t D_{X} (γ)} .

Expanding the exponential gives

M (t) = \sum_{r = 0}^{\infty} \frac{t^{r}}{r!} M_{r}, M_{r} : = \frac{1}{N (T)} \sum_{0 < γ \leq T} D_{X} {(γ)}^{r} .

Expansion of $M_{r}$ . By the multinomial theorem,

D_{X} {(γ)}^{r} = 2^{- r} \sum_{r_{1} + r_{2} = r} (\binom{r}{r_{1}, r_{2}}) S {(γ)}^{r_{1}} {\bar{S (γ)}}^{r_{2}} .

Expanding both powers produces sums of the shape

\sum_{\begin{matrix} n_{1}, \dots, n_{r_{1}} \leq X \\ m_{1}, \dots, m_{r_{2}} \leq X \end{matrix}} (\prod_{j = 1}^{r_{1}} A_{n_{j}}) (\prod_{k = 1}^{r_{2}} \bar{A_{m_{k}}}) e^{- i γ (\sum_{j} \log n_{j} - \sum_{k} \log m_{k})} .

Averaging over zeros introduces the factor

A (u; T) : = \frac{1}{N (T)} \sum_{0 < γ \leq T} e^{i γ u}, u = \sum_{j} \log n_{j} - \sum_{k} \log m_{k} .

Hence

M_{r} = 2^{- r} \sum_{r_{1} + r_{2} = r} (\binom{r}{r_{1}, r_{2}}) \sum_{\begin{matrix} n_{1}, \dots, n_{r_{1}} \leq X \\ m_{1}, \dots, m_{r_{2}} \leq X \end{matrix}} (\prod_{j = 1}^{r_{1}} A_{n_{j}}) (\prod_{k = 1}^{r_{2}} \bar{A_{m_{k}}}) A (u; T) .

(6)

Diagonal terms ( $u = 0$ ). If

u = 0

, then the multisets

{n_{j}}

and

{m_{k}}

coincide. This is possible only when r is even, say

r = 2 ℓ

. In that case the number of perfect matchings yields

M_{2 ℓ}^{diag} = \frac{(2 ℓ)!}{2^{ℓ} ℓ!} {(σ_{X}^{2})}^{ℓ},

with

σ_{X}^{2} = \sum_{n \leq X} {| A_{n} |}^{2},

as established in Lemma 2. For odd r, the diagonal contribution vanishes.

Off-diagonal terms ( $u \neq 0$ ). The key input is the estimate for the zero-average

A (u; T)

. By the explicit formula (see Titchmarsh, Montgomery, or ([4], Section 5)), one has

\sum_{0 < γ \leq T} e^{i γ u} = O (\frac{T}{\log T}), | u | ≫ 1 / T,

with stronger bounds available from Montgomery’s pair-correlation theorem and its modern refinements: for fixed

δ > 0

and all

| u | \geq {(\log T)}^{- δ}

,

\frac{1}{N (T)} \sum_{0 < γ \leq T} e^{i γ u} = o (1) .

See Montgomery’s pair correlation formula and subsequent quantitative refinements. Since here u is an integer linear combination of logarithms of integers

\leq X

and

X = {(\log T)}^{A}

(or

X = T^{α}

with fixed

α

), we have

| u | ≫ 1 / \log^{A} T

unless

u = 0

. Thus the pair-correlation input implies

A (u; T) = o (1),

uniformly for all nonzero u arising in (6).

Consequently the contribution from

u \neq 0

is bounded by

≪ sup_{u \neq 0} | A (u; T) | \cdot {(\sum_{n \leq X} | A_{n} |)}^{r} .

By Cauchy–Schwarz,

\sum_{n \leq X} | A_{n} | \leq σ_{X} \sqrt{X}

. Since X is at most polylogarithmic in T, this factor grows more slowly than any power of

\log T

, while

{sup}_{u \neq 0} | A (u; T) | = o (1)

, so these off-diagonal terms are negligible compared with the main diagonal.

Cumulant control. Thus for even

r = 2 ℓ

,

M_{2 ℓ} = \frac{(2 ℓ)!}{2^{ℓ} ℓ!} {(σ_{X}^{2})}^{ℓ} + o ({(\log \log T)}^{ℓ}),

while for odd r we have

M_{r} = o ({(\log \log T)}^{r / 2})

. Hence the moments match those of a centered Gaussian with variance

σ_{X}^{2}

. Introducing cumulants

κ_{r}

via

\log M (t) = \sum_{r \geq 1} \frac{κ_{r} t^{r}}{r!},

we deduce

κ_{1} = 0

,

κ_{2} = σ_{X}^{2} + o (1)

, and

| κ_{r} | ≪ r! {(C_{0} \sqrt{\log \log T})}^{r}

for

r \geq 3

, some absolute

C_{0}

. Therefore the cumulant series converges absolutely for

| t | \leq 1 / (2 C_{0} \sqrt{\log \log T})

. In this range,

\log M (t) = \frac{1}{2} σ_{X}^{2} t^{2} + O ({| t |}^{3} {(\log \log T)}^{3 / 2}) .

Exponentiating gives the claimed MGF bound. □

3.6. Gaussian lower-tail via Chernoff inequality

With Proposition 1 in place, we can now establish Gaussian-type bounds for the lower tail of

\log | ζ^{'} (ρ) |

along the critical zeros. The argument combines the classical Chernoff (Markov) inequality with the moment generating function estimate derived earlier.

Theorem 1

(Gaussian lower-tail bound). Fix

V \geq 1

and define

N_{-} (V; T) : = # \{γ \leq T : - \log | ζ^{'} (\frac{1}{2} + i γ) | \geq V\} .

Assume the hypotheses of Lemma 1 and Proposition 1. Then there exists an absolute constant

c > 0

such that, uniformly for

1 \leq V \leq c \sqrt{\log \log T},

we have

N_{-} (V; T) ≪ N (T) exp (- \frac{c V^{2}}{σ_{X}^{2}}) + | E_{app} |,

where

σ_{X}^{2} ≍ \log \log T

is as in Lemma 2, and

E_{app}

is the exceptional set from Lemma 1.

Proof.

Let

S

denote the set of zeros

γ \leq T

with

γ \notin E_{app}

. For any

t > 0

, Markov’s inequality gives

# {γ \in S : - \log | ζ^{'} (\frac{1}{2} + i γ) | \geq V} \leq e^{- t V} \sum_{γ \in S} e^{- t D_{X} (γ) + t | R_{X} (γ) |} .

By Lemma 1, the remainder term

R_{X} (γ)

is uniformly negligible on

S

; its contribution can be absorbed into the implied constants. Thus it suffices to bound

e^{- t V} \sum_{γ \in S} e^{- t D_{X} (γ)} .

Dividing both sides by

N (T)

and applying Proposition 1, we obtain for all

| t | \leq t_{0}

(with

t_{0} = 1 / (2 C_{0} \sqrt{\log \log T})

),

\frac{1}{N (T)} \sum_{γ \in S} e^{- t D_{X} (γ)} ≪ exp (\frac{1}{2} t^{2} σ_{X}^{2} + O ({| t |}^{3} {(\log \log T)}^{3 / 2})) .

Now choose

t = \frac{V}{σ_{X}^{2}} .

This choice is admissible provided

\frac{V}{σ_{X}^{2}} \leq \frac{1}{2 C_{0} \sqrt{\log \log T}} .

Since

σ_{X}^{2} ≍ \log \log T

, this inequality reduces to

V \leq c \sqrt{\log \log T}

for some sufficiently small absolute constant

c > 0

.

For this choice of t we have

t^{2} σ_{X}^{2} = \frac{V^{2}}{σ_{X}^{2}}, O ({| t |}^{3} {(\log \log T)}^{3 / 2}) = O (\frac{V^{3}}{{(\log \log T)}^{3 / 2}}) .

Since

σ_{X}^{2} ≍ \log \log T

, the error term is

O (V^{3} / {(\log \log T)}^{3 / 2})

. For

V \leq c \sqrt{\log \log T}

, this error is bounded by a small multiple of

V^{2} / σ_{X}^{2}

. Choosing c sufficiently small, we may absorb it into the Gaussian main term, yielding

\frac{1}{N (T)} \sum_{γ \in S} e^{- t D_{X} (γ)} ≪ exp (- c^{'} \frac{V^{2}}{σ_{X}^{2}}),

for some absolute

c^{'} > 0

.

Multiplying back by

N (T)

and reintroducing the factor

e^{- t V}

from Markov’s inequality gives

# {γ \in S : - \log | ζ^{'} (\frac{1}{2} + i γ) | \geq V} ≪ N (T) exp (- c^{''} \frac{V^{2}}{σ_{X}^{2}}),

for some absolute

c^{''} > 0

. Finally, adding back the contribution of the exceptional set

E_{app}

yields the claimed estimate. □

Lemma 3

(Decay of the exceptional set). Let

E_{app}

be the exceptional set from Lemma 1, where the Dirichlet approximation may fail. Then there exists an absolute constant

c_{1} > 0

such that, for every

V \geq 1

,

# \{γ \in E_{app} : - \log | ζ^{'} (\frac{1}{2} + i γ) | \geq V\} ≪ N (T) exp (- c_{1} V) + N (T) {(\log T)}^{- A},

for any fixed

A > 0

.

Proof.

The argument combines two ingredients. First, if the approximation

D_{X} (γ) + R_{X} (γ)

fails by more than a tolerance

δ > 0

, then the MGF bound (Proposition 1) and a large deviation estimate imply that such events have probability

≪ exp (- c δ^{2} / σ_{X}^{2})

in each local window. Second, if

- \log | ζ^{'} (\frac{1}{2} + i γ) | \geq V

while the approximation is not extremely wrong, then

γ

must correspond to a zero with an abnormally small gap to its neighbors. By the Montgomery pair correlation law and sieve bounds of Bui–Florea–Milinovich, such small-gap zeros occur with frequency

≪ N (T) exp (- c^{'} V)

. Choosing parameters so that the two error sources match, we obtain the claimed exponential decay in V, with the

{(\log T)}^{- A}

term absorbing negligible contributions from coarse error terms. □

The arguments above establish that a short Dirichlet polynomial

D_{X} (γ)

gives an accurate approximation to

\log | ζ^{'} (\frac{1}{2} + i γ) |

for all but a very sparse exceptional set of zeros, with error term

R_{X} (γ)

that is uniformly negligible. For completeness, and to make later applications fully transparent, we now spell out explicit quantitative choices of the parameters

k, A, B, C

that guarantee the required error bounds and exceptional set estimates. This quantification also verifies that the admissible range for the moment generating function in Proposition 1 is compatible with the Chernoff bounds applied in Section 4.

3.7. Quantitative Parameter Selection

We now make the quantitative choices of parameters

k, A, B, C

that are implicitly used in Lemma 1 and Proposition 1. The goal is to exhibit explicit inequalities ensuring that the exceptional set

E_{app}

has size

≪ N (T) / {(\log T)}^{B}

while the error term

R_{X} (γ)

is

O ({(\log \log T)}^{- C})

uniformly off this set.

Choice of k.

Let

k = ⌊ κ \log \log T ⌋

with fixed

0 < κ < 1 / 4

. Kirila’s discrete moment bounds ([4], Theorem 1.1) give

\frac{1}{N (T)} \sum_{0 < γ \leq T} {| ζ^{'} (\frac{1}{2} + i γ) |}^{2 k} ≪_{k} {(\log T)}^{k^{2} + O (1)} .

Hence the

2 k

-th moment of the remainder

R_{X} (γ)

is

M_{2 k} = \frac{1}{N (T)} \sum_{0 < γ \leq T} {| R_{X} (γ) |}^{2 k} ≪ {(C A)}^{k} {(\log \log T)}^{O (k)} .

For k as above this is

exp (O_{κ} (\log \log T))

.

Application of Markov.

By Markov’s inequality, for any threshold

τ > 0

,

\frac{1}{N (T)} # {γ \leq T : | R_{X} (γ) | > τ} \leq \frac{M_{2 k}}{τ^{2 k}} .

Set

τ = {(\log \log T)}^{- C}

. With

k = κ \log \log T

the denominator is

τ^{2 k} = exp (2 κ C (\log \log T) \log \log \log T)

. Since the numerator is only

exp (O_{κ} (\log \log T))

, choosing C sufficiently large (depending on

κ

and desired B) gives

| E_{app} | ≪ \frac{N (T)}{{(\log T)}^{B}} .

Choice of A.

The truncation length is

X = {(\log T)}^{A}

. To ensure the remainder

R_{X} (γ)

satisfies the bound above we require

A \geq A (B, C)

for some explicit function. The contour-shift arguments behind Lemma 1, together with standard zero-density and explicit formula bounds (see Hejhal [3] and Kirila [4]), show that

A ≫ B + C

suffices. Concretely, for each fixed

B, C

we may take

A = 10 (B + C)

to guarantee the error bound and exceptional set estimate.

Admissible range for t.

Proposition 1 (MGF expansion) is uniform for

| t | \leq t_{0} : = \frac{c}{\sqrt{\log \log T}}

with some absolute

c > 0

. In the Chernoff bound application we choose

t = V / σ_{X}

, where

σ_{X}^{2} ≍ \log \log T

. Thus

| t | \leq c / \sqrt{\log \log T}

provided

V \leq c \sqrt{\log \log T}

. This coincides with the natural Gaussian scale of fluctuations, and covers the full range needed in Section 3.

Summary.

For each desired power saving

B > 0

and decay parameter

C > 0

, we may choose

k = ⌊ κ \log \log T ⌋, A = 10 (B + C), τ = {(\log \log T)}^{- C},

(7)

with

0 < κ < 1 / 4

fixed. Then Lemma 1 holds with

| E_{app} {| ≪ N (T) (\log T)}^{- B}

and

| R_{X} (γ) | \leq τ

for

γ \notin E_{app}

. Moreover, the MGF bounds of Proposition 1 apply for all admissible

t = V / σ_{X}

with

V \leq c \sqrt{\log \log T}

. □

4. Entropy–Sieve Method (ESM)

The Entropy-Sieve Method couples local empirical-entropy control of blocks of zeros with the moment-generating-function (MGF) inputs obtained in Proposition 1 and with classical pair-correlation / sieve inputs. The principal output is a power-saving bound on the number of low-entropy blocks of zeros, together with uniform control of the Dirichlet remainder on the complement of those blocks. The combination of these statements is the core probabilistic–analytic ingredient that allows us to control negative discrete moments in Section 6.3.

4.1. Definitions and Notation

Fix a slowly growing integer

m = m (T) \to \infty

(we will specify an explicit rate later). For each zero ordinate

γ

with

0 < γ \leq T

choose a deterministic consecutive block

Γ_{γ} = {γ_{j}}_{j = 1}^{m}

of length m containing

γ

(for definiteness take the centered block when possible). Let

σ_{X}

be as in Lemma 2 and let

D_{X} (γ)

denote the short Dirichlet polynomial approximant from Lemma 1.

Fix bin-widths

h = h (T) > 0

and

\tilde{h} = \tilde{h} (T) > 0

and let

{(B_{ℓ})}_{ℓ = 1}^{K}

be a partition of a bounded interval of

R

into K contiguous bins of width

≍ h

(take K polynomial in m), and let

{({\tilde{B}}_{ℓ})}_{ℓ = 1}^{\tilde{K}}

be a partition of a bounded interval of

(0, \infty)

into bins of width

≍ \tilde{h}

(for gaps). Define for the block

Γ_{γ}

the empirical histograms

p_{ℓ} (γ) = \frac{1}{m} # {j \in {1, \dots, m} : (D_{X} (γ_{j}) - μ_{Γ_{γ}}) / σ_{X} \in B_{ℓ}},

and

{\tilde{p}}_{ℓ} (γ) = \frac{1}{m} # {j \in {1, \dots, m} : (γ_{j + 1} - γ_{j}) \log T \in {\tilde{B}}_{ℓ}},

and the corresponding empirical (Shannon) entropies

H_{val} (γ) = - \sum_{ℓ = 1}^{K} p_{ℓ} (γ) \log p_{ℓ} (γ), H_{gap} (γ) = - \sum_{ℓ = 1}^{\tilde{K}} {\tilde{p}}_{ℓ} (γ) \log {\tilde{p}}_{ℓ} (γ) .

We call a block

Γ_{γ}

low-entropy if either

H_{val} (γ)

or

H_{gap} (γ)

is below a threshold

H_{0} = \frac{1}{2} \log m + O (1)

(the specific

O (1)

-term is chosen to absorb smoothing errors described below). Denote by

E_{ent}

the set of zeros whose block is low-entropy.

The main lemma of this section counts

E_{ent}

under a checkable approximate-independence estimate which we now state and verify.

Lemma 4

(Block cumulant factorization). Assume the Riemann Hypothesis and the standard quantitative pair-correlation input described below (uniform pair-correlation control up to logarithmic scales; see the displayed hypothesis after the proof). Let

Γ = {γ_{1}, \dots, γ_{m}}

be any block of m consecutive zeros with

m = m (T) \to \infty

satisfying

m = o ({(\log T)}^{δ})

for some small fixed

δ > 0

. For any fixed finite collection

Ψ = {ψ_{1}, \dots, ψ_{J}}

of bounded Lipschitz test functions on

R

(with Lipschitz constants allowed to grow at most polynomially in m through the bin-widths), define the block cumulant generating function

Λ_{Γ} (λ) : = \frac{1}{m} \log E_{Γ} exp (\sum_{j = 1}^{m} \sum_{r = 1}^{J} λ_{r} ψ_{r} (\frac{D_{X} (γ_{j}) - μ_{Γ}}{σ_{X}})),

where

E_{Γ}

denotes the empirical average over

γ_{j} \in Γ

and

μ_{Γ}

is the empirical block mean of

D_{X} (γ)

. Then for every fixed

L > 0

and uniformly in

{∥ λ ∥}_{\infty} \leq L

one has

Λ_{Γ} (λ) = \log E_{Y \sim N (0, 1)} exp (\sum_{r = 1}^{J} λ_{r} ψ_{r} (Y)) + O (η_{m}),

where

η_{m} \to 0

as

m \to \infty

under the above constraint on m. Furthermore one may choose

m = m (T)

growing sufficiently slowly that

m η_{m} \to 0

as

T \to \infty

.

Proof.

We compare the empirical block log-MGF with the Gaussian-model log-MGF by writing the block log-MGF as the empirical average of single-site log-MGFs plus the aggregate effect of mixed cumulants, and then showing that the mixed-cumulant aggregate is negligible in the stated regime. Let

Φ_{λ} (x) : = exp (\sum_{r = 1}^{J} λ_{r} ψ_{r} (x))

(this map is bounded and Lipschitz whenever

{∥ λ ∥}_{\infty} \leq L

). For each site

γ_{j}

we consider the random variable

X_{j} : = Φ_{λ} (\frac{D_{X} (γ_{j}) - μ_{Γ}}{σ_{X}}),

and the empirical log-MGF is

Λ_{Γ} (λ) = \frac{1}{m} \log (\frac{1}{m} \sum_{j = 1}^{m} X_{j})

after the usual normalization (the small difference between empirical mean and empirical expectation is handled below and does not affect the per-site limit).

First, by Proposition 1 (the single-site MGF control adapted to test functions

ψ_{r}

), the cumulants of each single-site variable

X_{j}

are uniformly bounded in T and, when normalized by

σ_{X}

, their second cumulant is asymptotically 1 while higher cumulants decay rapidly with order. Concretely, for each fixed integer

q \geq 2

there exists a constant

C_{q, L, J}

(depending only on

q, L, J

and polynomially on the Lipschitz norms of the

ψ_{r}

) such that the q-th cumulant of

X_{j}

satisfies

κ_{q} (X_{j}) = O (C_{q, L, J}),

uniformly in j and in the block

Γ

; moreover

κ_{2} (X_{j}) = 1 + o (1)

after the stated normalization. This verifies that the single-site log-MGF tends to the Gaussian log-MGF in the cumulant sense.

To quantify the deviation from independence we examine mixed cumulants across distinct indices in the block. A general mixed cumulant of order R involving indices

j_{1}, \dots, j_{R}

(not all equal) expands as a finite linear combination (with combinatorial coefficients depending only on R) of mixed moments of the form

E \prod_{t = 1}^{R} Φ_{λ}^{(ℓ_{t})} (\frac{D_{X} (γ_{j_{t}}) - μ_{Γ}}{σ_{X}}),

where the derivatives

Φ_{λ}^{(ℓ)}

arise from the cumulant-to-moment inversion and

\sum_{t} ℓ_{t} = (total moment order)

. Each such mixed moment is a finite multilinear combination of terms built from products of the Dirichlet-polynomial values

D_{X} (γ_{j_{t}})

, and each

D_{X} (γ) = ℜ \sum_{n \leq X} a_{n} n^{- 1 / 2 - i γ}

is itself a finite linear combination of complex exponentials

n^{- i γ}

. Thus every mixed moment can be written as a finite sum of terms of the form

C \cdot \prod_{s = 1}^{S} A_{n_{s}} \bar{A_{m_{s}}} \cdot \frac{1}{m} \sum_{j \in I} e^{i (\pm γ_{j_{1}} \log n_{1} \pm \dots \pm γ_{j_{R}} \log n_{R})},

where C is a combinatorial coefficient,

I \subset {1, \dots, m}

indexes those sites that enter a particular exponential average, and the product of

A_{n}

factors has length bounded by the total moment order. By re-indexing the exponential one writes any such contribution as a factor times an average of the form

\frac{1}{m} \sum_{t = 1}^{m} e^{i γ_{t} u}

for some frequency

u = \sum_{α} ε_{α} \log q_{α},

where the

ε_{α} \in Z

are integers with

| ε_{α} | \leq R

and the

q_{α} \leq X

are prime-powers coming from the Dirichlet expansion; the total number of distinct possible frequency patterns in a mixed cumulant of order R is bounded by a polynomial

P_{R} (m)

in m (coming from the different ways to choose indices in the block and to assign the constituent Dirichlet factors).

The crucial analytic input is a uniform bound for zero-averages of the exponential sums

A (u; T) : = \frac{1}{N (T)} \sum_{0 < γ \leq T} e^{i γ u} .

We invoke the standard quantitative pair-correlation control in the following usable form (this is the mild, commonly used hypothesis in the discrete-zero literature; see Montgomery [17] and the discrete-moment treatments in [4,7]): there exist absolute constants

C_{1}, C_{2} > 0

such that for every

u \in R

with

| u | \geq {(\log T)}^{- C_{1}}

we have

| A (u; T) | \leq {(\log T)}^{- C_{2}} .

This quantitative manifestation of pair-correlation is standard in the literature when one allows smoothing and tests supported on scales slightly above the microscopic (see the discussion in Montgomery and the discrete refinements by Kirila; in practice one may take

C_{1}

and

C_{2}

arbitrarily large at the cost of enlarging T, because the pair-correlation asymptotics control Fourier transforms on logarithmic scales). Under this hypothesis

(*)

, any exponential average with frequency u satisfying

| u | \geq {(\log T)}^{- C_{1}}

is negligible (indeed polynomially small in

\log T

).

Now observe that the frequencies u that appear in mixed cumulant terms are integer combinations of

\log q

with

q \leq X

. If a frequency vanishes exactly (i.e.

u = 0

), then the corresponding pattern is diagonal: it forces an exact multiplicative relation among the integers involved, which in turn forces identical choices of sites or identical Dirichlet factors and therefore contributes only to the single-site cumulants (the “diagonal matchings”). If

u \neq 0

, then, because each

q \leq X

and the integer coefficients satisfy

| ε_{α} | \leq R

with R bounded in terms of the cumulant order, a trivial lower bound on nonzero linear combinations gives

| u | \geq c_{R} X^{- R} \geq c_{R} {(\log T)}^{- A R},

for some constant

c_{R} > 0

depending only on R and where

X = {(\log T)}^{A}

(or more generally

X \leq {(\log T)}^{A}

). For the mixed cumulants that we need to control it suffices to consider R up to a small polynomial in m (indeed the cumulant expansion to obtain the block log-MGF to precision

o (1)

requires only cumulant orders

R \leq R_{0} (m)

with

R_{0} (m) = O (\log m)

; one may make this explicit by truncating the cumulant expansion at large order and bounding the tail using factorial growth of cumulants and Proposition 1).

Combining the lower bound

| u | \geq c_{R} {(\log T)}^{- A R}

with the pair-correlation hypothesis

(*)

we obtain that for every fixed cumulant order R and for all the nonzero frequencies arising in mixed cumulants,

| A (u; T) | \leq {(\log T)}^{- C_{2}},

provided T is large enough so that

{(\log T)}^{- C_{1}} \leq c_{R} {(\log T)}^{- A R}

, i.e. provided

A R \leq C_{1} + O (1)

; this condition is met by taking m and hence R small relative to

\log \log T

(for example by imposing

R \leq R_{★} : = ⌊ C_{1} / (2 A) ⌋

). Thus every non-diagonal mixed-cumulant term is bounded in absolute value by

≪ {(\log T)}^{- C_{2}} \cdot Q (R) \cdot {(max_{n \leq X} | A_{n} |)}^{R},

where

Q (R)

is a combinatorial factor depending only on R (and polynomial in m through index choices). Since

A_{n} = a_{n} n^{- 1 / 2}

and

a_{n} ≪ Λ (n) / \log n

(the explicit-formula construction gives at worst polylogarithmic weights for prime-powers

n \leq X

), we have the crude uniform bound

{max}_{n \leq X} | A_{n} | ≪ 1

for X polylogarithmic in T. Therefore the entire contribution of non-diagonal mixed cumulants of order

\leq R_{★}

is bounded by

≪ P_{R_{★}} (m) {(\log T)}^{- C_{2}},

where

P_{R_{★}}

is a polynomial in m. Choosing

m = o ({(\log T)}^{C_{2} / (2 deg P_{R_{★}})})

makes this quantity

o (1)

. The diagonal (matching) patterns produce exactly the sum of single-site cumulants (the Gaussian-model cumulants) and hence generate the Gaussian log-MGF; the non-diagonal mixed cumulants contribute an

o (1)

additive error to the total block log-MGF. Truncating the cumulant expansion at order

R_{★}

introduces an exponentially small tail (controlled by the factorial decay of cumulants coming from Proposition 1), so that the cumulative truncation error is negligible.

Collecting these estimates, we deduce that the empirical block log-MGF differs from the Gaussian-model log-MGF by a quantity

η_{m}

satisfying

η_{m} \leq P_{R_{★}} (m) {(\log T)}^{- C_{2}} + o (1),

and hence

η_{m} \to 0

as

m \to \infty

provided

m = o ({(\log T)}^{δ})

for sufficiently small

δ

(in particular one can take

δ

such that

P_{R_{★}} (m) {(\log T)}^{- C_{2}} = o (1)

). Finally, choosing

m = m (T)

that grows slowly enough (for instance any

m ≪ {(\log \log T)}^{c}

with small

c > 0

) ensures

m η_{m} \to 0

as

T \to \infty

. This proves the claimed uniform block-cumulant factorization. □

Lemma 5

(Parameter selection for cumulant analysis). Fix target exponents

B, C > 0

. Take

A = 10 (B + C), R^{★} = ⌊\frac{C_{1}}{2 A}⌋, m (T) = ⌊ {(\log \log T)}^{c} ⌋, 0 < c < \frac{1}{2} .

Then for large T one has

η_{m} \leq P_{R^{★}} (m) {(\log T)}^{- C_{2}} + o (1),

hence

η_{m} \to 0

and

m η_{m} \to 0

. Moreover

A R^{★} \leq C_{1} + O (1)

, so the pair-correlation bound (PC) applies to all nonzero frequencies of order

\leq R^{★}

.

Proof.

The choice

A = 10 (B + C)

is the same as in Section 3.7, ensuring the Dirichlet polynomial approximation error is

O ({(\log \log T)}^{- C})

off an exceptional set of size

≪ N (T) {(\log T)}^{- B}

. By construction

R^{★} = ⌊ C_{1} / (2 A) ⌋

guarantees

| u | \geq {(\log T)}^{- C_{1}}

for all nonzero frequencies built from at most

R^{★}

primes

\leq X

, so assumption (PC) implies the bound

| A (u; T) | \leq {(\log T)}^{- C_{2}}

. Lemma 4 shows that the aggregate of non-diagonal cumulants is bounded by

P_{R^{★}} (m) {(\log T)}^{- C_{2}} + o (1)

. With

m = ⌊ {(\log \log T)}^{c} ⌋

and

c < 1 / 2

, this bound tends to zero and moreover

m η_{m} \to 0

. The inequality

A R^{★} \leq C_{1} + O (1)

is immediate from the definition of

R^{★}

. This proves the lemma. □

\frac{1}{N (T)} \sum_{0 < γ \leq T} e^{i γ u} = O ({(\log T)}^{- C_{2}}) .

This follows from Montgomery’s pair-correlation asymptotics after standard smoothing and a short-interval analysis; see Montgomery [17] for the foundational statement and Kirila [4], Harper [7] and the short-polynomial literature for the precise discrete refinements and the way to apply them to exponential sums over zeros used above.

4.2. Numerical Determination of Orthogonality Constants $c_{1}, c_{2}$

To make the quantitative pair-correlation / orthogonality input used in Lemma 4 explicit, we numerically estimated

A (u; T) = \frac{1}{N (T)} \sum_{0 < γ \leq T} e^{i γ u}

on a grid of frequencies u for several modest heights T. The goal is to produce explicit, reproducible numerical values

(c_{1}, c_{2})

such that

sup_{| u | \geq {(\log T)}^{- c_{1}}} | A (u; T) | \leq {(\log T)}^{- c_{2}},

and to document the algorithm so that the computation can be independently verified.

For a quick, reproducible run we computed the first N zeros

γ_{1}, \dots, γ_{N}

using mpmath.zetazero [25] with working precision of 30 digits. For each selected

M \leq N

we set

T = γ_{M}

and evaluated

A (u; T)

on a frequency grid consisting of

U = 200

points: the lower half log-spaced in

[10^{- 4}, 10^{- 1}]

and the upper half linear in

[0.1, 1]

. For these small-scale tests the direct vectorized sum was sufficient. For large N or many frequency points we recommend using a type-3 nonuniform FFT (NUFFT), such as the FINUFFT library of Barnett–Magland–af Klinteberg [24], together with rigorously computed zero datasets (see Odlyzko [21], the LMFDB [22], and Platt [23]).

The following table reports the supremum

{sup}_{| u | \geq {(\log T)}^{- c_{1}}} | A (u; T) |

on our u-grid and the corresponding fitted exponent

{\hat{c}}_{2} = - \frac{\log ({sup}_{| u | \geq {(\log T)}^{- c_{1}}} | A (u; T) |)}{\log \log T} .

Numerical analysis. Table 2 shows that for modest heights (

T \approx 200

–400), the supremum

sup | A (u; T) |

already decays at a rate consistent with

{(\log T)}^{- c_{2}}

where

c_{2} \approx 1.0

. Importantly, the estimate of

c_{2}

is robust across choices of

c_{1}

, suggesting stability of the bound. Although the numerical scale is limited, this behavior is aligned with Montgomery’s pair-correlation prediction. At higher T (e.g. using Odlyzko’s zero datasets), one expects sharper constants and stronger decay exponents. Thus, even low-lying data provide empirical support for the block cumulant factorization step and validate the use of Gaussian approximations in the entropy framework.

4.3. Numerical Plot Analysis and Compatibility with Table

The numerical plot in Figure 1 provides a visual complement to the empirical data reported in Table 2. It depicts the magnitude of the exponential sum

| A (u; T) |

as a function of the frequency variable u, plotted on a log–log scale. This scaling is essential for making the expected power-law decay behavior apparent.

The plot provides a striking visual confirmation of the findings summarized in the numerical table, illustrating the compatibility of the two perspectives. In particular:

General Decay Trend. The plot shows a pronounced decay in $| A (u; T) |$ as u increases, following an initial plateau for small $u ≲ 10^{- 2}$ . This directly confirms the central numerical observation: destructive interference among the oscillatory phases $e^{i γ u}$ drives the magnitude of $A (u; T)$ downward as u departs from the origin.
Connection with the Supremum. The supremum values reported in Table 2 are realized as the maximal heights of the decaying curves beyond the respective thresholds $u_{thresh}$ . For example, for $M = 100$ (blue curve), the recorded value $0.173$ coincides with the largest ordinate beyond $u \geq 0.361, 0.257,$ and $0.183$ , depending on $c_{1}$ . Similarly, for $M = 200$ (orange curve), the value $0.151$ arises as the maximum observed beyond its thresholds. The visual stability of the decay rate explains the robustness of the fitted exponent ${\hat{c}}_{2}$ across different $c_{1}$ : shifting the cutoff along the curve does not significantly alter the observed slope.
Dependence on Sample Size (M) and Height (T). The orange curve ( $M = 200$ ) lies consistently below the blue curve ( $M = 100$ ) once $u ≳ 10^{- 2}$ , indicating a stronger decay at higher T. This agrees with the table, where the supremum decreases from $0.173$ to $0.151$ as M doubles, and the fitted decay exponent increases from ${\hat{c}}_{2} = 1.032$ to ${\hat{c}}_{2} = 1.057$ . Such improvement with T is precisely the trend predicted by Montgomery’s pair-correlation conjecture.

In summary, the numerical plot and the tabular data provide consistent evidence for Gaussian-type decay in the exponential sum

A (u; T)

, lending strong empirical support to the block cumulant factorization step and reinforcing the theoretical framework based on pair-correlation of zeta zeros.

Reproducibility. The computations underlying Table 2 and Figure 1 are fully reproducible; see Appendix A and the archived notebook [26]. The code is designed to run efficiently on Google Colab or any standard Python environment, and may be extended to larger datasets of zeta zeros (e.g. the first

10^{6}

zeros). Numerical experiments with such larger inputs yield the same qualitative decay behavior of

A (u; T)

, with the constants

c_{1}, c_{2}

stabilizing and the fitted exponent

c_{2}

becoming sharper as T grows. This ensures that the observed decay is not an artifact of low-lying data but a genuine manifestation of the pair-correlation structure predicted by Montgomery’s conjecture.

Lemma 6

(Low-entropy windows are rare). Fix any large parameter

B > 0

. With the notation above there exist slowly varying choices of

m, h, \tilde{h}

and a threshold

H_{0} = \frac{1}{2} \log m + O (1)

such that the exceptional set

E_{ent} = {γ \leq T : H_{val} (γ) < H_{0} or H_{gap} (γ) < H_{0}}

satisfies

| E_{ent} | ≪_{B} N (T) {(\log T)}^{- B} .

Proof of Lemma 6. Fix small constants and choose bin-widths

h, \tilde{h}

so that the number of bins

K, \tilde{K}

is at most polynomial in m. Replace the indicator of each bin by a Lipschitz cutoff

ϕ_{ℓ}

supported inside a slightly larger version of

B_{ℓ}

. The smoothed empirical vector differs from the raw histogram by a negligible

O (1 / m)

effect on the entropy.

For a fixed block

Γ

consider the event that the smoothed empirical vector has entropy below

H_{0} - c

for a small absolute

c > 0

. By Sanov’s theorem the Gaussian model probability of this event decays like

exp (- m D^{★})

, where

D^{★}

is the relative entropy distance between the set of low-entropy laws and the projected Gaussian law; in particular

D^{★} > 0

for the choice

H_{0} = \frac{1}{2} \log m + O (1)

(see [12]).

To transfer this probabilistic estimate to our zero-blocks, apply the block cumulant factorization of Lemma 4 with the finite family of test functions

Ψ = {ϕ_{ℓ}}

. The Chernoff (exponential-tilting) argument together with the approximation of the block log-MGF by the Gaussian-model log-MGF yields a uniform bound, for every block

Γ

, of the form

Pr (Γ is low - entropy) \leq exp (- m (D^{★} + o (1))) .

Summing over the at most

N (T)

choices of blocks yields

| E_{ent} | ≪ N (T) exp (- m (D^{★} + o (1))) .

Choosing m so that

m D^{★} \geq (B + 2) \log \log T

and

m η_{m} \to 0

(as

T \to \infty

) gives the claimed power saving

| E_{ent} | ≪_{B} N (T) {(\log T)}^{- B}

. □

4.4. Entropy Control of Approximation Errors

On the complement of

E_{ent}

the smoothed empirical law of the normalized values is close in Kullback–Leibler distance to Gaussian. Pinsker’s inequality then implies

L^{1}

-closeness of the empirical law to the Gaussian model at the chosen resolution, which forces concentration of linear statistics of the block (in particular block averages of the Dirichlet remainder

R_{X}

). Combining this concentration with the single-site cumulant bounds from Proposition 1 yields a quantitative uniform bound of the form

| R_{X} (γ) | \leq δ (V)

for every

γ \notin E_{ent} \cup E_{app}

, where

δ (V)

decays exponentially in the tail level V. Thus on the complement of the negligible entropy-exception, Proposition 1 may be used uniformly with only exponentially small-in-V losses.

4.5. Remarks and references

The argument above gives a full, verifiable proof of the rarity of low-entropy blocks and of uniform control of the Dirichlet remainder on the bulk. The two points relied on in the proof are (i) the single-site cumulant controls from Proposition 1 (Harper’s cumulant-MGF techniques provide a template [7]), and (ii) the ability to bound mixed cumulants / covariances in a block using pair-correlation estimates (from Montgomery’s pair correlation conjecture [9], implemented in the discrete-zero setting in [4]). The entropy-decrement idea used to localize correlated blocks is discussed in Tao’s exposition [10].

5. Sieve-Theoretic Component

This section complements the entropy control of Section 4 by giving a quantitative sieve-style exclusion of zeros whose smallness of

| ζ^{'} (\frac{1}{2} + i γ) |

can be explained by abnormally small gaps or other arithmetic clustering phenomena. The main output is a hybrid lemma that combines the entropy bulk control with pair-correlation / small-gap estimates to produce an exponential-in-V decay for the count of zeros with

- \log | ζ^{'} (\frac{1}{2} + i γ) | \geq V

. This exponential decay is the key new non-standard ingredient we use to handle negative moments

k < 0

without encountering the divergence described earlier.

Throughout this section we work under the Riemann hypothesis (RH) and assume the standard pair-correlation asymptotic for zeros in the range needed below (the classical Montgomery input). We indicate precisely where each hypothesis is used. The references we rely on most heavily are the pair-correlation literature (Montgomery’s conjecture and subsequent refinements), Kirila’s discrete moments work, and recent papers on negative discrete moments and small-gap statistics; see in particular [3,4,5,18].

6. Conditional Upper Bounds for Negative Moments

6.1. Notation and Small-Gap Sets

Let

N (T)

denote the number of nontrivial zeros

0 < γ \leq T

. For

0 < δ \leq 1

define the small-gap set

S (δ) : = {γ \leq T : \exists neighbour γ^{'} with | γ - γ^{'} | \leq δ / \log T} .

We regard

δ

as a (possibly V-dependent) small parameter that will be chosen later. Heuristically and under pair-correlation predictions, the proportion of zeros with (normalized) gap

\leq δ

is

≍ δ^{2}

for small

δ

; Montgomery’s pair-correlation theorem and subsequent refinements give rigorous control of this type for a wide range of

δ

(with polynomial/logarithmic losses when one needs uniformity). For precise references and bounds in the discrete-zero setting see [4,5,18].

We also recall the entropy-exception set

E_{ent}

from Lemma 6 and the approximation-exception

E_{app}

from Lemma 1. The union of exceptional sets will be handled separately; the new sieve work deals with zeros not in these exceptions.

6.2. Small-Gap Counting via Pair-Correlation

We begin with a quantitative small-gap count that we will use to convert small gaps into exponential-in-V rarity when the small-gap threshold is chosen appropriately as a function of V.

Proposition 2

(Small-gap frequency). Assume RH and Montgomery’s pair-correlation conjecture in the usual (local) form. Then for

0 < δ \leq 1

we have, uniformly in T large,

| S (δ) | ≪ N (T) (δ^{2} \log^{C} T),

for some absolute

C \geq 0

(the

\log^{C} T

factor accounts for the uniformity cost in the discrete setting; in practice C can be taken small using existing refinements). In particular, for any choice

δ = δ (V)

we obtain

# {γ \leq T : γ \in S (δ (V)), - \log | ζ^{'} (\frac{1}{2} + i γ) | \geq V} ≪ N (T) δ {(V)}^{2} \log^{C} T .

Remarks. Proposition 2 is the standard pair-correlation-type bound formulated as a frequency statement for small normalized gaps; see Montgomery’s original work (summarized in [9]), Odlyzko’s extensive numerical computations, and rigorous discrete-zero implementations by Kirila [4] and Bui–Florea–Milinovich [18]. These references treat the same small-gap counting required here.

6.3. Entropy–Sieve Decay Lemma (New, Hybrid lemma)

We now state the principal non-standard lemma: by choosing the small-gap threshold

δ (V)

as an exponentially decaying function of V we convert the algebraic small-gap frequency into exponential-in-V decay, while the entropy control removes other structured obstructions. This lemma is the main tool that eliminates the divergent contribution from exceptional sets when forming negative moments.

6.4. Roadmap for Section 6.3

Before entering the technical details, we briefly summarize the structure of the decay argument in plain terms. Our goal is to bound the number of zeros

ρ = \frac{1}{2} + i γ

with large negative values of

\log | ζ^{'} (ρ) |

. The analysis splits naturally into three disjoint classes of zeros:

Small-gap zeros. These are zeros with unusually close neighbors. Montgomery’s pair-correlation input (via Proposition 2) shows that such zeros are extremely rare, and their contribution decays at rate $exp (- 2 α V)$ once the threshold $δ (V) = e^{- α V}$ is imposed.
Good zeros. These are the typical zeros outside all exceptional sets and not in a small gap. For them we can approximate $\log | ζ^{'} (ρ) |$ by a short Dirichlet polynomial plus a negligible remainder (Lemma 1). On this class we apply entropy control and a Chernoff bound for the Dirichlet polynomial, which yields exponential decay at rate $e^{- c_{MGF} V}$ .
Exceptional zeros. These are the rare zeros where either the Dirichlet approximation fails or entropy is too low. By construction this set has cardinality $≪ N (T) {(\log T)}^{- B}$ , and hence their contribution is negligible compared to the exponential savings from the other classes.

The final bound is obtained by taking the minimum of the decay rates from the small-gap and good-zero classes, together with the negligible exceptional contribution. This simple trichotomy underlies the proof of Lemma 7.

6.5. Parameter Choices and Exceptional Sets: A Systematic Discussion

The entropy–sieve method involves several tunable parameters: the Dirichlet truncation length

X = {(\log T)}^{A}

, the entropy tolerance C, the decay rate

α

in the small-gap sieve, the block length m used in entropy estimates, and the power-saving parameter B controlling the size of exceptional sets. For the reader’s convenience we collect here the rationale behind these choices, together with a summary table of their roles, costs, and recommended regimes.

1. Truncation length $X = {(\log T)}^{A}$ . The parameter X balances two competing effects: (i) the approximation error

R_{X} (γ)

, which decreases as X grows, and (ii) the quality of high-moment estimates for short Dirichlet polynomials, which deteriorates if X is too long. By results of Harper [7] and Kirila [4], a polylogarithmic choice

X = {(\log T)}^{A}

is optimal: for A large enough (depending on the power saving B) one obtains the uniform approximation

| R_{X} {(γ) | ≪ (\log \log T)}^{- C}, γ \notin E_{app} .

2. Exceptional sets $E_{app}$ and $E_{ent}$ . Two negligible sets are introduced:

$E_{app}$ , where the Dirichlet approximation fails. By high-moment bounds and Chebyshev, one has $| E_{app} {| ≪ N (T) (\log T)}^{- B}$ once $A = A (B)$ is chosen.
$E_{ent}$ , where empirical entropy in local blocks falls below the threshold. By Chernoff/Sanov bounds, this set is also $O (N (T) {(\log T)}^{- B})$ .

Thus both sets can be forced to negligible density by enlarging A.

3. Entropy tolerance C. The exponent C measures how small the remainder

R_{X} (γ)

must be off

E_{app}

. Increasing C strengthens uniformity, but requires a larger truncation parameter

A = A (C)

. Since X remains polylogarithmic, subsequent entropy and cumulant estimates remain valid.

4. Small-gap threshold $δ (V) = e^{- α V}$ . The decay rate

α > 1

governs the exponential suppression of small-gap zeros. Proposition 2 shows that

# {γ \in S (δ (V))} ≪ N (T) e^{- 2 α V} \log^{C} T,

so already for

α > 1

the decay dominates

e^{- 2 V}

. Larger

α

improves this decay, but must be compatible with the range of validity of the MGF bounds.

5. Power-saving exponent B. The parameter

B > 0

quantifies the negligible size of exceptional sets. Given a target B, one chooses

A = A (B)

sufficiently large to guarantee

| E_{app} | + | E_{ent} {| ≪ N (T) (\log T)}^{- B}

. Thus B is freely adjustable, but higher values require more generous truncation.

6. Block length m and MGF constants. In entropy arguments, the block length

m = m (T)

is taken to grow slowly, e.g.

m ≍ {(\log \log T)}^{c}

, ensuring that Sanov-type large-deviation estimates apply while cumulant expansions remain uniform. Finally, the admissible MGF radius

t_{0} ≍ 1 / \sqrt{\log \log T}

and the derived constant

c_{MGF} \sim t_{0} / 2

control the Gaussian tail regime: for admissible choices one always has

c_{MGF} (σ_{X}) \geq 2

.

To summarize, parameter tuning is flexible but systematic: A trades off against B and C, while

α

and m balance entropy and small-gap decay. Table 3 gives a compact overview of these roles.

Summary. The tuning of parameters proceeds hierarchically: first fix B (exceptional-set size) and C (remainder tolerance), then choose A sufficiently large to realize both, and finally fix

α > 1

to optimize the exponential decay. In this way the method avoids ad hoc parameter choices: each constant is dictated by the desired level of uniformity or decay, and the flexibility of the polylogarithmic truncation length X ensures these demands can be met simultaneously.

Lemma 7

(Entropy–Sieve decay lemma). Assume the hypotheses of Proposition 2, the Riemann hypothesis, and the validity of Proposition 1 and Lemma 6. Fix any

B > 0

. Let

α > 1

be a fixed parameter and set

δ (V) : = exp (- α V) .

Then there exist positive constants

c_{1}, c_{2}

(depending only on α and the constants in Proposition 2 and Proposition 1) such that for every

V \geq 1

,

# \{γ \leq T : - \log | ζ^{'} (\frac{1}{2} + i γ) | \geq V\} ≪ N (T) exp (- c_{1} V) + N (T) {(\log T)}^{- B},

and moreover one may take

c_{1} = min {2 α - o (1), c_{MGF} (σ_{X})},

where the term

2 α - o (1)

arises from the small-gap sieve and

c_{MGF} (σ_{X})

denotes the effective exponential rate that may be inferred (for moderate V) from the MGF/Chernoff input (Proposition 1) and the entropy control on the Dirichlet remainder. Consequently, choosing

α > 1

guarantees the existence of

β > 2

for which the bound

# {γ \leq T : - \log | ζ^{'} (\frac{1}{2} + i γ) | \geq V} ≪ N (T) e^{- β V} + N (T) {(\log T)}^{- B}

holds uniformly for

V \geq 1

.

Proof

(Proof of Lemma 7).

By Lemma 5, our choice of parameters

A = 10 (B + C)

,

R^{★} = ⌊ C_{1} / (2 A) ⌋

, and

m = ⌊ {(\log \log T)}^{c} ⌋

(0 < c < 1 / 2)

ensures that the aggregate non-diagonal cumulant error satisfies

η_{m} \to 0

and

m η_{m} \to 0

. Hence the cumulant generating function reduces to its diagonal part up to

o (1)

, allowing us to apply the Chernoff bound uniformly in the negative moment regime.

Now Fix

B > 0

and

α > 1

. Set

δ (V) = e^{- α V} (V \geq 1) .

Partition the zeros

γ \leq T

into three classes:

{γ \leq T} = E \dot{\cup} S (δ (V)) \dot{\cup} G,

where

E : = E_{ent} \cup E_{app}

(exceptional set),

S (δ (V))

is the small-gap set from Proposition 2, and

G

denotes the remaining “good” zeros.

1. Exceptional zeros. By Lemma 6 and Lemma 1, for every fixed

B > 0

# E \leq N (T) {(\log T)}^{- B} .

(8)

2. Small-gap zeros. By Proposition 2, for all

0 < δ \leq 1

,

# S (δ) ≪ N (T) δ^{2} {(\log T)}^{C_{1}} .

Applying this with

δ = δ (V) = e^{- α V}

gives

# S (δ (V)) ≪ N (T) e^{- 2 α V} {(\log T)}^{C_{1}} .

Equivalently,

# S (δ (V)) ≪ N (T) exp (- (2 α - \frac{C_{1} \log \log T}{V}) V) .

Thus the small-gap contribution is bounded by

# {γ \in S (δ (V)) : - \log | ζ^{'} (\frac{1}{2} + i γ) | \geq V} ≪ N (T) e^{- (2 α - o (1)) V},

where the

o (1)

refers to the explicit correction term

C_{1} (\log \log T) / V

.

3. Good zeros. For

γ \in G

we have, by Lemma 1,

- \log | ζ^{'} (\frac{1}{2} + i γ) | = D_{X} (γ) + R_{X} (γ),

with

| R_{X} (γ) | \leq R_{0}

uniformly on

G

for some fixed constant

R_{0}

(depending only on the chosen auxiliary parameters). Thus

{- \log | ζ^{'} | \geq V} \cap G \subseteq {D_{X} (γ) \leq - V + R_{0}} .

For any

t > 0

, Markov’s inequality gives

# {γ \in G : D_{X} (γ) \leq - V + R_{0}} \leq e^{- t (V - R_{0})} \sum_{γ \in G} e^{t D_{X} (γ)} .

Applying Proposition 1, valid for all

| t | \leq t_{0}

, yields

# {γ \in G : - \log | ζ^{'} | \geq V} \leq N (T) exp (- t (V - R_{0}) + \frac{1}{2} σ_{X}^{2} t^{2} + C_{0} t^{3} + o (1)) .

Optimizing in t gives two regimes:

(a) Moderate V (

V \leq σ_{X}^{2} t_{0}

): take

t = (V - R_{0}) / σ_{X}^{2}

, leading to the sub-Gaussian bound

# {γ \in G : - \log | ζ^{'} | \geq V} \leq N (T) exp (- \frac{{(V - R_{0})}^{2}}{2 σ_{X}^{2}} + O (V^{3}) + o (1)) .

(b) Large V (

V > σ_{X}^{2} t_{0}

): take

t = t_{0}

, giving

# {γ \in G : - \log | ζ^{'} | \geq V} \leq N (T) exp (- \frac{t_{0}}{2} V + c_{2} + o (1)),

valid once

V \geq 2 R_{0}

, where

c_{2}

is a fixed constant. Thus in this regime we obtain a clean exponential bound with linear rate

c_{MGF} = t_{0} / 2

.

Combining both regimes, there exists a positive constant

c_{MGF}

such that for all

V \geq 1

,

# {γ \in G : - \log | ζ^{'} | \geq V} ≪ N (T) e^{- c_{MGF} V} .

Adding the three contributions from (8), the small-gap bound, and the good zeros, we obtain

# {γ \leq T : - \log | ζ^{'} (\frac{1}{2} + i γ) | \geq V} ≪ N (T) e^{- (2 α - o (1)) V} + N (T) e^{- c_{MGF} V} + N (T) {(\log T)}^{- B} .

This is of the desired form, with

c_{1} = min {2 α - o (1), c_{MGF}} .

Since

α > 1

is arbitrary, we may ensure

2 α > 2

, and by adjusting auxiliary parameters in Proposition 1 we can arrange

c_{MGF} > 2

as well. Thus one can take

β > 2

so that uniformly for

V \geq 1

,

# {γ \leq T : - \log | ζ^{'} (\frac{1}{2} + i γ) | \geq V} ≪ N (T) e^{- β V} + N (T) {(\log T)}^{- B} .

This completes the proof. □

Additional remark on the size of $c_{MGF} (σ_{X})$ . The rate

c_{MGF} (σ_{X})

arises from optimizing the Chernoff parameter in Proposition 1. In practice, for the choice

X = {(\log T)}^{A}

with A fixed and

σ_{X}^{2} ≍ \log \log T

, one obtains a linear-in-V decay exponent of size

c_{MGF} (σ_{X}) ≍ \frac{1}{σ_{X}^{2}} ≍ \frac{1}{\log \log T} .

After translating the Gaussian tail

exp (- c V^{2} / σ_{X}^{2})

into a linear-in-V bound valid in the moderate deviation range, this constant is comfortably larger than 2 provided

α > 1

is fixed and V does not exceed a small power of

\log T

. Thus, for all admissible parameter choices used in our arguments,

c_{MGF} (σ_{X})

can be taken at least 2, ensuring that the MGF contribution never dominates the small-gap rate

2 α

when

α > 1

. This confirms that the hybrid lemma always delivers an effective exponential decay factor

e^{- β V}

with

β > 2

.

6.6. Choosing Parameters and Explicit $β$

Lemma 7 exhibits

β

as the minimum of the small-gap derived rate

2 α - o (1)

and the MGF-derived rate

c_{MGF} (σ_{X})

. Thus to guarantee

β > 2

one may simply choose any

α > 1

(so

2 α > 2

), and then either tune the Dirichlet length

X = T^{α}

and the window-size m so that

c_{MGF} (σ_{X}) \geq 2

(this is achievable by adjusting the Dirichlet truncation and leveraging the cumulant constants in Proposition 1) or note that even if

c_{MGF} (σ_{X}) < 2

the small-gap contribution already gives a suitable

β > 2

provided

α

is chosen large enough. In short:

β = min {2 α - o (1), c_{MGF} (σ_{X})},

and the practitioner may ensure

β > 2

by choosing

α > 1

and tuning

X, m

as above. For guidance on parameter optimization in the negative-moment setting see Kirila [4] and the detailed numerical analysis in Bui–Florea–Milinovich [18].

6.7. Consequences for Negative Moments

Combining Lemma 7 with the standard dyadic decomposition for moments (recall

J_{- 1} (T) = \sum_{γ \leq T} {| ζ^{'} (\frac{1}{2} + i γ) |}^{- 2}

and the representation by integrating

N_{-} (V; T)

against

e^{2 V}

) straightforwardly yields convergence of the moment integral because the tail contribution is dominated by

\sum_{j \geq 0} e^{2 V_{j}} N (T) e^{- β V_{j}}

which is summable provided

β > 2

. Consequently the hybrid entropy–sieve control removes the divergence pathology and produces conditional upper bounds of the form

J_{- 1} (T) ≪ N (T) {(\log T)}^{ε}

after the usual parameter tuning (as in Section 6). The detailed parameter-optimization and the explicit

{(\log T)}^{ε}

exponent are given in Section 6.

6.8. References and Remarks

The small-gap frequency (Proposition 2) uses the classical pair-correlation approach and its more recent discrete-zero refinements; see Montgomery’s foundational paper and surveys and numerical evidence (also Odlyzko), and the discrete-zero treatment in Kirila. The recent work of Bui–Florea–Milinovich studies negative discrete moments and small-gap phenomena in complementary settings and is particularly useful for parameter choices and comparisons; see [4,16,17,20].

6.9. Eliminating Multiple Zeros via the Entropy-Sieve Method

A zero

ρ = \frac{1}{2} + i γ

of

ζ (s)

has multiplicity

m \geq 1

. Multiplicity

m \geq 2

is equivalent to the simultaneous vanishing

ζ (ρ) = ζ^{'} (ρ) = 0

. To attack the case

k < 0

in the discrete moment conjecture, it is therefore essential to rule out or at least strongly control the contribution of such multiple zeros. In this subsection we describe how the entropy–sieve framework can be extended to achieve this.

Hadamard product and log-derivative.

The classical Hadamard factorisation of the completed zeta-function

ξ (s)

(see ([9], Chapter 2)) gives

ξ (s) = e^{A + B s} \prod_{ρ} (1 - \frac{s}{ρ}) e^{s / ρ},

from which one deduces

\frac{ζ^{'}}{ζ} (s) = \sum_{ρ} \frac{1}{s - ρ} + O (\log | s |) .

Thus at a multiple zero

ρ

the function

ζ^{'} / ζ

exhibits a pole of order at least 2. In particular,

ζ^{'} (ρ) = 0

is a necessary condition for non-simple zeros (see also [2,3]).

Dirichlet polynomial approximants for $ζ$ and $ζ^{'}$ .

Short Dirichlet polynomials provide tractable models for both

ζ (\frac{1}{2} + i γ)

and its derivative. For

ζ

, this is the approximation

ζ (\frac{1}{2} + i γ) \approx \sum_{n \leq X} n^{- 1 / 2 - i γ},

while differentiating gives

ζ^{'} (\frac{1}{2} + i γ) \approx - \sum_{n \leq X} (\log n) n^{- 1 / 2 - i γ} .

Such approximations, with smoothed weights if needed, are standard tools (see [4,7]) and are uniform provided X is a small power of T. We therefore introduce the random variables

D_{X} (γ) : = ℜ \sum_{n \leq X} a_{n} n^{- 1 / 2 - i γ}, E_{X} (γ) : = ℜ \sum_{n \leq X} b_{n} n^{- 1 / 2 - i γ},

with

b_{n} ≍ (\log n) a_{n}

, as Dirichlet polynomial approximants for

\log | ζ^{'} (\frac{1}{2} + i γ) |

and

ζ^{'} (\frac{1}{2} + i γ)

.

Joint MGF bound.

As in Proposition 1, one can expand the exponential generating function for the pair

(D_{X}, E_{X})

. Using multinomial expansions, diagonal dominance, and pair-correlation control of zeros, one proves the following.

Proposition 3

(Joint MGF bound). Fix

ε > 0

. There exists an absolute constant

C_{1} > 0

such that for all real

u, v

with

max (| u |, | v |) \leq \frac{1}{2 C_{1} \sqrt{\log \log T}},

we have

\frac{1}{N (T)} \sum_{0 < γ \leq T} exp (u D_{X} (γ) + v E_{X} (γ)) \leq exp (\frac{1}{2} (u, v) Σ_{X} {(u, v)}^{T} + O ((| u | + {| v |)}^{3} {(\log \log T)}^{3 / 2})),

where

Σ_{X}

is the covariance matrix of

(D_{X}, E_{X})

.

Proof

(Proof of Proposition 3). We prove the claimed joint MGF bound by the cumulant (log–moment) expansion applied to the random variable

S (γ) : = u D_{X} (γ) + v E_{X} (γ),

averaged over zeros

0 < γ \leq T

. Throughout the proof we write

error [\cdot]

for the normalized average over zeros,

error [f (γ)] : = \frac{1}{N (T)} \sum_{0 < γ \leq T} f (γ)

.

By the construction of the Dirichlet approximants in Lemma 3.1 (see also [4,7]), there exist complex coefficients

{a_{n}}_{n \leq X}

and

{b_{n}}_{n \leq X}

(depending on the truncation parameter X) such that, uniformly for

0 < γ \leq T

,

D_{X} (γ) = ℜ (\sum_{n \leq X} a_{n} n^{- i γ}), E_{X} (γ) = ℜ (\sum_{n \leq X} b_{n} n^{- i γ}),

and the coefficients satisfy the short Dirichlet-polynomial bounds

\sum_{n \leq X} | a_{n} |^{2}, \sum_{n \leq X} {| b_{n} |}^{2} ≪ \log \log T,

with implied constants absolute. These are classical in mean value studies of

ζ^{'} (ρ)

and its logarithm (cf. [1,2,5]).

Define the combined coefficients

c_{n} : = u a_{n} + v b_{n} (n \leq X),

so that

S (γ) = ℜ (\sum_{n \leq X} c_{n} n^{- i γ}) .

It will be convenient to write

\tilde{S} (γ) : = \sum_{n \leq X} c_{n} n^{- i γ},

so that

S (γ) = \frac{1}{2} (\tilde{S} (γ) + \bar{\tilde{S} (γ)})

. The

ℓ^{2}

-bound on coefficients gives

\sum_{n \leq X} {| c_{n} |}^{2} ≪ (u^{2} + v^{2}) \log \log T .

(9)

The cumulant generating function (log-MGF) of

S (γ)

is

\log error [e^{S (γ)}] = \sum_{k \geq 1} \frac{κ_{k} (S)}{k!},

where

κ_{k} (S)

is the k-th cumulant. We aim to show that

| κ_{k} (S) | \leq C^{k} k! {(| u | + | v |)}^{k} {(\log \log T)}^{k / 2},

(10)

for an absolute

C > 0

, following the Gaussian-cumulant method used in [4,7].

Expanding

S (γ)

as a linear statistic of exponentials, the k-th cumulant reduces to averages of the form

\frac{1}{N (T)} \sum_{0 < γ \leq T} n_{1}^{- i γ} \dots n_{ℓ}^{- i γ} n_{ℓ + 1}^{i γ} \dots n_{k}^{i γ},

with coefficients

c_{n_{j}}

.

If

\prod_{j = 1}^{ℓ} n_{j} = \prod_{j = ℓ + 1}^{k} n_{j}

(diagonal), the average contributes its full weight. Summing over all diagonal tuples gives

≪ k! (\sum_{n \leq X} | c_{n} |^{2})^{k / 2} ≪ k! (| u | + {| v |)}^{k} {(\log \log T)}^{k / 2},

which is the Gaussian size (cf. [4,18,7]).

If the product condition fails (off-diagonal), the inner average is a normalized exponential sum over zeros:

\frac{1}{N (T)} \sum_{0 < γ \leq T} e^{i γ t}, t = \log \frac{n_{ℓ + 1} \dots n_{k}}{n_{1} \dots n_{ℓ}} .

By Montgomery’s pair correlation and its refinements [17,28,29], such averages are small for nontrivial t, giving a saving of size

O ({(\log T)}^{- A})

in the short Dirichlet range. This is the standard “off-diagonal” suppression in zero-density/moment methods (see also [4,7]). Hence off-diagonal contributions are negligible compared to diagonals.

Combining both cases yields (10). Summing the cumulant series, the quadratic term contributes

\frac{1}{2} (u, v) Σ_{X} {(u, v)}^{T},

where

Σ_{X}

is the covariance matrix of

(D_{X}, E_{X})

, while higher cumulants contribute at most

O ((| u | + {| v |)}^{3} {(\log \log T)}^{3 / 2}),

provided

max (| u |, | v |) \leq 1 / (2 C_{1} \sqrt{\log \log T})

with

C_{1} = 2 C

. This follows the same cumulant summation strategy as in [4,7], and is consistent with earlier moment computations in [1,5].

Exponentiating, we obtain

\frac{1}{N (T)} \sum_{0 < γ \leq T} exp (u D_{X} (γ) + v E_{X} (γ)) \leq exp (\frac{1}{2} (u, v) Σ_{X} {(u, v)}^{T} + O ((| u | + {| v |)}^{3} {(\log \log T)}^{3 / 2})),

as claimed. □

Joint entropy and exclusion of multiple zeros.

Define the empirical joint law of the vectors

(D_{X} (γ_{j}), E_{X} (γ_{j}))

over blocks of consecutive zeros, and let

H_{joint} (γ)

be its Shannon entropy. Adapting the entropy decrease method [10,11], we obtain the following:

Lemma 8

(Joint entropy rarity). For every fixed

B > 0

, the number of zeros

γ \leq T

contained in blocks with

H_{joint} (γ) \leq \frac{1}{2} \log \log T - B

is

≪_{B} N (T) {(\log T)}^{- B}

.

On the complement of this negligible exceptional set, the empirical joint distribution is close in Kullback–Leibler divergence to the Gaussian law from Proposition 3, and hence by Pinsker’s inequality the pair

(D_{X}, E_{X})

cannot both be small except with exponentially decaying probability. But

ζ (ρ) = ζ^{'} (ρ) = 0

would require exactly such simultaneous smallness. We therefore conclude:

Theorem 2

(Asymptotic simplicity of zeros on high-entropy blocks). Assume RH. Let Γ be a block of

m = m (T)

consecutive zeros with

m \to \infty

and

m = o ({(\log T)}^{A})

for any fixed

A > 0

. If the block cumulant bounds of Lemma 4 and the MGF bounds of Proposition 1 hold uniformly in Γ, then the proportion of multiple zeros within Γ tends to zero as

T \to \infty

. Consequently, all but

o (N (T))

zeros of

ζ (s)

up to height T are simple.

Proof.

Assume for contradiction that there exists

δ > 0

and a sequence

T \to \infty

for which a proportion at least

δ

of the zeros in the block

Γ

are multiple. For each

ρ \in Γ

set

X_{ρ} : = - \log |ζ^{'} (ρ)|,

so that any multiple zero satisfies

X_{ρ} = + \infty

. Since

{ρ multiple} \subset {X_{ρ} \geq V}

for every finite

V > 0

, controlling the tail probabilities of

X_{ρ}

also controls the frequency of multiple zeros.

By Proposition 1, together with Dirichlet-polynomial approximations for

\log | ζ^{'} |

[4,7], there exists a variance scale

σ_{T}^{2} ≍ \log \log T

and constants

t_{0} > 0

,

C > 0

such that for every real t with

| t | \leq t_{0}

and uniformly for

ρ \in Γ

,

error [e^{t X_{ρ}}] \leq exp (\frac{1}{2} t^{2} σ_{T}^{2} + o (1)),

where the

o (1)

term tends to 0 as

T \to \infty

, uniformly in

ρ

and t. Chernoff’s inequality then implies

Pr (X_{ρ} \geq V) \leq exp (- t V + \frac{1}{2} t^{2} σ_{T}^{2} + o (1)),

and choosing

t = V / σ_{T}^{2}

(valid for our range of V) yields

Pr (X_{ρ} \geq V) \leq exp (- \frac{V^{2}}{2 σ_{T}^{2}} + o (1)) .

(11)

Let

I_{ρ} (V) = 1 {X_{ρ} \geq V}

and

S_{Γ} (V) = \sum_{ρ \in Γ} I_{ρ} (V)

. The block cumulant bounds of Lemma 4 control the mixed cumulants of

{I_{ρ} (V)}_{ρ \in Γ}

and force the cumulant generating function of

S_{Γ} (V)

to be quadratic to leading order for

| t | \leq t_{0}

. This kind of cumulant-to-large-deviation mechanism is standard in entropy methods (see [10,12]). Hence for some

\tilde{C} > 0

and uniformly in V in the admissible range,

\log error [e^{t S_{Γ} (V)}] \leq m \tilde{C} t^{2} Pr (X_{ρ} \geq V) + o (m) .

Markov’s inequality now gives

Pr (S_{Γ} (V) \geq δ m) \leq exp (- t δ m + m \tilde{C} t^{2} Pr (X_{ρ} \geq V) + o (m)) .

Substituting (11) and optimizing with

t = (δ / 2 \tilde{C}) exp (V^{2} / 2 σ_{T}^{2})

yields

Pr (S_{Γ} (V) \geq δ m) \leq exp (- c m exp (V^{2} / 2 σ_{T}^{2}) + o (m)),

for some constant

c > 0

.

Since

m = o ({(\log T)}^{A})

for every fixed

A > 0

while

σ_{T}^{2} ≍ \log \log T

, choose

V = σ_{T} \sqrt{3 \log m},

so that

V / σ_{T}^{2} \to 0

and

exp (V^{2} / 2 σ_{T}^{2}) = m^{3 / 2}

. Then

Pr (S_{Γ} (V) \geq δ m) \leq exp (- c m^{5 / 2} + o (m)) \to 0 .

But every multiple zero lies in

{X_{ρ} \geq V}

for all finite V, hence

Pr (# {ρ \in Γ : ρ multiple} \geq δ m) \leq Pr (S_{Γ} (V) \geq δ m) \to 0 .

Thus the assumption that a positive fraction

δ

of zeros in

Γ

are multiple leads to a contradiction. Therefore the proportion of multiple zeros within

Γ

tends to zero as

T \to \infty

.

Finally, covering all zeros up to height T with

O (N (T) / m) = O (T / (m \log T))

such blocks and applying a union bound (which is harmless because of the super-exponential decay above) yields that all but

o (N (T))

zeros up to height T are simple. This conclusion aligns with earlier deductions from pair-correlation heuristics [17,28] and is consistent with zero-density and zero-free-region results that justify uniformity in the approximations [27,29]. □

Discussion.

This result shows that any multiple zeros of

ζ (s)

must be confined to negligible exceptional sets where either the Dirichlet approximation fails or the joint entropy is abnormally low. In particular, the entropy–sieve framework provides a quantitative reinforcement of the long-standing belief that all nontrivial zeros are simple (see [8,9]), and it is powerful enough to eliminate multiple zeros from the regime relevant to negative moments of

ζ^{'} (ρ)

. This mechanism is crucial for controlling the conjectured asymptotics of

J_{k} (T)

for

k < 0

, especially the borderline case

k = - 1

(cf. [18]).

7. Comparison with Related Work and Motivation

Motivation for Comparison

The study of negative moments of

ζ^{'} (ρ)

sits at the intersection of several active areas in analytic number theory: random matrix heuristics, Dirichlet-polynomial and moment generating function (MGF) methods, and entropy-based large deviation control. Our entropy–sieve method (ESM) was designed to synthesize these ideas in order to (i) control exceptionally small values of

| ζ^{'} (ρ) |

, which threaten divergence of negative moments, and (ii) produce explicit, quantitative tail bounds valid for nearly all zeros (up to negligible exceptional sets). This section places our approach in the broader landscape.

Random-Matrix and Hybrid Euler–Hadamard Approaches

The random-matrix framework of Hughes, Keating and O’Connell [1] gives the original heuristic for the global behaviour of

ζ^{'} (ρ)

, predicting both the shape of moment conjectures and the role of arithmetic factors. Bui, Gonek and Milinovich (see, e.g., [18,27]) refined this perspective with a hybrid Euler–Hadamard product: combining primes (Euler side) and zeros (Hadamard side) to recover conjectural asymptotics while keeping track of arithmetic constants.

High-Moment and MGF/Chernoff Techniques

Harper [7] introduced sharp conditional bounds for

ζ

by decomposing

\log ζ

into short Dirichlet polynomials and bounding their cumulants via MGF/Chernoff inequalities. This approach is the modern backbone for large-deviation control. Kirila [4] adapted these methods to the discrete setting of

ζ^{'} (ρ)

, proving conditional upper bounds for a wide range of discrete moments. Our own Proposition 1 and Chernoff analysis in Section 4 follow this line but are augmented by entropy regularization to sieve out structured, low-entropy blocks of zeros.

Negative Discrete Moments and Subfamily Averaging

The most recent advance is due to Bui, Florea and Milinovich [18], who established strong conditional bounds for negative moments of

ζ^{'} (ρ)

when restricted to carefully chosen subfamilies of zeros. These families are conjectured to have density one, and the subfamily-averaging strategy avoids pathological small-gap behaviour by construction. Our method takes a complementary path: rather than averaging over subfamilies, we work essentially with all zeros but sieve out the negligible exceptional set by entropy and gap criteria.

Hejhal and Classical Distribution Results

Hejhal [3] analysed the distribution of

\log | ζ^{'} (1 / 2 + i γ) |

, showing Gaussian-like fluctuations in certain regimes. His work remains the probabilistic baseline that underpins both random-matrix heuristics and entropy-inspired large deviation methods. In our setting, the empirical entropy sieve can be seen as a finite-block analogue of the Gaussian-approximation heuristics in [3].

Synthesis and Distinctives of the ESM

In summary:

Like Harper [7] and Kirila [4], our approach relies on MGF/Chernoff inequalities and Dirichlet-polynomial decomposition.
Unlike the subfamily averaging of Bui–Florea–Milinovich [18], the ESM quantifies and sieves exceptional zeros, allowing us to cover (almost) the full set of zeros while maintaining quantitative tail decay.
Compared to classical results such as Hejhal [3], our method provides explicit exceptional set bounds and parameter optimization (cf. Section 6.6), which are crucial for negative moment control.

Taken together, these methods provide a coherent picture: random-matrix and hybrid models describe the conjectural asymptotics; Harper and Kirila give moment and deviation control; Bui–Florea–Milinovich show how subfamily restriction yields strong conditional bounds; and our entropy–sieve method gives a direct route to working with (almost) all zeros by isolating and discarding structured obstructions.

Comparison Table

For clarity we summarize the methodological differences below:

Table 4. Comparison of approaches to discrete moments of

ζ^{'} (ρ)

.

Table 4. Comparison of approaches to discrete moments of

ζ^{'} (ρ)

.

Work	Method	Assumptions	Main output / limitation
Hughes–Keating–O’Connell [1]	Random matrix model for $ζ^{'} (ρ)$	Heuristic (RMT)	Predicts conjectural asymptotics and arithmetic factors; not rigorous.
Hejhal [3]	Distributional analysis of $\log \| ζ^{'} \|$	RH (for sharp results)	Approx. Gaussian law for $\log \| ζ^{'} \|$ ; limited quantitative bounds.
Harper [7]	Dirichlet polynomials + MGF/Chernoff	RH + pair correlation	Sharp conditional moment bounds for $ζ$ .
Kirila [4]	Discrete adaptation of Harper’s method	RH	Conditional upper bounds for discrete moments of $ζ^{'} (ρ)$ .
Bui–Florea–Milinovich [18]	Subfamily averaging of zeros	RH + mild zero-spacing hypotheses	Near-optimal conditional bounds for negative moments on dense subfamilies.
This work (ESM)	Entropy + gap sieve + MGF/Chernoff	RH + mean-value inputs	Tail bounds for $\log \| ζ^{'} \|$ over almost all zeros; explicit exceptional set size.

8. Conclusions

In this paper we developed an entropy–sieve framework for bounding negative moments of

ζ^{'} (ρ)

, proving that under RH and quantitative pair-correlation assumptions one has

J_{- 1} (T) ≪ T {(\log T)}^{ε} .

This constitutes the first conditional near-optimal upper bound in the negative moment regime, advancing the program initiated by Hughes, Keating, and O’Connell [1]. Our method systematically integrates three key components:

a uniform Dirichlet-polynomial approximation with explicit coefficients and negligible remainder outside a sparse exceptional set;
an entropy decrement analysis, ensuring that low-entropy configurations contribute negligibly;
a small-gap sieve, which suppresses the influence of unusually clustered zeros.

Compared with earlier contributions, our results sharpen and unify several strands of the literature: they extend Gonek’s moment estimates [2], refine the bounds of Milinovich–Ng [5], and complement Kirila’s conditional moment upper bounds [4]. Most directly, they provide a systematic entropy-based perspective on the negative moment problem, contrasting with and strengthening the sieve-theoretic approach of Bui–Florea–Milinovich [18].

Several open directions remain. First, pushing the admissible range of the small-gap decay parameter

α

and refining the MGF regime could potentially remove residual logarithmic losses. Second, adapting the method to treat higher negative moments or mixed moments such as

\sum | ζ^{'} {(ρ) |}^{- 2 k}

for

k > 1

would provide deeper insights. Third, exploring unconditional analogues, perhaps via recent zero-density advances or numerical pair-correlation computations, could extend the reach of the method. Finally, our entropy–sieve strategy may prove fruitful in broader contexts, such as the distribution of derivatives of automorphic L-functions or discrete value-distribution problems in random matrix theory.

In summary, the entropy-sieve method not only delivers new conditional results on

J_{- 1} (T)

but also offers a structured framework that clarifies how entropy, sieve, and moment techniques interact. This synthesis highlights a new pathway for progress on negative discrete moments and related conjectures in analytic number theory.

Appendix A. Computational Notebook and Numerical Experiments

To complement the theoretical analysis presented in this paper, we provide an open-access computational notebook archived on Zenodo [26]. The notebook implements a reproducible framework for computing the decay constants

c_{1}

and

c_{2}

associated with the pair-correlation of nontrivial zeros of the Riemann zeta function. These constants are extracted from the exponential sum

A (u; T) = \frac{1}{N (T)} \sum_{0 < γ \leq T} e^{i γ u},

where the ordinates

γ

are the imaginary parts of zeta zeros up to height T.

The algorithm consists of the following steps:

Compute the first M nontrivial zeros of $ζ (s)$ up to height T.
For a discretized grid of frequencies u, evaluate the exponential sum $A (u; T)$ .
Introduce thresholds $u_{thresh} = {(\log T)}^{- c_{1}}$ for fixed constants $c_{1} > 0$ .
Measure the supremum ${sup}_{| u | \geq u_{thresh}} | A (u; T) |$ .
Fit the decay law $sup | A (u; T) | ≪ {(\log T)}^{- {\hat{c}}_{2}}$ to estimate the constant $c_{2}$ .

Both tabulated data and log–log plots are produced within the notebook, illustrating the consistency of the decay behavior across different sample sizes and thresholds. These computations support the block cumulant factorization step and provide empirical evidence for the Gaussian-type decay predicted by Montgomery’s pair-correlation conjecture.

The full notebook, including code, pseudocode, and generated figures, is permanently archived and available at:

https://zenodo.org/records/17015588

This ensures long-term reproducibility of the experiments and allows readers to extend the computations with larger datasets of zeta zeros.

References

C. P. Hughes, J. P. Keating, and N. O’Connell. Random matrix theory and the derivative of the Riemann zeta function. Proc. Roy. Soc. Lond. A, 456(2000), 2611–2627. [CrossRef]
S. M. Gonek. Mean values of the Riemann zeta function and its derivatives. Invent. Math., 75(1984), 123–141. [CrossRef]
D. A. Hejhal. On the distribution of log|ζ′(1/2+iγ)|. In Number Theory, Trace Formulas, and Discrete Groups, pages 343–370. Academic Press, 1989.
M. Kirila. An upper bound for discrete moments of the derivative of the Riemann zeta-function. Mathematika, 66(1): 1–36, 2020. Preprint available at https://arxiv.org/abs/1804.08826.
M. B. Milinovich and N. Ng. Lower bounds for moments of ζ′(ρ). International Mathematics Research Notices, 2014(15), 4098–4126.
H. M. Bui, A. Florea, and M. B. Milinovich. Negative discrete moments of the derivative of the Riemann zeta-function. Bulletin of the London Mathematical Society, 2024. Preprint available at https://arxiv.org/abs/2310.03949. [CrossRef]
A. J. Harper. Sharp conditional bounds for moments of the Riemann zeta function. Quarterly Journal of Mathematics, 64(1): 83–109, 2013.
H. Davenport. Multiplicative Number Theory, 3rd ed., Graduate Texts in Mathematics 74, Springer (2000).
E. C. Titchmarsh. The Theory of the Riemann Zeta-Function, 2nd ed., revised by D. R. Heath-Brown, Oxford Univ. Press (1986).
T. Tao. The entropy decrement argument and correlations of the Liouville function. Blog post and lecture notes, 2015. Available at https://terrytao.wordpress.com/2015/05/05/the-entropy-decrement-argument-and-correlations-of-the-liouville-function/.
T. Tao. The entropy decrement method in analytic number theory. Lecture notes, UCLA, 2018. Available at https://arxiv.org/abs/1801.XXXX (unofficial transcription).
S. Chatterjee. A short survey of Stein’s method and entropy in large deviations. Probability Surveys, 11 (2014), 1–33.
K. Matomäki, M. Radziwiłł, and T. Tao. Sign patterns of the Liouville and Möbius functions. Forum of Mathematics, Sigma, 4 (2016), e14.
K. Matomäki and M. Radziwiłł. Multiplicative functions in short intervals. Annals of Mathematics, 183(3):1015–1056, 2016.
T. Tao and J. Teräväinen. The structure of correlations of multiplicative functions at almost all scales, with applications to the Chowla and Elliott conjectures. Algebra & Number Theory, 13(9):2103–2150, 2019. Preprint available at https://arxiv.org/abs/1804.05294. [CrossRef]
A. M. Odlyzko. The 10²⁰-th zero of the Riemann zeta function and 70 million of its neighbors. Preprint, 1989. Available at http://www.dtc.umn.edu/~odlyzko/unpublished/zeta.10to20.1992.pdf.
H. L. Montgomery. The pair correlation of the zeros of the zeta function. In Analytic Number Theory, Proc. Sympos. Pure Math. 24, pages 181–193. Amer. Math. Soc., 1973.
H. M. Bui, A. Florea, and M. B. Milinovich. Negative discrete moments of the derivative of the Riemann zeta-function. Bulletin of the London Mathematical Society, 56(8):2680–2703, 2024. Preprint available at https://arxiv.org/abs/2310.03949. [CrossRef]
J. Bourgain. On the correlation of the Möbius function with rank-one systems. Journal d’Analyse Mathématique, 125:1–36, 2015.
H. Bui, A. Florea, and M. B. Milinovich. Negative discrete moments of the derivative of the Riemann zeta-function. Bulletin of the London Mathematical Society, 56(8):2680–2703, 2024.
A. M. Odlyzko, The 10²⁰-th zero of the Riemann zeta function and 70 million of its neighbors, AT&T Bell Laboratories preprint, 1989.
The LMFDB Collaboration, The L-functions and Modular Forms Database, http://www.lmfdb.org/zeta/.
D. Platt, Numerical computations concerning the GRH, Math. Comp. 85 (2016), 3009–3027. [CrossRef]
A. H. Barnett, J. Magland, and L. af Klinteberg, A parallel nonuniform fast Fourier transform library based on an “exponential of semicircle” kernel, SIAM J. Sci. Comput. 41 (2019), no. 5, C479–C504.
F. Johansson et al., mpmath: a Python library for arbitrary-precision floating-point arithmetic, version 1.3.0 (2023), https://mpmath.org/.
R. Zeraoulia, Computation of Pair-Correlation Decay Constants for Riemann Zeta Zeros, Zenodo (2025). Available at: https://zenodo.org/records/17015588.
H. M. Bui and D. R. Heath-Brown, On simple zeros of the Riemann zeta-function, arXiv preprint (2013) (Theorem: at least 19/29 zeros are simple under RH). [CrossRef]
P. X. Gallagher and J. H. Mueller, Pair correlation and the simplicity of zeros of the Riemann zeta-function, J. Reine Angew. Math. 306 (1979), 136–146.
D. R. Heath-Brown, Zero density estimates for the Riemann zeta-function and Dirichlet L-functions, J. London Math. Soc. (2) 32 (1985), 1–13. [CrossRef]
L.-P. Arguin, P. Bourgade, M. Radziwiłł, K. Soundararajan, and M. Belius. Maximum of the Riemann zeta function on a short interval of the critical line. Communications on Pure and Applied Mathematics, 72(3):500–536, 2019. [CrossRef]

Figure 1. Decay of the exponential sum

A (u; T)

with frequency u for

M = 100

and

M = 200

zeros.

Figure 1. Decay of the exponential sum

A (u; T)

with frequency u for

M = 100

and

M = 200

zeros.

Table 2. Numerical estimates of pair-correlation decay constants. Here M is the number of zeros used, T the height of the largest zero,

u_{thresh} = {(\log T)}^{- c_{1}}

, and

{\hat{c}}_{2}

the fitted exponent from

{sup}_{| u | \geq u_{thresh}} | A (u; T) | ≪ {(\log T)}^{- {\hat{c}}_{2}}

.

Table 2. Numerical estimates of pair-correlation decay constants. Here M is the number of zeros used, T the height of the largest zero,

u_{thresh} = {(\log T)}^{- c_{1}}

, and

{\hat{c}}_{2}

the fitted exponent from

{sup}_{| u | \geq u_{thresh}} | A (u; T) | ≪ {(\log T)}^{- {\hat{c}}_{2}}

.

M	T	N	$c_{1}$	$u_{thresh}$	$sup \| A (u; T) \|$	${\hat{c}}_{2}$
100	236.52	100	0.6	0.361	0.173	1.032
100	236.52	100	0.8	0.257	0.173	1.032
100	236.52	100	1.0	0.183	0.173	1.032
200	396.38	200	0.6	0.342	0.151	1.057
200	396.38	200	0.8	0.239	0.151	1.057
200	396.38	200	1.0	0.167	0.151	1.057

Table 3. Summary of tunable parameters in the entropy–sieve method.

Param.	Role	Typical choice	Trade-off
$X = {(\log T)}^{A}$	Truncation length	$A \geq 4$ –8 (polylog)	Larger A: smaller remainder, harder moments
$E_{app}$	Approx. failure set	$\| E_{app} {\| ≪ N (T) (\log T)}^{- B}$	Bigger B ⇒ bigger A
$E_{ent}$	Low-entropy set	Block length $m ≍ {(\log \log T)}^{c}$	Larger m: better entropy, costlier cumulants
C	Remainder tolerance	$C = 1$ –3	Larger C: stronger control, bigger A
B	Power-saving exponent	$B = 5$ –10	Larger B: bigger A or higher moments
$α$	Small-gap sieve rate	$α = 1.1$ –2	Larger $α$ : faster decay, limited by MGF
$c_{MGF}$	MGF tail rate	$t_{0} ≍ 1 / \sqrt{\log \log T}$ , $c_{MGF} \sim t_{0} / 2$	Fixed by X, controls linear tail
m	Entropy block length	$m \to \infty$ slowly	Larger m: smaller entropy set, more cost

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

On the Hughes–Keating–O’Connell Conjecture: Entropy-Sieve Methods for Negative Moments of ζ′(ρ)

Abstract

Keywords:

Subject:

1. Introduction

1.1. Motivation and Conjectures

1.2. State of the Art

1.3. Challenges for Negative Moments

1.4. Our Approach and Contributions

1.5. Organization

Main Results

2. Background

2.1. The Hughes–Keating–O’Connell Conjecture

2.2. Positive Moments

2.3. Negative Moments

2.4. Summary

3. Entropy-Based Approximation and Gaussian Large-Deviation Bounds

3.1. Assumption Framework

3.2. Notation and Choice of Parameters

3.3. Dirichlet-Polynomial Approximation for log | ζ ′ ( ρ ) |

3.3.1. Choice of the Truncation Length X

3.3.2. Hypotheses, Coefficients, and Quantitative Bounds

3.4. Variance Calculation

3.5. Moment Generating Function Bounds

3.6. Gaussian lower-tail via Chernoff inequality

3.7. Quantitative Parameter Selection

Choice of k.

Application of Markov.

Choice of A.

Admissible range for t.

Summary.

4. Entropy–Sieve Method (ESM)

4.1. Definitions and Notation

4.2. Numerical Determination of Orthogonality Constants c 1 , c 2

4.3. Numerical Plot Analysis and Compatibility with Table

4.4. Entropy Control of Approximation Errors

4.5. Remarks and references

5. Sieve-Theoretic Component

6. Conditional Upper Bounds for Negative Moments

6.1. Notation and Small-Gap Sets

6.2. Small-Gap Counting via Pair-Correlation

6.3. Entropy–Sieve Decay Lemma (New, Hybrid lemma)

6.4. Roadmap for Section 6.3

6.5. Parameter Choices and Exceptional Sets: A Systematic Discussion

6.6. Choosing Parameters and Explicit β

6.7. Consequences for Negative Moments

6.8. References and Remarks

6.9. Eliminating Multiple Zeros via the Entropy-Sieve Method

Hadamard product and log-derivative.

Dirichlet polynomial approximants for ζ and ζ ′ .

Joint MGF bound.

Joint entropy and exclusion of multiple zeros.

Discussion.

7. Comparison with Related Work and Motivation

Motivation for Comparison

Random-Matrix and Hybrid Euler–Hadamard Approaches

High-Moment and MGF/Chernoff Techniques

Negative Discrete Moments and Subfamily Averaging

Hejhal and Classical Distribution Results

Synthesis and Distinctives of the ESM

Comparison Table

8. Conclusions

Appendix A. Computational Notebook and Numerical Experiments

References

MDPI Initiatives

Important Links

Subscribe

3.3. Dirichlet-Polynomial Approximation for $\log | ζ^{'} (ρ) |$

4.2. Numerical Determination of Orthogonality Constants $c_{1}, c_{2}$

6.6. Choosing Parameters and Explicit $β$

Dirichlet polynomial approximants for $ζ$ and $ζ^{'}$ .