Preprint
Article

This version is not peer-reviewed.

On the Hughes–Keating–O’Connell Conjecture: Entropy-Sieve Methods for Negative Moments of ζ′(ρ)

Submitted:

02 September 2025

Posted:

04 September 2025

You are already at the latest version

Abstract
We investigate the negative discrete moments of the derivative of the Riemann zeta function at its nontrivial zeros, focusing on the Hughes–Keating–O’Connell conjecture. Building on the earlier frameworks of Gonek, Milinovich–Ng, Kirila, and the recent breakthrough of Bui–Florea–Milinovich, we introduce a hybrid entropy–sieve method (ESM). This method refines Dirichlet-polynomial approximations by quantifying entropy of local distributions of \( D_X(\gamma) \) and controlling contributions from both small gaps and low-entropy blocks. Assuming the Riemann Hypothesis and standard pair-correlation conjectures, we prove the near-optimal conditional upper bound \( J_{-1}(T) \;=\; \sum_{0<\gamma\leq T} \frac{1}{|\zeta'(\rho)|^{2}} \;\ll\; T(\log T)^{\varepsilon}. \)This matches, up to logarithmic factors, the conjectured order \( J_{-1}(T)\asymp T \), improving upon previous conditional bounds in the literature. Our approach complements the sieve and moment methods of Bui–Florea–Milinovich and the entropy-based large deviation heuristics of Harper, while introducing new tools such as a uniform Dirichlet-polynomial approximation with explicit coefficients and quantitative entropy-decay estimates. Beyond these results, the ESM framework highlights the utility of entropy techniques in analytic number theory, suggesting applications to related problems in L-function theory and random matrix models.
Keywords: 
;  ;  ;  ;  
For the reader’s convenience, we summarize the main notation that will be used consistently throughout the paper. Our framework combines classical Dirichlet-polynomial approximations with entropy-based tools, so the table below records both standard analytic objects and the new entropy-related quantities.
Table 1. Summary of notation used throughout the paper.
Table 1. Summary of notation used throughout the paper.
General Notation
T Height parameter for critical zeros of ζ ( s ) ; we consider zeros ρ = 1 2 + i γ with 0 < γ T , counted with multiplicity.
N ( T ) Number of zeros ρ = 1 2 + i γ with 0 < γ T .
ρ A nontrivial zero of ζ ( s ) , written as ρ = 1 2 + i γ .
E app Exceptional set of zeros where the Dirichlet-polynomial approximation fails (Lemma 7).
G Set of “good” zeros: γ E app and outside any exceptional sieve/entropy set.
Dirichlet Polynomial Approximation
X Length of Dirichlet polynomial; throughout we take X = T α with small fixed 0 < α < 1 / 100 .
D X ( γ ) Main Dirichlet-polynomial approximant: D X ( γ ) = n X a n n 1 / 2 i γ .
a n Dirichlet polynomial coefficients derived from the smoothed explicit formula .
R X ( γ ) Remainder term in Dirichlet-polynomial approximation of log | ζ ( ρ ) | .
σ X 2 Variance of D X ( γ ) : σ X 2 = n X | a n | 2 n log log T (Lemma 1).
Moment Generating Function & Tail Estimates
M ( t ) Moment generating function: M ( t ) = 1 N ( T ) 0 < γ T e t D X ( γ ) .
κ r r-th cumulant of D X ( γ ) , defined by log M ( t ) = r 1 κ r t r / r ! .
t 0 Admissible range of t for MGF bounds: t 0 = c / log log T .
N ( V ; T ) Lower-tail counting function: N ( V ; T ) = # { γ T : log | ζ ( ρ ) | V } .
V Threshold parameter controlling the size of log | ζ ( ρ ) | in tail estimates.
Entropy-Sieve Framework
G ( γ 0 ) Local window of zeros near γ 0 used for entropy sampling.
H h , Δ val ( γ 0 ) Local value-entropy of D X ( γ ) in a window G ( γ 0 ) with bin-width h (Definition ??).
H h g , Δ gap ( γ 0 ) Local gap-entropy of normalized zero spacings near γ 0 .
H 0 Entropy threshold; zeros with entropy below H 0 belong to the exceptional low-entropy set.
E ent Exceptional set of zeros lying in low-entropy regions (see Lemma 3).
Moments and Sieve
J k ( T ) Discrete moment: J k ( T ) = 0 < γ T | ζ ( ρ ) | 2 k , defined for all k R ; for k < 0 , finiteness implies no multiple zeros.
J k simp ( T ) Same sum restricted to simple zeros (used in intermediate lemmas for clarity).
c , C 0 Absolute positive constants appearing in Gaussian and sieve bounds.

1. Introduction

Let ζ ( s ) denote the Riemann zeta function and ρ = 1 2 + i γ denote its nontrivial zeros. The size of the derivative ζ ( ρ ) at these zeros plays a central role in analytic number theory, with deep connections to the distribution of zeros, random matrix theory, and the moments of L-functions. For k C , we define the discrete moment
J k ( T ) : = 0 < γ T | ζ ( ρ ) | 2 k ,
where the sum runs over all nontrivial zeros ρ of ζ ( s ) , counted with multiplicity. For k < 0 this sum is finite only if every zero is simple, since a multiple zero would satisfy ζ ( ρ ) = 0 and force J k ( T ) = + . Thus, proving upper bounds for J k ( T ) in the negative range has direct implications for the simplicity of zeros.

1.1. Motivation and Conjectures

Understanding the asymptotic growth of J k ( T ) has been the subject of considerable research. Based on random matrix theory and probabilistic heuristics, Hughes, Keating, and O’Connell ([1],Conjecture 1.7, p. 5) conjectured that for ( k ) > 3 2 ,
J k ( T ) G 2 ( 2 + k ) G ( 3 + 2 k ) a ( k ) T 2 π log T 2 π ( k + 1 ) 2 ,
where G ( · ) is the Barnes G-function and a ( k ) is an explicit arithmetic factor. In particular, for k = 1 , conjecture (1) predicts
J 1 ( T ) T ,
so the negative second moment is expected to be of the same order as the number of zeros up to height T.

1.2. State of the Art

For positive moments ( k 0 ), significant progress has been achieved:
  • Gonek ([2], p. 35) initiated the study of discrete moments of ζ ( ρ ) and derived asymptotic formulas for J 1 ( T ) under the Riemann Hypothesis (RH).
  • Hejhal ([3], Section 3, pp. 343–370) studied the distribution of log | ζ ( ρ ) | and showed that it behaves approximately like a Gaussian with variance log log T , providing the foundation for later probabilistic approaches.
  • Kirila ([4], Theorem 1.1, pp. 2–4) obtained sharp upper bounds for positive moments by adapting Harper’s Dirichlet-polynomial techniques to sums over zeros:
    J k ( T ) k N ( T ) ( log T ) k ( k + 2 ) ,
    where N ( T ) denotes the number of zeros up to height T.
  • Harper’s probabilistic method ([7], pp. 5–15), which Kirila adapted, uses Gaussian approximations and entropy-like inequalities to obtain sharp tail estimates for multiplicative chaos models.
These results match the predictions of the Hughes–Keating–O’Connell conjecture for k > 0 .
For negative moments ( k < 0 ), however, far less is known:
  • Gonek ([2], p. 36) derived conditional lower bounds for J k ( T ) when k < 0 but did not provide upper bounds.
  • Milinovich and Ng ([5], pp. 642–644) improved certain lower bounds for negative moments, using refined estimates of ζ ( ρ ) in terms of the spacing of zeros.
  • Recently, Bui, Florea, and Milinovich ([18], Theorem 1.3, pp. 3–6) obtained conditional upper bounds for negative moments over a large subfamily of zeros, excluding a sparse exceptional set where ζ ( ρ ) may be abnormally small. However, a full unconditional upper bound for J k ( T ) when k < 0 remains open.

1.3. Challenges for Negative Moments

The difficulty in establishing upper bounds for J k ( T ) when k < 0 stems from controlling the contribution of zeros where ζ ( ρ ) is exceptionally small. Since
J 1 ( T ) = 0 < γ T 1 | ζ ( ρ ) | 2 ,
the dominant contribution arises from rare events in which ζ ( ρ ) is unusually tiny. Hejhal’s model ([3], Section 3) suggests that log | ζ ( ρ ) | behaves like a Gaussian with variance log log T , implying that very small derivatives are exponentially rare. However, making this rigorous for sums over zeros requires two ingredients:
  • Sharp Gaussian-type tail bounds for log | ζ ( ρ ) | , obtained by approximating it with a short Dirichlet polynomial and applying entropy-based large-deviation methods ([7], pp. 5–20).
  • Control over the set of exceptional zeros where the approximation fails or where ζ ( ρ ) is extremely small, addressed via sieve-theoretic exclusion techniques as in ([18], Section 6).

1.4. Our Approach and Contributions

In this paper, we propose a hybrid analytic–probabilistic framework to tackle the upper bound for J k ( T ) when k < 0 , combining three main ingredients:
  • Entropy-Sieve Method (ESM): We introduce an entropy-based refinement of the Dirichlet-polynomial approximation. By quantifying the entropy of local distributions of D X ( γ ) values and zero gaps, we ensure that low-entropy regions form a negligible exceptional set. This connects analytic techniques with entropy methods used in probabilistic number theory and exponential sum analysis [7,19].
  • Sieve methods for exceptional zeros: Building on Bui, Florea, and Milinovich ([18], Section 6), we remove a negligible set of zeros where ζ ( ρ ) is abnormally small, using pair-correlation and independence heuristics to bound their contribution. Our systematic discussion of parameter optimization (see Section 4) clarifies how A , B , C , α can be tuned so that both E app and E ent are negligible.
  • Algorithmic tail truncation: We develop an entropy-driven tail-truncation procedure to efficiently control the extreme lower tail of ζ ( ρ ) , ensuring that these rare events contribute less than any power of log T .
Using these tools, we establish, under RH and mild orthogonality hypotheses, the conditional bound
J 1 ( T ) T ( log T ) ε ,
which matches the conjectured order up to a logarithmic factor. Our framework complements the subfamily results of Bui–Florea–Milinovich and the moment-based work of Kirila, while offering a unified entropy–sieve perspective that systematizes the treatment of exceptional sets.

1.5. Organization

We briefly summarize the logical structure of the paper. Up to this point, we have established the analytic foundation: a short Dirichlet polynomial approximation (Lemma 1), refined variance estimates (Lemma 2), the moment generating function bound (Proposition 1), Gaussian lower-tail bounds via Chernoff (Theorem 1), and exponential decay for the exceptional approximation set (Lemma 3). These ingredients will be combined with entropy and sieve methods to treat the negative moments of ζ ( ρ ) in a consistent framework, avoiding the divergences that arise if exceptional sets are not carefully controlled.
The remainder of the paper is organized as follows. Section 2 reviews previous results on positive and negative moments of ζ ( ρ ) , with particular emphasis on the conjectural framework of Hughes–Keating–O’Connell. In Section 4 we introduce the Entropy–Sieve Method (ESM), which strengthens Dirichlet-polynomial approximations of log | ζ ( ρ ) | by incorporating entropy-based regularity, and thereby yields robust Gaussian large-deviation bounds. Section 5 develops the sieve-theoretic component, excluding low-entropy or small-gap exceptional sets where ζ ( ρ ) could be abnormally small. Finally, Section 6.9 combines these analytic and probabilistic tools to establish conditional upper bounds for
J k ( T ) = 0 < γ T 1 | ζ ( 1 2 + i γ ) | 2 k
in the critical range k < 0 , including the key case k = 1 .

Main Results

  • Entropy–Sieve Framework. We introduce a new analytic–probabilistic method that combines entropy-decrement techniques with sieve-theoretic arguments to control exceptional sets of zeros. This framework provides a novel approach to bounding negative moments of ζ ( ρ ) .
  • Conditional Upper Bound for Negative Moments. Assuming the Riemann Hypothesis and standard pair-correlation conjectures, we prove the near-optimal bound
    J 1 ( T ) T ( log T ) ε ,
    for any ε > 0 , in agreement with the Hughes–Keating–O’Connell conjecture up to logarithmic factors.
  • Asymptotic Simplicity of Zeros in High-Entropy Blocks (Theorem 2). Under RH and uniform cumulant/MGF bounds, the proportion of multiple zeros within long blocks tends to zero as T . Hence, all but o ( N ( T ) ) zeros of the Riemann zeta function are simple.
  • Joint MGF Bounds (Proposition 3). The mixed moment generating function of Dirichlet approximants admits a uniform Gaussian bound with covariance Σ X , up to cubic error terms in ( log log T ) 3 / 2 .
  • Numerical and Structural Evidence. Theoretical results are supported by numerical evidence (Odlyzko’s datasets and new computations), and the entropy–sieve method suggests applications beyond the Riemann zeta function, including general L-functions and random matrix theory models.

2. Background

The discrete moments of the derivative of the Riemann zeta function at its nontrivial zeros,
J k ( T ) = 0 < γ T | ζ ( ρ ) | 2 k ,
are central objects in analytic number theory. They provide insight into the distribution of ζ ( ρ ) , the spacing of the nontrivial zeros of ζ ( s ) , and the connections between the zeta function and random matrix theory. Understanding the asymptotic growth of J k ( T ) has been the subject of extensive research over the past decades and is closely connected with one of the most refined conjectures in this area: the Hughes–Keating–O’Connell conjecture.

2.1. The Hughes–Keating–O’Connell Conjecture

Motivated by random matrix theory and probabilistic models, Hughes, Keating, and O’Connell proposed an explicit formula for J k ( T ) in the regime ( k ) > 3 2 . Their conjecture predicts that
J k ( T ) G 2 ( 2 + k ) G ( 3 + 2 k ) a ( k ) T 2 π log T 2 π ( k + 1 ) 2 ,
where G ( · ) denotes the Barnes G-function and a ( k ) is an explicit arithmetic factor arising from the Euler product.
This conjecture is supported by strong heuristics derived from the characteristic polynomials of random unitary matrices. In these models, log | ζ ( ρ ) | behaves approximately like a Gaussian random variable, and Formula (2) reflects the matching asymptotics between the number-theoretic and random-matrix frameworks. A striking consequence appears when setting k = 1 , where the conjecture predicts
J 1 ( T ) T .
Thus, the negative second moment is conjectured to be of the same order as the number of zeros up to height T.

2.2. Positive Moments

The case of positive moments, k 0 , is relatively well understood and has seen substantial progress over the last four decades. Gonek ([2], Theorem 1, p. 35) pioneered the study of discrete moments of ζ ( ρ ) , proving under the Riemann Hypothesis that for k = 1 ,
J 1 ( T ) T 24 π log T 2 π 4 .
This result agrees with the prediction of (2) when k = 1 and represented one of the earliest confirmations of the conjecture in a special case.
Hejhal ([3], Section 3, Theorem 3.1, pp. 343–370) advanced the probabilistic understanding of ζ ( ρ ) by studying the distribution of log | ζ ( ρ ) | . He showed that, heuristically, log | ζ ( ρ ) | behaves approximately like a Gaussian random variable with variance σ 2 log log T . This probabilistic model suggested that extremely large or small values of ζ ( ρ ) are exponentially rare and laid the conceptual foundation for later entropy-based methods.
A major breakthrough came from Harper ([7], Theorem 2.1, pp. 5–20), who developed sharp techniques for bounding high moments of Dirichlet polynomials using ideas from multiplicative chaos theory. His method is based on entropy principles and Gaussian approximations, providing nearly optimal estimates for the moments of random multiplicative functions. Building on Harper’s framework, Kirila ([4], Theorem 1.1, pp. 2–4) adapted these ideas to the discrete setting of the zeta zeros and obtained sharp conditional upper bounds for positive moments:
J k ( T ) k N ( T ) ( log T ) k ( k + 2 ) ( k > 0 ) ,
where N ( T ) denotes the number of zeros up to height T. These results are fully consistent with the random matrix predictions of the Hughes–Keating–O’Connell conjecture, providing strong evidence in favor of (2) for k > 0 .

2.3. Negative Moments

In stark contrast to the positive regime, the behavior of J k ( T ) for negative k remains largely mysterious. The primary challenge stems from the fact that negative moments are dominated by the contribution of zeros ρ where | ζ ( ρ ) | is extremely small. Controlling this contribution requires strong bounds on the lower tail of log | ζ ( ρ ) | , a problem that has resisted classical techniques.
Early work by Gonek ([2], Theorem 2, p. 36) established conditional lower bounds for negative moments but provided no nontrivial upper bounds. Later, Milinovich and Ng ([5], Proposition 4.1, pp. 642–644) refined these lower bounds by relating ζ ( ρ ) to the spacing between consecutive zeros, but even these methods do not yield control over the full sum.
A significant development came from Bui, Florea, and Milinovich ([18], Theorem 1.3, pp. 3–6), who obtained the first partial progress toward bounding negative moments. By excluding a sparse exceptional set of zeros where ζ ( ρ ) is abnormally small, they proved conditional upper bounds for J k ( T ) over a large subfamily of zeros. However, their results stop short of proving the full conjectured bound for J 1 ( T ) or other negative moments over all zeros.
These contributions underline the difficulty of the negative moment problem: without precise control over extremely small values of ζ ( ρ ) , unconditional upper bounds remain out of reach. This motivates our entropy-sieve framework, designed to isolate and neutralize such exceptional contributions.

2.4. Summary

To summarize, positive moments of ζ ( ρ ) are now well understood, thanks to the interplay between Harper’s entropy-based techniques, Kirila’s discrete adaptations, and random matrix predictions. For negative moments, however, the lack of control over zeros with exceptionally small ζ ( ρ ) remains the key obstacle. Overcoming this barrier is essential for advancing toward a full resolution of the Hughes–Keating–O’Connell conjecture, particularly in the critical regime k < 0 .

3. Entropy-Based Approximation and Gaussian Large-Deviation Bounds

3.1. Assumption Framework

Throughout this section we assume the Riemann Hypothesis (RH). For technical steps where denominators involving ζ ( ρ ) arise, we restrict initially to the set of simple zeros
Z simp : = { ρ = 1 2 + i γ : ζ ( ρ ) 0 } ,
and define discrete averages over Z simp in place of all zeros. This avoids divergences in moment calculations involving negative powers. No generality is lost, since Z simp has the same density as the full zero set under standard pair-correlation heuristics (cf. [17,28,29]).
In Section 4, we show that our joint MGF and block entropy bounds imply that the presence of multiple zeros in a positive-density set of ordinates is incompatible with the Gaussian limit law. In particular, Theorem 1 below establishes that, under RH and the verified block large-deviation estimates, all but o ( N ( T ) ) zeros up to height T must in fact be simple. Thus the initial restriction to Z simp is later justified a posteriori.

3.2. Notation and Choice of Parameters

Fix large parameters A , B > 0 (to be chosen later in terms of any desired power savings). For T large define
X : = ( log T ) A , Y : = exp ( log log T ) 2 .
Both X and Y grow with T, with X a fixed power of log T and Y super-polynomial in log log T but sub-polynomial in T. We shall construct a short Dirichlet polynomial of length X to approximate log | ζ ( 1 2 + i γ ) | for most zeros γ T .
For a generic Dirichlet polynomial
D X ( γ ) : = n X a n n 1 / 2 + i γ ,
we define its variance
σ X 2 : = n X | a n | 2 n .
In our application the coefficients a n will be explicit (coming from a truncated Euler product or approximate functional equation for ζ ( s ) ), and we will have
σ X 2 log log T ,
uniformly for our range of parameters.

3.3. Dirichlet-Polynomial Approximation for log | ζ ( ρ ) |

3.3.1. Choice of the Truncation Length X

Throughout this section we fix
X = ( log T ) A ,
with A > 0 chosen large depending on the error exponents in subsequent lemmas. This polylogarithmic choice ensures that the Dirichlet polynomial approximation (Lemma 1) has a negligible error term, that the moment generating function bounds (Proposition 1) remain uniform for | t | t 0 1 / log log T , and that block cumulant factorization (Lemma 4) can be applied without enlarging off-diagonal terms. We emphasize that X = T θ with small fixed θ > 0 may also be treated with refinements of our arguments, but to avoid technical complications we restrict to the polylogarithmic case.

3.3.2. Hypotheses, Coefficients, and Quantitative Bounds

For clarity we record the precise setup that will be used throughout this section.
  • Hypothesis. We assume the Riemann Hypothesis (RH). All multiple zeros are placed into the exceptional set E app .
  • Truncation length. We fix
    X = ( log T ) A , A > 0 ,
    with A chosen large depending on the desired decay of the remainder (see Lemma 1).
  • Coefficients. Let w C c ( 0 , 2 ) be a fixed smooth cutoff with w ( u ) = 1 for 0 u 1 . Define
    a n : = Λ ( n ) log n w log n log X ,
    so a n is supported on prime powers n X 2 and is explicit and computable.
  • Dirichlet polynomial. For each zero ρ = 1 2 + i γ we define
    D X ( γ ) : = n 2 a n n 1 / 2 i γ .
  • Remainder and exceptional set. We set
    R X ( γ ) : = log | ζ ( 1 2 + i γ ) | D X ( γ ) ,
    and define an exceptional set
    E app : = 0 < γ T : | R X ( γ ) | > ( log log T ) C ,
    where C > 0 is arbitrary.
  • Quantitative bounds. For every C , B > 0 there exists A = A ( B , C ) such that
    | R X ( γ ) | C ( log log T ) C ( γ E app ) ,
    and
    | E app | B N ( T ) ( log T ) B .
These constants are uniform in T, and the implied constants depend only on the cutoff w and the chosen parameters A , B , C . This hypothesis package is exactly what Lemma 1 will establish.
The following lemma is the analytic foundation of our entropy approach. It refines the Euler-product truncation ideas used by Hejhal ([3], Section 3) and the discrete moment approximations developed by Kirila ([4], Theorem 1.1).
 Lemma 1
(Short Dirichlet-polynomial approximation). Assume the Riemann Hypothesis. Let T be large and put
X = ( log T ) A ,
with A > 0 . There exist explicit coefficients a n (computable from the smooth truncated explicit formula and supported on n X ) and an exceptional set E app { γ : 0 < γ T } such that for every ordinate γ T with γ E app and for which ζ ( 1 2 + i γ ) 0 we have
log ζ ( 1 2 + i γ ) = D X ( γ ) + R X ( γ ) , D X ( γ ) = n X a n n 1 / 2 i γ ,
and, uniformly for such γ,
| R X ( γ ) | C ( log log T ) C ,
for every fixed C > 0 , provided A = A ( C ) is taken sufficiently large. Furthermore, for any fixed B > 0 one may choose A = A ( B ) so that
E app B N ( T ) ( log T ) B .
Finally the coefficients a n are explicit: they arise from the smooth truncation of the explicit formula / Euler-product expansion for the derivative near a zero (in particular they are supported on prime-powers n X and are of the form of explicit prime-power weights).
 Proof. 
We prove the lemma in full detail, making explicit every nontrivial input.
Throughout we assume the Riemann Hypothesis (RH). Let ρ = 1 / 2 + i γ denote a nontrivial zero of ζ ( s ) . If a zero ρ has multiplicity > 1 we place it automatically into the exceptional set E app ; hence from now on we may restrict attention to simple zeros (this convention is recorded in the statement). Let N ( T ) denote the usual count of zeros 0 < γ T .
Fix a smooth cutoff function w ( u ) C c ( 0 , 2 ) with w ( u ) = 1 for u [ 0 , 1 ] and 0 w 1 . For X 2 define the smooth weight
W X ( n ) : = w log n log X ,
so that W X ( n ) = 1 for n X and W X ( n ) = 0 for n X 2 (any compactly supported smooth cutoff with these properties will do). Consider the truncated prime-power Dirichlet polynomial
P X ( s ) : = n 1 b n n s , b n : = Λ ( n ) log n W X ( n ) ,
where Λ is the von Mangoldt function. (The choice b n = Λ ( n ) / log n matches the standard expansion of log ζ ( s ) ; the specific smooth cutoff W X produces the uniform control we need.) By standard manipulations of the Euler product one has the formal identity (valid in a region of absolute convergence)
log ζ ( s ) = n 1 Λ ( n ) log n n s + ( small analytic terms ) .
Differentiating this identity in the region where it converges and then inserting the smooth cutoff yields the short polynomial
D ˜ X ( s ) : = n 1 a n n s , a n : = d d s Λ ( n ) log n W X ( n ) | s = 1 / 2 ,
so that D ˜ X ( 1 / 2 + i γ ) is the explicitly computable main Dirichlet-polynomial approximation to log ζ ( 1 / 2 + i γ ) . (Equivalently one may derive the same coefficients a n by applying the smoothed explicit formula to an appropriate test function tailored to recover log ζ ( s ) near s = 1 2 + i γ ; both constructions produce identical prime-power supported coefficients up to negligible boundary terms.) In the sequel we write
D X ( γ ) : = n X 2 a n n 1 / 2 i γ ,
and note that the contribution from n ( X , X 2 ] is included only for bookkeeping; by choosing the support of w sufficiently concentrated one may equally well take the sum truncated at n X and absorb the rest into the remainder R X .
For each simple zero ρ = 1 / 2 + i γ we define the remainder by the identity
R X ( γ ) : = log ζ ( 1 2 + i γ ) D X ( γ ) .
This remainder collects: (i) the contribution from prime-powers with n > X (and the smooth tail), (ii) contour-integral boundary terms arising from the truncated explicit-formula representation, and (iii) local contributions coming from zeros other than ρ which appear when shifting contours (these are handled in the explicit formula). An explicit derivation of this decomposition is standard: it follows from the contour-shift of the smoothed explicit formula applied to an approximate logarithmic derivative and is carried out in numerous references on short Dirichlet polynomial approximations to log ζ and to log ζ (compare the derivations in [3] for log | ζ | and in the short-polynomial literature for log ζ ). The important point is that D X ( γ ) is explicit and supported on prime-powers up to the chosen truncation parameter, and all other contributions are collected into R X ( γ ) .
To show that R X ( γ ) is uniformly small off a tiny exceptional set we bound high discrete moments of R X ( γ ) averaged over zeros and then apply Markov/Chebyshev. Concretely, fix an integer k 1 (to be chosen later) and consider the 2 k -th average
M 2 k : = 1 N ( T ) 0 < γ T R X ( γ ) 2 k .
Expand R X into its defining pieces (tail over n > X , boundary integrals, and zero-contributions) and bound each contribution in L 2 k -mean. The two crucial inputs for the resulting bounds are:
(A)
Discrete moment bounds for the derivative at zeros: Kirila [4] proves sharp upper bounds for discrete moments of ζ ( ρ ) (in ranges that cover the moment sizes we need). Concretely, for any fixed real k 1 one has an upper bound of the form
1 N ( T ) 0 < γ T ζ ( 1 2 + i γ ) 2 k k ( log T ) k 2 + o ( 1 ) ,
and variants of this estimate control mixed moments of ζ ( ρ ) against short Dirichlet polynomials built from primes up to X; these mixed-moment bounds are used below when comparing the full object to the truncated polynomial. (We apply Kirila’s discrete-moment estimates to handle any term in the expansion that involves ζ ( ρ ) directly.) See [4] for the precise uniform statements and ranges.
(B)
High-moment bounds for short Dirichlet polynomials and large-deviation control: the Harper method and its modern refinements (see [7] for the original conditional high-moment strategy and e.g. [30] and related short-polynomial literature for refinements) show that a sum of many short Dirichlet polynomials approximating log ζ (and likewise the adapted decomposition for the derivative) satisfies, for X = ( log T ) A and any fixed integer k 1 ,
1 T T 2 T n X a n n 1 / 2 i t 2 k d t k ( log log T ) C ( k ) ,
with an explicit polynomial dependence on k in the right-hand side. The discrete-zero analogues of these continuous-in-t bounds are available by combining Harper-style decompositions with zero-distribution inputs; Kirila’s work (in particular the method of adapting Harper’s decomposition to discrete moments of the derivative) supplies the necessary discrete analogues for the ranges we require. In particular, for X = ( log T ) A one obtains
1 N ( T ) 0 < γ T n X a n n 1 / 2 i γ 2 k k ( log log T ) C ( k ) ,
where C ( k ) is at most polynomial in k. See [7] and [4].
Using the two inputs above, expand | R X ( γ ) | 2 k by multinomial expansion and estimate each arising mixed moment by Hölder’s inequality together with the bounds from (A) and (B). Off-diagonal mixed terms that produce exponential sums of the form 0 < γ T e i γ u (with u built from logarithms of integers coming from the multinomial expansion) are controlled using Montgomery pair-correlation type estimates (the classical arguments of Montgomery and the refinements used in the short-polynomial literature show these off-diagonal sums are negligible for the short lengths X = ( log T ) A under RH). The net outcome is the bound
M 2 k = 1 N ( T ) 0 < γ T R X ( γ ) 2 k k C A k ( log log T ) C ( k ) + ( smaller terms ) ,
where the implicit constants are absolute and the polynomial-in-k growth in the right-hand side is explicit. Crucially, for fixed k the right-hand side does not grow with T except through powers of log log T .
The arguments above establish that a short Dirichlet polynomial D X ( γ ) gives an accurate approximation to log | ζ ( 1 2 + i γ ) | for all but a very sparse exceptional set of zeros, with error term R X ( γ ) that is uniformly negligible. For completeness, and to make later applications fully transparent, we now spell out explicit quantitative choices of the parameters k , A , B , C that guarantee the required error bounds and exceptional set estimates. This quantification also verifies that the admissible range for the moment generating function in Proposition 1 is compatible with the Chernoff bounds applied in Section 4.
Let B > 0 and C > 0 be given. We now choose the integer k = k ( B , C ) slowly growing with B , C (for instance k = B + C suffices). The previous mean bound then yields
M 2 k B , C ( log log T ) α
for some α = α ( B , C ) > 0 , once we take the truncation parameter A = A ( B , C ) sufficiently large (the dependence of A on B , C is explicit: increasing A diminishes the contribution of prime-powers n > X and improves the off-diagonal control). Applying Markov’s (Chebyshev’s) inequality, we obtain that the number of zeros with
| R X ( γ ) | ( log log T ) C
is bounded by
# { γ T : | R X ( γ ) | ( log log T ) C } ( log log T ) 2 k C · N ( T ) · M 2 k B , C N ( T ) ( log T ) B ,
provided A = A ( B , C ) is chosen large enough so that the log log T -powers on the right-hand side are dominated by ( log T ) B . This produces the exceptional set E app and yields the claimed uniform bound | R X ( γ ) | C ( log log T ) C for γ E app .
All dependence on γ in the above arguments is handled in the averaged estimates, and the step from averaged control to a uniform bound off a small exceptional set is the standard Chebyshev/Markov device described. The coefficients a n are explicit (they are assembled from Λ ( n ) / log n and the derivatives of the smooth cutoff at the central point s = 1 / 2 ) and can be written in closed form; the only non-elementary inputs in the proof are the discrete moment estimates for ζ ( ρ ) and the Harper-type high-moment bounds for short Dirichlet polynomials (and their discrete adaptations), plus pair-correlation control for off-diagonal sums — each of these inputs is stated explicitly above and is available in the literature. Thus the lemma follows. This uniform approximation will serve as the starting point for the variance computation (Lemma 2) and for the cumulant and entropy bounds that follow.    □
Remarks on Lemma 1. The coefficients a n arise naturally from truncating the Euler product or approximate functional equation for ζ ( s ) . In practice, one may take a n supported on prime powers, with a p of size O ( p o ( 1 ) ) . The exact form of a n is not essential for the entropy arguments; what matters is that the variance
σ X 2 = n X | a n | 2 n log log T ,
so that D X ( γ ) admits a Gaussian-type normalization.
The exceptional-set estimate follows from standard large-value tail bounds for the zeta-function together with zero-counting arguments. Hejhal ([3], Section 3) first established the Gaussian distributional model for log | ζ | , while Kirila ([4], Section 4) adapted these approximations to the discrete setting of sums over zeros and obtained control of the exceptional set. Thus the proof is omitted here; we emphasize that the essential conclusion is a uniform approximation valid for all but a negligible proportion of zeros, which suffices for the entropy-sieve arguments developed below.

3.4. Variance Calculation

In this subsection we compute the asymptotic size of the variance
σ X 2 = n X | a n | 2 n ,
associated with the short Dirichlet polynomial approximation
D X ( γ ) = n X a n n 1 / 2 i γ ,
where the coefficients a n are given explicitly below. The variance determines the natural Gaussian scale for fluctuations of D X ( γ ) and is a key input for the moment-generating and entropy arguments in Section 4, Section 5 and Section 6.
We adopt the canonical choice
X = T α , 0 < α < 1 100 ,
so that log X = α log T and log log X = log log T + O ( 1 ) . (If one instead wishes to work with X = ( log T ) A one must replace the final display by σ X 2 log log log T ; for the entropy and MGF scales used here the choice X = T α is more convenient and is adopted throughout.)
 Lemma 2
(Variance asymptotic — explicit coefficients). Let X 3 and define the smooth cutoff
W X ( n ) : = log ( X / n ) log X ( 1 n X ) , W X ( n ) = 0 ( n > X ) .
Set
a n = Λ ( n ) log n n 1 / 2 σ X W X ( n ) = Λ ( n ) log n n 1 / log X W X ( n ) ( n X ) ,
with
σ X : = 1 2 + 1 log X .
Define
Σ ( X ) : = n X | a n | 2 n .
Then
Σ ( X ) = log log X + O ( 1 ) ,
with an absolute implied constant. Consequently, for X = T α with fixed α > 0 ,
Σ ( X ) = log log T + O ( 1 ) .
 Proof. 
With the choice (3) put
b n : = a n n 1 / 2 ( n X ) ,
so that
b n = Λ ( n ) log n n σ X W X ( n ) , Σ ( X ) = n X | b n | 2 .
Since Λ ( n ) = 0 unless n = p k is a prime power, the sum reduces to prime powers:
Σ ( X ) = p X k 1 p k X ( Λ ( p k ) ) 2 ( log p k ) 2 p 2 k σ X W X ( p k ) 2 .
For a prime power p k we have Λ ( p k ) = log p and log p k = k log p , hence the factor simplifies to 1 / k 2 . Thus
Σ ( X ) = p X k 1 p k X 1 k 2 p 2 k σ X W X ( p k ) 2 .
Split the contribution into k = 1 and k 2 :
Σ ( X ) = S 1 ( X ) + S 2 ( X ) ,
where
S 1 ( X ) : = p X p 2 σ X W X ( p ) 2 , S 2 ( X ) : = p X k 2 p k X 1 k 2 p 2 k σ X W X ( p k ) 2 .
We treat S 2 ( X ) first. For k 2 and p 2 we have p 2 k σ X p k (since σ X 1 / 2 ), and W X ( · ) 1 , so
0 S 2 ( X ) p k 2 1 k 2 p k .
The double series on the right converges absolutely, hence
S 2 ( X ) = O ( 1 ) ,
with an absolute implied constant.
It remains to evaluate the prime contribution S 1 ( X ) . Using σ X = 1 2 + 1 / log X and W X ( p ) = 1 log p log X we write
p 2 σ X W X ( p ) 2 = 1 p e 2 log p log X 1 log p log X 2 .
Put v : = log p log X (so 0 v 1 for p X ). Expanding e 2 v ( 1 v ) 2 about v = 0 gives
e 2 v ( 1 v ) 2 = 1 4 v + O ( v 2 ) ,
uniformly for 0 v 1 (the v 2 -constant may be taken absolute). Hence
p 2 σ X W X ( p ) 2 = 1 p 1 4 log p log X + O ( log p ) 2 log 2 X .
Summing over p X and using standard prime-sum estimates (from the prime number theorem; see Davenport ([8], Chapter 1) or Titchmarsh ([9], Chapter 2)) we have
p X 1 p = log log X + O ( 1 ) , p X log p p = log X + O ( 1 ) , p X ( log p ) 2 p ( log X ) 2 .
Therefore
S 1 ( X ) = p X 1 p 4 1 log X p X log p p + O ( 1 ) = log log X + O ( 1 ) ,
since the middle term equals 4 + O ( 1 / log X ) and the error from the v 2 -term contributes O ( 1 ) . Combining this with (5) yields
Σ ( X ) = log log X + O ( 1 ) .
Finally, with X = T α we have log log X = log log T + O ( 1 ) , whence
Σ ( X ) = log log T + O ( 1 ) ,
as required.    □

3.5. Moment Generating Function Bounds

We now establish bounds on the moment generating function (MGF) of the short Dirichlet polynomial approximant
D X ( γ ) = n X a n n 1 / 2 i γ ,
averaged over the nontrivial zeros ρ = 1 2 + i γ of the Riemann zeta function. This constitutes one of the key analytic inputs in deriving Gaussian-type large deviation estimates for log | ζ ( ρ ) | . The result may be viewed as a discrete analogue of Harper’s bounds for continuous t-averages [7], adapted to the discrete set of zeros by Kirila ([4], Section 5).
 Proposition 1
(MGF bound for the Dirichlet approximant). Fix ε > 0 . There exists an absolute constant C 0 > 0 such that for all real t with
| t | t 0 : = 1 2 C 0 log log T ,
we have the uniform bound
1 N ( T ) 0 < γ T exp t D X ( γ ) exp 1 2 t 2 σ X 2 + O | t | 3 ( log log T ) 3 / 2 ,
where σ X 2 is the variance from Lemma 2. The implied constants are absolute.
 Proof. 
Write
S ( γ ) : = n X A n n i γ , A n : = a n n 1 / 2 ,
so that D X ( γ ) = 1 2 ( S ( γ ) + S ( γ ) ¯ ) . Define
M ( t ) : = 1 N ( T ) 0 < γ T e t D X ( γ ) .
Expanding the exponential gives
M ( t ) = r = 0 t r r ! M r , M r : = 1 N ( T ) 0 < γ T D X ( γ ) r .
Expansion of M r . By the multinomial theorem,
D X ( γ ) r = 2 r r 1 + r 2 = r r r 1 , r 2 S ( γ ) r 1 S ( γ ) ¯ r 2 .
Expanding both powers produces sums of the shape
n 1 , , n r 1 X m 1 , , m r 2 X j = 1 r 1 A n j k = 1 r 2 A m k ¯ e i γ ( j log n j k log m k ) .
Averaging over zeros introduces the factor
A ( u ; T ) : = 1 N ( T ) 0 < γ T e i γ u , u = j log n j k log m k .
Hence
M r = 2 r r 1 + r 2 = r r r 1 , r 2 n 1 , , n r 1 X m 1 , , m r 2 X j = 1 r 1 A n j k = 1 r 2 A m k ¯ A ( u ; T ) .
Diagonal terms ( u = 0 ). If u = 0 , then the multisets { n j } and { m k } coincide. This is possible only when r is even, say r = 2 . In that case the number of perfect matchings yields
M 2 diag = ( 2 ) ! 2 ! ( σ X 2 ) ,
with
σ X 2 = n X | A n | 2 ,
as established in Lemma 2. For odd r, the diagonal contribution vanishes.
Off-diagonal terms ( u 0 ). The key input is the estimate for the zero-average A ( u ; T ) . By the explicit formula (see Titchmarsh, Montgomery, or ([4], Section 5)), one has
0 < γ T e i γ u = O T log T , | u | 1 / T ,
with stronger bounds available from Montgomery’s pair-correlation theorem and its modern refinements: for fixed δ > 0 and all | u | ( log T ) δ ,
1 N ( T ) 0 < γ T e i γ u = o 1 .
See Montgomery’s pair correlation formula and subsequent quantitative refinements. Since here u is an integer linear combination of logarithms of integers X and X = ( log T ) A (or X = T α with fixed α ), we have | u | 1 / log A T unless u = 0 . Thus the pair-correlation input implies
A ( u ; T ) = o ( 1 ) ,
uniformly for all nonzero u arising in (6).
Consequently the contribution from u 0 is bounded by
sup u 0 | A ( u ; T ) | · n X | A n | r .
By Cauchy–Schwarz, n X | A n | σ X X . Since X is at most polylogarithmic in T, this factor grows more slowly than any power of log T , while sup u 0 | A ( u ; T ) | = o ( 1 ) , so these off-diagonal terms are negligible compared with the main diagonal.
Cumulant control. Thus for even r = 2 ,
M 2 = ( 2 ) ! 2 ! ( σ X 2 ) + o ( log log T ) ,
while for odd r we have M r = o ( log log T ) r / 2 . Hence the moments match those of a centered Gaussian with variance σ X 2 . Introducing cumulants κ r via
log M ( t ) = r 1 κ r t r r ! ,
we deduce κ 1 = 0 , κ 2 = σ X 2 + o ( 1 ) , and | κ r | r ! ( C 0 log log T ) r for r 3 , some absolute C 0 . Therefore the cumulant series converges absolutely for | t | 1 / ( 2 C 0 log log T ) . In this range,
log M ( t ) = 1 2 σ X 2 t 2 + O | t | 3 ( log log T ) 3 / 2 .
Exponentiating gives the claimed MGF bound.    □

3.6. Gaussian lower-tail via Chernoff inequality

With Proposition 1 in place, we can now establish Gaussian-type bounds for the lower tail of log | ζ ( ρ ) | along the critical zeros. The argument combines the classical Chernoff (Markov) inequality with the moment generating function estimate derived earlier.
 Theorem 1
(Gaussian lower-tail bound). Fix V 1 and define
N ( V ; T ) : = # γ T : log | ζ ( 1 2 + i γ ) | V .
Assume the hypotheses of Lemma 1 and Proposition 1. Then there exists an absolute constant c > 0 such that, uniformly for
1 V c log log T ,
we have
N ( V ; T ) N ( T ) exp c V 2 σ X 2 + | E app | ,
where σ X 2 log log T is as in Lemma 2, and E app is the exceptional set from Lemma 1.
 Proof. 
Let S denote the set of zeros γ T with γ E app . For any t > 0 , Markov’s inequality gives
# { γ S : log | ζ ( 1 2 + i γ ) | V } e t V γ S e t D X ( γ ) + t | R X ( γ ) | .
By Lemma 1, the remainder term R X ( γ ) is uniformly negligible on S ; its contribution can be absorbed into the implied constants. Thus it suffices to bound
e t V γ S e t D X ( γ ) .
Dividing both sides by N ( T ) and applying Proposition 1, we obtain for all | t | t 0 (with t 0 = 1 / ( 2 C 0 log log T ) ),
1 N ( T ) γ S e t D X ( γ ) exp 1 2 t 2 σ X 2 + O | t | 3 ( log log T ) 3 / 2 .
Now choose
t = V σ X 2 .
This choice is admissible provided
V σ X 2 1 2 C 0 log log T .
Since σ X 2 log log T , this inequality reduces to
V c log log T
for some sufficiently small absolute constant c > 0 .
For this choice of t we have
t 2 σ X 2 = V 2 σ X 2 , O | t | 3 ( log log T ) 3 / 2 = O V 3 ( log log T ) 3 / 2 .
Since σ X 2 log log T , the error term is O ( V 3 / ( log log T ) 3 / 2 ) . For V c log log T , this error is bounded by a small multiple of V 2 / σ X 2 . Choosing c sufficiently small, we may absorb it into the Gaussian main term, yielding
1 N ( T ) γ S e t D X ( γ ) exp c V 2 σ X 2 ,
for some absolute c > 0 .
Multiplying back by N ( T ) and reintroducing the factor e t V from Markov’s inequality gives
# { γ S : log | ζ ( 1 2 + i γ ) | V } N ( T ) exp c V 2 σ X 2 ,
for some absolute c > 0 . Finally, adding back the contribution of the exceptional set E app yields the claimed estimate.    □
 Lemma 3
(Decay of the exceptional set). Let E app be the exceptional set from Lemma 1, where the Dirichlet approximation may fail. Then there exists an absolute constant c 1 > 0 such that, for every V 1 ,
# γ E app : log | ζ ( 1 2 + i γ ) | V N ( T ) exp ( c 1 V ) + N ( T ) ( log T ) A ,
for any fixed A > 0 .
 Proof. 
The argument combines two ingredients. First, if the approximation D X ( γ ) + R X ( γ ) fails by more than a tolerance δ > 0 , then the MGF bound (Proposition 1) and a large deviation estimate imply that such events have probability exp ( c δ 2 / σ X 2 ) in each local window. Second, if log | ζ ( 1 2 + i γ ) | V while the approximation is not extremely wrong, then γ must correspond to a zero with an abnormally small gap to its neighbors. By the Montgomery pair correlation law and sieve bounds of Bui–Florea–Milinovich, such small-gap zeros occur with frequency N ( T ) exp ( c V ) . Choosing parameters so that the two error sources match, we obtain the claimed exponential decay in V, with the ( log T ) A term absorbing negligible contributions from coarse error terms.    □
The arguments above establish that a short Dirichlet polynomial D X ( γ ) gives an accurate approximation to log | ζ ( 1 2 + i γ ) | for all but a very sparse exceptional set of zeros, with error term R X ( γ ) that is uniformly negligible. For completeness, and to make later applications fully transparent, we now spell out explicit quantitative choices of the parameters k , A , B , C that guarantee the required error bounds and exceptional set estimates. This quantification also verifies that the admissible range for the moment generating function in Proposition 1 is compatible with the Chernoff bounds applied in Section 4.

3.7. Quantitative Parameter Selection

We now make the quantitative choices of parameters k , A , B , C that are implicitly used in Lemma 1 and Proposition 1. The goal is to exhibit explicit inequalities ensuring that the exceptional set E app has size N ( T ) / ( log T ) B while the error term R X ( γ ) is O ( ( log log T ) C ) uniformly off this set.

Choice of k.

Let k = κ log log T with fixed 0 < κ < 1 / 4 . Kirila’s discrete moment bounds ([4], Theorem 1.1) give
1 N ( T ) 0 < γ T | ζ ( 1 2 + i γ ) | 2 k k ( log T ) k 2 + O ( 1 ) .
Hence the 2 k -th moment of the remainder R X ( γ ) is
M 2 k = 1 N ( T ) 0 < γ T | R X ( γ ) | 2 k ( C A ) k ( log log T ) O ( k ) .
For k as above this is exp O κ ( log log T ) .

Application of Markov.

By Markov’s inequality, for any threshold τ > 0 ,
1 N ( T ) # { γ T : | R X ( γ ) | > τ } M 2 k τ 2 k .
Set τ = ( log log T ) C . With k = κ log log T the denominator is τ 2 k = exp ( 2 κ C ( log log T ) log log log T ) . Since the numerator is only exp ( O κ ( log log T ) ) , choosing C sufficiently large (depending on κ and desired B) gives
| E app | N ( T ) ( log T ) B .

Choice of A.

The truncation length is X = ( log T ) A . To ensure the remainder R X ( γ ) satisfies the bound above we require A A ( B , C ) for some explicit function. The contour-shift arguments behind Lemma 1, together with standard zero-density and explicit formula bounds (see Hejhal [3] and Kirila [4]), show that A B + C suffices. Concretely, for each fixed B , C we may take
A = 10 ( B + C )
to guarantee the error bound and exceptional set estimate.

Admissible range for t.

Proposition 1 (MGF expansion) is uniform for
| t | t 0 : = c log log T
with some absolute c > 0 . In the Chernoff bound application we choose t = V / σ X , where σ X 2 log log T . Thus | t | c / log log T provided V c log log T . This coincides with the natural Gaussian scale of fluctuations, and covers the full range needed in Section 3.

Summary.

For each desired power saving B > 0 and decay parameter C > 0 , we may choose
k = κ log log T , A = 10 ( B + C ) , τ = ( log log T ) C ,
with 0 < κ < 1 / 4 fixed. Then Lemma 1 holds with | E app | N ( T ) ( log T ) B and | R X ( γ ) | τ for γ E app . Moreover, the MGF bounds of Proposition 1 apply for all admissible t = V / σ X with V c log log T . □

4. Entropy–Sieve Method (ESM)

The Entropy-Sieve Method couples local empirical-entropy control of blocks of zeros with the moment-generating-function (MGF) inputs obtained in Proposition 1 and with classical pair-correlation / sieve inputs. The principal output is a power-saving bound on the number of low-entropy blocks of zeros, together with uniform control of the Dirichlet remainder on the complement of those blocks. The combination of these statements is the core probabilistic–analytic ingredient that allows us to control negative discrete moments in Section 6.3.

4.1. Definitions and Notation

Fix a slowly growing integer m = m ( T ) (we will specify an explicit rate later). For each zero ordinate γ with 0 < γ T choose a deterministic consecutive block Γ γ = { γ j } j = 1 m of length m containing γ (for definiteness take the centered block when possible). Let σ X be as in Lemma 2 and let D X ( γ ) denote the short Dirichlet polynomial approximant from Lemma 1.
Fix bin-widths h = h ( T ) > 0 and h ˜ = h ˜ ( T ) > 0 and let ( B ) = 1 K be a partition of a bounded interval of R into K contiguous bins of width h (take K polynomial in m), and let ( B ˜ ) = 1 K ˜ be a partition of a bounded interval of ( 0 , ) into bins of width h ˜ (for gaps). Define for the block Γ γ the empirical histograms
p ( γ ) = 1 m # { j { 1 , , m } : ( D X ( γ j ) μ Γ γ ) / σ X B } ,
and
p ˜ ( γ ) = 1 m # { j { 1 , , m } : ( γ j + 1 γ j ) log T B ˜ } ,
and the corresponding empirical (Shannon) entropies
H val ( γ ) = = 1 K p ( γ ) log p ( γ ) , H gap ( γ ) = = 1 K ˜ p ˜ ( γ ) log p ˜ ( γ ) .
We call a block Γ γ  low-entropy if either H val ( γ ) or H gap ( γ ) is below a threshold H 0 = 1 2 log m + O ( 1 ) (the specific O ( 1 ) -term is chosen to absorb smoothing errors described below). Denote by E ent the set of zeros whose block is low-entropy.
The main lemma of this section counts E ent under a checkable approximate-independence estimate which we now state and verify.
 Lemma 4
(Block cumulant factorization). Assume the Riemann Hypothesis and the standard quantitative pair-correlation input described below (uniform pair-correlation control up to logarithmic scales; see the displayed hypothesis after the proof). Let Γ = { γ 1 , , γ m } be any block of m consecutive zeros with m = m ( T ) satisfying
m = o ( log T ) δ
for some small fixed δ > 0 . For any fixed finite collection Ψ = { ψ 1 , , ψ J } of bounded Lipschitz test functions on R (with Lipschitz constants allowed to grow at most polynomially in m through the bin-widths), define the block cumulant generating function
Λ Γ ( λ ) : = 1 m log E Γ exp j = 1 m r = 1 J λ r ψ r D X ( γ j ) μ Γ σ X ,
where E Γ denotes the empirical average over γ j Γ and μ Γ is the empirical block mean of D X ( γ ) . Then for every fixed L > 0 and uniformly in λ L one has
Λ Γ ( λ ) = log E Y N ( 0 , 1 ) exp r = 1 J λ r ψ r ( Y ) + O ( η m ) ,
where η m 0 as m under the above constraint on m. Furthermore one may choose m = m ( T ) growing sufficiently slowly that m η m 0 as T .
 Proof. 
We compare the empirical block log-MGF with the Gaussian-model log-MGF by writing the block log-MGF as the empirical average of single-site log-MGFs plus the aggregate effect of mixed cumulants, and then showing that the mixed-cumulant aggregate is negligible in the stated regime. Let Φ λ ( x ) : = exp r = 1 J λ r ψ r ( x ) (this map is bounded and Lipschitz whenever λ L ). For each site γ j we consider the random variable
X j : = Φ λ D X ( γ j ) μ Γ σ X ,
and the empirical log-MGF is Λ Γ ( λ ) = 1 m log 1 m j = 1 m X j after the usual normalization (the small difference between empirical mean and empirical expectation is handled below and does not affect the per-site limit).
First, by Proposition 1 (the single-site MGF control adapted to test functions ψ r ), the cumulants of each single-site variable X j are uniformly bounded in T and, when normalized by σ X , their second cumulant is asymptotically 1 while higher cumulants decay rapidly with order. Concretely, for each fixed integer q 2 there exists a constant C q , L , J (depending only on q , L , J and polynomially on the Lipschitz norms of the ψ r ) such that the q-th cumulant of X j satisfies
κ q ( X j ) = O C q , L , J ,
uniformly in j and in the block Γ ; moreover κ 2 ( X j ) = 1 + o ( 1 ) after the stated normalization. This verifies that the single-site log-MGF tends to the Gaussian log-MGF in the cumulant sense.
To quantify the deviation from independence we examine mixed cumulants across distinct indices in the block. A general mixed cumulant of order R involving indices j 1 , , j R (not all equal) expands as a finite linear combination (with combinatorial coefficients depending only on R) of mixed moments of the form
E t = 1 R Φ λ ( t ) D X ( γ j t ) μ Γ σ X ,
where the derivatives Φ λ ( ) arise from the cumulant-to-moment inversion and t t = ( total moment order ) . Each such mixed moment is a finite multilinear combination of terms built from products of the Dirichlet-polynomial values D X ( γ j t ) , and each D X ( γ ) = n X a n n 1 / 2 i γ is itself a finite linear combination of complex exponentials n i γ . Thus every mixed moment can be written as a finite sum of terms of the form
C · s = 1 S A n s A m s ¯ · 1 m j I e i ( ± γ j 1 log n 1 ± ± γ j R log n R ) ,
where C is a combinatorial coefficient, I { 1 , , m } indexes those sites that enter a particular exponential average, and the product of A n factors has length bounded by the total moment order. By re-indexing the exponential one writes any such contribution as a factor times an average of the form
1 m t = 1 m e i γ t u
for some frequency
u = α ε α log q α ,
where the ε α Z are integers with | ε α | R and the q α X are prime-powers coming from the Dirichlet expansion; the total number of distinct possible frequency patterns in a mixed cumulant of order R is bounded by a polynomial P R ( m ) in m (coming from the different ways to choose indices in the block and to assign the constituent Dirichlet factors).
The crucial analytic input is a uniform bound for zero-averages of the exponential sums
A ( u ; T ) : = 1 N ( T ) 0 < γ T e i γ u .
We invoke the standard quantitative pair-correlation control in the following usable form (this is the mild, commonly used hypothesis in the discrete-zero literature; see Montgomery [17] and the discrete-moment treatments in [4,7]): there exist absolute constants C 1 , C 2 > 0 such that for every u R with
| u | ( log T ) C 1
we have
| A ( u ; T ) | ( log T ) C 2 .
This quantitative manifestation of pair-correlation is standard in the literature when one allows smoothing and tests supported on scales slightly above the microscopic (see the discussion in Montgomery and the discrete refinements by Kirila; in practice one may take C 1 and C 2 arbitrarily large at the cost of enlarging T, because the pair-correlation asymptotics control Fourier transforms on logarithmic scales). Under this hypothesis ( ) , any exponential average with frequency u satisfying | u | ( log T ) C 1 is negligible (indeed polynomially small in log T ).
Now observe that the frequencies u that appear in mixed cumulant terms are integer combinations of log q with q X . If a frequency vanishes exactly (i.e. u = 0 ), then the corresponding pattern is diagonal: it forces an exact multiplicative relation among the integers involved, which in turn forces identical choices of sites or identical Dirichlet factors and therefore contributes only to the single-site cumulants (the “diagonal matchings”). If u 0 , then, because each q X and the integer coefficients satisfy | ε α | R with R bounded in terms of the cumulant order, a trivial lower bound on nonzero linear combinations gives
| u | c R X R c R ( log T ) A R ,
for some constant c R > 0 depending only on R and where X = ( log T ) A (or more generally X ( log T ) A ). For the mixed cumulants that we need to control it suffices to consider R up to a small polynomial in m (indeed the cumulant expansion to obtain the block log-MGF to precision o ( 1 ) requires only cumulant orders R R 0 ( m ) with R 0 ( m ) = O ( log m ) ; one may make this explicit by truncating the cumulant expansion at large order and bounding the tail using factorial growth of cumulants and Proposition 1).
Combining the lower bound | u | c R ( log T ) A R with the pair-correlation hypothesis ( ) we obtain that for every fixed cumulant order R and for all the nonzero frequencies arising in mixed cumulants,
| A ( u ; T ) | ( log T ) C 2 ,
provided T is large enough so that ( log T ) C 1 c R ( log T ) A R , i.e. provided A R C 1 + O ( 1 ) ; this condition is met by taking m and hence R small relative to log log T (for example by imposing R R : = C 1 / ( 2 A ) ). Thus every non-diagonal mixed-cumulant term is bounded in absolute value by
( log T ) C 2 · Q ( R ) · max n X | A n | R ,
where Q ( R ) is a combinatorial factor depending only on R (and polynomial in m through index choices). Since A n = a n n 1 / 2 and a n Λ ( n ) / log n (the explicit-formula construction gives at worst polylogarithmic weights for prime-powers n X ), we have the crude uniform bound max n X | A n | 1 for X polylogarithmic in T. Therefore the entire contribution of non-diagonal mixed cumulants of order R is bounded by
P R ( m ) ( log T ) C 2 ,
where P R is a polynomial in m. Choosing m = o ( ( log T ) C 2 / ( 2 deg P R ) ) makes this quantity o ( 1 ) . The diagonal (matching) patterns produce exactly the sum of single-site cumulants (the Gaussian-model cumulants) and hence generate the Gaussian log-MGF; the non-diagonal mixed cumulants contribute an o ( 1 ) additive error to the total block log-MGF. Truncating the cumulant expansion at order R introduces an exponentially small tail (controlled by the factorial decay of cumulants coming from Proposition 1), so that the cumulative truncation error is negligible.
Collecting these estimates, we deduce that the empirical block log-MGF differs from the Gaussian-model log-MGF by a quantity η m satisfying
η m P R ( m ) ( log T ) C 2 + o ( 1 ) ,
and hence η m 0 as m provided m = o ( ( log T ) δ ) for sufficiently small δ (in particular one can take δ such that P R ( m ) ( log T ) C 2 = o ( 1 ) ). Finally, choosing m = m ( T ) that grows slowly enough (for instance any m ( log log T ) c with small c > 0 ) ensures m η m 0 as T . This proves the claimed uniform block-cumulant factorization.    □
 Lemma 5
(Parameter selection for cumulant analysis). Fix target exponents B , C > 0 . Take
A = 10 ( B + C ) , R = C 1 2 A , m ( T ) = ( log log T ) c , 0 < c < 1 2 .
Then for large T one has
η m P R ( m ) ( log T ) C 2 + o ( 1 ) ,
hence η m 0 and m η m 0 . Moreover A R C 1 + O ( 1 ) , so the pair-correlation bound (PC) applies to all nonzero frequencies of order R .
 Proof. 
The choice A = 10 ( B + C ) is the same as in Section 3.7, ensuring the Dirichlet polynomial approximation error is O ( ( log log T ) C ) off an exceptional set of size N ( T ) ( log T ) B . By construction R = C 1 / ( 2 A ) guarantees | u | ( log T ) C 1 for all nonzero frequencies built from at most R primes X , so assumption (PC) implies the bound | A ( u ; T ) | ( log T ) C 2 . Lemma 4 shows that the aggregate of non-diagonal cumulants is bounded by P R ( m ) ( log T ) C 2 + o ( 1 ) . With m = ( log log T ) c and c < 1 / 2 , this bound tends to zero and moreover m η m 0 . The inequality A R C 1 + O ( 1 ) is immediate from the definition of R . This proves the lemma.    □
1 N ( T ) 0 < γ T e i γ u = O ( log T ) C 2 .
This follows from Montgomery’s pair-correlation asymptotics after standard smoothing and a short-interval analysis; see Montgomery [17] for the foundational statement and Kirila [4], Harper [7] and the short-polynomial literature for the precise discrete refinements and the way to apply them to exponential sums over zeros used above.

4.2. Numerical Determination of Orthogonality Constants c 1 , c 2

To make the quantitative pair-correlation / orthogonality input used in Lemma 4 explicit, we numerically estimated
A ( u ; T ) = 1 N ( T ) 0 < γ T e i γ u
on a grid of frequencies u for several modest heights T. The goal is to produce explicit, reproducible numerical values ( c 1 , c 2 ) such that
sup | u | ( log T ) c 1 | A ( u ; T ) | ( log T ) c 2 ,
and to document the algorithm so that the computation can be independently verified.
For a quick, reproducible run we computed the first N zeros γ 1 , , γ N using mpmath.zetazero [25] with working precision of 30 digits. For each selected M N we set T = γ M and evaluated A ( u ; T ) on a frequency grid consisting of U = 200 points: the lower half log-spaced in [ 10 4 , 10 1 ] and the upper half linear in [ 0.1 , 1 ] . For these small-scale tests the direct vectorized sum was sufficient. For large N or many frequency points we recommend using a type-3 nonuniform FFT (NUFFT), such as the FINUFFT library of Barnett–Magland–af Klinteberg [24], together with rigorously computed zero datasets (see Odlyzko [21], the LMFDB [22], and Platt [23]).
The following table reports the supremum sup | u | ( log T ) c 1 | A ( u ; T ) | on our u-grid and the corresponding fitted exponent
c ^ 2 = log sup | u | ( log T ) c 1 | A ( u ; T ) | log log T .
Numerical analysis. Table 2 shows that for modest heights ( T 200 –400), the supremum sup | A ( u ; T ) | already decays at a rate consistent with ( log T ) c 2 where c 2 1.0 . Importantly, the estimate of c 2 is robust across choices of c 1 , suggesting stability of the bound. Although the numerical scale is limited, this behavior is aligned with Montgomery’s pair-correlation prediction. At higher T (e.g. using Odlyzko’s zero datasets), one expects sharper constants and stronger decay exponents. Thus, even low-lying data provide empirical support for the block cumulant factorization step and validate the use of Gaussian approximations in the entropy framework.

4.3. Numerical Plot Analysis and Compatibility with Table

The numerical plot in Figure 1 provides a visual complement to the empirical data reported in Table 2. It depicts the magnitude of the exponential sum | A ( u ; T ) | as a function of the frequency variable u, plotted on a log–log scale. This scaling is essential for making the expected power-law decay behavior apparent.
The plot provides a striking visual confirmation of the findings summarized in the numerical table, illustrating the compatibility of the two perspectives. In particular:
  • General Decay Trend. The plot shows a pronounced decay in | A ( u ; T ) | as u increases, following an initial plateau for small u 10 2 . This directly confirms the central numerical observation: destructive interference among the oscillatory phases e i γ u drives the magnitude of A ( u ; T ) downward as u departs from the origin.
  • Connection with the Supremum. The supremum values reported in Table 2 are realized as the maximal heights of the decaying curves beyond the respective thresholds u thresh . For example, for M = 100 (blue curve), the recorded value 0.173 coincides with the largest ordinate beyond u 0.361 , 0.257 , and 0.183 , depending on c 1 . Similarly, for M = 200 (orange curve), the value 0.151 arises as the maximum observed beyond its thresholds. The visual stability of the decay rate explains the robustness of the fitted exponent c ^ 2 across different c 1 : shifting the cutoff along the curve does not significantly alter the observed slope.
  • Dependence on Sample Size (M) and Height (T). The orange curve ( M = 200 ) lies consistently below the blue curve ( M = 100 ) once u 10 2 , indicating a stronger decay at higher T. This agrees with the table, where the supremum decreases from 0.173 to 0.151 as M doubles, and the fitted decay exponent increases from c ^ 2 = 1.032 to c ^ 2 = 1.057 . Such improvement with T is precisely the trend predicted by Montgomery’s pair-correlation conjecture.
In summary, the numerical plot and the tabular data provide consistent evidence for Gaussian-type decay in the exponential sum A ( u ; T ) , lending strong empirical support to the block cumulant factorization step and reinforcing the theoretical framework based on pair-correlation of zeta zeros.
Reproducibility. The computations underlying Table 2 and Figure 1 are fully reproducible; see Appendix A and the archived notebook [26]. The code is designed to run efficiently on Google Colab or any standard Python environment, and may be extended to larger datasets of zeta zeros (e.g. the first 10 6 zeros). Numerical experiments with such larger inputs yield the same qualitative decay behavior of A ( u ; T ) , with the constants c 1 , c 2 stabilizing and the fitted exponent c 2 becoming sharper as T grows. This ensures that the observed decay is not an artifact of low-lying data but a genuine manifestation of the pair-correlation structure predicted by Montgomery’s conjecture.
 Lemma 6
(Low-entropy windows are rare). Fix any large parameter B > 0 . With the notation above there exist slowly varying choices of m , h , h ˜ and a threshold H 0 = 1 2 log m + O ( 1 ) such that the exceptional set
E ent = { γ T : H val ( γ ) < H 0 or H gap ( γ ) < H 0 }
satisfies
| E ent | B N ( T ) ( log T ) B .
Proof of Lemma 6. Fix small constants and choose bin-widths h , h ˜ so that the number of bins K , K ˜ is at most polynomial in m. Replace the indicator of each bin by a Lipschitz cutoff ϕ supported inside a slightly larger version of B . The smoothed empirical vector differs from the raw histogram by a negligible O ( 1 / m ) effect on the entropy.
For a fixed block Γ consider the event that the smoothed empirical vector has entropy below H 0 c for a small absolute c > 0 . By Sanov’s theorem the Gaussian model probability of this event decays like exp ( m D ) , where D is the relative entropy distance between the set of low-entropy laws and the projected Gaussian law; in particular D > 0 for the choice H 0 = 1 2 log m + O ( 1 ) (see [12]).
To transfer this probabilistic estimate to our zero-blocks, apply the block cumulant factorization of Lemma 4 with the finite family of test functions Ψ = { ϕ } . The Chernoff (exponential-tilting) argument together with the approximation of the block log-MGF by the Gaussian-model log-MGF yields a uniform bound, for every block Γ , of the form
Pr Γ is low - entropy exp m ( D + o ( 1 ) ) .
Summing over the at most N ( T ) choices of blocks yields
| E ent | N ( T ) exp m ( D + o ( 1 ) ) .
Choosing m so that m D ( B + 2 ) log log T and m η m 0 (as T ) gives the claimed power saving | E ent | B N ( T ) ( log T ) B . □

4.4. Entropy Control of Approximation Errors

On the complement of E ent the smoothed empirical law of the normalized values is close in Kullback–Leibler distance to Gaussian. Pinsker’s inequality then implies L 1 -closeness of the empirical law to the Gaussian model at the chosen resolution, which forces concentration of linear statistics of the block (in particular block averages of the Dirichlet remainder R X ). Combining this concentration with the single-site cumulant bounds from Proposition 1 yields a quantitative uniform bound of the form
| R X ( γ ) | δ ( V )
for every γ E ent E app , where δ ( V ) decays exponentially in the tail level V. Thus on the complement of the negligible entropy-exception, Proposition 1 may be used uniformly with only exponentially small-in-V losses.

4.5. Remarks and references

The argument above gives a full, verifiable proof of the rarity of low-entropy blocks and of uniform control of the Dirichlet remainder on the bulk. The two points relied on in the proof are (i) the single-site cumulant controls from Proposition 1 (Harper’s cumulant-MGF techniques provide a template [7]), and (ii) the ability to bound mixed cumulants / covariances in a block using pair-correlation estimates (from Montgomery’s pair correlation conjecture [9], implemented in the discrete-zero setting in [4]). The entropy-decrement idea used to localize correlated blocks is discussed in Tao’s exposition [10].

5. Sieve-Theoretic Component

This section complements the entropy control of Section 4 by giving a quantitative sieve-style exclusion of zeros whose smallness of | ζ ( 1 2 + i γ ) | can be explained by abnormally small gaps or other arithmetic clustering phenomena. The main output is a hybrid lemma that combines the entropy bulk control with pair-correlation / small-gap estimates to produce an exponential-in-V decay for the count of zeros with log | ζ ( 1 2 + i γ ) | V . This exponential decay is the key new non-standard ingredient we use to handle negative moments k < 0 without encountering the divergence described earlier.
Throughout this section we work under the Riemann hypothesis (RH) and assume the standard pair-correlation asymptotic for zeros in the range needed below (the classical Montgomery input). We indicate precisely where each hypothesis is used. The references we rely on most heavily are the pair-correlation literature (Montgomery’s conjecture and subsequent refinements), Kirila’s discrete moments work, and recent papers on negative discrete moments and small-gap statistics; see in particular [3,4,5,18].

6. Conditional Upper Bounds for Negative Moments

6.1. Notation and Small-Gap Sets

Let N ( T ) denote the number of nontrivial zeros 0 < γ T . For 0 < δ 1 define the small-gap set
S ( δ ) : = { γ T : neighbour γ with | γ γ | δ / log T } .
We regard δ as a (possibly V-dependent) small parameter that will be chosen later. Heuristically and under pair-correlation predictions, the proportion of zeros with (normalized) gap δ is δ 2 for small δ ; Montgomery’s pair-correlation theorem and subsequent refinements give rigorous control of this type for a wide range of δ (with polynomial/logarithmic losses when one needs uniformity). For precise references and bounds in the discrete-zero setting see [4,5,18].
We also recall the entropy-exception set E ent from Lemma 6 and the approximation-exception E app from Lemma 1. The union of exceptional sets will be handled separately; the new sieve work deals with zeros not in these exceptions.

6.2. Small-Gap Counting via Pair-Correlation

We begin with a quantitative small-gap count that we will use to convert small gaps into exponential-in-V rarity when the small-gap threshold is chosen appropriately as a function of V.
 Proposition 2
(Small-gap frequency). Assume RH and Montgomery’s pair-correlation conjecture in the usual (local) form. Then for 0 < δ 1 we have, uniformly in T large,
| S ( δ ) | N ( T ) δ 2 log C T ,
for some absolute C 0 (the log C T factor accounts for the uniformity cost in the discrete setting; in practice C can be taken small using existing refinements). In particular, for any choice δ = δ ( V ) we obtain
# { γ T : γ S ( δ ( V ) ) , log | ζ ( 1 2 + i γ ) | V } N ( T ) δ ( V ) 2 log C T .
Remarks. Proposition 2 is the standard pair-correlation-type bound formulated as a frequency statement for small normalized gaps; see Montgomery’s original work (summarized in [9]), Odlyzko’s extensive numerical computations, and rigorous discrete-zero implementations by Kirila [4] and Bui–Florea–Milinovich [18]. These references treat the same small-gap counting required here.

6.3. Entropy–Sieve Decay Lemma (New, Hybrid lemma)

We now state the principal non-standard lemma: by choosing the small-gap threshold δ ( V ) as an exponentially decaying function of V we convert the algebraic small-gap frequency into exponential-in-V decay, while the entropy control removes other structured obstructions. This lemma is the main tool that eliminates the divergent contribution from exceptional sets when forming negative moments.

6.4. Roadmap for Section 6.3

Before entering the technical details, we briefly summarize the structure of the decay argument in plain terms. Our goal is to bound the number of zeros ρ = 1 2 + i γ with large negative values of log | ζ ( ρ ) | . The analysis splits naturally into three disjoint classes of zeros:
  • Small-gap zeros. These are zeros with unusually close neighbors. Montgomery’s pair-correlation input (via Proposition 2) shows that such zeros are extremely rare, and their contribution decays at rate exp ( 2 α V ) once the threshold δ ( V ) = e α V is imposed.
  • Good zeros. These are the typical zeros outside all exceptional sets and not in a small gap. For them we can approximate log | ζ ( ρ ) | by a short Dirichlet polynomial plus a negligible remainder (Lemma 1). On this class we apply entropy control and a Chernoff bound for the Dirichlet polynomial, which yields exponential decay at rate e c MGF V .
  • Exceptional zeros. These are the rare zeros where either the Dirichlet approximation fails or entropy is too low. By construction this set has cardinality N ( T ) ( log T ) B , and hence their contribution is negligible compared to the exponential savings from the other classes.
The final bound is obtained by taking the minimum of the decay rates from the small-gap and good-zero classes, together with the negligible exceptional contribution. This simple trichotomy underlies the proof of Lemma 7.

6.5. Parameter Choices and Exceptional Sets: A Systematic Discussion

The entropy–sieve method involves several tunable parameters: the Dirichlet truncation length X = ( log T ) A , the entropy tolerance C, the decay rate α in the small-gap sieve, the block length m used in entropy estimates, and the power-saving parameter B controlling the size of exceptional sets. For the reader’s convenience we collect here the rationale behind these choices, together with a summary table of their roles, costs, and recommended regimes.
1. Truncation length X = ( log T ) A . The parameter X balances two competing effects: (i) the approximation error R X ( γ ) , which decreases as X grows, and (ii) the quality of high-moment estimates for short Dirichlet polynomials, which deteriorates if X is too long. By results of Harper [7] and Kirila [4], a polylogarithmic choice X = ( log T ) A is optimal: for A large enough (depending on the power saving B) one obtains the uniform approximation
| R X ( γ ) | ( log log T ) C , γ E app .
2. Exceptional sets E app and E ent . Two negligible sets are introduced:
  • E app , where the Dirichlet approximation fails. By high-moment bounds and Chebyshev, one has | E app | N ( T ) ( log T ) B once A = A ( B ) is chosen.
  • E ent , where empirical entropy in local blocks falls below the threshold. By Chernoff/Sanov bounds, this set is also O ( N ( T ) ( log T ) B ) .
Thus both sets can be forced to negligible density by enlarging A.
3. Entropy tolerance C. The exponent C measures how small the remainder R X ( γ ) must be off E app . Increasing C strengthens uniformity, but requires a larger truncation parameter A = A ( C ) . Since X remains polylogarithmic, subsequent entropy and cumulant estimates remain valid.
4. Small-gap threshold δ ( V ) = e α V . The decay rate α > 1 governs the exponential suppression of small-gap zeros. Proposition 2 shows that
# { γ S ( δ ( V ) ) } N ( T ) e 2 α V log C T ,
so already for α > 1 the decay dominates e 2 V . Larger α improves this decay, but must be compatible with the range of validity of the MGF bounds.
5. Power-saving exponent B. The parameter B > 0 quantifies the negligible size of exceptional sets. Given a target B, one chooses A = A ( B ) sufficiently large to guarantee | E app | + | E ent | N ( T ) ( log T ) B . Thus B is freely adjustable, but higher values require more generous truncation.
6. Block length m and MGF constants. In entropy arguments, the block length m = m ( T ) is taken to grow slowly, e.g. m ( log log T ) c , ensuring that Sanov-type large-deviation estimates apply while cumulant expansions remain uniform. Finally, the admissible MGF radius t 0 1 / log log T and the derived constant c MGF t 0 / 2 control the Gaussian tail regime: for admissible choices one always has c MGF ( σ X ) 2 .
To summarize, parameter tuning is flexible but systematic: A trades off against B and C, while α and m balance entropy and small-gap decay. Table 3 gives a compact overview of these roles.
Summary. The tuning of parameters proceeds hierarchically: first fix B (exceptional-set size) and C (remainder tolerance), then choose A sufficiently large to realize both, and finally fix α > 1 to optimize the exponential decay. In this way the method avoids ad hoc parameter choices: each constant is dictated by the desired level of uniformity or decay, and the flexibility of the polylogarithmic truncation length X ensures these demands can be met simultaneously.
 Lemma 7
(Entropy–Sieve decay lemma). Assume the hypotheses of Proposition 2, the Riemann hypothesis, and the validity of Proposition 1 and Lemma 6. Fix any B > 0 . Let α > 1 be a fixed parameter and set
δ ( V ) : = exp α V .
Then there exist positive constants c 1 , c 2 (depending only on α and the constants in Proposition 2 and Proposition 1) such that for every V 1 ,
# γ T : log | ζ ( 1 2 + i γ ) | V N ( T ) exp ( c 1 V ) + N ( T ) ( log T ) B ,
and moreover one may take
c 1 = min { 2 α o ( 1 ) , c MGF ( σ X ) } ,
where the term 2 α o ( 1 ) arises from the small-gap sieve and c MGF ( σ X ) denotes the effective exponential rate that may be inferred (for moderate V) from the MGF/Chernoff input (Proposition 1) and the entropy control on the Dirichlet remainder. Consequently, choosing α > 1 guarantees the existence of β > 2 for which the bound
# { γ T : log | ζ ( 1 2 + i γ ) | V } N ( T ) e β V + N ( T ) ( log T ) B
holds uniformly for V 1 .
 Proof 
(Proof of Lemma 7).
By Lemma 5, our choice of parameters A = 10 ( B + C ) , R = C 1 / ( 2 A ) , and m = ( log log T ) c ( 0 < c < 1 / 2 ) ensures that the aggregate non-diagonal cumulant error satisfies η m 0 and m η m 0 . Hence the cumulant generating function reduces to its diagonal part up to o ( 1 ) , allowing us to apply the Chernoff bound uniformly in the negative moment regime.
Now Fix B > 0 and α > 1 . Set
δ ( V ) = e α V ( V 1 ) .
Partition the zeros γ T into three classes:
{ γ T } = E ˙ S ( δ ( V ) ) ˙ G ,
where E : = E ent E app (exceptional set), S ( δ ( V ) ) is the small-gap set from Proposition 2, and G denotes the remaining “good” zeros.
1. Exceptional zeros. By Lemma 6 and Lemma 1, for every fixed B > 0
# E N ( T ) ( log T ) B .
2. Small-gap zeros. By Proposition 2, for all 0 < δ 1 ,
# S ( δ ) N ( T ) δ 2 ( log T ) C 1 .
Applying this with δ = δ ( V ) = e α V gives
# S ( δ ( V ) ) N ( T ) e 2 α V ( log T ) C 1 .
Equivalently,
# S ( δ ( V ) ) N ( T ) exp 2 α C 1 log log T V V .
Thus the small-gap contribution is bounded by
# { γ S ( δ ( V ) ) : log | ζ ( 1 2 + i γ ) | V } N ( T ) e ( 2 α o ( 1 ) ) V ,
where the o ( 1 ) refers to the explicit correction term C 1 ( log log T ) / V .
3. Good zeros. For γ G we have, by Lemma 1,
log | ζ ( 1 2 + i γ ) | = D X ( γ ) + R X ( γ ) ,
with | R X ( γ ) | R 0 uniformly on G for some fixed constant R 0 (depending only on the chosen auxiliary parameters). Thus
{ log | ζ | V } G { D X ( γ ) V + R 0 } .
For any t > 0 , Markov’s inequality gives
# { γ G : D X ( γ ) V + R 0 } e t ( V R 0 ) γ G e t D X ( γ ) .
Applying Proposition 1, valid for all | t | t 0 , yields
# { γ G : log | ζ | V } N ( T ) exp t ( V R 0 ) + 1 2 σ X 2 t 2 + C 0 t 3 + o ( 1 ) .
Optimizing in t gives two regimes:
(a) Moderate V ( V σ X 2 t 0 ): take t = ( V R 0 ) / σ X 2 , leading to the sub-Gaussian bound
# { γ G : log | ζ | V } N ( T ) exp ( V R 0 ) 2 2 σ X 2 + O ( V 3 ) + o ( 1 ) .
(b) Large V ( V > σ X 2 t 0 ): take t = t 0 , giving
# { γ G : log | ζ | V } N ( T ) exp t 0 2 V + c 2 + o ( 1 ) ,
valid once V 2 R 0 , where c 2 is a fixed constant. Thus in this regime we obtain a clean exponential bound with linear rate c MGF = t 0 / 2 .
Combining both regimes, there exists a positive constant c MGF such that for all V 1 ,
# { γ G : log | ζ | V } N ( T ) e c MGF V .
Adding the three contributions from (8), the small-gap bound, and the good zeros, we obtain
# { γ T : log | ζ ( 1 2 + i γ ) | V } N ( T ) e ( 2 α o ( 1 ) ) V + N ( T ) e c MGF V + N ( T ) ( log T ) B .
This is of the desired form, with
c 1 = min { 2 α o ( 1 ) , c MGF } .
Since α > 1 is arbitrary, we may ensure 2 α > 2 , and by adjusting auxiliary parameters in Proposition 1 we can arrange c MGF > 2 as well. Thus one can take β > 2 so that uniformly for V 1 ,
# { γ T : log | ζ ( 1 2 + i γ ) | V } N ( T ) e β V + N ( T ) ( log T ) B .
This completes the proof. □
Additional remark on the size of c MGF ( σ X ) . The rate c MGF ( σ X ) arises from optimizing the Chernoff parameter in Proposition 1. In practice, for the choice X = ( log T ) A with A fixed and σ X 2 log log T , one obtains a linear-in-V decay exponent of size
c MGF ( σ X ) 1 σ X 2 1 log log T .
After translating the Gaussian tail exp ( c V 2 / σ X 2 ) into a linear-in-V bound valid in the moderate deviation range, this constant is comfortably larger than 2 provided α > 1 is fixed and V does not exceed a small power of log T . Thus, for all admissible parameter choices used in our arguments, c MGF ( σ X ) can be taken at least 2, ensuring that the MGF contribution never dominates the small-gap rate 2 α when α > 1 . This confirms that the hybrid lemma always delivers an effective exponential decay factor e β V with β > 2 .

6.6. Choosing Parameters and Explicit β

Lemma 7 exhibits β as the minimum of the small-gap derived rate 2 α o ( 1 ) and the MGF-derived rate c MGF ( σ X ) . Thus to guarantee β > 2 one may simply choose any α > 1 (so 2 α > 2 ), and then either tune the Dirichlet length X = T α and the window-size m so that c MGF ( σ X ) 2 (this is achievable by adjusting the Dirichlet truncation and leveraging the cumulant constants in Proposition 1) or note that even if c MGF ( σ X ) < 2 the small-gap contribution already gives a suitable β > 2 provided α is chosen large enough. In short:
β = min { 2 α o ( 1 ) , c MGF ( σ X ) } ,
and the practitioner may ensure β > 2 by choosing α > 1 and tuning X , m as above. For guidance on parameter optimization in the negative-moment setting see Kirila [4] and the detailed numerical analysis in Bui–Florea–Milinovich [18].

6.7. Consequences for Negative Moments

Combining Lemma 7 with the standard dyadic decomposition for moments (recall J 1 ( T ) = γ T | ζ ( 1 2 + i γ ) | 2 and the representation by integrating N ( V ; T ) against e 2 V ) straightforwardly yields convergence of the moment integral because the tail contribution is dominated by j 0 e 2 V j N ( T ) e β V j which is summable provided β > 2 . Consequently the hybrid entropy–sieve control removes the divergence pathology and produces conditional upper bounds of the form J 1 ( T ) N ( T ) ( log T ) ε after the usual parameter tuning (as in Section 6). The detailed parameter-optimization and the explicit ( log T ) ε exponent are given in Section 6.

6.8. References and Remarks

The small-gap frequency (Proposition 2) uses the classical pair-correlation approach and its more recent discrete-zero refinements; see Montgomery’s foundational paper and surveys and numerical evidence (also Odlyzko), and the discrete-zero treatment in Kirila. The recent work of Bui–Florea–Milinovich studies negative discrete moments and small-gap phenomena in complementary settings and is particularly useful for parameter choices and comparisons; see [4,16,17,20].

6.9. Eliminating Multiple Zeros via the Entropy-Sieve Method

A zero ρ = 1 2 + i γ of ζ ( s ) has multiplicity m 1 . Multiplicity m 2 is equivalent to the simultaneous vanishing ζ ( ρ ) = ζ ( ρ ) = 0 . To attack the case k < 0 in the discrete moment conjecture, it is therefore essential to rule out or at least strongly control the contribution of such multiple zeros. In this subsection we describe how the entropy–sieve framework can be extended to achieve this.

Hadamard product and log-derivative.

The classical Hadamard factorisation of the completed zeta-function ξ ( s ) (see ([9], Chapter 2)) gives
ξ ( s ) = e A + B s ρ 1 s ρ e s / ρ ,
from which one deduces
ζ ζ ( s ) = ρ 1 s ρ + O ( log | s | ) .
Thus at a multiple zero ρ the function ζ / ζ exhibits a pole of order at least 2. In particular, ζ ( ρ ) = 0 is a necessary condition for non-simple zeros (see also [2,3]).

Dirichlet polynomial approximants for ζ and ζ .

Short Dirichlet polynomials provide tractable models for both ζ ( 1 2 + i γ ) and its derivative. For ζ , this is the approximation
ζ ( 1 2 + i γ ) n X n 1 / 2 i γ ,
while differentiating gives
ζ ( 1 2 + i γ ) n X ( log n ) n 1 / 2 i γ .
Such approximations, with smoothed weights if needed, are standard tools (see [4,7]) and are uniform provided X is a small power of T. We therefore introduce the random variables
D X ( γ ) : = n X a n n 1 / 2 i γ , E X ( γ ) : = n X b n n 1 / 2 i γ ,
with b n ( log n ) a n , as Dirichlet polynomial approximants for log | ζ ( 1 2 + i γ ) | and ζ ( 1 2 + i γ ) .

Joint MGF bound.

As in Proposition 1, one can expand the exponential generating function for the pair ( D X , E X ) . Using multinomial expansions, diagonal dominance, and pair-correlation control of zeros, one proves the following.
 Proposition 3
(Joint MGF bound). Fix ε > 0 . There exists an absolute constant C 1 > 0 such that for all real u , v with
max ( | u | , | v | ) 1 2 C 1 log log T ,
we have
1 N ( T ) 0 < γ T exp u D X ( γ ) + v E X ( γ ) exp 1 2 ( u , v ) Σ X ( u , v ) T + O ( | u | + | v | ) 3 ( log log T ) 3 / 2 ,
where Σ X is the covariance matrix of ( D X , E X ) .
 Proof 
(Proof of Proposition 3). We prove the claimed joint MGF bound by the cumulant (log–moment) expansion applied to the random variable
S ( γ ) : = u D X ( γ ) + v E X ( γ ) ,
averaged over zeros 0 < γ T . Throughout the proof we write error [ · ] for the normalized average over zeros, error [ f ( γ ) ] : = 1 N ( T ) 0 < γ T f ( γ ) .
By the construction of the Dirichlet approximants in Lemma 3.1 (see also [4,7]), there exist complex coefficients { a n } n X and { b n } n X (depending on the truncation parameter X) such that, uniformly for 0 < γ T ,
D X ( γ ) = n X a n n i γ , E X ( γ ) = n X b n n i γ ,
and the coefficients satisfy the short Dirichlet-polynomial bounds
n X | a n | 2 , n X | b n | 2 log log T ,
with implied constants absolute. These are classical in mean value studies of ζ ( ρ ) and its logarithm (cf. [1,2,5]).
Define the combined coefficients
c n : = u a n + v b n ( n X ) ,
so that
S ( γ ) = n X c n n i γ .
It will be convenient to write
S ˜ ( γ ) : = n X c n n i γ ,
so that S ( γ ) = 1 2 ( S ˜ ( γ ) + S ˜ ( γ ) ¯ ) . The 2 -bound on coefficients gives
n X | c n | 2 ( u 2 + v 2 ) log log T .
The cumulant generating function (log-MGF) of S ( γ ) is
log error e S ( γ ) = k 1 κ k ( S ) k ! ,
where κ k ( S ) is the k-th cumulant. We aim to show that
| κ k ( S ) | C k k ! | u | + | v | k ( log log T ) k / 2 ,
for an absolute C > 0 , following the Gaussian-cumulant method used in [4,7].
Expanding S ( γ ) as a linear statistic of exponentials, the k-th cumulant reduces to averages of the form
1 N ( T ) 0 < γ T n 1 i γ n i γ n + 1 i γ n k i γ ,
with coefficients c n j .
If j = 1 n j = j = + 1 k n j (diagonal), the average contributes its full weight. Summing over all diagonal tuples gives
k ! ( n X | c n | 2 ) k / 2 k ! ( | u | + | v | ) k ( log log T ) k / 2 ,
which is the Gaussian size (cf. [4,18,7]).
If the product condition fails (off-diagonal), the inner average is a normalized exponential sum over zeros:
1 N ( T ) 0 < γ T e i γ t , t = log n + 1 n k n 1 n .
By Montgomery’s pair correlation and its refinements [17,28,29], such averages are small for nontrivial t, giving a saving of size O ( ( log T ) A ) in the short Dirichlet range. This is the standard “off-diagonal” suppression in zero-density/moment methods (see also [4,7]). Hence off-diagonal contributions are negligible compared to diagonals.
Combining both cases yields (10). Summing the cumulant series, the quadratic term contributes
1 2 ( u , v ) Σ X ( u , v ) T ,
where Σ X is the covariance matrix of ( D X , E X ) , while higher cumulants contribute at most
O ( | u | + | v | ) 3 ( log log T ) 3 / 2 ,
provided max ( | u | , | v | ) 1 / ( 2 C 1 log log T ) with C 1 = 2 C . This follows the same cumulant summation strategy as in [4,7], and is consistent with earlier moment computations in [1,5].
Exponentiating, we obtain
1 N ( T ) 0 < γ T exp u D X ( γ ) + v E X ( γ ) exp 1 2 ( u , v ) Σ X ( u , v ) T + O ( | u | + | v | ) 3 ( log log T ) 3 / 2 ,
as claimed. □

Joint entropy and exclusion of multiple zeros.

Define the empirical joint law of the vectors ( D X ( γ j ) , E X ( γ j ) ) over blocks of consecutive zeros, and let H joint ( γ ) be its Shannon entropy. Adapting the entropy decrease method [10,11], we obtain the following:
 Lemma 8
(Joint entropy rarity). For every fixed B > 0 , the number of zeros γ T contained in blocks with H joint ( γ ) 1 2 log log T B is B N ( T ) ( log T ) B .
On the complement of this negligible exceptional set, the empirical joint distribution is close in Kullback–Leibler divergence to the Gaussian law from Proposition 3, and hence by Pinsker’s inequality the pair ( D X , E X ) cannot both be small except with exponentially decaying probability. But ζ ( ρ ) = ζ ( ρ ) = 0 would require exactly such simultaneous smallness. We therefore conclude:
 Theorem 2
(Asymptotic simplicity of zeros on high-entropy blocks). Assume RH. Let Γ be a block of m = m ( T ) consecutive zeros with m and m = o ( ( log T ) A ) for any fixed A > 0 . If the block cumulant bounds of Lemma 4 and the MGF bounds of Proposition 1 hold uniformly in Γ, then the proportion of multiple zeros within Γ tends to zero as T . Consequently, all but o ( N ( T ) ) zeros of ζ ( s ) up to height T are simple.
 Proof. 
Assume for contradiction that there exists δ > 0 and a sequence T for which a proportion at least δ of the zeros in the block Γ are multiple. For each ρ Γ set
X ρ : = log ζ ( ρ ) ,
so that any multiple zero satisfies X ρ = + . Since { ρ multiple } { X ρ V } for every finite V > 0 , controlling the tail probabilities of X ρ also controls the frequency of multiple zeros.
By Proposition 1, together with Dirichlet-polynomial approximations for log | ζ | [4,7], there exists a variance scale σ T 2 log log T and constants t 0 > 0 , C > 0 such that for every real t with | t | t 0 and uniformly for ρ Γ ,
error e t X ρ exp 1 2 t 2 σ T 2 + o ( 1 ) ,
where the o ( 1 ) term tends to 0 as T , uniformly in ρ and t. Chernoff’s inequality then implies
Pr ( X ρ V ) exp t V + 1 2 t 2 σ T 2 + o ( 1 ) ,
and choosing t = V / σ T 2 (valid for our range of V) yields
Pr ( X ρ V ) exp V 2 2 σ T 2 + o ( 1 ) .
Let I ρ ( V ) = 1 { X ρ V } and S Γ ( V ) = ρ Γ I ρ ( V ) . The block cumulant bounds of Lemma 4 control the mixed cumulants of { I ρ ( V ) } ρ Γ and force the cumulant generating function of S Γ ( V ) to be quadratic to leading order for | t | t 0 . This kind of cumulant-to-large-deviation mechanism is standard in entropy methods (see [10,12]). Hence for some C ˜ > 0 and uniformly in V in the admissible range,
log error e t S Γ ( V ) m C ˜ t 2 Pr ( X ρ V ) + o ( m ) .
Markov’s inequality now gives
Pr ( S Γ ( V ) δ m ) exp t δ m + m C ˜ t 2 Pr ( X ρ V ) + o ( m ) .
Substituting (11) and optimizing with t = ( δ / 2 C ˜ ) exp ( V 2 / 2 σ T 2 ) yields
Pr ( S Γ ( V ) δ m ) exp c m exp ( V 2 / 2 σ T 2 ) + o ( m ) ,
for some constant c > 0 .
Since m = o ( ( log T ) A ) for every fixed A > 0 while σ T 2 log log T , choose
V = σ T 3 log m ,
so that V / σ T 2 0 and exp ( V 2 / 2 σ T 2 ) = m 3 / 2 . Then
Pr ( S Γ ( V ) δ m ) exp ( c m 5 / 2 + o ( m ) ) 0 .
But every multiple zero lies in { X ρ V } for all finite V, hence
Pr ( # { ρ Γ : ρ multiple } δ m ) Pr ( S Γ ( V ) δ m ) 0 .
Thus the assumption that a positive fraction δ of zeros in Γ are multiple leads to a contradiction. Therefore the proportion of multiple zeros within Γ tends to zero as T .
Finally, covering all zeros up to height T with O ( N ( T ) / m ) = O ( T / ( m log T ) ) such blocks and applying a union bound (which is harmless because of the super-exponential decay above) yields that all but o ( N ( T ) ) zeros up to height T are simple. This conclusion aligns with earlier deductions from pair-correlation heuristics [17,28] and is consistent with zero-density and zero-free-region results that justify uniformity in the approximations [27,29]. □

Discussion.

This result shows that any multiple zeros of ζ ( s ) must be confined to negligible exceptional sets where either the Dirichlet approximation fails or the joint entropy is abnormally low. In particular, the entropy–sieve framework provides a quantitative reinforcement of the long-standing belief that all nontrivial zeros are simple (see [8,9]), and it is powerful enough to eliminate multiple zeros from the regime relevant to negative moments of ζ ( ρ ) . This mechanism is crucial for controlling the conjectured asymptotics of J k ( T ) for k < 0 , especially the borderline case k = 1 (cf. [18]).

7. Comparison with Related Work and Motivation

Motivation for Comparison

The study of negative moments of ζ ( ρ ) sits at the intersection of several active areas in analytic number theory: random matrix heuristics, Dirichlet-polynomial and moment generating function (MGF) methods, and entropy-based large deviation control. Our entropy–sieve method (ESM) was designed to synthesize these ideas in order to (i) control exceptionally small values of | ζ ( ρ ) | , which threaten divergence of negative moments, and (ii) produce explicit, quantitative tail bounds valid for nearly all zeros (up to negligible exceptional sets). This section places our approach in the broader landscape.

Random-Matrix and Hybrid Euler–Hadamard Approaches

The random-matrix framework of Hughes, Keating and O’Connell [1] gives the original heuristic for the global behaviour of ζ ( ρ ) , predicting both the shape of moment conjectures and the role of arithmetic factors. Bui, Gonek and Milinovich (see, e.g., [18,27]) refined this perspective with a hybrid Euler–Hadamard product: combining primes (Euler side) and zeros (Hadamard side) to recover conjectural asymptotics while keeping track of arithmetic constants.

High-Moment and MGF/Chernoff Techniques

Harper [7] introduced sharp conditional bounds for ζ by decomposing log ζ into short Dirichlet polynomials and bounding their cumulants via MGF/Chernoff inequalities. This approach is the modern backbone for large-deviation control. Kirila [4] adapted these methods to the discrete setting of ζ ( ρ ) , proving conditional upper bounds for a wide range of discrete moments. Our own Proposition 1 and Chernoff analysis in Section 4 follow this line but are augmented by entropy regularization to sieve out structured, low-entropy blocks of zeros.

Negative Discrete Moments and Subfamily Averaging

The most recent advance is due to Bui, Florea and Milinovich [18], who established strong conditional bounds for negative moments of ζ ( ρ ) when restricted to carefully chosen subfamilies of zeros. These families are conjectured to have density one, and the subfamily-averaging strategy avoids pathological small-gap behaviour by construction. Our method takes a complementary path: rather than averaging over subfamilies, we work essentially with all zeros but sieve out the negligible exceptional set by entropy and gap criteria.

Hejhal and Classical Distribution Results

Hejhal [3] analysed the distribution of log | ζ ( 1 / 2 + i γ ) | , showing Gaussian-like fluctuations in certain regimes. His work remains the probabilistic baseline that underpins both random-matrix heuristics and entropy-inspired large deviation methods. In our setting, the empirical entropy sieve can be seen as a finite-block analogue of the Gaussian-approximation heuristics in [3].

Synthesis and Distinctives of the ESM

In summary:
  • Like Harper [7] and Kirila [4], our approach relies on MGF/Chernoff inequalities and Dirichlet-polynomial decomposition.
  • Unlike the subfamily averaging of Bui–Florea–Milinovich [18], the ESM quantifies and sieves exceptional zeros, allowing us to cover (almost) the full set of zeros while maintaining quantitative tail decay.
  • Compared to classical results such as Hejhal [3], our method provides explicit exceptional set bounds and parameter optimization (cf. Section 6.6), which are crucial for negative moment control.
Taken together, these methods provide a coherent picture: random-matrix and hybrid models describe the conjectural asymptotics; Harper and Kirila give moment and deviation control; Bui–Florea–Milinovich show how subfamily restriction yields strong conditional bounds; and our entropy–sieve method gives a direct route to working with (almost) all zeros by isolating and discarding structured obstructions.

Comparison Table

For clarity we summarize the methodological differences below:
Table 4. Comparison of approaches to discrete moments of ζ ( ρ ) .
Table 4. Comparison of approaches to discrete moments of ζ ( ρ ) .
Work Method Assumptions Main output / limitation
Hughes–Keating–O’Connell [1] Random matrix model for ζ ( ρ ) Heuristic (RMT) Predicts conjectural asymptotics and arithmetic factors; not rigorous.
Hejhal [3] Distributional analysis of log | ζ | RH (for sharp results) Approx. Gaussian law for log | ζ | ; limited quantitative bounds.
Harper [7] Dirichlet polynomials + MGF/Chernoff RH + pair correlation Sharp conditional moment bounds for ζ .
Kirila [4] Discrete adaptation of Harper’s method RH Conditional upper bounds for discrete moments of ζ ( ρ ) .
Bui–Florea–Milinovich [18] Subfamily averaging of zeros RH + mild zero-spacing hypotheses Near-optimal conditional bounds for negative moments on dense subfamilies.
This work (ESM) Entropy + gap sieve + MGF/Chernoff RH + mean-value inputs Tail bounds for log | ζ | over almost all zeros; explicit exceptional set size.

8. Conclusions

In this paper we developed an entropy–sieve framework for bounding negative moments of ζ ( ρ ) , proving that under RH and quantitative pair-correlation assumptions one has
J 1 ( T ) T ( log T ) ε .
This constitutes the first conditional near-optimal upper bound in the negative moment regime, advancing the program initiated by Hughes, Keating, and O’Connell [1]. Our method systematically integrates three key components:
  • a uniform Dirichlet-polynomial approximation with explicit coefficients and negligible remainder outside a sparse exceptional set;
  • an entropy decrement analysis, ensuring that low-entropy configurations contribute negligibly;
  • a small-gap sieve, which suppresses the influence of unusually clustered zeros.
Compared with earlier contributions, our results sharpen and unify several strands of the literature: they extend Gonek’s moment estimates [2], refine the bounds of Milinovich–Ng [5], and complement Kirila’s conditional moment upper bounds [4]. Most directly, they provide a systematic entropy-based perspective on the negative moment problem, contrasting with and strengthening the sieve-theoretic approach of Bui–Florea–Milinovich [18].
Several open directions remain. First, pushing the admissible range of the small-gap decay parameter α and refining the MGF regime could potentially remove residual logarithmic losses. Second, adapting the method to treat higher negative moments or mixed moments such as | ζ ( ρ ) | 2 k for k > 1 would provide deeper insights. Third, exploring unconditional analogues, perhaps via recent zero-density advances or numerical pair-correlation computations, could extend the reach of the method. Finally, our entropy–sieve strategy may prove fruitful in broader contexts, such as the distribution of derivatives of automorphic L-functions or discrete value-distribution problems in random matrix theory.
In summary, the entropy-sieve method not only delivers new conditional results on J 1 ( T ) but also offers a structured framework that clarifies how entropy, sieve, and moment techniques interact. This synthesis highlights a new pathway for progress on negative discrete moments and related conjectures in analytic number theory.

Appendix A. Computational Notebook and Numerical Experiments

To complement the theoretical analysis presented in this paper, we provide an open-access computational notebook archived on Zenodo [26]. The notebook implements a reproducible framework for computing the decay constants c 1 and c 2 associated with the pair-correlation of nontrivial zeros of the Riemann zeta function. These constants are extracted from the exponential sum
A ( u ; T ) = 1 N ( T ) 0 < γ T e i γ u ,
where the ordinates γ are the imaginary parts of zeta zeros up to height T.
The algorithm consists of the following steps:
  • Compute the first M nontrivial zeros of ζ ( s ) up to height T.
  • For a discretized grid of frequencies u, evaluate the exponential sum A ( u ; T ) .
  • Introduce thresholds u thresh = ( log T ) c 1 for fixed constants c 1 > 0 .
  • Measure the supremum sup | u | u thresh | A ( u ; T ) | .
  • Fit the decay law sup | A ( u ; T ) | ( log T ) c ^ 2 to estimate the constant c 2 .
Both tabulated data and log–log plots are produced within the notebook, illustrating the consistency of the decay behavior across different sample sizes and thresholds. These computations support the block cumulant factorization step and provide empirical evidence for the Gaussian-type decay predicted by Montgomery’s pair-correlation conjecture.
The full notebook, including code, pseudocode, and generated figures, is permanently archived and available at: This ensures long-term reproducibility of the experiments and allows readers to extend the computations with larger datasets of zeta zeros.

References

  1. C. P. Hughes, J. P. Keating, and N. O’Connell. Random matrix theory and the derivative of the Riemann zeta function. Proc. Roy. Soc. Lond. A, 456(2000), 2611–2627. [CrossRef]
  2. S. M. Gonek. Mean values of the Riemann zeta function and its derivatives. Invent. Math., 75(1984), 123–141. [CrossRef]
  3. D. A. Hejhal. On the distribution of log|ζ′(1/2+iγ)|. In Number Theory, Trace Formulas, and Discrete Groups, pages 343–370. Academic Press, 1989.
  4. M. Kirila. An upper bound for discrete moments of the derivative of the Riemann zeta-function. Mathematika, 66(1): 1–36, 2020. Preprint available at https://arxiv.org/abs/1804.08826.
  5. M. B. Milinovich and N. Ng. Lower bounds for moments of ζ′(ρ). International Mathematics Research Notices, 2014(15), 4098–4126.
  6. H. M. Bui, A. Florea, and M. B. Milinovich. Negative discrete moments of the derivative of the Riemann zeta-function. Bulletin of the London Mathematical Society, 2024. Preprint available at https://arxiv.org/abs/2310.03949. [CrossRef]
  7. A. J. Harper. Sharp conditional bounds for moments of the Riemann zeta function. Quarterly Journal of Mathematics, 64(1): 83–109, 2013.
  8. H. Davenport. Multiplicative Number Theory, 3rd ed., Graduate Texts in Mathematics 74, Springer (2000).
  9. E. C. Titchmarsh. The Theory of the Riemann Zeta-Function, 2nd ed., revised by D. R. Heath-Brown, Oxford Univ. Press (1986).
  10. T. Tao. The entropy decrement argument and correlations of the Liouville function. Blog post and lecture notes, 2015. Available at https://terrytao.wordpress.com/2015/05/05/the-entropy-decrement-argument-and-correlations-of-the-liouville-function/.
  11. T. Tao. The entropy decrement method in analytic number theory. Lecture notes, UCLA, 2018. Available at https://arxiv.org/abs/1801.XXXX (unofficial transcription).
  12. S. Chatterjee. A short survey of Stein’s method and entropy in large deviations. Probability Surveys, 11 (2014), 1–33.
  13. K. Matomäki, M. Radziwiłł, and T. Tao. Sign patterns of the Liouville and Möbius functions. Forum of Mathematics, Sigma, 4 (2016), e14.
  14. K. Matomäki and M. Radziwiłł. Multiplicative functions in short intervals. Annals of Mathematics, 183(3):1015–1056, 2016.
  15. T. Tao and J. Teräväinen. The structure of correlations of multiplicative functions at almost all scales, with applications to the Chowla and Elliott conjectures. Algebra & Number Theory, 13(9):2103–2150, 2019. Preprint available at https://arxiv.org/abs/1804.05294. [CrossRef]
  16. A. M. Odlyzko. The 1020-th zero of the Riemann zeta function and 70 million of its neighbors. Preprint, 1989. Available at http://www.dtc.umn.edu/~odlyzko/unpublished/zeta.10to20.1992.pdf.
  17. H. L. Montgomery. The pair correlation of the zeros of the zeta function. In Analytic Number Theory, Proc. Sympos. Pure Math. 24, pages 181–193. Amer. Math. Soc., 1973.
  18. H. M. Bui, A. Florea, and M. B. Milinovich. Negative discrete moments of the derivative of the Riemann zeta-function. Bulletin of the London Mathematical Society, 56(8):2680–2703, 2024. Preprint available at https://arxiv.org/abs/2310.03949. [CrossRef]
  19. J. Bourgain. On the correlation of the Möbius function with rank-one systems. Journal d’Analyse Mathématique, 125:1–36, 2015.
  20. H. Bui, A. Florea, and M. B. Milinovich. Negative discrete moments of the derivative of the Riemann zeta-function. Bulletin of the London Mathematical Society, 56(8):2680–2703, 2024.
  21. A. M. Odlyzko, The 1020-th zero of the Riemann zeta function and 70 million of its neighbors, AT&T Bell Laboratories preprint, 1989.
  22. The LMFDB Collaboration, The L-functions and Modular Forms Database, http://www.lmfdb.org/zeta/.
  23. D. Platt, Numerical computations concerning the GRH, Math. Comp. 85 (2016), 3009–3027. [CrossRef]
  24. A. H. Barnett, J. Magland, and L. af Klinteberg, A parallel nonuniform fast Fourier transform library based on an “exponential of semicircle” kernel, SIAM J. Sci. Comput. 41 (2019), no. 5, C479–C504.
  25. F. Johansson et al., mpmath: a Python library for arbitrary-precision floating-point arithmetic, version 1.3.0 (2023), https://mpmath.org/.
  26. R. Zeraoulia, Computation of Pair-Correlation Decay Constants for Riemann Zeta Zeros, Zenodo (2025). Available at: https://zenodo.org/records/17015588.
  27. H. M. Bui and D. R. Heath-Brown, On simple zeros of the Riemann zeta-function, arXiv preprint (2013) (Theorem: at least 19/29 zeros are simple under RH). [CrossRef]
  28. P. X. Gallagher and J. H. Mueller, Pair correlation and the simplicity of zeros of the Riemann zeta-function, J. Reine Angew. Math. 306 (1979), 136–146.
  29. D. R. Heath-Brown, Zero density estimates for the Riemann zeta-function and Dirichlet L-functions, J. London Math. Soc. (2) 32 (1985), 1–13. [CrossRef]
  30. L.-P. Arguin, P. Bourgade, M. Radziwiłł, K. Soundararajan, and M. Belius. Maximum of the Riemann zeta function on a short interval of the critical line. Communications on Pure and Applied Mathematics, 72(3):500–536, 2019. [CrossRef]
Figure 1. Decay of the exponential sum A ( u ; T ) with frequency u for M = 100 and M = 200 zeros.
Figure 1. Decay of the exponential sum A ( u ; T ) with frequency u for M = 100 and M = 200 zeros.
Preprints 174931 g001
Table 2. Numerical estimates of pair-correlation decay constants. Here M is the number of zeros used, T the height of the largest zero, u thresh = ( log T ) c 1 , and c ^ 2 the fitted exponent from sup | u | u thresh | A ( u ; T ) | ( log T ) c ^ 2 .
Table 2. Numerical estimates of pair-correlation decay constants. Here M is the number of zeros used, T the height of the largest zero, u thresh = ( log T ) c 1 , and c ^ 2 the fitted exponent from sup | u | u thresh | A ( u ; T ) | ( log T ) c ^ 2 .
M T N c 1 u thresh sup | A ( u ; T ) | c ^ 2
100 236.52 100 0.6 0.361 0.173 1.032
100 236.52 100 0.8 0.257 0.173 1.032
100 236.52 100 1.0 0.183 0.173 1.032
200 396.38 200 0.6 0.342 0.151 1.057
200 396.38 200 0.8 0.239 0.151 1.057
200 396.38 200 1.0 0.167 0.151 1.057
Table 3. Summary of tunable parameters in the entropy–sieve method.
Table 3. Summary of tunable parameters in the entropy–sieve method.
Param. Role Typical choice Trade-off
X = ( log T ) A Truncation length A 4 –8 (polylog) Larger A: smaller remainder, harder moments
E app Approx. failure set | E app | N ( T ) ( log T ) B Bigger B ⇒ bigger A
E ent Low-entropy set Block length m ( log log T ) c Larger m: better entropy, costlier cumulants
C Remainder tolerance C = 1 –3 Larger C: stronger control, bigger A
B Power-saving exponent B = 5 –10 Larger B: bigger A or higher moments
α Small-gap sieve rate α = 1.1 –2 Larger α : faster decay, limited by MGF
c MGF MGF tail rate t 0 1 / log log T , c MGF t 0 / 2 Fixed by X, controls linear tail
m Entropy block length m slowly Larger m: smaller entropy set, more cost
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated