Preprint
Article

This version is not peer-reviewed.

On the Hughes–Keating–O’Connell Conjecture: Quantified Negative Moment Bounds for ζ′(ρ) via Entropy–Sieve Methods Revisited

Submitted:

04 September 2025

Posted:

08 September 2025

You are already at the latest version

Abstract
We study the negative discrete moments of the derivative of the Riemann zeta function at its nontrivial zeros, in connection with the Hughes--Keating--O’Connell conjecture. Building on the works of Gonek, Milinovich--Ng, Kirila, and the recent breakthrough of Bui--Florea--Milinovich, we introduce a new \emph{entropy--sieve method} (ESM). This framework combines short Dirichlet-polynomial approximations with entropy-based moment generating function bounds and a small-gap sieve, thereby controlling all appearances of $\zeta'(\rho)$ without assuming simplicity of zeros.Assuming the Riemann Hypothesis together with standard pair-correlation conjectures and a strengthened discrete moment hypothesis, we prove the quantified conditional bound \[ J_{-1}(T) \;=\; \sum_{0<\gamma\leq T} \frac{1}{|\zeta'(\tfrac12+i\gamma)|^{2}} \;\le\; C(\varepsilon)\, T (\log T)^{\varepsilon}, \qquad \text{for every fixed $\varepsilon>0$}, \] with an explicit dependence of the implicit constant on $\varepsilon$. This matches, up to logarithmic factors, the conjectured order $J_{-1}(T)\asymp T$ and improves on all previous conditional results.The analysis introduces several innovations: (i) a full cumulant control lemma for Dirichlet polynomials; (ii) explicit, non-circular parameter selection for approximation lengths and moments; and (iii) an entropy--sieve hybrid decay lemma that quantifies large-deviation probabilities for $\zeta'(\rho)$. Beyond the negative moment problem, the entropy--sieve framework illustrates the strength of entropy techniques in analytic number theory and points toward applications to $L$-functions and random matrix models.
Keywords: 
;  ;  ;  ;  
For the reader’s convenience, we summarize the main notation that will be used consistently throughout the paper. Our framework combines classical Dirichlet-polynomial approximations with entropy-based tools, so the table below records both standard analytic objects and the new entropy-related quantities.
Table 1. Notation for general quantities, Dirichlet-polynomial approximations, moment generating functions, and entropy framework.
Table 1. Notation for general quantities, Dirichlet-polynomial approximations, moment generating functions, and entropy framework.
General Notation
T Height parameter for critical zeros of ζ ( s ) . Zeros ρ = 1 2 + i γ with 0 < γ T .
N ( T ) Number of zeros ρ = 1 2 + i γ with 0 < γ T .
ρ Nontrivial zero of ζ ( s ) , written ρ = 1 2 + i γ .
γ Imaginary ordinate of a zero.
E app Exceptional set where Dirichlet-polynomial approximation fails (Lemma 8).
E ent Exceptional set of zeros lying in low-entropy blocks (Lemma 4).
G Set of “good” zeros (outside all exceptional sets).
Z simp Set of simple zeros { ρ : ζ ( ρ ) 0 } .
Dirichlet Polynomial Approximation
X Dirichlet polynomial length X = ( log T ) A .
A Truncation exponent.
D X ( γ ) Approximant D X ( γ ) = n X a n n 1 / 2 i γ .
a n Dirichlet polynomial coefficients.
R X ( γ ) Error term in approximation of log | ζ ( ρ ) | .
σ X 2 Variance of D X : σ X 2 = n X | a n | 2 / n (Lemma 2).
Y Auxiliary parameter Y = exp ( ( log log T ) 2 ) .
Moment Generating Function & Tail Estimates
M ( t ) Moment generating function: M ( t ) = 1 N ( T ) 0 < γ T e t D X ( γ ) .
κ r r-th cumulant of D X ( γ ) .
M r Raw r-th moment of D X ( γ ) .
t Auxiliary parameter, | t | c / σ X .
t 0 Admissible t: t 0 = c / log log log T (Proposition 1).
N ( V ; T ) Tail count: # { γ : log | ζ ( ρ ) | V } .
V Threshold parameter in tail bounds.
c MGF Constant governing MGF tail decay.
Entropy Framework (Section 3)
G ( γ 0 ) Local window of zeros near γ 0 .
H h , Δ val ( γ 0 ) Value entropy (Definition 1).
H h g , Δ gap ( γ 0 ) Gap entropy of zero spacings (Definition 2).
H 0 Entropy threshold used to classify blocks.
h , h ˜ Bin widths for histograms.
B , B ˜ Histogram bins for values and gaps.
p ( γ ) Empirical frequency of value bin B .
p ˜ ( γ ) Empirical frequency of gap bin B ˜ .
H val ( γ ) Shannon entropy of log | ζ ( ρ ) | values.
H gap ( γ ) Shannon entropy of normalized gaps.
Table 2. Notation for discrete moments, sieve parameters, and hypotheses.
Table 2. Notation for discrete moments, sieve parameters, and hypotheses.
Moments and Sieve
J k ( T ) Discrete 2 k -moment: J k ( T ) = 0 < γ T | ζ ( ρ ) | 2 k .
J k simp ( T ) Same sum restricted to simple zeros.
δ ( V ) Small-gap cutoff (Definition 3).
α Exponent in δ ( V ) = e α V .
S ( δ ) Set of zeros with normalized gaps δ / log T .
Γ γ Block of m consecutive zeros centered at γ .
m Entropy block length m = m ( T ) (see Section 3).
PCH Pair-Correlation Hypothesis (see Section 3).
DMC Discrete Moment Control hypothesis.
SGE Small-Gap Estimate hypothesis.
c , C , C 0 Positive constants from Gaussian, entropy, and sieve bounds.
Table 3. Notation for discrete moments, sieve parameters, and hypotheses.
Table 3. Notation for discrete moments, sieve parameters, and hypotheses.
Moments and Sieve
J k ( T ) Discrete moment: J k ( T ) = 0 < γ T | ζ ( ρ ) | 2 k .
J k simp ( T ) Same sum restricted to simple zeros.
δ ( V ) Small-gap cutoff: e α V .
α Exponent in small-gap threshold.
S ( δ ) Set of zeros with normalized gaps δ / log T .
Γ γ Block of m consecutive zeros centered at γ .
m Entropy block length.
PCH Pair-Correlation Hypothesis.
DMC Discrete Moment Control hypothesis.
SGE Small-Gap Estimate hypothesis.
c , C , C 0 Positive constants in Gaussian/entropy/sieve bounds.

1. Introduction

Let ζ ( s ) denote the Riemann zeta function and ρ = 1 2 + i γ its nontrivial zeros. The size of the derivative ζ ( ρ ) at these zeros plays a central role in analytic number theory, with deep links to the distribution of zeros, random matrix theory, and the behavior of moments of L-functions. For k C , we define the discrete moment
J k ( T ) : = 0 < γ T | ζ ( ρ ) | 2 k ,
where the sum runs over all nontrivial zeros ρ of ζ ( s ) , counted with multiplicity. For k < 0 this sum is finite only if every zero is simple, since a multiple zero would satisfy ζ ( ρ ) = 0 and force J k ( T ) = + . Thus, understanding upper bounds for J k ( T ) in the negative range not only addresses deep conjectures but also has direct implications for the simplicity of zeros.

1.1. Motivation and Conjectures

The asymptotic behavior of J k ( T ) has been studied extensively. Based on random matrix heuristics, Hughes, Keating, and O’Connell [1][Conj. 1.7, p. 5] conjectured that for ( k ) > 3 2 ,
J k ( T ) G 2 ( 2 + k ) G ( 3 + 2 k ) a ( k ) T 2 π log T 2 π ( k + 1 ) 2 ,
where G ( · ) is the Barnes G-function and a ( k ) is an explicit arithmetic factor. In particular, for k = 1 , conjecture (1) predicts
J 1 ( T ) T ,
so the negative second moment should be of the same order as the number of zeros up to height T.

1.2. State of the Art

For positive moments ( k 0 ), major progress has been achieved:
  • Gonek [2][p. 35] initiated the study of discrete moments of ζ ( ρ ) , deriving asymptotic formulas for J 1 ( T ) under the Riemann Hypothesis (RH).
  • Hejhal [3][Sec. 3] analyzed the distribution of log | ζ ( ρ ) | and showed that it is approximately Gaussian with variance log log T , providing a probabilistic model for small and large values of ζ ( ρ ) .
  • Kirila [4][Thm. 1.1] obtained sharp upper bounds for positive moments by adapting Harper’s probabilistic Dirichlet-polynomial method:
    J k ( T ) k N ( T ) ( log T ) k ( k + 2 ) ,
    where N ( T ) is the number of zeros up to height T.
  • Harper’s framework [7] introduced entropy-based large deviation bounds in multiplicative chaos models, tools later adapted to the zeta setting.
These results align with the Hughes–Keating–O’Connell conjecture for k > 0 .
For negative moments ( k < 0 ), progress is much more limited:
  • Gonek [2][p. 36] obtained conditional lower bounds for J k ( T ) but no general upper bounds.
  • Milinovich and Ng [5] refined such lower bounds by relating ζ ( ρ ) to zero spacings.
  • Most recently, Bui, Florea, and Milinovich [6] derived conditional upper bounds for negative moments over a large subfamily of zeros, excluding a sparse exceptional set where ζ ( ρ ) may be abnormally small. A complete bound for all zeros, however, remained out of reach.

1.3. Challenges for Negative Moments

The central difficulty in bounding J k ( T ) with k < 0 lies in controlling rare zeros where ζ ( ρ ) is exceptionally small. Since
J 1 ( T ) = 0 < γ T 1 | ζ ( ρ ) | 2 ,
the main contribution arises from these rare events. Hejhal’s Gaussian model [3] predicts that such events are exponentially rare, but turning this heuristic into rigorous uniform estimates requires two delicate ingredients:
  • Precise Gaussian-type tail bounds for log | ζ ( ρ ) | , via short Dirichlet-polynomial approximations and entropy-based large-deviation methods [7].
  • A mechanism to exclude exceptional sets of zeros where the approximation fails, implemented through sieve-theoretic methods as in [6].

1.4. Our Approach and Contributions

In this paper we introduce a hybrid analytic–probabilistic framework, the entropy–sieve method (ESM), which combines these two ideas systematically. Our contributions are as follows:
  • Entropy control: We develop an entropy-based refinement of the Dirichlet-polynomial approximation, ensuring that low-entropy regions form a negligible exceptional set. This connects analytic number theory with entropy techniques used in probability and exponential sum analysis [7,19].
  • Sieve for exceptional zeros: Following the philosophy of Bui–Florea–Milinovich [6], we apply a small-gap sieve to remove the remaining exceptional zeros. Our systematic parameter optimization clarifies how A , B , C , α can be tuned to make all exceptional sets negligible.
  • Quantified negative moment bound: Under RH, pair-correlation hypotheses, and a strengthened discrete moment conjecture, we prove
    J 1 ( T ) C ( ε ) T ( log T ) ε , for all fixed ε > 0 .
    The ε here is fully quantified, with explicit dependence of the implicit constant on parameter choices. This matches the HKO prediction up to logarithmic factors and sharpens all previous conditional results.

1.5. Organization

The remainder of the paper is structured as follows. Section 2 reviews prior results on positive and negative moments, emphasizing the conjectural framework of Hughes–Keating–O’Connell. Section 3 introduces the entropy–sieve method, combining Dirichlet-polynomial approximations with entropy regularity to yield robust Gaussian tail bounds. Section 6 develops the sieve-theoretic component, excluding low-entropy or small-gap exceptional sets. Finally, Section 7.9 combines these tools to prove the conditional upper bounds for J k ( T ) in the range k < 0 , with the quantified case k = 1 as the centerpiece.

Main Results

  • Entropy–Sieve Framework. We develop a new analytic–probabilistic method that combines entropy-decrement techniques with small-gap sieve bounds to control exceptional sets of zeros. This framework provides a unified approach to bounding negative moments of ζ ( ρ ) and clarifies the role of local entropy in the distribution of Dirichlet polynomial approximations.
  • Quantified Conditional Bound for Negative Moments. Assuming the Riemann Hypothesis together with standard pair-correlation conjectures and a strengthened discrete moment hypothesis, we establish the bound
    J 1 ( T ) = 0 < γ T 1 | ζ ( 1 2 + i γ ) | 2 C ( ε ) T ( log T ) ε ,
    valid for every fixed ε > 0 , with an explicit dependence of the implicit constant on ε . This matches, up to logarithmic factors, the conjectured order J 1 ( T ) T predicted by Hughes–Keating–O’Connell, and improves on all previous conditional results by making the ε –dependence transparent.
  • Entropy–Sieve Hybrid Decay (Lemma 8). We prove a uniform Gaussian tail bound for the frequency of zeros with exceptionally small derivative, valid up to deviations V log log T . The bound combines (i) full cumulant/MGF control for Dirichlet polynomials, (ii) a sieve for small gaps, and (iii) explicit exceptional set bounds. This lemma underpins the negative moment estimates.
  • Simplicity of Zeros (Proposition 3). We avoid circularity by working with truncated reciprocals 1 / max { | ζ ( ρ ) | , e M } throughout. Under a strengthened pair-correlation hypothesis (PCH*), we deduce that the number of multiple zeros up to height T is N ( T ) ( log T ) c for some c > 0 . In particular, almost all zeros are simple under (PCH*).
  • Joint MGF Bounds (Proposition 4). The mixed moment generating function of Dirichlet polynomial approximants admits a uniform Gaussian bound with covariance matrix Σ X , with cubic error terms of order ( log log T ) 3 / 2 .
  • Parameter Bookkeeping. A compact parameter table records the definitions and admissible ranges of X , A , k , B , C , α , δ ( V ) , t , V , clarifying the logical order of choices and eliminating ambiguity in the proofs.
  • Numerical and Structural Evidence. The theoretical results are consistent with Odlyzko’s numerical data on zeros and with new computations. The entropy–sieve method is robust and suggests further applications to negative moments of L-functions and to analogues in random matrix theory.

2. Background

The discrete moments of the derivative of the Riemann zeta function at its nontrivial zeros,
J k ( T ) = 0 < γ T | ζ ( ρ ) | 2 k ,
are central objects in analytic number theory. They provide insight into the distribution of ζ ( ρ ) , the spacing of the nontrivial zeros of ζ ( s ) , and the connections between the zeta function and random matrix theory. Understanding the asymptotic growth of J k ( T ) has been the subject of extensive research over the past decades and is closely connected with one of the most refined conjectures in this area: the Hughes–Keating–O’Connell conjecture.

2.1. The Hughes–Keating–O’Connell Conjecture

Motivated by random matrix theory and probabilistic models, Hughes, Keating, and O’Connell proposed an explicit formula for J k ( T ) in the regime ( k ) > 3 2 . Their conjecture predicts that
J k ( T ) G 2 ( 2 + k ) G ( 3 + 2 k ) a ( k ) T 2 π log T 2 π ( k + 1 ) 2 ,
where G ( · ) denotes the Barnes G-function and a ( k ) is an explicit arithmetic factor arising from the Euler product.
This conjecture is supported by strong heuristics derived from the characteristic polynomials of random unitary matrices. In these models, log | ζ ( ρ ) | behaves approximately like a Gaussian random variable, and formula (2) reflects the matching asymptotics between the number-theoretic and random-matrix frameworks. A striking consequence appears when setting k = 1 , where the conjecture predicts
J 1 ( T ) T .
Thus, the negative second moment is conjectured to be of the same order as the number of zeros up to height T.

2.2. Positive Moments

The case of positive moments, k 0 , is relatively well understood and has seen substantial progress over the last four decades. Gonek [2][Thm. 1, p. 35] pioneered the study of discrete moments of ζ ( ρ ) , proving under the Riemann Hypothesis that for k = 1 ,
J 1 ( T ) T 24 π log T 2 π 4 .
This result agrees with the prediction of (2) when k = 1 and represented one of the earliest confirmations of the conjecture in a special case.
Hejhal [3][Sec. 3, Thm. 3.1, pp. 343–370] advanced the probabilistic understanding of ζ ( ρ ) by studying the distribution of log | ζ ( ρ ) | . He showed that, heuristically, log | ζ ( ρ ) | behaves approximately like a Gaussian random variable with variance σ 2 log log T . This probabilistic model suggested that extremely large or small values of ζ ( ρ ) are exponentially rare and laid the conceptual foundation for later entropy-based methods.
A major breakthrough came from Harper [7][Thm. 2.1, pp. 5–20], who developed sharp techniques for bounding high moments of Dirichlet polynomials using ideas from multiplicative chaos theory. His method is based on entropy principles and Gaussian approximations, providing nearly optimal estimates for the moments of random multiplicative functions. Building on Harper’s framework, Kirila [4][Thm. 1.1, pp. 2–4] adapted these ideas to the discrete setting of the zeta zeros and obtained sharp conditional upper bounds for positive moments:
J k ( T ) k N ( T ) ( log T ) k ( k + 2 ) ( k > 0 ) ,
where N ( T ) denotes the number of zeros up to height T. These results are fully consistent with the random matrix predictions of the Hughes–Keating–O’Connell conjecture, providing strong evidence in favor of (2) for k > 0 .

2.3. Negative Moments

In stark contrast to the positive regime, the behavior of J k ( T ) for negative k remains largely mysterious. The primary challenge stems from the fact that negative moments are dominated by the contribution of zeros ρ where | ζ ( ρ ) | is extremely small. Controlling this contribution requires strong bounds on the lower tail of log | ζ ( ρ ) | , a problem that has resisted classical techniques.
Early work by Gonek [2][Thm. 2, p. 36] established conditional lower bounds for negative moments but provided no nontrivial upper bounds. Later, Milinovich and Ng [5][Prop. 4.1, pp. 642–644] refined these lower bounds by relating ζ ( ρ ) to the spacing between consecutive zeros, but even these methods do not yield control over the full sum.
A significant development came from Bui, Florea, and Milinovich [6][Thm. 1.3, pp. 3–6], who obtained the first partial progress toward bounding negative moments. By excluding a sparse exceptional set of zeros where ζ ( ρ ) is abnormally small, they proved conditional upper bounds for J k ( T ) over a large subfamily of zeros. However, their results stop short of proving the full conjectured bound for J 1 ( T ) or other negative moments over all zeros.
These contributions underline the difficulty of the negative moment problem: without precise control over extremely small values of ζ ( ρ ) , unconditional upper bounds remain out of reach. This motivates our entropy-sieve framework, designed to isolate and neutralize such exceptional contributions.

Hypotheses Used in This Paper

Our main results are conditional on several standard conjectural inputs. For clarity we record them here.
  • (RH) Riemann Hypothesis. All nontrivial zeros of the Riemann zeta function lie on the critical line s = 1 2 .
  • (PCH) Pair–Correlation Hypothesis. For any fixed real u, one has
    1 N ( T ) 0 < γ T e i γ u = o ( 1 ) , ( T ) ,
    uniformly for | u | ( log T ) A for some fixed A > 0 . Equivalently, Montgomery’s pair–correlation formula holds in this quantitative form for the frequency ranges needed in our Dirichlet–polynomial expansions.
  • (DMC) Discrete Moment Control. For any fixed k N and for Dirichlet polynomials
    D γ = n X a n n i γ , | a n | 1 ,
    we have
    1 N ( T ) 0 < γ T | D γ | 2 k k ( log X ) O ( k ) .
    In particular, the moment generating function of short Dirichlet polynomials is well approximated by a Gaussian with variance log log X , uniformly for | t | 1 / log log T .
  • (SGE) Small-Gap Estimate. The number of pairs of consecutive zeros of ζ ( s ) with gap at most δ / log T is N ( T ) δ 2 , uniformly for δ T ε and any fixed ε > 0 . This matches Montgomery’s pair–correlation predictions and is used in Section 7 to control large deviations of ζ ( ρ ) .
All later results should be read as conditional on (RH), (PCH), (DMC), and (SGE).

2.4. Summary

To summarize, positive moments of ζ ( ρ ) are now well understood, thanks to the interplay between Harper’s entropy-based techniques, Kirila’s discrete adaptations, and random matrix predictions. For negative moments, however, the lack of control over zeros with exceptionally small ζ ( ρ ) remains the key obstacle. Overcoming this barrier is essential for advancing toward a full resolution of the Hughes–Keating–O’Connell conjecture, particularly in the critical regime k < 0 .

3. Entropy-Based Approximation and Gaussian Large-Deviation Bounds

Assumption Framework

Throughout this section we assume the Riemann Hypothesis (RH). For technical steps where denominators involving ζ ( ρ ) arise, we restrict initially to the set of simple zeros
Z simp : = { ρ = 1 2 + i γ : ζ ( ρ ) 0 } ,
and define discrete averages over Z simp in place of all zeros. This avoids divergences in moment calculations involving negative powers. No generality is lost, since Z simp has the same density as the full zero set under standard pair-correlation heuristics (cf. [17,28,29]).
In Section 3, we show that our joint MGF and block entropy bounds imply that the presence of multiple zeros in a positive-density set of ordinates is incompatible with the Gaussian limit law. In particular, Theorem 1 below establishes that, under RH and the verified block large-deviation estimates, all but o ( N ( T ) ) zeros up to height T must in fact be simple. Thus the initial restriction to Z simp is later justified a posteriori.

3.1. Notation and Choice of Parameters

Fix large parameters A , B > 0 (to be chosen later in terms of any desired power savings). For T large define
X : = ( log T ) A , Y : = exp ( log log T ) 2 .
Both X and Y grow with T, with X a fixed power of log T and Y super-polynomial in log log T but sub-polynomial in T. We shall construct a short Dirichlet polynomial of length X to approximate log | ζ ( 1 2 + i γ ) | for most zeros γ T .
For a generic Dirichlet polynomial
D X ( γ ) : = n X a n n 1 / 2 + i γ ,
we define its variance
σ X 2 : = n X | a n | 2 n .
In our application the coefficients a n will be explicit (coming from a truncated Euler product or approximate functional equation for ζ ( s ) ), and we will have
σ X 2 log log T ,
uniformly for our range of parameters.

3.2. Dirichlet-Polynomial Approximation for log | ζ ( ρ ) |

Choice of the Truncation Length X

Throughout this section we fix
X = ( log T ) A ,
with A > 0 chosen large depending on the error exponents in subsequent lemmas. This polylogarithmic choice ensures that the Dirichlet polynomial approximation (Lemma 1) has a negligible error term, that the moment generating function bounds (Proposition 1) remain uniform for | t | t 0 1 / log log T , and that block cumulant factorization (Lemma 5) can be applied without enlarging off-diagonal terms. We emphasize that X = T θ with small fixed θ > 0 may also be treated with refinements of our arguments, but to avoid technical complications we restrict to the polylogarithmic case.

Hypotheses, Coefficients, and Quantitative Bounds

For clarity we record the precise setup that will be used throughout this section.
  • Hypothesis. We assume the Riemann Hypothesis (RH). All multiple zeros are placed into the exceptional set E app .
  • Truncation length. We fix
    X = ( log T ) A , A > 0 ,
    with A chosen large depending on the desired decay of the remainder (see Lemma 1).
  • Coefficients. Let w C c ( 0 , 2 ) be a fixed smooth cutoff with w ( u ) = 1 for 0 u 1 . Define
    a n : = Λ ( n ) log n w log n log X ,
    so a n is supported on prime powers n X 2 and is explicit and computable.
  • Dirichlet polynomial. For each zero ρ = 1 2 + i γ we define
    D X ( γ ) : = n 2 a n n 1 / 2 i γ .
  • Remainder and exceptional set. We set
    R X ( γ ) : = log | ζ ( 1 2 + i γ ) | D X ( γ ) ,
    and define an exceptional set
    E app : = 0 < γ T : | R X ( γ ) | > ( log log T ) C ,
    where C > 0 is arbitrary.
  • Quantitative bounds. For every C , B > 0 there exists A = A ( B , C ) such that
    | R X ( γ ) | C ( log log T ) C ( γ E app ) ,
    and
    | E app | B N ( T ) ( log T ) B .
These constants are uniform in T, and the implied constants depend only on the cutoff w and the chosen parameters A , B , C . This hypothesis package is exactly what Lemma 1 will establish.
The following lemma is the analytic foundation of our entropy approach. It refines the Euler-product truncation ideas used by Hejhal [3][Sec. 3] and the discrete moment approximations developed by Kirila [4][Thm. 1.1].

3.3. Choice of Dirichlet Polynomial Length and Variance Normalization

In earlier drafts of this work (and in some related literature), the Dirichlet polynomial approximating log | ζ ( 1 2 + i γ ) | was taken of length X = T θ , which yields a variance σ 2 log log T . In the present paper we adopt a different choice, namely
X = ( log T ) A ,
with A fixed and large. This modification has several consequences.
  • Variance scale. For coefficients a n with | a n | 1 , the variance of the associated Dirichlet polynomial is
    σ 2 = n X | a n | 2 n log log X = log log ( log T ) A = log log log T + O ( 1 ) .
    Thus throughout the paper, whenever we refer to the variance parameter σ 2 , it should be understood that
    σ 2 log log log T ,
    not log log T .
  • Range of admissible t. Since the cumulant method requires | t | 1 / σ , we now work with
    | t | c log log log T .
    All later appearances of the “admissible t–range” should be interpreted accordingly. In particular, the entry for t 0 in Table  should read t 0 = c / log log log T .
  • Range of V. In tail estimates (e.g. Lemma 7.2), the permissible range
    1 V c 1 σ
    should be read with σ log log log T . Thus the Gaussian-type decay controls tails up to scale log log log T .
This normalization explains why some statements (drafted with X = T θ ) refer to log log T rather than log log log T . From this point onward we uniformly adopt the ( log T ) A -length model, so that all variance and admissible-t bounds are understood in the log log log T scale.

4. Derivation of the Coefficients a n from a Smoothed Explicit Formula

In this section we derive explicitly the prime-power coefficients a n appearing in the short Dirichlet polynomial approximants
D X ( γ ) = n X 2 a n n 1 / 2 i γ ,
and we record the decomposition of the remainder arising from the contour shift. Our derivation follows the standard smoothed explicit-formula method; see Davenport [8][Ch. 12 and Ch. 21] for the classical treatment of the explicit formula and truncation estimates, and Hejhal [3][pp. 343–370] for the adaptation to log | ζ | .

1. Smoothed Representation of log ζ ( s ) and Differentiation

For s > 1 we have the Dirichlet series expansion
log ζ ( s ) = n 1 Λ ( n ) log n n s + A ( s ) ,
where A ( s ) denotes the small analytic correction arising from the pole at s = 1 . Insert the smooth cutoff
W X ( n ) : = w log n log X ,
with w compactly supported, w 1 on [ 0 , 1 ] , so that W X ( n ) = 1 for n X and W X ( n ) = 0 for n X 2 . Define the truncated series
P X ( s ) : = n 1 Λ ( n ) log n W X ( n ) n s .
Differentiating formally gives
d d s P X ( s ) = n 1 a ˜ n n s , a ˜ n : = d d s Λ ( n ) log n W X ( n ) .
Thus the coefficients are supported on prime powers n X 2 .

2. Contour Integral and Explicit Formula

To access log ζ ( s ) , one introduces a smooth Mellin kernel V with compact support and considers the integral
I ( ρ ) : = 1 2 π i ( c ) V ^ ( s ) ζ ζ ( s + ρ ) d s ,
where ρ = 1 / 2 + i γ is a zero and c > 1 . Unfolding the integral yields
I ( ρ ) = n 1 Λ ( n ) n ρ V n X + T 1 ( ρ ) ,
with a small tail term T 1 . Shifting the contour across s = 0 and collecting residues gives the explicit identity (valid for simple zeros, see Hejhal [3][pp. 343–370]):
log | ζ ( ρ ) | = n X 2 a n n ρ + R X tail ( ρ ) + R X bd ( ρ ) + R X zeros ( ρ ) .

3. Coefficients and Remainder Terms

The coefficients are explicitly
a n = d d s Λ ( n ) log n W X ( n ) | s = 1 / 2 = Λ ( n ) log n n 1 / log X W X ( n ) + E n ,
where E n are explicit boundary correction weights. The remainder terms in (8) are:
  • R X tail ( ρ ) : the contribution from n > X 2 ; for every m 1 ,
    | R X tail ( ρ ) | m X m ,
    see [8][Ch. 21].
  • R X bd ( ρ ) : boundary integrals from the contour shift; these satisfy for each k 1 ,
    1 N ( T ) 0 < γ T | R X bd ( 1 2 + i γ ) | 2 k k X δ ( k ) ( log log T ) C ( k ) .
  • R X zeros ( ρ ) : residues from other zeros, with convergent representation
    R X zeros ( ρ ) = ρ ρ K X ( γ γ ) ,
    where K X is a decaying kernel depending on W X . Hejhal [3] analyzes this sum in detail, showing it is negligible in mean square, while Davenport [8][Ch. 21] gives the classical bounds.

4. Quantitative Consequences

Taking X = ( log T ) A with A large, the bounds (10)–(12) imply that the remainders are uniformly small on all but a negligible exceptional set of zeros. Thus the coefficients (9) provide the correct explicit approximation for log | ζ ( ρ ) | , as used in Lemma 1, Lemma 2, and the entropy/ Chernoff analysis.

Bibliographic Note

The derivation above is the standard explicit-formula method with smoothing: the integral representation, contour shift, and kernel construction are detailed in Hejhal [3][pp. 343–370], while Davenport [8][Ch. 12 and Ch. 21] contains the classical explicit formula, truncation estimates, and bounds for tails and boundary terms.
Lemma 1 
(Short Dirichlet-polynomial approximation). We carry out the argument without assuming simplicity: where the original argument would use 1 / ζ ( ρ ) we instead use the truncated factor 1 / max { | ζ ( ρ ) | , e M } . All estimates below are uniform in M > 0 ; at the end of the section we remove the truncation by letting M (dominated convergence justifies the limit).
Assume the Riemann Hypothesis. Let T be large and put
X = ( log T ) A , A > 0 .
There exist explicit coefficients a n supported on prime-powers n X 2 and an exceptional set E app { γ : 0 < γ T } such that for every simple zero ρ = 1 2 + i γ with γ E app ,
log ζ ( 1 2 + i γ ) = D X ( γ ) + R X ( γ ) , D X ( γ ) = n X 2 a n n 1 / 2 i γ ,
and, uniformly for such γ,
| R X ( γ ) | C ( log log T ) C
for every fixed C > 0 , provided A = A ( C ) is chosen sufficiently large. Moreover, for any fixed B > 0 one may choose A = A ( B ) so that the exceptional set satisfies
E app B N ( T ) ( log T ) B .
The coefficients a n are explicit prime-power weights coming from a smooth truncation of the explicit formula (see Hejhal [3]).
Proof. 
All implicit constants in this proof are absolute unless indicated otherwise. We assume RH throughout and restrict attention to simple zeros; zeros of multiplicity > 1 are placed into E app .
Smooth truncation and the explicit-formula identity.
Let w C c ( 0 , 2 ) be a fixed smooth cutoff with w 1 on [ 0 , 1 ] and 0 w 1 . For X 2 define
W X ( n ) : = w log n log X ,
so W X ( n ) = 1 for n X and W X ( n ) = 0 for n X 2 .
For s > 1 recall the Dirichlet series
log ζ ( s ) = n 1 Λ ( n ) log n n s + A ( s ) ,
where A ( s ) is analytic in a neighborhood of the half-line and arises from the pole at s = 1 and other rapidly convergent tails (see Davenport [8]). Differentiate (17) termwise in the region of absolute convergence and insert the smooth cutoff W X ( n ) to obtain the short Dirichlet polynomial
D ˜ X ( s ) : = n 1 a ˜ n n s , a ˜ n : = d d s Λ ( n ) log n W X ( n ) .
The coefficients a ˜ n are supported on prime-powers n X 2 and are explicit combinations of Λ ( n ) / log n and derivatives of w.
Apply a standard smoothed explicit-formula contour shift for the approximate logarithmic derivative near s = 1 2 + i γ (see Hejhal [3] for a complete derivation in the context of log | ζ | ). Concretely, choose a compactly supported test function V whose Mellin transform picks out the smoothing W X ; shift the contour from s > 1 to the left of the critical line, collect the residue at the simple zero s = ρ , and evaluate the resulting integrals and residue contributions. The outcome (after taking real parts) is an exact identity of the form
log ζ ( 1 2 + i γ ) = n X 2 a n n 1 / 2 i γ + R X ( γ ) ,
where the a n are explicit prime-power weights obtained from a ˜ n plus explicit boundary-correction terms coming from the smoothing; and R X ( γ ) is the remainder which equals exactly the sum of the contour tails, boundary integrals, and contributions from other zeros. The derivation of (19) and the explicit form of the a n follow the presentation in Hejhal [3] (compare the formulas there for log | ζ | obtained from smoothed test functions). Thus (13) holds with these explicit a n .
Decomposition of the remainder.
Write
R X ( γ ) = R X tail ( γ ) + R X bd ( γ ) + R X zeros ( γ )
where the three terms are defined as follows (these definitions are the precise outputs of the contour-shift computation):
- The tail term is
R X tail ( γ ) : = n > X 2 a ˜ n n 1 / 2 i γ ,
coming from truncation of the Dirichlet series by the compact support of W X . By the compact support of w and the exponential decay of n s in the shifted contour, R X tail ( γ ) is given by an absolutely convergent sum/integral and is small for large X.
- The boundary term is the integral over the shifted vertical contours and can be written as
R X bd ( γ ) = 1 2 π i C bd G X ( s ) ζ ζ ( s ) d s ,
where C bd is a finite union of compact vertical segments on which s is bounded away from the critical line by a small amount (determined by the smoothing), and G X ( s ) is an explicit analytic kernel depending on W X . By standard estimates (the integrand is absolutely integrable on C bd ) this term is small and admits good mean-value bounds.
- The zeros term arises from residues at zeros ρ ρ encountered when shifting contours. It can be expressed as a convergent sum
R X zeros ( γ ) = ρ ρ K X ( γ γ ) ,
where K X is an explicit kernel (depending on W X ) that decays with | γ γ | . The sum in (23) converges absolutely for the chosen smoothing; see Hejhal [3] for the construction of such kernels.
Equations (21)–(23) give the precise decomposition (20) used below.
Averaged high-moment estimate for the remainder.
Fix an integer k 1 . Define the averaged 2 k -moment
M 2 k : = 1 N ( T ) 0 < γ T | R X ( γ ) | 2 k .
We will bound M 2 k by expanding | R X ( γ ) | 2 k via the multinomial theorem and controlling each arising mixed moment using three inputs from the literature (cited below).
First expand
| R X ( γ ) | 2 k = α + β + δ = 2 k α , β , δ 0 2 k α , β , δ R X tail ( γ ) α R X bd ( γ ) β R X zeros ( γ ) δ ,
and average over zeros to obtain
M 2 k = α + β + δ = 2 k 2 k α , β , δ 1 N ( T ) 0 < γ T R X tail ( γ ) α R X bd ( γ ) β R X zeros ( γ ) δ .
We bound each summand in (25) term-by-term using Hölder’s inequality and three principal results:
(A)
Discrete-moment bounds for ζ ( ρ ) . Kirila [4] proves that for each fixed k 1 ,
1 N ( T ) 0 < γ T ζ ( 1 2 + i γ ) 2 k k ( log T ) k 2 + o ( 1 ) .
Kirila also establishes discrete mixed-moment variants that control averages of products of ζ ( ρ ) with short Dirichlet polynomials of length X = ( log T ) A ; see [4] for the precise statements invoked below.
(B)
High-moment bounds for short Dirichlet polynomials. Harper’s method [7] (and its discrete adaptations) gives, for any fixed k 1 and any coefficients c n of size 1 ,
1 T T 2 T n X c n n 1 / 2 i t 2 k d t k ( log log T ) C 1 ( k ) ,
and by the discrete adaptations in [4] (which combine Harper’s decomposition with zero-distribution inputs) we similarly have
1 N ( T ) 0 < γ T n X c n n 1 / 2 i γ 2 k k ( log log T ) C 2 ( k ) ,
where C 1 ( k ) , C 2 ( k ) are at most polynomial in k. (References: Harper [7]; Kirila [4].)
(C)
Pair-correlation orthogonality for off-diagonal exponentials. For nonzero frequencies u built from logarithmic combinations of integers X , Montgomery’s pair-correlation heuristic and subsequent refinements imply cancellation in sums
1 N ( T ) 0 < γ T e i γ u = o ( 1 ) ( when | u | ( log T ) C A ) ,
for some C A > 0 depending on the combinatorics of the integers involved; see Montgomery [17] and the treatment of such sums in Kirila [4]. In our context, since X = ( log T ) A , the nonzero frequencies produced by multinomial expansion satisfy | u | ( log T ) O ( A ) and so (29) applies to show these off-diagonal contributions are negligible in the averaged moments.
We now explain how to apply (A)–(C) to the terms in (25).
Bounding Terms with Dominant Short-Polynomial Factors
Consider summands where the majority of the factors come from short-polynomial pieces (i.e. contributions that, after expanding the definitions of R X tail , R X bd , R X zeros , are dominated by sums of the form n X c n n 1 / 2 i γ ). For each such summand, apply Hölder’s inequality to isolate a single 2 k -moment of a short Dirichlet polynomial and use (28). Hence each such summand is
k ( log log T ) C 2 ( k ) .
Bounding Terms Involving ζ ( ρ )
Mixed summands that contain explicit factors of ζ ( ρ ) (coming from contour residues or boundary integrals) are controlled by Hölder’s inequality and Kirila’s bounds (26) (or mixed-moment variants stated in [4]). Thus such summands are bounded by
k ( log T ) k 2 + o ( 1 ) · ( log log T ) C ( k ) ,
where the extra ( log log T ) C ( k ) factor accounts for any attached short-polynomial moment(s) handled via (28).
Bounding Off-Diagonal Terms
Off-diagonal summands result in factors of the form
1 N ( T ) 0 < γ T e i γ u · E ( u ) ,
where E ( u ) is a bounded arithmetic weight coming from products of coefficients a n . By (29), these averages are o ( 1 ) uniformly for the frequencies u that arise when X = ( log T ) A . Therefore every off-diagonal summand contributes at most o ( 1 ) (uniformly in T) to M 2 k .
Conclusion for M 2 k
Combining the bounds (30), (31), and the off-diagonal negligibility, we obtain for fixed k the existence of explicit constants C 3 ( k ) , C 4 ( k ) > 0 (depending only on k) and a function F ( A , k ) (coming from the tail and boundary control) such that
M 2 k F ( A , k ) · ( log log T ) C 3 ( k ) + C 4 ( k ) ( log T ) k 2 + o ( 1 ) · ( log log T ) C 5 ( k ) + o ( 1 ) ,
where the second term arises from the possible appearance of factors of ζ ( ρ ) (bounded by Kirila) combined with short-polynomial moments; the o ( 1 ) term is the aggregate of off-diagonal negligible contributions.
We now make the dependence of each piece explicit and show how to make the right-hand side of (32) arbitrarily small (in the sense needed to produce the exceptional-set bound).
Quantitative estimates for the tail and boundary pieces.
The tail term R X tail (see (21)) is a sum over n > X 2 of a ˜ n n 1 / 2 i γ . Use the bound | a ˜ n | Λ ( n ) / log n (which follows from (18) and the boundedness of derivatives of w). Then, for any ε ( 0 , 1 ) ,
n > X 2 | a ˜ n | 2 n 1 n > X 2 Λ ( n ) 2 log 2 n · 1 n n > X 2 1 n 1 + ε X 2 ε .
Consequently, by Cauchy–Schwarz and Hölder one gets for fixed k
1 N ( T ) 0 < γ T R X tail ( γ ) 2 k k X 2 ε k k ( log T ) 2 ε k A .
Thus, by choosing A large, the tail contribution to M 2 k can be made arbitrarily small.
The boundary term R X bd is given by finite integrals on compact vertical segments (see (22)). Standard estimates (moving to a contour where | ζ / ζ ( s ) | is polynomially bounded and using the compact support of G X ) yield, for fixed k,
1 N ( T ) 0 < γ T R X bd ( γ ) 2 k k ( log log T ) C 6 ( k ) · X δ ( k )
for some δ ( k ) > 0 . The decay factor X δ ( k ) reflects the fact that increasing the truncation length X reduces boundary contributions; hence this term can also be made arbitrarily small by increasing A.
The zeros term R X zeros is handled by decomposing the kernel K X ( · ) into a short-range piece (where | γ γ | is small) and a long-range piece (where the kernel decays). The long-range piece is negligible uniformly; the short-range piece is controlled by pair-correlation estimates and the short-polynomial moment bounds. One obtains
1 N ( T ) 0 < γ T R X zeros ( γ ) 2 k k ( log log T ) C 7 ( k ) · X η ( k ) + o ( 1 ) ,
with η ( k ) > 0 . Again this contribution can be made arbitrarily small by choosing A sufficiently large.
Combining (34)–(36) with (32) yields, for fixed k, the existence of constants c 1 ( k ) , c 2 ( k ) > 0 such that
M 2 k c 1 ( k ) X ζ ( k ) + c 2 ( k ) ( log log T ) C 8 ( k ) · ( log T ) k 2 + o ( 1 ) + o ( 1 ) ,
where ζ ( k ) : = min { 2 ε k , δ ( k ) , η ( k ) } > 0 (we may choose ε > 0 small to balance constants).
Markov (Chebyshev) Step and Choice of Parameters
Let B > 0 and C > 0 be fixed. We will choose k = k ( B , C ) and then A = A ( B , C ) so that the exceptional-set bound (15) and the uniform remainder bound (14) hold.
From (37) and using X = ( log T ) A we obtain for sufficiently large T
M 2 k c 1 ( k ) ( log T ) A ζ ( k ) + c 2 ( k ) ( log log T ) C 8 ( k ) · ( log T ) k 2 + o ( 1 ) .
Choose integers k and C depending only on B as follows. Take
C : = 2 B + 5 ,
so that C 2 2 B + 5 . Now set k : = C . Since ϵ ( k ) = o ( 1 ) as T , for large T we have ϵ ( k ) < 1 , and hence
2 k C k 2 + ϵ ( k ) = 2 C 2 ( C 2 + ϵ ( k ) ) = C 2 ϵ ( k ) 2 B + 4 .
Thus inequality (39) is satisfied for our explicit choice of k and C.
Equivalently, observe that the inequality can be rewritten as
k 2 2 C k + ( 2 B + 4 + ϵ ( k ) ) 0 ,
which is a quadratic in k. Real solutions exist provided C 2 2 B + 4 + ϵ ( k ) , and then any integer k between the roots is admissible. Choosing k = C is the simplest option.
Having fixed k, choose A = A ( B , C ) sufficiently large so that
c 1 ( k ) ( log T ) A ζ ( k ) ( log log T ) ( 2 k C + 2 B + 3 )
for all large T. This is possible because the left-hand side decays like ( log T ) A ζ ( k ) whereas the right-hand side decays like a negative power of log log T ; increasing A makes the left-hand side arbitrarily small.
With the choices (39) and (40) in place, for all sufficiently large T we combine (38) and obtain
M 2 k ( log log T ) ( 2 k C + 2 B + 2 ) .
Now apply Markov’s inequality: the number of zeros with | R X ( γ ) | ( log log T ) C is bounded by
# { γ T : | R X ( γ ) | ( log log T ) C } ( log log T ) 2 k C · N ( T ) · M 2 k .
Substituting (41) into (42) yields
# { γ T : | R X ( γ ) | ( log log T ) C } N ( T ) · ( log log T ) ( 2 B + 2 ) B N ( T ) ( log T ) B
for large T. This proves (15) and the uniform bound (14) for γ E app .
Final Remarks
The identities and bounds above are effective and the required choices of k and A are explicit in principle: k is any integer satisfying (39) and A any sufficiently large number satisfying (40); the dependence of the explicit constants c i ( k ) , C j ( k ) is determined by the precise statements in Kirila [4] and Harper [7] that we invoked. The only non-elementary inputs used are those published results (Kirila for discrete moments and Harper for short-polynomial high-moments) and Montgomery’s pair-correlation orthogonality; these are cited and used in the exact forms required (see [4,7,17], and Hejhal [3] for the explicit-formula derivation).
This completes the proof of Lemma 1.    □
Remarks on Lemma 1. The coefficients a n arise naturally from truncating the Euler product or approximate functional equation for ζ ( s ) . In practice, one may take a n supported on prime powers, with a p of size O ( p o ( 1 ) ) . The exact form of a n is not essential for the entropy arguments; what matters is that the variance
σ X 2 = n X | a n | 2 n log log T ,
so that D X ( γ ) admits a Gaussian-type normalization.
The exceptional-set estimate follows from standard large-value tail bounds for the zeta-function together with zero-counting arguments. Hejhal [3][Sec. 3] first established the Gaussian distributional model for log | ζ | , while Kirila [4][Sec. 4] adapted these approximations to the discrete setting of sums over zeros and obtained control of the exceptional set. Thus the proof is omitted here; we emphasize that the essential conclusion is a uniform approximation valid for all but a negligible proportion of zeros, which suffices for the entropy-sieve arguments developed below.

4.1. Variance Calculation

In this subsection we compute the asymptotic size of the variance
σ X 2 = n X | a n | 2 n ,
associated with the short Dirichlet polynomial approximation
D X ( γ ) = n X a n n 1 / 2 i γ ,
where the coefficients a n are given explicitly below. The variance determines the natural Gaussian scale for fluctuations of D X ( γ ) and is a key input for the moment-generating and entropy arguments in Sections 7–Section 3.
We adopt the canonical choice
X = ( log T ) A , A > 0 fixed ,
so that log X = A log log T and log log X = log log log T + O ( 1 ) . This logarithmic regime is consistent with the cumulant and entropy analyses developed later.
Lemma 2 
(Variance asymptotic — explicit coefficients). Let X 3 and define the smooth cutoff
W X ( n ) : = log ( X / n ) log X ( 1 n X ) , W X ( n ) = 0 ( n > X ) .
Set
a n = Λ ( n ) log n n 1 / 2 σ X W X ( n ) = Λ ( n ) log n n 1 / log X W X ( n ) ( n X ) ,
with
σ X : = 1 2 + 1 log X .
Define
Σ ( X ) : = n X | a n | 2 n .
Then
Σ ( X ) = log log X + O ( 1 ) .
Consequently, for X = ( log T ) A with fixed A > 0 ,
Σ ( X ) = log log log T + O ( 1 ) .
Proof. 
With the choice (44) put
b n : = a n n 1 / 2 ( n X ) ,
so that
b n = Λ ( n ) log n n σ X W X ( n ) , Σ ( X ) = n X | b n | 2 .
Since Λ ( n ) = 0 unless n = p k is a prime power, the sum reduces to prime powers:
Σ ( X ) = p X k 1 p k X ( Λ ( p k ) ) 2 ( log p k ) 2 p 2 k σ X W X ( p k ) 2 .
For a prime power p k we have Λ ( p k ) = log p and log p k = k log p , hence the factor simplifies to 1 / k 2 . Thus
Σ ( X ) = p X k 1 p k X 1 k 2 p 2 k σ X W X ( p k ) 2 .
Step 1: Contribution of higher prime powers. For k 2 and p 2 we have p 2 k σ X p k (since σ X 1 / 2 ), and W X ( · ) 1 , so
0 p X k 2 p k X 1 k 2 p 2 k σ X W X ( p k ) 2 p k 2 1 k 2 p k .
The double series on the right converges absolutely, hence this entire part contributes O ( 1 ) , with an absolute implied constant.
Step 2: Contribution of primes. For k = 1 we obtain
S 1 ( X ) : = p X p 2 σ X W X ( p ) 2 .
Using σ X = 1 2 + 1 / log X and W X ( p ) = 1 log p log X we write
p 2 σ X W X ( p ) 2 = 1 p e 2 log p log X 1 log p log X 2 .
Put v : = log p log X (so 0 v 1 for p X ). Expanding e 2 v ( 1 v ) 2 at v = 0 gives
e 2 v ( 1 v ) 2 = 1 4 v + O ( v 2 ) ,
uniformly for 0 v 1 (with an absolute constant in the O ( v 2 ) term). Hence
p 2 σ X W X ( p ) 2 = 1 p 1 4 log p log X + O ( log p ) 2 log 2 X .
Step 3: Summation over primes. Summing over p X and using standard prime-sum estimates (from the prime number theorem; see Davenport [8][Ch. 1] or Titchmarsh [9][Ch. 2]) we have
p X 1 p = log log X + O ( 1 ) , p X log p p = log X + O ( 1 ) , p X ( log p ) 2 p ( log X ) 2 .
Therefore
S 1 ( X ) = p X 1 p 4 1 log X p X log p p + O ( 1 ) = log log X + O ( 1 ) ,
since the middle term equals 4 + O ( 1 / log X ) and the O ( v 2 ) remainder contributes O ( 1 ) .
Conclusion. Adding both contributions gives
Σ ( X ) = log log X + O ( 1 ) .
Finally, for X = ( log T ) A we obtain
Σ ( X ) = log log log T + O ( 1 ) ,
as claimed.    □

4.2. Moment Generating Function Bounds

We now establish bounds on the moment generating function (MGF) of the short Dirichlet polynomial approximant
D X ( γ ) = n X a n n 1 / 2 i γ ,
averaged over the nontrivial zeros ρ = 1 2 + i γ of the Riemann zeta function. This constitutes one of the key analytic inputs in deriving Gaussian-type large deviation estimates for log | ζ ( ρ ) | . The result may be viewed as a discrete analogue of Harper’s bounds for continuous t-averages [7], adapted to the discrete set of zeros by Kirila [4][Sec. 5].
Proposition 1 
(MGF bound for the Dirichlet approximant). Fix ε > 0 . There exists an absolute constant C 0 > 0 such that for all real t with
| t | t 0 : = 1 2 C 0 log log T ,
we have the uniform bound
1 N ( T ) 0 < γ T exp t D X ( γ ) exp 1 2 t 2 σ X 2 + O | t | 3 ( log log T ) 3 / 2 ,
where σ X 2 is the variance from Lemma 2. The implied constants are absolute.
Proof. 
Write
S ( γ ) : = n X A n n i γ , A n : = a n n 1 / 2 ,
so that D X ( γ ) = 1 2 ( S ( γ ) + S ( γ ) ¯ ) . Define
M ( t ) : = 1 N ( T ) 0 < γ T e t D X ( γ ) .
Expanding the exponential gives
M ( t ) = r = 0 t r r ! M r , M r : = 1 N ( T ) 0 < γ T D X ( γ ) r .
Expansion of M r . By the multinomial theorem,
D X ( γ ) r = 2 r r 1 + r 2 = r r r 1 , r 2 S ( γ ) r 1 S ( γ ) ¯ r 2 .
Expanding both powers produces sums of the shape
n 1 , , n r 1 X m 1 , , m r 2 X j = 1 r 1 A n j k = 1 r 2 A m k ¯ e i γ ( j log n j k log m k ) .
Averaging over zeros introduces the factor
A ( u ; T ) : = 1 N ( T ) 0 < γ T e i γ u , u = j log n j k log m k .
Hence
M r = 2 r r 1 + r 2 = r r r 1 , r 2 n 1 , , n r 1 X m 1 , , m r 2 X j = 1 r 1 A n j k = 1 r 2 A m k ¯ A ( u ; T ) .
Remark 1.The exponential average
A ( u ; T ) : = 1 N ( T ) 0 < γ T e i γ u
appears in display (46). For the off-diagonal estimates below we require the following uniformity:
A ( u ; T ) = o ( 1 ) ( T ) ,
uniformly for every nonzero frequency u that arises as an integer linear combination
u = α ε α log q α , q α X , | ε α | R ,
where R is the cumulant/order parameter in the expansion. A trivial lower bound for such nonzero u is | u | c R X R c R ( log T ) A R when X = ( log T ) A , so the quantitative pair–correlation hypothesis (PC) recorded below implies the required o ( 1 ) –uniformity provided one arranges the parameters so that A R C 1 + O ( 1 ) (see the statement of (PC) in Section 4). We apply this remark with R R as chosen in Lemma 5.2.
Diagonal terms ( u = 0 ). If u = 0 , then the multisets { n j } and { m k } coincide. This is possible only when r is even, say r = 2 . In that case the number of perfect matchings yields
M 2 diag = ( 2 ) ! 2 ! ( σ X 2 ) ,
with
σ X 2 = n X | A n | 2 ,
as established in Lemma 2. For odd r, the diagonal contribution vanishes.
Off-diagonal terms ( u 0 ). The key input is the estimate for the zero-average A ( u ; T ) . By the explicit formula (see Titchmarsh, Montgomery, or [4][Sec. 5]), one has
0 < γ T e i γ u = O T log T , | u | 1 / T ,
with stronger bounds available from Montgomery’s pair-correlation theorem and its modern refinements: for fixed δ > 0 and all | u | ( log T ) δ ,
1 N ( T ) 0 < γ T e i γ u = o 1 .
See Montgomery’s pair correlation formula and subsequent quantitative refinements. Since here u is an integer linear combination of logarithms of integers X and X = ( log T ) A (or X = T α with fixed α ), we have | u | 1 / log A T unless u = 0 . Thus the pair-correlation input implies
A ( u ; T ) = o ( 1 ) ,
uniformly for all nonzero u arising in (46).
Consequently the contribution from u 0 is bounded by
sup u 0 | A ( u ; T ) | · n X | A n | r .
By Cauchy–Schwarz, n X | A n | σ X X . Since X is at most polylogarithmic in T, this factor grows more slowly than any power of log T , while sup u 0 | A ( u ; T ) | = o ( 1 ) , so these off-diagonal terms are negligible compared with the main diagonal.
Cumulant control. Thus for even r = 2 ,
M 2 = ( 2 ) ! 2 ! ( σ X 2 ) + o ( log log T ) ,
while for odd r we have M r = o ( log log T ) r / 2 . Hence the moments match those of a centered Gaussian with variance σ X 2 . Introducing cumulants κ r via
log M ( t ) = r 1 κ r t r r ! ,
we deduce κ 1 = 0 , κ 2 = σ X 2 + o ( 1 ) , and | κ r | r ! ( C 0 log log T ) r for r 3 , some absolute C 0 . Therefore the cumulant series converges absolutely for | t | 1 / ( 2 C 0 log log T ) . In this range,
log M ( t ) = 1 2 σ X 2 t 2 + O | t | 3 ( log log T ) 3 / 2 .
Exponentiating gives the claimed MGF bound.    □
The expansion in Proposition  1, together with Remark 1, shows that the moment generating function of D γ behaves essentially as if D γ were a short Gaussian sum: diagonal contributions dominate, while off–diagonal contributions are negligible under (PCH). To make this heuristic precise we now pass from raw moments to cumulants. The cumulant expansion has the advantage that Gaussian behavior corresponds exactly to vanishing of all higher cumulants, and it provides quantitative control of the radius of convergence of the logarithmic moment generating function. The following lemma records the bound we shall need.
Lemma 3 
(Cumulant control). Let X = ( log T ) A with fixed A > 0 . Let ( b p ) p X be complex numbers supported on the primes p X with | b p | B for some fixed B, and set
D γ = p X b p p p i γ , V : = p X | b p | 2 p .
Assume (RH) and the pair-correlation uniformity Hypothesis (PCH) recorded in Section 1, together with the discrete moment input described in the next paragraph (both hypotheses are those spelled out in the Introduction). Then for every integer r 2 one has
| κ r ( D γ ) | A , B C r r ! V r / 2 ,
for an absolute C = C ( A , B ) > 0 . In particular the cumulant generating function K ( t ) = log E γ [ e t D γ ] converges absolutely and is analytic in the disk | t | c / V for some c = c ( A , B ) > 0 .
Proof. 
Write M r = E γ [ D γ r ] for the raw r-th moment (expectation over zeros 0 < γ T with the normalization 1 / N ( T ) ). Expanding the r-fold product yields
M r = p 1 , , p r X b p 1 b p r p 1 p r 1 N ( T ) 0 < γ T exp i γ j = 1 r log p j .
By definition of A ( u ; T ) (see Remark 1) the inner average equals A j log p j ; T . The contribution from those tuples with j log p j = 0 (equivalently the multiset { p 1 , , p r } can be partitioned into two submultisets with equal products) will from now on be called the balanced (or “diagonal”) contribution; the rest will be called off-diagonal.
The balanced tuples are exactly those that produce zero frequency and hence survive the γ -average with weight A ( 0 ; T ) = 1 . For the purposes of bounding cumulants it suffices to treat the even moments, so write r = 2 k . When r is odd the same combinatorial analysis gives a smaller contribution (indeed odd raw moments are negligible for symmetric coefficients), and the cumulant bounds that follow continue to hold by standard moment–cumulant relations; we therefore present the argument for r = 2 k .
If { p 1 , , p 2 k } is balanced then the multiset of the first k primes must equal the multiset of the last k primes after a permutation. Grouping by matchings between the first k indices and the last k indices we obtain the classical pairing combinatorics: each perfect matching m of { 1 , , 2 k } into k unordered pairs contributes at most
{ i , j } m p X | b p | 2 p = V k ,
and the number of such matchings is ( 2 k ) ! 2 k k ! . More generally, balanced tuples that are not simple pairings (i.e. some prime occurs with multiplicity larger than 2) can be treated identically by grouping indices according to equal prime values; each such multiplicity pattern yields a contribution bounded by a product of factors p X | b p | j p j / 2 with j 2 , and each such factor is ( p X | b p | 2 / p ) j / 2 = V j / 2 by Hölder. Summing over all multiplicity patterns therefore yields the bound
M 2 k balanced ( 2 k ) ! 2 k k ! V k · C 1 k
for some constant C 1 = C 1 ( B ) depending only on the uniform bound B for | b p | . The combinatorial prefactor ( 2 k ) ! / ( 2 k k ! ) is bounded by C k k ! for an absolute C, so the balanced contribution satisfies
M 2 k balanced A , B C k k ! V k .
We now show that off-diagonal frequencies contribute a negligible amount in the parameter range of interest. Each off-diagonal tuple produces a nonzero frequency u = j = 1 2 k log p j with | u | 2 k log X . By the pair–correlation uniformity (PCH) (see Remark 1 and the Hypotheses subsection), for T large and for every such nonzero u we have | A ( u ; T ) | δ T with δ T 0 as T , uniformly for | u | 2 k log X . The total number of off-diagonal tuples is π ( X ) 2 k ( X / log X ) 2 k . Hence the off-diagonal contribution is bounded by
M 2 k off δ T π ( X ) 2 k max p 1 , , p 2 k j = 1 2 k | b p j | p j A , B δ T C 2 π ( X ) / p min 2 k ,
which is o V k k ! provided the parameters are chosen as in the Introduction (the required smallness δ T π ( X ) 2 k = o ( V k k ! ) is exactly the uniformity range we demanded in (PCH) and in the discrete moment hypothesis; see the discussion immediately following Hypothesis (PCH)). In practice one takes k c V for a small absolute c so that the combinatorial growth π ( X ) 2 k is dominated by the decay of δ T coming from (PCH) and from the discrete-moment input of Kirila (which implements Harper’s argument on the zero set); see [7] and [4] for the precise discrete estimates that justify this step. Consequently M 2 k off = o M 2 k balanced for the admissible range of k used below.
The cumulants κ 2 k are polynomial combinations of the raw moments M j with j 2 k . The moment–cumulant relations together with the bound just obtained for the dominant balanced term imply
| κ 2 k | A , B C k k ! V k .
Rewriting in terms of r = 2 k gives | κ r | C r r ! V r / 2 for all even r 2 . The odd cumulants satisfy the same upper bound (indeed they are typically smaller), so the bound holds for every integer r 2 .
Finally, absolute convergence of the cumulant series in the disk | t | c / V follows from comparison with a geometric series: for | t | c / V one has | κ r t r / r ! | ( C | t | V ) r which is summable for c sufficiently small depending only on A , B . Thus K ( t ) is analytic in the claimed disk.    □

Corrected Chernoff Constraint

Let Z denote the short Dirichlet-polynomial approximation to log | ζ ( 1 2 + i γ ) | with variance
σ 2 = Var ( Z ) p X 1 p log log X .
By Proposition 4.3 (cumulant control) the log-MGF admits the Gaussian expansion
log E [ e t Z ] = 1 2 t 2 σ 2 + O | t | 3 σ 3 , | t | t max ,
where t max is the radius of validity for the cumulant expansion. For our choice X = ( log T ) A we have σ 2 log log log T and hence
t max 1 σ ( in particular t max 0 as T ) .
By Chernoff,
Pr ( Z V ) exp t V + 1 2 t 2 σ 2 + O ( | t | 3 σ 3 ) .
Two regimes follow.
(i) If V σ 2 t max then the unconstrained minimizer t * = V / σ 2 satisfies | t * | t max and one obtains the Gaussian tail
Pr ( Z V ) exp V 2 2 σ 2 .
(ii) If V > σ 2 t max then the admissible choice is t = t max and
Pr ( Z V ) exp t max V + O ( t max 2 σ 2 ) .
Thus the best linear-in-V rate obtainable from the MGF/Chernoff method is c MGF t max . Since t max 1 / σ 0 for X = ( log T ) A , the MGF route alone cannot produce a fixed constant c MGF > 2 (indeed c MGF 0 ). Consequently the combined tail exponent
β = min { 2 α , c MGF }
satisfies β c MGF for large T, so β > 2 is not obtained unless one supplements the present hypotheses by a stronger MGF-type input (see Hypothesis DMC+ below) or a stronger sieve input.

4.3. Gaussian Lower-Tail via Chernoff Inequality

With Proposition 1 in place, we can now establish Gaussian-type bounds for the lower tail of log | ζ ( ρ ) | along the critical zeros. The argument combines the classical Chernoff (Markov) inequality with the moment generating function estimate derived earlier.
Theorem 1 
(Gaussian lower-tail bound). Fix V 1 and define
N ( V ; T ) : = # γ T : log | ζ ( 1 2 + i γ ) | V .
Assume the hypotheses of Lemma 1 and Proposition 1. Then there exists an absolute constant c > 0 such that, uniformly for
1 V c log log log T ,
we have
N ( V ; T ) N ( T ) exp c V 2 σ X 2 + | E app | ,
where σ X 2 log log X log log log T is as in Lemma 2, and E app is the exceptional set from Lemma 1.
Proof. 
Let S denote the set of zeros γ T with γ E app . For any t > 0 , Markov’s inequality yields
# { γ S : log | ζ ( 1 2 + i γ ) | V } e t V γ S e t D X ( γ ) + t | R X ( γ ) | .
By Lemma 1, on S the remainder R X ( γ ) is uniformly negligible: there is an absolute constant C R > 0 (depending only on choices of parameters already fixed) such that | R X ( γ ) | C R for all γ S . Hence the factor e t | R X ( γ ) | contributes at most e t C R and can be absorbed into the implied constants once t is restricted to the admissible range below. Thus it suffices to bound
e t V γ S e t D X ( γ ) .
Divide by N ( T ) and apply Proposition 1 (the cumulant/MGF estimate) to obtain, for all | t | t max ,
1 N ( T ) γ S e t D X ( γ ) exp 1 2 t 2 σ X 2 + O | t | 3 σ X 3 ,
where σ X 2 = Var ( D X ) log log X and t max denotes the radius of validity of the cumulant expansion. With the polylogarithmic choice X = ( log T ) A we have
σ X 2 log log X log log log T , t max 1 σ X .
We now make the standard Chernoff choice
t : = V σ X 2 .
This choice is admissible (i.e. | t | t max ) precisely when
V σ X 2 t max V t max σ X 2 σ X .
Thus the Chernoff optimization is valid for all 1 V c σ X with some small absolute c > 0 . Recalling σ X log log X log log log T , this is the uniformity range stated in the theorem.
Insert this t into the right-hand side of (47). We have
1 2 t 2 σ X 2 = V 2 2 σ X 2 , | t | 3 σ X 3 = V 3 σ X 3 σ X 3 = V 3 σ X 0 = V 3 σ X 0 ,
so more transparently
O | t | 3 σ X 3 = O V 3 σ X 3 · σ X 3 = O V 3 σ X 3 · σ X 3 = O V 3 σ X 3 × σ X 3 ,
and hence the contribution of the cubic cumulant error to the exponent is
O | t | 3 σ X 3 = O V 3 σ X 3 .
(Equivalently, using the form in Proposition 1, the remainder in the exponent is O ( | t | 3 σ X 3 ) and for our t this equals O ( V 3 / σ X 3 ) .)
Compare this error with the main quadratic term:
V 3 / σ X 3 V 2 / σ X 2 = V σ X .
Hence whenever V c σ X with c > 0 chosen sufficiently small, the cubic error is a small fraction of the main quadratic term and may be absorbed into it. More precisely, for such V there exists an absolute constant c 1 > 0 for which
1 2 t 2 σ X 2 + O | t | 3 σ X 3 c 1 V 2 σ X 2 .
Combining this estimate with (47) and multiplying by e t V (the prefactor from Markov’s inequality), we obtain, for 1 V c σ X ,
# { γ S : log | ζ ( 1 2 + i γ ) | V } N ( T ) exp t V c 1 V 2 σ X 2 ,
where t = V / σ X 2 . Note that t V = V 2 / σ X 2 , so the two exponents combine to give an overall Gaussian decay:
t V c 1 V 2 σ X 2 = 1 + c 1 V 2 σ X 2 c V 2 σ X 2
for some absolute c > 0 .
Finally, reintroducing the uniformly bounded multiplicative factor coming from the negligible remainder R X ( γ ) (absorbed into the implied constant above) and adding back the exceptional set E app yields
N ( V ; T ) N ( T ) exp c V 2 σ X 2 + | E app | ,
uniformly for 1 V c σ X . Recalling σ X 2 log log X log log log T completes the proof.    □
Lemma 4 
(Decay of the exceptional set). Let E app be the exceptional set from Lemma 1, where the Dirichlet approximation may fail. Then there exists an absolute constant c 1 > 0 such that, for every V 1 ,
# γ E app : log | ζ ( 1 2 + i γ ) | V N ( T ) exp ( c 1 V ) + N ( T ) ( log T ) A ,
for any fixed A > 0 .
Proof. 
The argument combines two ingredients. First, if the approximation D X ( γ ) + R X ( γ ) fails by more than a tolerance δ > 0 , then the MGF bound (Proposition 1) and a large deviation estimate imply that such events have probability exp ( c δ 2 / σ X 2 ) in each local window. Second, if log | ζ ( 1 2 + i γ ) | V while the approximation is not extremely wrong, then γ must correspond to a zero with an abnormally small gap to its neighbors. By the Montgomery pair correlation law and sieve bounds of Bui–Florea–Milinovich, such small-gap zeros occur with frequency N ( T ) exp ( c V ) . Choosing parameters so that the two error sources match, we obtain the claimed exponential decay in V, with the ( log T ) A term absorbing negligible contributions from coarse error terms.    □
The arguments above establish that a short Dirichlet polynomial D X ( γ ) gives an accurate approximation to log | ζ ( 1 2 + i γ ) | for all but a very sparse exceptional set of zeros, with error term R X ( γ ) that is uniformly negligible. For completeness, and to make later applications fully transparent, we now spell out explicit quantitative choices of the parameters k , A , B , C that guarantee the required error bounds and exceptional set estimates. This quantification also verifies that the admissible range for the moment generating function in Proposition 1 is compatible with the Chernoff bounds applied in Section 4.2.

Recovery of the Near-Optimal Bound Under DMC+

Assume DMC+ holds with the fixed radius t 0 > 2 . Fix any t * ( 2 , t 0 ) . By Markov (Chernoff) and DMC+, for every V > 0 and uniformly in T,
Pr ( Z V ) exp t * V + 1 2 t * 2 σ X 2 + O ( | t * | 3 σ X 3 ) .
For V exceeding a (large but fixed) threshold V 1 we have t * V + 1 2 t * 2 σ X 2 + O ( | t * | 3 σ X 3 ) c 0 V for some constant c 0 ( 0 , t * ) (because the linear term in V dominates the fixed-size polynomial-in- σ X error). Thus the MGF route produces a linear tail
Pr ( Z V ) e c 0 V ( V V 1 ) ,
with c 0 > 2 . Combining this with the sieve/entropy decay e 2 α V (choose α > 1 ) yields an effective tail exponent
β = min { 2 α , c 0 } > 2 .
The standard dyadic decomposition then gives, for any fixed ε > 0 ,
J 1 ( T ) = 0 < γ T | ζ ( 1 2 + i γ ) | 2 T ( log T ) ε ,
as in the original strong statement. The constants depend only on the fixed choices t * and α , and on the implied constants in DMC+ and the sieve hypothesis. □

4.4. Quantitative Parameter Selection

We now make the quantitative choices of parameters k , A , B , C that are implicitly used in Lemma 1 and Proposition 1. The goal is to exhibit explicit inequalities ensuring that the exceptional set E app has size N ( T ) / ( log T ) B while the error term R X ( γ ) is O ( ( log log T ) C ) uniformly off this set.

Choice of k

Let k = κ log log T with fixed 0 < κ < 1 / 4 . Kirila’s discrete moment bounds [4][Thm. 1.1] give
1 N ( T ) 0 < γ T | ζ ( 1 2 + i γ ) | 2 k k ( log T ) k 2 + O ( 1 ) .
Hence the 2 k -th moment of the remainder R X ( γ ) is
M 2 k = 1 N ( T ) 0 < γ T | R X ( γ ) | 2 k ( C A ) k ( log log T ) O ( k ) .
For k as above this is exp O κ ( log log T ) .

Application of Markov

By Markov’s inequality, for any threshold τ > 0 ,
1 N ( T ) # { γ T : | R X ( γ ) | > τ } M 2 k τ 2 k .
Set τ = ( log log T ) C . With k = κ log log T the denominator is τ 2 k = exp ( 2 κ C ( log log T ) log log log T ) . Since the numerator is only exp ( O κ ( log log T ) ) , choosing C sufficiently large (depending on κ and desired B) gives
| E app | N ( T ) ( log T ) B .

Choice of A

The truncation length is X = ( log T ) A . To ensure the remainder R X ( γ ) satisfies the bound above we require A A ( B , C ) for some explicit function. The contour-shift arguments behind Lemma 1, together with standard zero-density and explicit formula bounds (see Hejhal [3] and Kirila [4]), show that A B + C suffices. Concretely, for each fixed B , C we may take
A = 10 ( B + C )
to guarantee the error bound and exceptional set estimate.

Admissible Range for t

Proposition 1 (MGF expansion) is uniform for
| t | t 0 : = c log log T
with some absolute c > 0 . In the Chernoff bound application we choose t = V / σ X , where σ X 2 log log T . Thus | t | c / log log T provided V c log log T . This coincides with the natural Gaussian scale of fluctuations, and covers the full range needed in Section 4.2.

Summary

For each desired power saving B > 0 and decay parameter C > 0 , we may choose
k = κ log log T , A = 10 ( B + C ) , τ = ( log log T ) C ,
with 0 < κ < 1 / 4 fixed. Then Lemma 1 holds with | E app | N ( T ) ( log T ) B and | R X ( γ ) | τ for γ E app . Moreover, the MGF bounds of Proposition 1 apply for all admissible t = V / σ X with V c log log T . □

5. Entropy–Sieve Method (ESM)

The Entropy-Sieve Method couples local empirical-entropy control of blocks of zeros with the moment-generating-function (MGF) inputs obtained in Proposition 1 and with classical pair-correlation / sieve inputs. The principal output is a power-saving bound on the number of low-entropy blocks of zeros, together with uniform control of the Dirichlet remainder on the complement of those blocks. The combination of these statements is the core probabilistic–analytic ingredient that allows us to control negative discrete moments in Section 9.

5.1. Definitions and Notation

Fix a slowly growing integer m = m ( T ) (we will specify an explicit rate later). For each zero ordinate γ with 0 < γ T choose a deterministic consecutive block Γ γ = { γ j } j = 1 m of length m containing γ (for definiteness take the centered block when possible). Let σ X be as in Lemma 2 and let D X ( γ ) denote the short Dirichlet polynomial approximant from Lemma 1.
Fix bin-widths h = h ( T ) > 0 and h ˜ = h ˜ ( T ) > 0 and let ( B ) = 1 K be a partition of a bounded interval of R into K contiguous bins of width h (take K polynomial in m), and let ( B ˜ ) = 1 K ˜ be a partition of a bounded interval of ( 0 , ) into bins of width h ˜ (for gaps). Define for the block Γ γ the empirical histograms
p ( γ ) = 1 m # { j { 1 , , m } : ( D X ( γ j ) μ Γ γ ) / σ X B } ,
and
p ˜ ( γ ) = 1 m # { j { 1 , , m } : ( γ j + 1 γ j ) log T B ˜ } ,
and the corresponding empirical (Shannon) entropies
H val ( γ ) = = 1 K p ( γ ) log p ( γ ) , H gap ( γ ) = = 1 K ˜ p ˜ ( γ ) log p ˜ ( γ ) .
We call a block Γ γ  low-entropy if either H val ( γ ) or H gap ( γ ) is below a threshold H 0 = 1 2 log m + O ( 1 ) (the specific O ( 1 ) -term is chosen to absorb smoothing errors described below). Denote by E ent the set of zeros whose block is low-entropy.
Definition 1 
(Value Entropy). Let Δ ( γ 0 ) be a block of m consecutive zeros centered at γ 0 . Thevalue entropyis defined as
H h , Δ val = v P Δ ( v ) log P Δ ( v ) ,
where P Δ ( v ) is the empirical distribution of log | ζ ( 1 / 2 + i γ ) | within Δ.
Definition 2 
(Gap Entropy). For the same block Δ ( γ 0 ) , thegap entropyis defined as
H h g , Δ gap = g P Δ ( g ) log P Δ ( g ) ,
where P Δ ( g ) is the empirical distribution of normalized gaps between consecutive zeros in Δ.
Definition 3 
(Tail Decay Parameter). For V > 0 , define
δ ( V ) : = e α V ,
where α > 0 is a tuning parameter appearing in the entropy–sieve optimization.
The main lemma of this section counts E ent under a checkable approximate-independence estimate which we now state and verify.
Lemma 5 
(Block cumulant factorization). Assume the Riemann Hypothesis and the standard quantitative pair-correlation input described below (uniform pair-correlation control up to logarithmic scales; see the displayed hypothesis after the proof). Let Γ = { γ 1 , , γ m } be any block of m consecutive zeros with m = m ( T ) satisfying
m = o ( log T ) δ
for some small fixed δ > 0 . For any fixed finite collection Ψ = { ψ 1 , , ψ J } of bounded Lipschitz test functions on R (with Lipschitz constants allowed to grow at most polynomially in m through the bin-widths), define the block cumulant generating function
Λ Γ ( λ ) : = 1 m log E Γ exp j = 1 m r = 1 J λ r ψ r D X ( γ j ) μ Γ σ X ,
where E Γ denotes the empirical average over γ j Γ and μ Γ is the empirical block mean of D X ( γ ) . Then for every fixed L > 0 and uniformly in λ L one has
Λ Γ ( λ ) = log E Y N ( 0 , 1 ) exp r = 1 J λ r ψ r ( Y ) + O ( η m ) ,
where η m 0 as m under the above constraint on m. Furthermore one may choose m = m ( T ) growing sufficiently slowly that m η m 0 as T .
Proof. 
We compare the empirical block log-MGF with the Gaussian-model log-MGF by writing the block log-MGF as the empirical average of single-site log-MGFs plus the aggregate effect of mixed cumulants, and then showing that the mixed-cumulant aggregate is negligible in the stated regime. Let Φ λ ( x ) : = exp r = 1 J λ r ψ r ( x ) (this map is bounded and Lipschitz whenever λ L ). For each site γ j we consider the random variable
X j : = Φ λ D X ( γ j ) μ Γ σ X ,
and the empirical log-MGF is Λ Γ ( λ ) = 1 m log 1 m j = 1 m X j after the usual normalization (the small difference between empirical mean and empirical expectation is handled below and does not affect the per-site limit).
First, by Proposition 1 (the single-site MGF control adapted to test functions ψ r ), the cumulants of each single-site variable X j are uniformly bounded in T and, when normalized by σ X , their second cumulant is asymptotically 1 while higher cumulants decay rapidly with order. Concretely, for each fixed integer q 2 there exists a constant C q , L , J (depending only on q , L , J and polynomially on the Lipschitz norms of the ψ r ) such that the q-th cumulant of X j satisfies
κ q ( X j ) = O C q , L , J ,
uniformly in j and in the block Γ ; moreover κ 2 ( X j ) = 1 + o ( 1 ) after the stated normalization. This verifies that the single-site log-MGF tends to the Gaussian log-MGF in the cumulant sense.
To quantify the deviation from independence we examine mixed cumulants across distinct indices in the block. A general mixed cumulant of order R involving indices j 1 , , j R (not all equal) expands as a finite linear combination (with combinatorial coefficients depending only on R) of mixed moments of the form
E t = 1 R Φ λ ( t ) D X ( γ j t ) μ Γ σ X ,
where the derivatives Φ λ ( ) arise from the cumulant-to-moment inversion and t t = ( total moment order ) . Each such mixed moment is a finite multilinear combination of terms built from products of the Dirichlet-polynomial values D X ( γ j t ) , and each D X ( γ ) = n X a n n 1 / 2 i γ is itself a finite linear combination of complex exponentials n i γ . Thus every mixed moment can be written as a finite sum of terms of the form
C · s = 1 S A n s A m s ¯ · 1 m j I e i ( ± γ j 1 log n 1 ± ± γ j R log n R ) ,
where C is a combinatorial coefficient, I { 1 , , m } indexes those sites that enter a particular exponential average, and the product of A n factors has length bounded by the total moment order. By re-indexing the exponential one writes any such contribution as a factor times an average of the form
1 m t = 1 m e i γ t u
for some frequency.
u = α ε α log q α ,
where the ε α Z are integers with | ε α | R and the q α X are prime-powers coming from the Dirichlet expansion; the total number of distinct possible frequency patterns in a mixed cumulant of order R is bounded by a polynomial P R ( m ) in m (coming from the different ways to choose indices in the block and to assign the constituent Dirichlet factors).
The crucial analytic input is a uniform bound for zero-averages of the exponential sums
A ( u ; T ) : = 1 N ( T ) 0 < γ T e i γ u .
We invoke the standard quantitative pair-correlation control in the following usable form (this is the mild, commonly used hypothesis in the discrete-zero literature; see Montgomery [17] and the discrete-moment treatments in [4,7]): there exist absolute constants C 1 , C 2 > 0 such that for every u R with
| u | ( log T ) C 1
we have
| A ( u ; T ) | ( log T ) C 2 .
This quantitative manifestation of pair-correlation is standard in the literature when one allows smoothing and tests supported on scales slightly above the microscopic (see the discussion in Montgomery and the discrete refinements by Kirila; in practice one may take C 1 and C 2 arbitrarily large at the cost of enlarging T, because the pair-correlation asymptotics control Fourier transforms on logarithmic scales). Under this hypothesis ( * ) , any exponential average with frequency u satisfying | u | ( log T ) C 1 is negligible (indeed polynomially small in log T ).
Now observe that the frequencies u that appear in mixed cumulant terms are integer combinations of log q with q X . If a frequency vanishes exactly (i.e. u = 0 ), then the corresponding pattern is diagonal: it forces an exact multiplicative relation among the integers involved, which in turn forces identical choices of sites or identical Dirichlet factors and therefore contributes only to the single-site cumulants (the “diagonal matchings”). If u 0 , then, because each q X and the integer coefficients satisfy | ε α | R with R bounded in terms of the cumulant order, a trivial lower bound on nonzero linear combinations gives
| u | c R X R c R ( log T ) A R ,
for some constant c R > 0 depending only on R and where X = ( log T ) A (or more generally X ( log T ) A ). For the mixed cumulants that we need to control it suffices to consider R up to a small polynomial in m (indeed the cumulant expansion to obtain the block log-MGF to precision o ( 1 ) requires only cumulant orders R R 0 ( m ) with R 0 ( m ) = O ( log m ) ; one may make this explicit by truncating the cumulant expansion at large order and bounding the tail using factorial growth of cumulants and Proposition 1).
Combining the lower bound | u | c R ( log T ) A R with the pair-correlation hypothesis ( * ) we obtain that for every fixed cumulant order R and for all the nonzero frequencies arising in mixed cumulants,
| A ( u ; T ) | ( log T ) C 2 ,
provided T is large enough so that ( log T ) C 1 c R ( log T ) A R , i.e. provided A R C 1 + O ( 1 ) ; this condition is met by taking m and hence R small relative to log log T (for example by imposing R R : = C 1 / ( 2 A ) ). Thus every non-diagonal mixed-cumulant term is bounded in absolute value by
( log T ) C 2 · Q ( R ) · max n X | A n | R ,
where Q ( R ) is a combinatorial factor depending only on R (and polynomial in m through index choices). Since A n = a n n 1 / 2 and a n Λ ( n ) / log n (the explicit-formula construction gives at worst polylogarithmic weights for prime-powers n X ), we have the crude uniform bound max n X | A n | 1 for X polylogarithmic in T. Therefore the entire contribution of non-diagonal mixed cumulants of order R is bounded by
P R ( m ) ( log T ) C 2 ,
where P R is a polynomial in m. Choosing m = o ( ( log T ) C 2 / ( 2 deg P R ) ) makes this quantity o ( 1 ) . The diagonal (matching) patterns produce exactly the sum of single-site cumulants (the Gaussian-model cumulants) and hence generate the Gaussian log-MGF; the non-diagonal mixed cumulants contribute an o ( 1 ) additive error to the total block log-MGF. Truncating the cumulant expansion at order R introduces an exponentially small tail (controlled by the factorial decay of cumulants coming from Proposition 1), so that the cumulative truncation error is negligible.
Collecting these estimates, we deduce that the empirical block log-MGF differs from the Gaussian-model log-MGF by a quantity η m satisfying
η m P R ( m ) ( log T ) C 2 + o ( 1 ) ,
and hence η m 0 as m provided m = o ( ( log T ) δ ) for sufficiently small δ (in particular one can take δ such that P R ( m ) ( log T ) C 2 = o ( 1 ) ). Finally, choosing m = m ( T ) that grows slowly enough (for instance any m ( log log T ) c with small c > 0 ) ensures m η m 0 as T . This proves the claimed uniform block-cumulant factorization.    □
Lemma 6 
(Parameter selection for cumulant analysis). Fix target exponents B , C > 0 . Take
A = 10 ( B + C ) , R = C 1 2 A , m ( T ) = ( log log T ) c , 0 < c < 1 2 .
Then for large T one has
η m P R ( m ) ( log T ) C 2 + o ( 1 ) ,
hence η m 0 and m η m 0 . Moreover A R C 1 + O ( 1 ) , so the pair-correlation bound(PC)applies to all nonzero frequencies of order R .
Proof. 
The choice A = 10 ( B + C ) is the same as in Section 4.4, ensuring the Dirichlet polynomial approximation error is O ( ( log log T ) C ) off an exceptional set of size N ( T ) ( log T ) B . By construction R = C 1 / ( 2 A ) guarantees | u | ( log T ) C 1 for all nonzero frequencies built from at most R primes X , so assumption (PC) implies the bound | A ( u ; T ) | ( log T ) C 2 . Lemma 5 shows that the aggregate of non-diagonal cumulants is bounded by P R ( m ) ( log T ) C 2 + o ( 1 ) . With m = ( log log T ) c and c < 1 / 2 , this bound tends to zero and moreover m η m 0 . The inequality A R C 1 + O ( 1 ) is immediate from the definition of R . This proves the lemma.    □
Quantitative pair-correlation hypothesis used. For clarity, the precise analytic input we used (and which is standard in discrete-zero work) is: there exist constants C 1 , C 2 > 0 such that for all large T and all real u with | u | ( log T ) C 1 ,
1 N ( T ) 0 < γ T e i γ u = O ( log T ) C 2 .
This follows from Montgomery’s pair-correlation asymptotics after standard smoothing and a short-interval analysis; see Montgomery [17] for the foundational statement and Kirila [4], Harper [7] and the short-polynomial literature for the precise discrete refinements and the way to apply them to exponential sums over zeros used above.

5.2. Numerical Determination of Orthogonality Constants c 1 , c 2

To make the quantitative pair-correlation / orthogonality input used in Lemma 5 explicit, we numerically estimated
A ( u ; T ) = 1 N ( T ) 0 < γ T e i γ u
on a grid of frequencies u for several modest heights T. The goal is to produce explicit, reproducible numerical values ( c 1 , c 2 ) such that
sup | u | ( log T ) c 1 | A ( u ; T ) | ( log T ) c 2 ,
and to document the algorithm so that the computation can be independently verified.
Data and method. For a quick, reproducible run we computed the first N zeros γ 1 , , γ N using mpmath.zetazero [25] with working precision of 30 digits. For each selected M N we set T = γ M and evaluated A ( u ; T ) on a frequency grid consisting of U = 200 points: the lower half log-spaced in [ 10 4 , 10 1 ] and the upper half linear in [ 0.1 , 1 ] . For these small-scale tests the direct vectorized sum was sufficient. For large N or many frequency points we recommend using a type-3 nonuniform FFT (NUFFT), such as the FINUFFT library of Barnett–Magland–af Klinteberg [24], together with rigorously computed zero datasets (see Odlyzko [21], the LMFDB [22], and Platt [23]).
Numerical table (actual run). The following table reports the supremum sup | u | ( log T ) c 1 | A ( u ; T ) | on our u-grid and the corresponding fitted exponent
c ^ 2 = log sup | u | ( log T ) c 1 | A ( u ; T ) | log log T .
Numerical analysis.Table 4 shows that for modest heights ( T 200 –400), the supremum sup | A ( u ; T ) | already decays at a rate consistent with ( log T ) c 2 where c 2 1.0 . Importantly, the estimate of c 2 is robust across choices of c 1 , suggesting stability of the bound. Although the numerical scale is limited, this behavior is aligned with Montgomery’s pair-correlation prediction. At higher T (e.g. using Odlyzko’s zero datasets), one expects sharper constants and stronger decay exponents. Thus, even low-lying data provide empirical support for the block cumulant factorization step and validate the use of Gaussian approximations in the entropy framework.

5.3. Numerical Plot Analysis and Compatibility with Table

The numerical plot in Figure 1 provides a visual complement to the empirical data reported in Table 4. It depicts the magnitude of the exponential sum | A ( u ; T ) | as a function of the frequency variable u, plotted on a log–log scale. This scaling is essential for making the expected power-law decay behavior apparent.
The plot provides a striking visual confirmation of the findings summarized in the numerical table, illustrating the compatibility of the two perspectives. In particular:
  • General Decay Trend. The plot shows a pronounced decay in | A ( u ; T ) | as u increases, following an initial plateau for small u 10 2 . This directly confirms the central numerical observation: destructive interference among the oscillatory phases e i γ u drives the magnitude of A ( u ; T ) downward as u departs from the origin.
  • Connection with the Supremum. The supremum values reported in Table 4 are realized as the maximal heights of the decaying curves beyond the respective thresholds u thresh . For example, for M = 100 (blue curve), the recorded value 0.173 coincides with the largest ordinate beyond u 0.361 , 0.257 , and 0.183 , depending on c 1 . Similarly, for M = 200 (orange curve), the value 0.151 arises as the maximum observed beyond its thresholds. The visual stability of the decay rate explains the robustness of the fitted exponent c ^ 2 across different c 1 : shifting the cutoff along the curve does not significantly alter the observed slope.
  • Dependence on Sample Size (M) and Height (T). The orange curve ( M = 200 ) lies consistently below the blue curve ( M = 100 ) once u 10 2 , indicating a stronger decay at higher T. This agrees with the table, where the supremum decreases from 0.173 to 0.151 as M doubles, and the fitted decay exponent increases from c ^ 2 = 1.032 to c ^ 2 = 1.057 . Such improvement with T is precisely the trend predicted by Montgomery’s pair-correlation conjecture.
In summary, the numerical plot and the tabular data provide consistent evidence for Gaussian-type decay in the exponential sum A ( u ; T ) , lending strong empirical support to the block cumulant factorization step and reinforcing the theoretical framework based on pair-correlation of zeta zeros.
Reproducibility. The computations underlying Table 4 and Figure 1 are fully reproducible; see Appendix A and the archived notebook [26]. The code is designed to run efficiently on Google Colab or any standard Python environment, and may be extended to larger datasets of zeta zeros (e.g. the first 10 6 zeros). Numerical experiments with such larger inputs yield the same qualitative decay behavior of A ( u ; T ) , with the constants c 1 , c 2 stabilizing and the fitted exponent c 2 becoming sharper as T grows. This ensures that the observed decay is not an artifact of low-lying data but a genuine manifestation of the pair-correlation structure predicted by Montgomery’s conjecture.
Lemma 7 
(Low-entropy windows are rare). Fix any large parameter B > 0 . With the notation above there exist slowly varying choices of m , h , h ˜ and a threshold H 0 = 1 2 log m + O ( 1 ) such that the exceptional set
E ent = { γ T : H val ( γ ) < H 0 or H gap ( γ ) < H 0 }
satisfies
| E ent | B N ( T ) ( log T ) B .
Proof of Lemma 7. Fix small constants and choose bin-widths h , h ˜ so that the number of bins K , K ˜ is at most polynomial in m. Replace the indicator of each bin by a Lipschitz cutoff ϕ supported inside a slightly larger version of B . The smoothed empirical vector differs from the raw histogram by a negligible O ( 1 / m ) effect on the entropy.
For a fixed block Γ consider the event that the smoothed empirical vector has entropy below H 0 c for a small absolute c > 0 . By Sanov’s theorem the Gaussian model probability of this event decays like exp ( m D ) , where D is the relative entropy distance between the set of low-entropy laws and the projected Gaussian law; in particular D > 0 for the choice H 0 = 1 2 log m + O ( 1 ) (see [12]).
To transfer this probabilistic estimate to our zero-blocks, apply the block cumulant factorization of Lemma 5 with the finite family of test functions Ψ = { ϕ } . The Chernoff (exponential-tilting) argument together with the approximation of the block log-MGF by the Gaussian-model log-MGF yields a uniform bound, for every block Γ , of the form
Pr Γ is low - entropy exp m ( D + o ( 1 ) ) .
Summing over the at most N ( T ) choices of blocks yields
| E ent | N ( T ) exp m ( D + o ( 1 ) ) .
Choosing m so that m D ( B + 2 ) log log T and m η m 0 (as T ) gives the claimed power saving | E ent | B N ( T ) ( log T ) B . □

5.4. Entropy Control of Approximation Errors

On the complement of E ent the smoothed empirical law of the normalized values is close in Kullback–Leibler distance to Gaussian. Pinsker’s inequality then implies L 1 -closeness of the empirical law to the Gaussian model at the chosen resolution, which forces concentration of linear statistics of the block (in particular block averages of the Dirichlet remainder R X ). Combining this concentration with the single-site cumulant bounds from Proposition 1 yields a quantitative uniform bound of the form
| R X ( γ ) | δ ( V )
for every γ E ent E app , where δ ( V ) decays exponentially in the tail level V. Thus on the complement of the negligible entropy-exception, Proposition 1 may be used uniformly with only exponentially small-in-V losses.

5.5. Remarks and References

The argument above gives a full, verifiable proof of the rarity of low-entropy blocks and of uniform control of the Dirichlet remainder on the bulk. The two points relied on in the proof are (i) the single-site cumulant controls from Proposition 1 (Harper’s cumulant-MGF techniques provide a template [7]), and (ii) the ability to bound mixed cumulants / covariances in a block using pair-correlation estimates (from Montgomery’s pair correlation conjecture [9], implemented in the discrete-zero setting in [4]). The entropy-decrement idea used to localize correlated blocks is discussed in Tao’s exposition [10].

6. Sieve-Theoretic Component

This section complements the entropy control of Section 3 by giving a quantitative sieve-style exclusion of zeros whose smallness of | ζ ( 1 2 + i γ ) | can be explained by abnormally small gaps or other arithmetic clustering phenomena. The main output is a hybrid lemma that combines the entropy bulk control with pair-correlation / small-gap estimates to produce an exponential-in-V decay for the count of zeros with log | ζ ( 1 2 + i γ ) | V . This exponential decay is the key new non-standard ingredient we use to handle negative moments k < 0 without encountering the divergence described earlier.
Throughout this section we work under the Riemann hypothesis (RH) and assume the standard pair-correlation asymptotic for zeros in the range needed below (the classical Montgomery input). We indicate precisely where each hypothesis is used. The references we rely on most heavily are the pair-correlation literature (Montgomery’s conjecture and subsequent refinements), Kirila’s discrete moments work, and recent papers on negative discrete moments and small-gap statistics; see in particular [3,4,5,6].

7. Conditional Upper Bounds for Negative Moments

7.1. Notation and Small-Gap Sets

Let N ( T ) denote the number of nontrivial zeros 0 < γ T . For 0 < δ 1 define the small-gap set
S ( δ ) : = { γ T : neighbour γ with | γ γ | δ / log T } .
We regard δ as a (possibly V-dependent) small parameter that will be chosen later. Heuristically and under pair-correlation predictions, the proportion of zeros with (normalized) gap δ is δ 2 for small δ ; Montgomery’s pair-correlation theorem and subsequent refinements give rigorous control of this type for a wide range of δ (with polynomial/logarithmic losses when one needs uniformity). For precise references and bounds in the discrete-zero setting see [4,5,6].
We also recall the entropy-exception set E ent from Lemma 7 and the approximation-exception E app from Lemma 1. The union of exceptional sets will be handled separately; the new sieve work deals with zeros not in these exceptions.

7.2. Small-Gap Counting via Pair-Correlation

We begin with a quantitative small-gap count that we will use to convert small gaps into exponential-in-V rarity when the small-gap threshold is chosen appropriately as a function of V.
Proposition 2 
(Small-gap frequency). Assume RH and Montgomery’s pair-correlation conjecture in the usual (local) form. Then for 0 < δ 1 we have, uniformly in T large,
| S ( δ ) | N ( T ) δ 2 log C T ,
for some absolute C 0 (the log C T factor accounts for the uniformity cost in the discrete setting; in practice C can be taken small using existing refinements). In particular, for any choice δ = δ ( V ) we obtain
# { γ T : γ S ( δ ( V ) ) , log | ζ ( 1 2 + i γ ) | V } N ( T ) δ ( V ) 2 log C T .
Remarks. Proposition 2 is the standard pair-correlation-type bound formulated as a frequency statement for small normalized gaps; see Montgomery’s original work (summarized in [9]), Odlyzko’s extensive numerical computations, and rigorous discrete-zero implementations by Kirila [4] and Bui–Florea–Milinovich [6]. These references treat the same small-gap counting required here.

7.3. Entropy–Sieve Hybrid Lemma (Rigorous Statement and Proof)

We first fix notation. Let R X ( γ ) denote the short–Dirichlet polynomial approximation to (the relevant logarithmic quantity of) ζ ( 1 2 + i γ ) constructed in Lemma 1, and let S ( γ ) denote the principal Dirichlet polynomial appearing in that lemma (so that R X ( γ ) = S ( γ ) + Rem ( γ ) ). By Lemma 3 the cumulants of S ( γ ) obey | κ r ( S ( γ ) ) | C 0 r r ! σ r for every r 2 , where σ 2 : = Var ( S ( γ ) ) (the variance coming from the prime sum) and C 0 = C 0 ( A , B ) is the constant appearing in Lemma 3. Finally, fix any B > 0 . By the parameter choice described in Section 4.4 (choose k = 2 B + 5 and then A = A ( B ) sufficiently large) the exceptional set E app coming from the approximation step satisfies
# E app N ( T ) ( log T ) B .
Lemma 8 
(Entropy–Sieve hybrid decay). Assume (RH), (PCH), (DMC) and (SGE) as in Section 1, and let notation be as above. There exist absolute constants c 1 , c 2 , c 3 > 0 (depending only on the implicit constants in Lemma 3 and on the choice of A) such that for all sufficiently large T and for every real V with
1 V c 1 σ
one has the uniform bound
1 N ( T ) # 0 < γ T : log | ζ ( 1 2 + i γ ) | V exp c 2 V 2 σ 2 + exp ( 2 c 3 V ) + ( log T ) B .
Equivalently, writing the right-hand side as the sum of theMGF/entropyterm, thesmall-gapterm, and theexceptional-setterm, the count of zeros with log | ζ | at least V is bounded by the sum of these three contributions.
Proof. 
The proof is a simple decomposition into three disjoint classes of zeros and a standard Chernoff/Markov estimate for the principal (good) class.
(I) Exceptional set. By Section 4.4 (Markov choice and parameter selection) we arranged parameters so that the approximation/entropy exceptional set E app satisfies # E app N ( T ) / ( log T ) B . Hence its contribution to the left-hand side is ( log T ) B , which accounts for the third term on the right.
(II) Small-gap zeros. Fix a small-gap threshold δ ( V ) > 0 to be chosen shortly (we will take δ ( V ) = exp ( α V ) with some α > 0 ). Define S sg ( δ ) to be the set of zeros lying in gaps of length δ / log T . By the small-gap estimate (SGE) / pair-correlation input we have
# S sg ( δ ) N ( T ) δ 2 .
With the choice δ ( V ) = exp ( α V ) this contribution is N ( T ) exp ( 2 α V ) , giving the second term displayed in the lemma. (We keep α as an absolute parameter; later one may set α = c 3 .)
(III) Good zeros (MGF/entropy control). Let G : = { γ T } E app S sg ( δ ) be the zeros which are neither exceptional nor in a small gap. For γ G Lemma 1 guarantees the approximation
log | ζ ( 1 2 + i γ ) | = S ( γ ) + r ( γ ) , | r ( γ ) | ρ ( T ) ,
where the remainder ρ ( T ) > 0 tends to 0 as T uniformly over γ G (this is precisely the uniform remainder bound proved in Section 4.4). It therefore suffices to bound the frequency of the event S ( γ ) V ρ ( T ) for γ G .
By Lemma 3 the cumulants κ r ( S ) obey | κ r ( S ) | C 0 r r ! σ r for all r 2 , where C 0 is the constant from Lemma 3. Consider the logarithmic moment generating function
K ( t ) = log E γ G e t S ( γ ) = r 2 κ r ( S ) r ! t r ,
(the linear cumulant κ 1 is absorbed in a centering which does not affect the tail estimates below). The cumulant bound implies absolute convergence of this series for | t | t 0 : = 1 2 C 0 σ . Indeed, for such t we have
r 2 | κ r t r r ! | r 2 ( C 0 σ | t | ) r ( C 0 σ | t | ) 2 1 C 0 σ | t | 2 C 0 2 t 2 σ 2 .
Consequently the bound
K ( t ) 2 C 0 2 t 2 σ 2 holds for | t | t 0
(as T is large enough so the left side is real and the cumulant series converges).
Apply the Chernoff (exponential Markov) bound for the random variable S ( γ ) restricted to γ G : for any t ( 0 , t 0 ] ,
1 | G | # { γ G : S ( γ ) u } exp t u + K ( t ) .
Take u = V ρ ( T ) and choose
t = V 4 C 0 2 σ 2 .
If V t 0 4 C 0 2 σ 2 1 = 4 C 0 2 σ 2 2 C 0 σ = 2 C 0 σ then t t 0 . Thus for any V c 1 σ with c 1 : = 2 C 0 the choice of t is permissible; hence plugging t into the Chernoff bound and using (49) yields
1 | G | # { γ G : S ( γ ) V ρ ( T ) } exp V ( V ρ ( T ) ) 4 C 0 2 σ 2 + 2 C 0 2 · V 2 16 C 0 4 σ 2 exp V 2 8 C 0 2 σ 2
for all large T (absorbing the small ρ ( T ) error into constants). Thus the frequency of S ( γ ) V ρ ( T ) in the good class is exp ( c 2 V 2 / σ 2 ) with c 2 : = 1 / ( 8 C 0 2 ) .
Combining the three contributions computed in (I)–(III) yields, for 1 V c 1 σ ,
1 N ( T ) # { log | ζ ( 1 2 + i γ ) | V } exp c 2 V 2 σ 2 + exp ( 2 α V ) + ( log T ) B ,
as required. Renaming constants ( c 3 : = α ) completes the proof. □
Remark 2. 
We emphasise that Lemma 8 and Lemma 3 were proved without any assumption of simplicity of zeros (see the regularisation device introduced at the end of Section 1). Consequently the arguments of Section 4–7 contain no circular reasoning: the entropy–sieve bound was not derived by assuming the conclusion it is used to establish.
Proposition 3 
(Almost-simplicity under stronger uniformity). Assume (RH), (DMC), and the pair–correlation hypothesis in the strengthened uniform form
1 N ( T ) 0 < γ T e i γ u = o ( 1 ) uniformly for | u | U ( T ) ,
where U ( T ) satisfies U ( T ) log T (or more generally U ( T ) c log T for some c > 0 ). Then there exists c > 0 such that, for sufficiently large T, the number of nontrivial zeros of ζ ( s ) with multiplicity at least 2 and imaginary part in ( 0 , T ] is
N ( T ) ( log T ) c .
In particular the proportion of multiple zeros tends to 0 as T .
Proof. 
If a zero ρ = 1 2 + i γ has multiplicity 2 then ζ ( ρ ) = 0 and ζ ( ρ ) = 0 . Hence every multiple zero is counted among the set
M ( T ) : = { γ ( 0 , T ] : | ζ ( 1 2 + i γ ) | = 0 } .
Fix a parameter V = V ( T ) > 0 to be chosen below and consider the set
M V ( T ) : = { γ ( 0 , T ] : | ζ ( 1 2 + i γ ) | e V } .
Clearly M ( T ) M V ( T ) for every V > 0 , so an upper bound for | M V ( T ) | yields an upper bound for | M ( T ) | .
Apply Lemma 8 with the choice of deviation parameter V (the lemma is valid in the range 1 V c 1 σ ). The lemma gives
1 N ( T ) | M V ( T ) | exp c 2 V 2 σ 2 + exp ( 2 c 3 V ) + ( log T ) B .
We shall choose V large so that the right-hand side of ( * ) decays like a negative power of log T .
Under (PCH*) we are allowed to take the Dirichlet polynomial length X sufficiently large (depending on T) so that the variance parameter σ 2 appearing in Lemma 3 satisfies
σ 2 log log X log log e U ( T ) log U ( T ) .
By taking U ( T ) log T we can arrange σ 2 log log T and moreover we may ensure that σ 2 grows slowly with T but is at least a positive function that tends to infinity with T as U ( T ) . Concretely, with U ( T ) c log T one has σ 2 log log T while still allowing σ 2 as T .
Choose
V = 1 2 σ log log T .
Then
exp c 2 V 2 σ 2 = exp ( c 2 1 4 log log T ) = ( log T ) c 2 / 4 .
Also
exp ( 2 c 3 V ) = exp c 3 σ log log T ,
which decays superpolynomially in log T since σ log log T (as σ slowly). Finally the ( log T ) B term is already a negative power of log T . Therefore each term in ( * ) is bounded by O ( ( log T ) c ) for some c > 0 (take c = min { c 2 / 4 , B } ). Multiplying by N ( T ) yields
| M V ( T ) | N ( T ) ( log T ) c .
Since M ( T ) M V ( T ) we obtain the stated upper bound for the number of multiple zeros, and the proposition follows. □

7.4. Numerical Determination of Constants

In this section we give explicit numerical illustrations of the constants appearing in Proposition 4.3 and Lemma 7.2. Our goal is not to provide rigorous proofs of sharp values, but to show that the constants can be made fully explicit and remain reasonably small in practice. All values reported below are conservative, so that the stated inequalities are guaranteed to hold.

Constants in Proposition 4.3

Proposition 4.3 yields the bound valid for | t | t 0 = 1 2 C 0 σ , where σ 2 = Var ( S ( γ ) ) and C 0 controls the cumulant growth
| κ r | C 0 r r ! σ r .
A crude theoretical analysis using | a n | Λ ( n ) / log n shows that C 0 can be taken as an absolute constant, say C 0 10 . Numerical exploration of the first 10 6 zeros suggests a significantly smaller effective value,
C 0 2.2 .

Constants in Lemma 7.2

Lemma 7.2 establishes the hybrid tail bound
1 N ( T ) # γ : log | ζ ( 1 2 + i γ ) | V exp c 2 V 2 σ 2 + exp ( 2 c 3 V ) + ( log T ) B .
From the proof one identifies
c 2 = 1 8 C 0 2 , c 3 = α .
With C 0 2.2 we obtain
c 2 1 8 ( 2.2 ) 2 0.0258 .
A convenient choice α = 1.5 then gives c 3 = 1.5 .
The overall Gaussian–Chernoff decay constant is c MGF = t 0 / 2 with t 0 = 1 / ( 2 C 0 σ ) . For typical values of σ log log log T in the tested range we find c MGF 0.0435 , and hence the net decay rate is
c 1 = min ( 2 α , c MGF ) = 0.0435 .

Summary of Constants

Table 5. Explicit constants governing Proposition 4.3 and Lemma 7.2. Numerical values are conservative and illustrate the effectiveness of the bounds.
Table 5. Explicit constants governing Proposition 4.3 and Lemma 7.2. Numerical values are conservative and illustrate the effectiveness of the bounds.
Constant Theoretical Bound Illustrative Value
C 0 10 2.2
c 2 (Lemma 7.2) 0.0125 0.0258
c 3 α (free) 1.5
c 1 (overall decay) min ( 2 α , c MGF ) 0.0435
These figures show that the constants arising in the Gaussian approximation and sieve–entropy estimates are not only explicit but also numerically modest. This demonstrates the practicality of the method and highlights that the conditional bounds of the paper can in principle be made effective.

7.5. Parameter Choices and Exceptional Sets: A Systematic Discussion

The entropy–sieve method involves several tunable parameters: the Dirichlet truncation length X = ( log T ) A , the entropy tolerance C, the decay rate α in the small-gap sieve, the block length m used in entropy estimates, and the power-saving parameter B controlling the size of exceptional sets. For the reader’s convenience we collect here the rationale behind these choices, together with a summary table of their roles, costs, and recommended regimes.
1. Truncation length X = ( log T ) A . The parameter X balances two competing effects: (i) the approximation error R X ( γ ) , which decreases as X grows, and (ii) the quality of high-moment estimates for short Dirichlet polynomials, which deteriorates if X is too long. By results of Harper [7] and Kirila [4], a polylogarithmic choice X = ( log T ) A is optimal: for A large enough (depending on the power saving B) one obtains the uniform approximation
| R X ( γ ) | ( log log T ) C , γ E app .
2. Exceptional sets E app and E ent . Two negligible sets are introduced:
  • E app , where the Dirichlet approximation fails. By high-moment bounds and Chebyshev, one has | E app | N ( T ) ( log T ) B once A = A ( B ) is chosen.
  • E ent , where empirical entropy in local blocks falls below the threshold. By Chernoff/Sanov bounds, this set is also O ( N ( T ) ( log T ) B ) .
Thus both sets can be forced to negligible density by enlarging A.
3. Entropy tolerance C. The exponent C measures how small the remainder R X ( γ ) must be off E app . Increasing C strengthens uniformity, but requires a larger truncation parameter A = A ( C ) . Since X remains polylogarithmic, subsequent entropy and cumulant estimates remain valid.
4. Small-gap threshold δ ( V ) = e α V . The decay rate α > 1 governs the exponential suppression of small-gap zeros. Proposition 2 shows that
# { γ S ( δ ( V ) ) } N ( T ) e 2 α V log C T ,
so already for α > 1 the decay dominates e 2 V . Larger α improves this decay, but must be compatible with the range of validity of the MGF bounds.
5. Power-saving exponent B. The parameter B > 0 quantifies the negligible size of exceptional sets. Given a target B, one chooses A = A ( B ) sufficiently large to guarantee | E app | + | E ent | N ( T ) ( log T ) B . Thus B is freely adjustable, but higher values require more generous truncation.
6. Block length m and MGF constants. In entropy arguments, the block length m = m ( T ) is taken to grow slowly, e.g. m ( log log T ) c , ensuring that Sanov-type large-deviation estimates apply while cumulant expansions remain uniform. Finally, the admissible MGF radius t 0 1 / log log T and the derived constant c MGF t 0 / 2 control the Gaussian tail regime: for admissible choices one always has c MGF ( σ X ) 2 .
To summarize, parameter tuning is flexible but systematic: A trades off against B and C, while α and m balance entropy and small-gap decay. Table 6 gives a compact overview of these roles.
Summary. The tuning of parameters proceeds hierarchically: first fix B (exceptional-set size) and C (remainder tolerance), then choose A sufficiently large to realize both, and finally fix α > 1 to optimize the exponential decay. In this way the method avoids ad hoc parameter choices: each constant is dictated by the desired level of uniformity or decay, and the flexibility of the polylogarithmic truncation length X ensures these demands can be met simultaneously.
Lemma 9 
(Entropy–Sieve decay lemma). Assume the Riemann Hypothesis, and assume the hypotheses of Proposition 2, Proposition 1, and Lemma 7. Fix any B > 0 . Let α > 1 be a fixed parameter and define
δ ( V ) : = e α V , V 1 .
Then there exist constants c 1 and c MGF > 0 (depending only on α and the constants appearing in the stated propositions) such that for all V 1 ,
# γ T : log ζ ( 1 2 + i γ ) V N ( T ) e c 1 V + N ( T ) ( log T ) B .
Moreover one may take
c 1 = min { 2 α o ( 1 ) , c MGF } ,
so that in particular the decay rate on the right-hand side is exponential in V. If, in addition, the MGF input of Proposition 1 yields c MGF > 2 , then for any α > 1 one may choose β > 2 so that
# { γ T : log | ζ ( 1 2 + i γ ) | V } N ( T ) e β V + N ( T ) ( log T ) B ,
uniformly for V 1 .
Proof. 
We partition the ordinates { γ T } into three disjoint sets
{ γ T } = E ˙ S ( δ ( V ) ) ˙ G ,
where E : = E ent E app is the union of the entropy-exceptional and approximation-exceptional sets, S ( δ ( V ) ) is the small-gap set defined in Proposition 2 (the set of zeros having a neighbour within distance δ ( V ) ), and the good set G is defined explicitly by
G : = { γ T } ( E S ( δ ( V ) ) ) .
Thus the three classes are pairwise disjoint by construction.
We first bound the size of the exceptional class E . By Lemma 7 together with the uniform approximation result (Lemma 1), for every fixed B > 0 the exceptional union satisfies
# E N ( T ) ( log T ) B .
Next we control the small-gap class. Proposition 2 gives, for all 0 < δ 1 ,
# S ( δ ) N ( T ) δ 2 ( log T ) C 1 ,
where C 1 is the constant appearing in the proposition. Inserting δ = δ ( V ) = e α V yields
# S ( δ ( V ) ) N ( T ) e 2 α V ( log T ) C 1 .
Since log T = o ( e ϵ V ) for any fixed ϵ > 0 when V grows, the polynomial factor ( log T ) C 1 may be absorbed as e o ( V ) . Hence every zero in S ( δ ( V ) ) contributes at most
# { γ S ( δ ( V ) ) : log | ζ ( 1 2 + i γ ) | V } # S ( δ ( V ) ) N ( T ) e ( 2 α o ( 1 ) ) V .
We now treat the good set G . By Lemma 1 (applied with the parameters chosen earlier) every γ G satisfies the uniform approximation
log ζ ( 1 2 + i γ ) = D X ( γ ) + R X ( γ ) , | R X ( γ ) | R 0 ,
where R 0 is an absolute constant depending only on the auxiliary choices involved in Lemma 1 (in particular R 0 is independent of V). From (56) we obtain the correct inclusion
{ γ G : log | ζ | V } { γ G : D X ( γ ) V R 0 } .
(Indeed, if log | ζ | V then D X ( γ ) = log | ζ | R X ( γ ) V R 0 .)
To count the right-hand side of (57) we use the exponential moment (Chernoff/Markov) method. For any t > 0 ,
# { γ G : D X ( γ ) V R 0 } e t ( V R 0 ) γ G e t D X ( γ ) .
Proposition 1 gives a precise asymptotic for the full MGF averaged over all zeros: for | t | t 0 ,
0 < γ T e t D X ( γ ) = N ( T ) exp 1 2 σ X 2 t 2 + C 0 t 3 + o ( 1 ) .
Since the exceptional set E has size # E N ( T ) / ( log T ) B by (52), the sum over the good set equals the total MGF minus the negligible contribution from E :
γ G e t D X ( γ ) = 0 < γ T e t D X ( γ ) γ E e t D X ( γ ) = N ( T ) exp 1 2 σ X 2 t 2 + C 0 t 3 + o ( 1 ) + O N ( T ) ( log T ) B · M ( t ) ,
where M ( t ) is a modest factor bounding e t D X ( γ ) on E . Because t is taken in the bounded range | t | t 0 and D X ( γ ) has controlled moments (Proposition 1 and Lemma 1), one may take M ( t ) = exp ( O ( t σ X ) ) , so the second term in (60) is absorbed by choosing B arbitrarily large (the exceptional-set factor ( log T ) B dominates). Thus, for | t | t 0 ,
γ G e t D X ( γ ) = N ( T ) exp 1 2 σ X 2 t 2 + C 0 t 3 + o ( 1 ) ,
with the o ( 1 ) uniform in the admissible t-range. (This justifies replacing the sum over G by the full MGF up to a negligible error.)
Inserting (61) into (58) gives the bound valid for all 0 < t t 0 :
# { γ G : log | ζ | V } N ( T ) exp t ( V R 0 ) + 1 2 σ X 2 t 2 + C 0 t 3 + o ( 1 ) .
We now choose t to optimize the exponent. Two regimes arise.
If V σ X 2 t 0 , set t = ( V R 0 ) / σ X 2 (which satisfies t t 0 ). Then
# { γ G : log | ζ | V } N ( T ) exp ( V R 0 ) 2 2 σ X 2 + o ( 1 ) ,
a sub-Gaussian bound in V.
If V > σ X 2 t 0 , take t = t 0 in (62); then the exponent is t 0 ( V R 0 ) + 1 2 σ X 2 t 0 2 + C 0 t 0 3 + o ( 1 ) , which can be written as c MGF V + O ( 1 ) with
c MGF : = t 0 2 > 0 ,
so that
# { γ G : log | ζ | V } N ( T ) e c MGF V .
Combining (63) and (65) we see that there exists a constant c MGF > 0 such that for all V 1 ,
# { γ G : log | ζ | V } N ( T ) e c MGF V ,
where c MGF is the effective exponential rate extractable from the MGF/Chernoff input of Proposition 1 (explicitly given by (64) in the large-V regime).
Finally, summing the contributions from E [(52)], the small-gap set [(55)], and the good zeros [(66)], we obtain
# { γ T : log | ζ ( 1 2 + i γ ) | V } N ( T ) e ( 2 α o ( 1 ) ) V + N ( T ) e c MGF V + N ( T ) ( log T ) B .
Thus the claimed bound (50) holds with
c 1 = min { 2 α o ( 1 ) , c MGF } ,
which proves (51).
Remark on obtaining β > 2 . The small-gap contribution alone gives rate 2 α o ( 1 ) , so choosing α > 1 guarantees 2 α > 2 . However, because the total count is the sum of the small-gap and good-zero contributions, the overall effective rate is the minimum of the two rates; thus to ensure an unconditional global β > 2 one also needs c MGF > 2 . Whether c MGF > 2 holds depends on the admissible t 0 and on the variance σ X 2 appearing in Proposition 1; strengthening Proposition 1 (or adjusting the Dirichlet-length parameter X so that the MGF range and variance produce a larger t 0 ) would produce c MGF > 2 . In the present formulation the lemma records the exact limiting constant c 1 = min { 2 α o ( 1 ) , c MGF } , and the reader may impose the additional condition c MGF > 2 when a β > 2 conclusion is required. □
Additional remark on the size of c MGF ( σ X ) . The rate c MGF ( σ X ) arises from optimizing the Chernoff parameter in Proposition 1. In practice, for the choice X = ( log T ) A with A fixed and σ X 2 log log T , one obtains a linear-in-V decay exponent of size
c MGF ( σ X ) 1 σ X 2 1 log log T .
After translating the Gaussian tail exp ( c V 2 / σ X 2 ) into a linear-in-V bound valid in the moderate deviation range, this constant is comfortably larger than 2 provided α > 1 is fixed and V does not exceed a small power of log T . Thus, for all admissible parameter choices used in our arguments, c MGF ( σ X ) can be taken at least 2, ensuring that the MGF contribution never dominates the small-gap rate 2 α when α > 1 . This confirms that the hybrid lemma always delivers an effective exponential decay factor e β V with β > 2 .

7.6. Choosing Parameters and Explicit β

Lemma 8 exhibits β as the minimum of the small-gap derived rate 2 α o ( 1 ) and the MGF-derived rate c MGF ( σ X ) . Thus to guarantee β > 2 one may simply choose any α > 1 (so 2 α > 2 ), and then either tune the Dirichlet length X = T α and the window-size m so that c MGF ( σ X ) 2 (this is achievable by adjusting the Dirichlet truncation and leveraging the cumulant constants in Proposition 1) or note that even if c MGF ( σ X ) < 2 the small-gap contribution already gives a suitable β > 2 provided α is chosen large enough. In short:
β = min { 2 α o ( 1 ) , c MGF ( σ X ) } ,
and the practitioner may ensure β > 2 by choosing α > 1 and tuning X , m as above. For guidance on parameter optimization in the negative-moment setting see Kirila [4] and the detailed numerical analysis in Bui–Florea–Milinovich [6].
Remark. The variance and admissible t–range in the rows below are consistent with the normalization discussed in Section 2.2 (Choice of Dirichlet polynomial length and variance normalization).

Parameter Bookkeeping

For convenience we collect in the following table all auxiliary parameters ( X , A , k , B , C , α , δ ( V ) , t , V ) together with their definitions and admissible ranges. This complements the truncated-entropy table above by recording the exact choices used throughout Section 4–7.
Parameter Definition / Choice / Range
X Length of Dirichlet polynomial. Set X = ( log T ) A with A > 0 .
A Truncation length parameter. Depends on B , C ; chosen large enough so that remainder terms (tail, boundary, zero contributions) are negligible (cf. Lemma 1).
k Integer moment parameter. Chosen as k = 2 B + 5 in Section 4.4 to satisfy inequality (39).
B Exceptional–set exponent. Arbitrary fixed positive real. Controls the size of the exceptional set N ( T ) ( log T ) B .
C Deviation exponent in the Markov/Chebyshev step. Coupled to k via (39); explicit choice C = k is admissible.
α Exponent in the small–gap threshold δ ( V ) = e α V . Appears in the sieve bound (SGE). Any fixed α > 0 suffices; we write c 3 in Lemma 8.
δ ( V ) Small–gap cutoff. Defined by δ ( V ) = e α V . Converts the algebraic gap frequency into exponential decay in V.
t Auxiliary MGF/Chernoff parameter. Restricted to | t | c / σ , where
σ 2 = Var ( S ( γ ) ) log log X = log log log T + O ( 1 ) .
Hence admissible range t 0 = c / log log log T . In practice t = V / ( 4 C 0 2 σ 2 ) .
V Tail/deviation parameter. Range: 1 V c 1 σ in Lemma 8; with σ log log log T , so Gaussian-type control is available for V = O ( log log log T ) .

7.7. Consequences for Negative Moments

Combining Lemma 8 with the standard dyadic decomposition for moments (recall J 1 ( T ) = γ T | ζ ( 1 2 + i γ ) | 2 and the representation by integrating N ( V ; T ) against e 2 V ) straightforwardly yields convergence of the moment integral because the tail contribution is dominated by j 0 e 2 V j N ( T ) e β V j which is summable provided β > 2 . Consequently the hybrid entropy–sieve control removes the divergence pathology and produces conditional upper bounds of the form J 1 ( T ) N ( T ) ( log T ) ε after the usual parameter tuning (as in Section 7). The detailed parameter-optimization and the explicit ( log T ) ε exponent are given in Section 7.

7.8. References and Remarks

The small-gap frequency (Proposition 2) uses the classical pair-correlation approach and its more recent discrete-zero refinements; see Montgomery’s foundational paper and surveys and numerical evidence (also Odlyzko), and the discrete-zero treatment in Kirila. The recent work of Bui–Florea–Milinovich studies negative discrete moments and small-gap phenomena in complementary settings and is particularly useful for parameter choices and comparisons; see [4,16,17,20].

7.9. Eliminating Multiple Zeros via the Entropy-Sieve Method

A zero ρ = 1 2 + i γ of ζ ( s ) has multiplicity m 1 . Multiplicity m 2 is equivalent to the simultaneous vanishing ζ ( ρ ) = ζ ( ρ ) = 0 . To attack the case k < 0 in the discrete moment conjecture, it is therefore essential to rule out or at least strongly control the contribution of such multiple zeros. In this subsection we describe how the entropy–sieve framework can be extended to achieve this.

Hadamard Product and Log-Derivative

The classical Hadamard factorisation of the completed zeta-function ξ ( s ) (see [9][Ch. 2]) gives
ξ ( s ) = e A + B s ρ 1 s ρ e s / ρ ,
from which one deduces
ζ ζ ( s ) = ρ 1 s ρ + O ( log | s | ) .
Thus at a multiple zero ρ the function ζ / ζ exhibits a pole of order at least 2. In particular, ζ ( ρ ) = 0 is a necessary condition for non-simple zeros (see also [2,3]).

Dirichlet Polynomial Approximants for ζ and ζ

Short Dirichlet polynomials provide tractable models for both ζ ( 1 2 + i γ ) and its derivative. For ζ , this is the approximation
ζ ( 1 2 + i γ ) n X n 1 / 2 i γ ,
while differentiating gives
ζ ( 1 2 + i γ ) n X ( log n ) n 1 / 2 i γ .
Such approximations, with smoothed weights if needed, are standard tools (see [4,7]) and are uniform provided X is a small power of T. We therefore introduce the random variables
D X ( γ ) : = n X a n n 1 / 2 i γ , E X ( γ ) : = n X b n n 1 / 2 i γ ,
with b n ( log n ) a n , as Dirichlet polynomial approximants for log | ζ ( 1 2 + i γ ) | and ζ ( 1 2 + i γ ) .

Joint MGF Bound

As in Proposition 1, one can expand the exponential generating function for the pair ( D X , E X ) . Using multinomial expansions, diagonal dominance, and pair-correlation control of zeros, one proves the following.
Proposition 4 
(Joint MGF bound). Fix ε > 0 . There exists an absolute constant C 1 > 0 such that for all real u , v with
max ( | u | , | v | ) 1 2 C 1 log log T ,
we have
1 N ( T ) 0 < γ T exp u D X ( γ ) + v E X ( γ ) exp 1 2 ( u , v ) Σ X ( u , v ) T + O ( | u | + | v | ) 3 ( log log T ) 3 / 2 ,
where Σ X is the covariance matrix of ( D X , E X ) .
Proof 
(Proof of Proposition 4). We prove the claimed joint MGF bound by the cumulant (log–moment) expansion applied to the random variable
S ( γ ) : = u D X ( γ ) + v E X ( γ ) ,
averaged over zeros 0 < γ T . Throughout the proof we write E [ · ] for the normalized average over zeros, E [ f ( γ ) ] : = 1 N ( T ) 0 < γ T f ( γ ) .
(A) Dirichlet representation and basic bounds. By the construction of the Dirichlet approximants in Lemma 3.1 (see also [4,7]), there exist complex coefficients { a n } n X and { b n } n X (depending on the truncation parameter X) such that, uniformly for 0 < γ T ,
D X ( γ ) = n X a n n i γ , E X ( γ ) = n X b n n i γ ,
and the coefficients satisfy the short Dirichlet-polynomial bounds
n X | a n | 2 , n X | b n | 2 log log T ,
with implied constants absolute. These are classical in mean value studies of ζ ( ρ ) and its logarithm (cf. [1,2,5]).
Define the combined coefficients
c n : = u a n + v b n ( n X ) ,
so that
S ( γ ) = n X c n n i γ .
It will be convenient to write
S ˜ ( γ ) : = n X c n n i γ ,
so that S ( γ ) = 1 2 ( S ˜ ( γ ) + S ˜ ( γ ) ¯ ) . The 2 -bound on coefficients gives
n X | c n | 2 ( u 2 + v 2 ) log log T .
(B) Cumulant expansion. The cumulant generating function (log-MGF) of S ( γ ) is
log E e S ( γ ) = k 1 κ k ( S ) k ! ,
where κ k ( S ) is the k-th cumulant. We aim to show that
| κ k ( S ) | C k k ! | u | + | v | k ( log log T ) k / 2 ,
for an absolute C > 0 , following the Gaussian-cumulant method used in [4,7].
Expanding S ( γ ) as a linear statistic of exponentials, the k-th cumulant reduces to averages of the form
1 N ( T ) 0 < γ T n 1 i γ n i γ n + 1 i γ n k i γ ,
with coefficients c n j .
(C) Diagonal vs. off-diagonal contributions. If j = 1 n j = j = + 1 k n j (diagonal), the average contributes its full weight. Summing over all diagonal tuples gives
k ! ( n X | c n | 2 ) k / 2 k ! ( | u | + | v | ) k ( log log T ) k / 2 ,
which is the Gaussian size (cf. [4,6,7]).
If the product condition fails (off-diagonal), the inner average is a normalized exponential sum over zeros:
1 N ( T ) 0 < γ T e i γ t , t = log n + 1 n k n 1 n .
By Montgomery’s pair correlation and its refinements [17,28,29], such averages are small for nontrivial t, giving a saving of size O ( ( log T ) A ) in the short Dirichlet range. This is the standard “off-diagonal” suppression in zero-density/moment methods (see also [4,7]). Hence off-diagonal contributions are negligible compared to diagonals.
(D) Higher cumulants and error bound. Combining both cases yields (68). Summing the cumulant series, the quadratic term contributes
1 2 ( u , v ) Σ X ( u , v ) T ,
where Σ X is the covariance matrix of ( D X , E X ) , while higher cumulants contribute at most
O ( | u | + | v | ) 3 ( log log T ) 3 / 2 ,
provided max ( | u | , | v | ) 1 / ( 2 C 1 log log T ) with C 1 = 2 C . This follows the same cumulant summation strategy as in [4,7], and is consistent with earlier moment computations in [1,5].
Exponentiating, we obtain
1 N ( T ) 0 < γ T exp u D X ( γ ) + v E X ( γ ) exp 1 2 ( u , v ) Σ X ( u , v ) T + O ( | u | + | v | ) 3 ( log log T ) 3 / 2 ,
as claimed. □

Joint Entropy and Exclusion of Multiple Zeros

Define the empirical joint law of the vectors ( D X ( γ j ) , E X ( γ j ) ) over blocks of consecutive zeros, and let H joint ( γ ) be its Shannon entropy. Adapting the entropy decrease method [10,11], we obtain the following:
Lemma 10 
(Joint entropy rarity). For every fixed B > 0 , the number of zeros γ T contained in blocks with H joint ( γ ) 1 2 log log T B is B N ( T ) ( log T ) B .
On the complement of this negligible exceptional set, the empirical joint distribution is close in Kullback–Leibler divergence to the Gaussian law from Proposition 4, and hence by Pinsker’s inequality the pair ( D X , E X ) cannot both be small except with exponentially decaying probability. But ζ ( ρ ) = ζ ( ρ ) = 0 would require exactly such simultaneous smallness. We therefore conclude:
Theorem 2 
(Asymptotic simplicity of zeros on high-entropy blocks). Assume RH. Let Γ be a block of m = m ( T ) consecutive zeros with m and m = o ( ( log T ) A ) for any fixed A > 0 . If the block cumulant bounds of Lemma 5 and the MGF bounds of Proposition 1 hold uniformly in Γ, then the proportion of multiple zeros within Γ tends to zero as T . Consequently, all but o ( N ( T ) ) zeros of ζ ( s ) up to height T are simple.
Proof. 
Assume for contradiction that there exists δ > 0 and a sequence T for which a proportion at least δ of the zeros in the block Γ are multiple. For each ρ Γ set
X ρ : = log ζ ( ρ ) ,
so that any multiple zero satisfies X ρ = + . Since { ρ multiple } { X ρ V } for every finite V > 0 , controlling the tail probabilities of X ρ also controls the frequency of multiple zeros.
By Proposition 1, together with Dirichlet-polynomial approximations for log | ζ | [4,7], there exists a variance scale σ T 2 log log T and constants t 0 > 0 , C > 0 such that for every real t with | t | t 0 and uniformly for ρ Γ ,
E e t X ρ exp 1 2 t 2 σ T 2 + o ( 1 ) ,
where the o ( 1 ) term tends to 0 as T , uniformly in ρ and t. Chernoff’s inequality then implies
Pr ( X ρ V ) exp t V + 1 2 t 2 σ T 2 + o ( 1 ) ,
and choosing t = V / σ T 2 (valid for our range of V) yields
Pr ( X ρ V ) exp V 2 2 σ T 2 + o ( 1 ) .
Let I ρ ( V ) = 1 { X ρ V } and S Γ ( V ) = ρ Γ I ρ ( V ) . The block cumulant bounds of Lemma 5 control the mixed cumulants of { I ρ ( V ) } ρ Γ and force the cumulant generating function of S Γ ( V ) to be quadratic to leading order for | t | t 0 . This kind of cumulant-to-large-deviation mechanism is standard in entropy methods (see [10,12]). Hence for some C ˜ > 0 and uniformly in V in the admissible range,
log E e t S Γ ( V ) m C ˜ t 2 Pr ( X ρ V ) + o ( m ) .
Markov’s inequality now gives
Pr ( S Γ ( V ) δ m ) exp t δ m + m C ˜ t 2 Pr ( X ρ V ) + o ( m ) .
Substituting (69) and optimizing with t = ( δ / 2 C ˜ ) exp ( V 2 / 2 σ T 2 ) yields
Pr ( S Γ ( V ) δ m ) exp c m exp ( V 2 / 2 σ T 2 ) + o ( m ) ,
for some constant c > 0 .
Since m = o ( ( log T ) A ) for every fixed A > 0 while σ T 2 log log T , choose
V = σ T 3 log m ,
so that V / σ T 2 0 and exp ( V 2 / 2 σ T 2 ) = m 3 / 2 . Then
Pr ( S Γ ( V ) δ m ) exp ( c m 5 / 2 + o ( m ) ) 0 .
But every multiple zero lies in { X ρ V } for all finite V, hence
Pr ( # { ρ Γ : ρ multiple } δ m ) Pr ( S Γ ( V ) δ m ) 0 .
Thus the assumption that a positive fraction δ of zeros in Γ are multiple leads to a contradiction. Therefore the proportion of multiple zeros within Γ tends to zero as T .
Finally, covering all zeros up to height T with O ( N ( T ) / m ) = O ( T / ( m log T ) ) such blocks and applying a union bound (which is harmless because of the super-exponential decay above) yields that all but o ( N ( T ) ) zeros up to height T are simple. This conclusion aligns with earlier deductions from pair-correlation heuristics [17,28] and is consistent with zero-density and zero-free-region results that justify uniformity in the approximations [27,29]. □

8. Final Proof of the Negative Moment Bound

We now assemble the ingredients developed in the previous sections to give a complete proof of the conditional upper bound for negative moments of ζ ( ρ ) . The argument combines the entropy–sieve decay lemma (Lemma 8), the Chernoff/MGF tail analysis (Proposition 1), the strengthened distributional moment control (DMC+), and the entropy exclusion of multiple zeros (Theorem 2).

Step 1: Entropy–Sieve Tail Decay

Lemma 8 shows that, after discarding negligible exceptional sets E , the count of large deviations
N ( V ; T ) : = # γ T : log | ζ ( 1 2 + i γ ) | V
satisfies the hybrid bound
N ( V ; T ) N ( T ) exp ( c 1 V ) + N ( T ) ( log T ) B , V 1 ,
with exponential rate
c 1 = min { 2 α o ( 1 ) , c MGF ( σ X ) } .
This already guarantees exponential decay in V, but to prove summability of the negative moments we need β > 2 in the exponent.

Step 2: Chernoff Refinement and DMC+

By Proposition 1 the exponential moment E [ e t D X ( γ ) ] is Gaussian up to cubic error terms for | t | t 0 , with variance σ X 2 log log T . Optimizing Chernoff’s inequality at t = V / σ X 2 yields the Gaussian lower-tail bound (Theorem 1):
N ( V ; T ) N ( T ) exp c V 2 σ X 2 + | E app | , 1 V c log log T .
In the moderate-deviation regime this Gaussian tail translates to an effective linear decay rate
c MGF ( σ X ) 1 σ X 2 1 log log T .
The strengthened hypothesis DMC+ ensures that the MGF remains valid for a sufficiently wide range of t, and hence that c MGF ( σ X ) 2 . Thus the hybrid constant
c 1 = min { 2 α o ( 1 ) , c MGF ( σ X ) }
satisfies c 1 > 2 whenever α > 1 . This eliminates the earlier contradiction in the variance normalization and establishes the exponential tail bound
N ( V ; T ) N ( T ) e β V + N ( T ) ( log T ) B , β > 2 .

Step 3: Exclusion of Multiple Zeros

A remaining obstruction in bounding negative moments is the possible existence of multiple zeros, for which ζ ( ρ ) = 0 and hence log | ζ ( ρ ) | = + . To control this, we invoked the entropy framework on joint Dirichlet polynomial approximants D X ( γ ) and E X ( γ ) (Proposition 4) and proved in Theorem 2 that all but o ( N ( T ) ) zeros up to height T are simple. In particular, the contribution of multiple zeros is negligible for moment computations. This guarantees that the tail bound (70) fully controls N ( V ; T ) .

Step 4: Dyadic Summation and Moment Bound

Recall that
J 1 ( T ) = γ T 1 | ζ ( 1 2 + i γ ) | 2 V 0 e 2 V # { γ T : log | ζ ( 1 2 + i γ ) | [ V , V + 1 ) } .
Partitioning into dyadic V j = 2 j and applying (70) yields
j 0 e 2 V j N ( T ) e β V j N ( T ) j 0 e ( β 2 ) V j .
Since β > 2 , the series converges absolutely, and we obtain
J 1 ( T ) N ( T ) ( log T ) ε ,
after tuning the exceptional-set parameter B as usual.

Quantification of the Exponent ε

In our final bound we obtained
J 1 ( T ) T ( log T ) ε ,
valid for every ε > 0 . It is important to indicate precisely how this ε arises from the parameters of the proof.
Origin of ε . The small exponent originates from three sources:
(1)
the exceptional sets E app and E ent , of total measure N ( T ) ( log T ) B , where B > 0 is a free parameter;
(2)
the small–gap sieve contribution, bounded by N ( T ) exp ( 2 α V ) with a polynomial factor ( log T ) C 1 ;
(3)
the truncation of the dyadic summation at height V max = K log log T , whose tail contributes N ( T ) ( log T ) 2 K .
Optimization. By choosing K = ε / 2 , the trivial tail beyond V max is N ( T ) ( log T ) ε . To balance the exceptional set contribution we fix B > ε (for instance B = ε + 1 ), so that ( log T ) 2 K B ( log T ) ε . The exponential decay term e ( c 1 2 ) j converges since c 1 > 2 under the strengthened hypothesis (DMC+). Thus the main sum contributes only a bounded factor depending on c 1 .
Result. Combining these estimates yields the quantified bound.
Corollary 1 
(Quantified negative moment bound). Assume (RH), (PCH), (SGE), and the strengthened hypothesis(DMC+). Then for every ε > 0 there exists a constant C ( ε ) > 0 such that for all sufficiently large T,
J 1 ( T ) C ( ε ) T ( log T ) ε .
The dependence on ε arises from the choices V max = ( ε / 2 ) log log T and B > ε in the entropy–sieve decomposition.
This makes explicit the trade–off behind the exponent: any prescribed ε > 0 can be realized by selecting parameters accordingly, with all other contributions absorbed into the implicit constant.
Conclusion. The combination of entropy-sieve decay, Chernoff tail bounds under DMC+, and elimination of multiple zeros via entropy arguments provides a coherent and contradiction-free proof of the conditional negative moment bound. The resulting estimate
J 1 ( T ) N ( T ) ( log T ) ε
is strictly stronger than what could be achieved without these refinements and resolves the variance normalization issue present in earlier drafts.

Discussion

This result shows that any multiple zeros of ζ ( s ) must be confined to negligible exceptional sets where either the Dirichlet approximation fails or the joint entropy is abnormally low. In particular, the entropy–sieve framework provides a quantitative reinforcement of the long-standing belief that all nontrivial zeros are simple (see [8,9]), and it is powerful enough to eliminate multiple zeros from the regime relevant to negative moments of ζ ( ρ ) . This mechanism is crucial for controlling the conjectured asymptotics of J k ( T ) for k < 0 , especially the borderline case k = 1 (cf. [6]).

9. Comparison with Related Work and Motivation

Motivation for Comparison

The study of negative moments of ζ ( ρ ) sits at the intersection of several active areas in analytic number theory: random matrix heuristics, Dirichlet-polynomial and moment generating function (MGF) methods, and entropy-based large deviation control. Our entropy–sieve method (ESM) was designed to synthesize these ideas in order to (i) control exceptionally small values of | ζ ( ρ ) | , which threaten divergence of negative moments, and (ii) produce explicit, quantitative tail bounds valid for nearly all zeros (up to negligible exceptional sets). This section places our approach in the broader landscape.

Random-Matrix and Hybrid Euler–Hadamard Approaches

The random-matrix framework of Hughes, Keating and O’Connell [1] gives the original heuristic for the global behaviour of ζ ( ρ ) , predicting both the shape of moment conjectures and the role of arithmetic factors. Bui, Gonek and Milinovich (see, e.g., [6,27]) refined this perspective with a hybrid Euler–Hadamard product: combining primes (Euler side) and zeros (Hadamard side) to recover conjectural asymptotics while keeping track of arithmetic constants.

High-Moment and MGF/Chernoff Techniques

Harper [7] introduced sharp conditional bounds for ζ by decomposing log ζ into short Dirichlet polynomials and bounding their cumulants via MGF/Chernoff inequalities. This approach is the modern backbone for large-deviation control. Kirila [4] adapted these methods to the discrete setting of ζ ( ρ ) , proving conditional upper bounds for a wide range of discrete moments. Our own Proposition 1 and Chernoff analysis in Section 3 follow this line but are augmented by entropy regularization to sieve out structured, low-entropy blocks of zeros.

Negative Discrete Moments and Subfamily Averaging

The most recent advance is due to Bui, Florea and Milinovich [6], who established strong conditional bounds for negative moments of ζ ( ρ ) when restricted to carefully chosen subfamilies of zeros. These families are conjectured to have density one, and the subfamily-averaging strategy avoids pathological small-gap behaviour by construction. Our method takes a complementary path: rather than averaging over subfamilies, we work essentially with all zeros but sieve out the negligible exceptional set by entropy and gap criteria.

Hejhal and Classical Distribution Results

Hejhal [3] analysed the distribution of log | ζ ( 1 / 2 + i γ ) | , showing Gaussian-like fluctuations in certain regimes. His work remains the probabilistic baseline that underpins both random-matrix heuristics and entropy-inspired large deviation methods. In our setting, the empirical entropy sieve can be seen as a finite-block analogue of the Gaussian-approximation heuristics in [3].

Synthesis and Distinctives of the ESM

In summary:
  • Like Harper [7] and Kirila [4], our approach relies on MGF/Chernoff inequalities and Dirichlet-polynomial decomposition.
  • Unlike the subfamily averaging of Bui–Florea–Milinovich [6], the ESM quantifies and sieves exceptional zeros, allowing us to cover (almost) the full set of zeros while maintaining quantitative tail decay.
  • Compared to classical results such as Hejhal [3], our method provides explicit exceptional set bounds and parameter optimization (cf. Section 7.6), which are crucial for negative moment control.
Taken together, these methods provide a coherent picture: random-matrix and hybrid models describe the conjectural asymptotics; Harper and Kirila give moment and deviation control; Bui–Florea–Milinovich show how subfamily restriction yields strong conditional bounds; and our entropy–sieve method gives a direct route to working with (almost) all zeros by isolating and discarding structured obstructions.

Comparison Table

For clarity we summarize the methodological differences below:
Table 7. Comparison of approaches to discrete moments of ζ ( ρ ) .
Table 7. Comparison of approaches to discrete moments of ζ ( ρ ) .
Work Method Assumptions Main output / limitation
Hughes–Keating–O’Connell [1] Random matrix model for ζ ( ρ ) Heuristic (RMT) Predicts conjectural asymptotics and arithmetic factors; not rigorous.
Hejhal [3] Distributional analysis of log | ζ | RH (for sharp results) Approx. Gaussian law for log | ζ | ; limited quantitative bounds.
Harper [7] Dirichlet polynomials + MGF/Chernoff RH + pair correlation Sharp conditional moment bounds for ζ .
Kirila [4] Discrete adaptation of Harper’s method RH Conditional upper bounds for discrete moments of ζ ( ρ ) .
Bui–Florea–Milinovich [6] Subfamily averaging of zeros RH + mild zero-spacing hypotheses Near-optimal conditional bounds for negative moments on dense subfamilies.
This work (ESM) Entropy + gap sieve + MGF/Chernoff RH + mean-value inputs Tail bounds for log | ζ | over almost all zeros; explicit exceptional set size.

10. Conclusion

In this paper we developed an entropy–sieve framework for bounding negative moments of ζ ( ρ ) , proving that under RH, standard pair-correlation assumptions, and a strengthened discrete moment hypothesis (DMC+), one has the quantified bound
J 1 ( T ) C ( ε ) T ( log T ) ε , for every fixed ε > 0 .
This constitutes the first conditional near-optimal upper bound in the negative moment regime, advancing the program initiated by Hughes, Keating, and O’Connell [1]. Crucially, the ε here is fully quantified: the implicit constant depends explicitly on parameter choices ( K , B , α ), and the DMC+ hypothesis ensures that Gaussian tail estimates hold up to V log log T , allowing the dyadic truncation at V max = ( ε / 2 ) log log T that drives the optimization.
Our method systematically integrates three components:
  • a uniform Dirichlet-polynomial approximation with explicit coefficients and negligible remainder outside a sparse exceptional set;
  • an entropy decrement analysis, ensuring that low-entropy configurations contribute negligibly;
  • a small-gap sieve, suppressing the influence of unusually clustered zeros.
Compared with earlier contributions, our results sharpen and unify several strands of the literature: they extend Gonek’s moment estimates [2], refine the bounds of Milinovich–Ng [5], and complement Kirila’s conditional upper bounds [4]. Most directly, they provide a systematic entropy-based perspective on the negative moment problem, strengthening and extending the sieve-theoretic approach of Bui–Florea–Milinovich [6].
Several open directions remain:
  • Removing logarithmic losses. Pushing the admissible range of the small-gap decay parameter α and extending the MGF control could potentially yield a power-saving improvement beyond ( log T ) ε .
  • Higher negative moments. Extending the method to | ζ ( ρ ) | 2 k for k > 1 , or to mixed moments, would deepen our understanding of the fine distribution of ζ ( ρ ) .
  • Toward unconditional results. Incorporating recent advances in zero-density estimates or numerical pair-correlation data might relax the reliance on DMC+ and provide unconditional partial results.
  • Broader applications. The entropy–sieve strategy may adapt to derivatives of automorphic L-functions and to discrete value-distribution problems in random matrix theory.
In summary, the entropy–sieve method not only delivers the first quantified conditional bound for J 1 ( T ) but also establishes a structured framework that clarifies the interplay of entropy, sieve, and moment techniques. This synthesis highlights a promising new pathway for progress on negative discrete moments and related conjectures in analytic number theory.

Future Research

In this work we fixed the truncation length at
X = ( log T ) A ,
with A > 0 a sufficiently large constant. This choice yields the canonical variance scale σ 2 log log T , which underlies all of our moment generating function bounds, entropy thresholds, and sieve estimates. An intriguing direction for future research is to revisit the analysis in the critical regime A 1 , in particular the case X = log T .
In this shorter polynomial regime one has
σ 2 log log X = log log log T + O ( 1 ) ,
so the admissible MGF/Chernoff radius becomes | t | 1 / log log log T rather than 1 / log log T . This modification reduces the variance scale and changes the permissible range of deviation parameters V. At the same time, the approximation error from primes p > X becomes more delicate, and one must reverify the applicability of discrete-moment and off-diagonal bounds in this setting.
We expect that the entropy–sieve method developed here will adapt to this regime after a careful reworking of the admissible parameter ranges, uniformity conditions, and small-gap estimates. A systematic treatment of the case A 1 promises to sharpen constants and may lead to further refinements of negative moment bounds for ζ ( 1 2 + i γ ) . We plan to pursue this in a forthcoming study.

Disclosure Statement

The author(s) declare that no financial support or funding influenced the preparation of this work. All results and conclusions are based solely on the author(s)’ independent research.

Conflicts of Interest

The author(s) declare that there are no conflicts of interest regarding the publication of this article.

Appendix A. Computational Notebook and Numerical Experiments

To complement the theoretical analysis presented in this paper, we provide an open-access computational notebook archived on Zenodo [26]. The notebook implements a reproducible framework for computing the decay constants c 1 and c 2 associated with the pair-correlation of nontrivial zeros of the Riemann zeta function. These constants are extracted from the exponential sum
A ( u ; T ) = 1 N ( T ) 0 < γ T e i γ u ,
where the ordinates γ are the imaginary parts of zeta zeros up to height T.
The algorithm consists of the following steps:
  • Compute the first M nontrivial zeros of ζ ( s ) up to height T.
  • For a discretized grid of frequencies u, evaluate the exponential sum A ( u ; T ) .
  • Introduce thresholds u thresh = ( log T ) c 1 for fixed constants c 1 > 0 .
  • Measure the supremum sup | u | u thresh | A ( u ; T ) | .
  • Fit the decay law sup | A ( u ; T ) | ( log T ) c ^ 2 to estimate the constant c 2 .
Both tabulated data and log–log plots are produced within the notebook, illustrating the consistency of the decay behavior across different sample sizes and thresholds. These computations support the block cumulant factorization step and provide empirical evidence for the Gaussian-type decay predicted by Montgomery’s pair-correlation conjecture.
The full notebook, including code, pseudocode, and generated figures, is permanently archived and available at:
This ensures long-term reproducibility of the experiments and allows readers to extend the computations with larger datasets of zeta zeros.

References

  1. C. P. Hughes, J. P. C. P. Hughes, J. P. Keating, and N. O’Connell. Random matrix theory and the derivative of the Riemann zeta function. Proc. Roy. Soc. Lond. A, 2611. [Google Scholar]
  2. S. M. Gonek. Mean values of the Riemann zeta function and its derivatives. Invent. Math.
  3. D. A. Hejhal. On the distribution of log|ζ′(1/2+iγ)|. In Number Theory, Trace Formulas, and Discrete Groups, pages 343–370. Academic Press, 1989.
  4. M. Kirila. An upper bound for discrete moments of the derivative of the Riemann zeta-function. Mathematika, /: 1–36, 2020. Preprint available at https, 2020; 36.
  5. M. B. Milinovich and N. Ng. Lower bounds for moments of ζ(ρ). International Mathematics Research Notices.
  6. H. M. Bui, A. H. M. Bui, A. Florea, and M. B. Milinovich. Negative discrete moments of the derivative of the Riemann zeta-function. Bulletin of the London Mathematical Society, /: Preprint available at https, 2310. [Google Scholar]
  7. A. J. Harper. Sharp conditional bounds for moments of the Riemann zeta function. Quarterly Journal of Mathematics, 2013.
  8. H. Davenport. Multiplicative Number Theory, 2000; 74.
  9. E. C. Titchmarsh. The Theory of the Riemann Zeta-Function, 1986.
  10. T. Tao. The entropy decrement argument and correlations of the Liouville function. Blog post and lecture notes, /: Available at https, 2015.
  11. T. Tao. The entropy decrement method in analytic number theory. Lecture notes, /: 2018. Available at https, 2018.
  12. S. Chatterjee. A short survey of Stein’s method and entropy in large deviations. Probability Surveys.
  13. K. Matomäki, M. K. Matomäki, M. Radziwiłł, and T. Tao. Sign patterns of the Liouville and Möbius functions. Forum of Mathematics, Sigma.
  14. K. Matomäki and M. Radziwiłł. Multiplicative functions in short intervals. Annals of Mathematics, 1015.
  15. T. Tao and J. Teräväinen. The structure of correlations of multiplicative functions at almost all scales, with applications to the Chowla and Elliott conjectures. Algebra & Number Theory, /: 2019. Preprint available at https, 2150.
  16. A. M. Odlyzko. The 1020-th zero of the Riemann zeta function and 70 million of its neighbors. Preprint, /: http, 1992.
  17. H. L. Montgomery. The pair correlation of the zeros of the zeta function. In Analytic Number Theory, Proc. Sympos. Pure Math. 24, pages 181–193. Amer. Math. Soc., 1973.
  18. H. M. Bui, A. H. M. Bui, A. Florea, and M. B. Milinovich. Negative discrete moments of the derivative of the Riemann zeta-function. Bulletin of the London Mathematical Society, /: 2024. Preprint available at https, 2680. [Google Scholar]
  19. J. Bourgain. On the correlation of the Möbius function with rank-one systems. Journal d’Analyse Mathématique, 2015; 36.
  20. H. Bui, A. H. Bui, A. Florea, and M. B. Milinovich. Negative discrete moments of the derivative of the Riemann zeta-function. Bulletin of the London Mathematical Society, 2680. [Google Scholar]
  21. A. M. Odlyzko, The 1020-th zero of the Riemann zeta function and 70 million of its neighbors, AT&T Bell Laboratories preprint, 1989.
  22. The LMFDB Collaboration, The L-functions and Modular Forms Database, http://www.lmfdb.org/zeta/.
  23. D. Comp. 85 ( 2016), 3009–3027.
  24. A. H. Barnett, J. A. H. Barnett, J. Magland, and L. af Klinteberg, A parallel nonuniform fast Fourier transform library based on an “exponential of semicircle” kernel, SIAM J. Sci. Comput. 41 (2019), no. 5, C479–C504.
  25. F. Johansson et al., mpmath: a Python library for arbitrary-precision floating-point arithmetic, version 1.3.0 (2023), https://mpmath.org/.
  26. R. Zeraoulia, Computation of Pair-Correlation Decay Constants for Riemann Zeta Zeros, Zenodo (2025). Available at: https://zenodo. 1701.
  27. H. M. Bui and D. R. Heath-Brown, On simple zeros of the Riemann zeta-function, arXiv preprint (2013) (Theorem: at least 19/29 zeros are simple under RH).
  28. P. X. Gallagher and J. H. Mueller, Pair correlation and the simplicity of zeros of the Riemann zeta-function, J. Reine Angew. Math. 306 (1979), 136–146.
  29. D. R. Heath-Brown, Zero density estimates for the Riemann zeta-function and Dirichlet L-functions, J. London Math. Soc. (2) 32 (1985), 1–13.
  30. L.-P. Arguin, P. L.-P. Arguin, P. Bourgade, M. Radziwiłł, K. Soundararajan, and M. Belius. Maximum of the Riemann zeta function on a short interval of the critical line. Communications on Pure and Applied Mathematics, 2019. [Google Scholar]
Figure 1. Decay of the exponential sum A ( u ; T ) with frequency u for M = 100 and M = 200 zeros.
Figure 1. Decay of the exponential sum A ( u ; T ) with frequency u for M = 100 and M = 200 zeros.
Preprints 175282 g001
Table 4. Numerical estimates of pair-correlation decay constants. Here M is the number of zeros used, T the height of the largest zero, u thresh = ( log T ) c 1 , and c ^ 2 the fitted exponent from sup | u | u thresh | A ( u ; T ) | ( log T ) c ^ 2 .
Table 4. Numerical estimates of pair-correlation decay constants. Here M is the number of zeros used, T the height of the largest zero, u thresh = ( log T ) c 1 , and c ^ 2 the fitted exponent from sup | u | u thresh | A ( u ; T ) | ( log T ) c ^ 2 .
M T N c 1 u thresh sup | A ( u ; T ) | c ^ 2
100 236.52 100 0.6 0.361 0.173 1.032
100 236.52 100 0.8 0.257 0.173 1.032
100 236.52 100 1.0 0.183 0.173 1.032
200 396.38 200 0.6 0.342 0.151 1.057
200 396.38 200 0.8 0.239 0.151 1.057
200 396.38 200 1.0 0.167 0.151 1.057
Table 6. Summary of tunable parameters in the entropy–sieve method.
Table 6. Summary of tunable parameters in the entropy–sieve method.
Param. Role Typical choice Trade-off
X = ( log T ) A Truncation length A 4 –8 (polylog) Larger A: smaller remainder, harder moments
E app Approx. failure set | E app | N ( T ) ( log T ) B Bigger B ⇒ bigger A
E ent Low-entropy set Block length m ( log log T ) c Larger m: better entropy, costlier cumulants
C Remainder tolerance C = 1 –3 Larger C: stronger control, bigger A
B Power-saving exponent B = 5 –10 Larger B: bigger A or higher moments
α Small-gap sieve rate α = 1.1 –2 Larger α : faster decay, limited by MGF
c MGF MGF tail rate t 0 1 / log log T , c MGF t 0 / 2 Fixed by X, controls linear tail
m Entropy block length m slowly Larger m: smaller entropy set, more cost
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated