Minimizing the Error in Prime Counting on Short Intervals

A Deep Dive with Python Examples

The prime counting function $\pi(x)$ counts the number of primes up to $x$. One of the most fascinating open problems in analytic number theory concerns how well we can approximate $\pi(x)$ on short intervals $[x, x+h]$, where $h$ is much smaller than $x$.

The Problem

For a short interval of length $h$, the number of primes is:

$$\pi(x+h) - \pi(x)$$

The prime number theorem predicts this should be approximately:

$$\frac{h}{\ln x}$$

The error of this approximation is:

$$E(x, h) = \left(\pi(x+h) - \pi(x)\right) - \frac{h}{\ln x}$$

Our goal: find $h$ as a function of $x$ that minimizes $|E(x, h)|$ across many short intervals, and study the distribution of these errors.


Mathematical Framework

The Riemann Hypothesis predicts that for $h \gg x^{1/2+\varepsilon}$, the relative error satisfies:

$$\frac{E(x,h)}{h/\ln x} \to 0$$

but for very short intervals (Cramér’s conjecture), variance is dominated by:

$$\text{Var}(\pi(x+h) - \pi(x)) \approx \frac{h}{\ln x}$$

This suggests a Poisson-like fluctuation with mean $\mu = h/\ln x$.

The normalized error we minimize is:

$$Z(x, h) = \frac{E(x, h)}{\sqrt{h / \ln x}}$$

We seek $h = h^*(x)$ such that, over a window of starting points $x$, the variance of $Z(x, h)$ is minimized.


Concrete Example

Let’s fix $x \in [10^5, 10^5 + 5000]$ and study errors for different interval lengths $h \in {10, 20, 50, 100, 200, 500}$.


Source Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# ============================================================
# Prime Count Short-Interval Error Minimization
# Environment: Google Colaboratory
# ============================================================

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import MaxNLocator
import warnings
warnings.filterwarnings('ignore')

# ── 1. Fast sieve ──────────────────────────────────────────
def sieve_of_eratosthenes(limit: int) -> np.ndarray:
"""Return boolean array; index i is True iff i is prime."""
is_prime = np.ones(limit + 1, dtype=bool)
is_prime[0:2] = False
for p in range(2, int(limit**0.5) + 1):
if is_prime[p]:
is_prime[p*p::p] = False
return is_prime

# ── 2. Cumulative prime count ───────────────────────────────
def build_pi(is_prime: np.ndarray) -> np.ndarray:
"""Prefix sum → π(x) for all x up to len(is_prime)-1."""
return np.cumsum(is_prime)

# ── 3. Short-interval prime count and error ─────────────────
def short_interval_error(pi: np.ndarray,
x_values: np.ndarray,
h: int) -> np.ndarray:
"""
E(x, h) = (π(x+h) − π(x)) − h/ln(x)
Vectorised over all x in x_values.
"""
observed = pi[x_values + h] - pi[x_values]
expected = h / np.log(x_values.astype(float))
return observed - expected

# ── 4. Normalised error (Z-score style) ────────────────────
def normalised_error(errors: np.ndarray, h: int,
x_values: np.ndarray) -> np.ndarray:
sigma = np.sqrt(h / np.log(x_values.astype(float)))
return errors / sigma

# ═══════════════════════════════════════════════════════════
# PARAMETERS
# ═══════════════════════════════════════════════════════════
X_START = 100_000
X_WINDOW = 5_000 # slide x over [X_START, X_START + X_WINDOW]
H_VALUES = [10, 20, 50, 100, 200, 500]
LIMIT = X_START + X_WINDOW + max(H_VALUES) + 1

# ── Build sieve once ───────────────────────────────────────
print("Building sieve …")
IS_PRIME = sieve_of_eratosthenes(LIMIT)
PI = build_pi(IS_PRIME)
x_arr = np.arange(X_START, X_START + X_WINDOW)
print(f"Sieve built. Primes up to {LIMIT:,}: {PI[LIMIT-1]:,}")

# ── Compute errors for each h ──────────────────────────────
results = {}
for h in H_VALUES:
raw = short_interval_error(PI, x_arr, h)
nrm = normalised_error(raw, h, x_arr)
results[h] = {
"raw": raw,
"norm": nrm,
"mae": np.mean(np.abs(raw)),
"rmse": np.sqrt(np.mean(raw**2)),
"norm_var": np.var(nrm),
}

# ── Summary table ──────────────────────────────────────────
print("\n{:>6} {:>10} {:>10} {:>14}".format(
"h", "MAE", "RMSE", "Var(norm_err)"))
print("─" * 46)
for h in H_VALUES:
r = results[h]
print(f"{h:>6} {r['mae']:>10.4f} {r['rmse']:>10.4f} {r['norm_var']:>14.6f}")

Code Walkthrough

sieve_of_eratosthenes builds a Boolean NumPy array in $O(N \log \log N)$ time using vectorised stride assignment — far faster than a Python loop.

build_pi converts the Boolean sieve into the cumulative prime-count function $\pi(x)$ via np.cumsum. This makes any short-interval count $O(1)$: just subtract two array elements.

short_interval_error computes the raw error $E(x,h)$ for every $x$ simultaneously. The expected count $h / \ln x$ uses NumPy’s vectorised log.

normalised_error divides by the Cramér standard deviation $\sigma = \sqrt{h / \ln x}$, producing a dimensionless $Z$-score that can be fairly compared across different $h$ values.

The summary table shows MAE (Mean Absolute Error), RMSE, and the variance of the normalised error. The $h$ that minimises $\text{Var}(Z)$ closest to 1 is the “best-behaved” interval length — it matches the Poisson prediction most closely.


Visualisation Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
# ═══════════════════════════════════════════════════════════
# FIGURES
# ═══════════════════════════════════════════════════════════
plt.style.use('seaborn-v0_8-whitegrid')
COLORS = plt.cm.plasma(np.linspace(0.15, 0.85, len(H_VALUES)))

# ── Figure 1: Raw error traces for each h ─────────────────
fig1, axes = plt.subplots(len(H_VALUES), 1,
figsize=(13, 11), sharex=True)
fig1.suptitle(r"Raw error $E(x,h) = (\pi(x+h)-\pi(x)) - h/\ln x$"
"\n" + r"$x \in [10^5,\ 10^5\!+\!5000]$",
fontsize=13, y=1.01)

for ax, h, col in zip(axes, H_VALUES, COLORS):
ax.plot(x_arr, results[h]["raw"], color=col, lw=0.6, alpha=0.85)
ax.axhline(0, color='black', lw=0.8, ls='--')
ax.set_ylabel(f"h={h}", fontsize=9)
ax.yaxis.set_major_locator(MaxNLocator(nbins=4, integer=True))

axes[-1].set_xlabel("x", fontsize=10)
plt.tight_layout()
plt.savefig("fig1_raw_errors.png", dpi=140, bbox_inches='tight')
plt.show()
print("[Figure 1 saved]")

# ── Figure 2: Normalised error distributions ───────────────
fig2, axes2 = plt.subplots(2, 3, figsize=(13, 7))
fig2.suptitle(r"Normalised error $Z(x,h) = E(x,h)\,/\,\sqrt{h/\ln x}$"
" — histogram per $h$", fontsize=12)

for ax, h, col in zip(axes2.flat, H_VALUES, COLORS):
z = results[h]["norm"]
ax.hist(z, bins=60, color=col, alpha=0.75, density=True,
edgecolor='white', linewidth=0.3)
# Overlay standard normal for reference
xs = np.linspace(z.min(), z.max(), 300)
from scipy.stats import norm as sp_norm
ax.plot(xs, sp_norm.pdf(xs), 'k--', lw=1.2, label='N(0,1)')
ax.set_title(f"h = {h} (Var={results[h]['norm_var']:.3f})", fontsize=10)
ax.set_xlabel("Z", fontsize=9)
ax.legend(fontsize=8)

plt.tight_layout()
plt.savefig("fig2_norm_distributions.png", dpi=140, bbox_inches='tight')
plt.show()
print("[Figure 2 saved]")

# ── Figure 3: MAE & RMSE vs h (bar chart) ─────────────────
fig3, ax3 = plt.subplots(figsize=(8, 4.5))
h_arr = np.array(H_VALUES)
mae_arr = np.array([results[h]["mae"] for h in H_VALUES])
rmse_arr = np.array([results[h]["rmse"] for h in H_VALUES])

bar_w = 0.38
x_idx = np.arange(len(H_VALUES))
ax3.bar(x_idx - bar_w/2, mae_arr, bar_w, label="MAE", color="#5B4FCF", alpha=0.85)
ax3.bar(x_idx + bar_w/2, rmse_arr, bar_w, label="RMSE", color="#E8593C", alpha=0.85)
ax3.set_xticks(x_idx)
ax3.set_xticklabels([str(h) for h in H_VALUES])
ax3.set_xlabel("Interval length h", fontsize=11)
ax3.set_ylabel("Error magnitude", fontsize=11)
ax3.set_title("MAE and RMSE of $E(x,h)$ across the sliding window", fontsize=12)
ax3.legend()

# annotate best h (lowest RMSE)
best_idx = np.argmin(rmse_arr)
ax3.annotate(f"min RMSE\nh={H_VALUES[best_idx]}",
xy=(best_idx + bar_w/2, rmse_arr[best_idx]),
xytext=(best_idx + 1.1, rmse_arr[best_idx] + 0.8),
arrowprops=dict(arrowstyle='->', color='black'),
fontsize=9, color='black')

plt.tight_layout()
plt.savefig("fig3_mae_rmse.png", dpi=140, bbox_inches='tight')
plt.show()
print("[Figure 3 saved]")

# ── Figure 4: Variance of normalised error vs h ────────────
fig4, ax4 = plt.subplots(figsize=(7, 4))
var_arr = np.array([results[h]["norm_var"] for h in H_VALUES])
ax4.plot(H_VALUES, var_arr, 'o-', color='#1D9E75', lw=2, ms=8)
ax4.axhline(1.0, color='gray', ls='--', lw=1.2, label='Poisson target = 1')
ax4.set_xlabel("h", fontsize=11)
ax4.set_ylabel(r"$\mathrm{Var}(Z)$", fontsize=11)
ax4.set_title(r"Variance of normalised error — optimal $h$ closest to 1", fontsize=12)
ax4.legend()
plt.tight_layout()
plt.savefig("fig4_var_vs_h.png", dpi=140, bbox_inches='tight')
plt.show()
print("[Figure 4 saved]")

# ── Figure 5: 3-D surface — |E(x, h)| over (x, h) ────────
print("\nBuilding 3-D surface …")

H_GRID = np.array([10, 20, 50, 100, 200, 500])
X_THIN = x_arr[::50] # thin x for 3-D (100 points)

Z3D = np.zeros((len(X_THIN), len(H_GRID)))
for j, h in enumerate(H_GRID):
raw_thin = short_interval_error(PI, X_THIN, h)
Z3D[:, j] = np.abs(raw_thin)

Xg, Hg = np.meshgrid(X_THIN, H_GRID, indexing='ij')

fig5 = plt.figure(figsize=(13, 7))
ax5 = fig5.add_subplot(111, projection='3d')

surf = ax5.plot_surface(Xg, Hg, Z3D,
cmap=cm.plasma, alpha=0.88,
linewidth=0, antialiased=True)
fig5.colorbar(surf, ax=ax5, shrink=0.45, aspect=10,
label=r"$|E(x,h)|$")

ax5.set_xlabel("x", fontsize=10, labelpad=8)
ax5.set_ylabel("h", fontsize=10, labelpad=8)
ax5.set_zlabel(r"|E(x,h)|", fontsize=10, labelpad=8)
ax5.set_title(r"3-D surface of $|E(x,h)|$ — short-interval prime error",
fontsize=12, pad=14)
ax5.view_init(elev=30, azim=-60)
plt.tight_layout()
plt.savefig("fig5_3d_surface.png", dpi=140, bbox_inches='tight')
plt.show()
print("[Figure 5 saved]")

# ── Figure 6: 3-D surface — Var(Z) over sub-windows × h ──
print("Building 3-D variance surface …")

SUBWIN = 500 # each sub-window width
n_wins = X_WINDOW // SUBWIN # 10 sub-windows
win_ids = np.arange(n_wins)
win_mids = X_START + (win_ids + 0.5) * SUBWIN # centre x of each window

VarZ_grid = np.zeros((n_wins, len(H_GRID)))
for i, wid in enumerate(win_ids):
x_sub = x_arr[wid*SUBWIN : (wid+1)*SUBWIN]
for j, h in enumerate(H_GRID):
raw_sub = short_interval_error(PI, x_sub, h)
nrm_sub = normalised_error(raw_sub, h, x_sub)
VarZ_grid[i, j] = np.var(nrm_sub)

Xg2, Hg2 = np.meshgrid(win_mids, H_GRID, indexing='ij')

fig6 = plt.figure(figsize=(13, 7))
ax6 = fig6.add_subplot(111, projection='3d')

surf2 = ax6.plot_surface(Xg2, Hg2, VarZ_grid,
cmap=cm.viridis, alpha=0.88,
linewidth=0, antialiased=True)
ax6.plot_surface(Xg2, Hg2,
np.ones_like(VarZ_grid),
alpha=0.18, color='red') # Poisson plane at Var=1

fig6.colorbar(surf2, ax=ax6, shrink=0.45, aspect=10,
label=r"$\mathrm{Var}(Z)$")
ax6.set_xlabel("x (window centre)", fontsize=10, labelpad=8)
ax6.set_ylabel("h", fontsize=10, labelpad=8)
ax6.set_zlabel(r"Var(Z)", fontsize=10, labelpad=8)
ax6.set_title(r"$\mathrm{Var}(Z)$ surface — red plane = Poisson target",
fontsize=12, pad=14)
ax6.view_init(elev=28, azim=-50)
plt.tight_layout()
plt.savefig("fig6_3d_var_surface.png", dpi=140, bbox_inches='tight')
plt.show()
print("[Figure 6 saved]")

print("\n✓ All figures complete.")

Graph-by-Graph Explanation

Figure 1 — Raw error traces. Each subplot shows $E(x,h)$ as $x$ slides from $10^5$ to $10^5!+!5000$. Small $h$ (e.g., $h=10$) yields a highly jagged, nearly integer-valued error because the observed count jumps between 0 and 1 primes. Large $h$ (e.g., $h=500$) produces smoother, but systematically larger, absolute excursions.

Figure 2 — Normalised error distributions. After dividing by $\sqrt{h/\ln x}$, all histograms should theoretically resemble a standard normal (the dashed black curve). For small $h$, the distribution is discrete and heavy-tailed; for moderate $h$ ($\approx 50$–$200$), it most closely tracks $\mathcal{N}(0,1)$, confirming the Poisson CLT regime.

Figure 3 — MAE and RMSE vs $h$. Both metrics grow with $h$ in absolute terms — longer intervals accumulate more variance. This is expected: the raw error’s standard deviation scales as $\sqrt{h/\ln x}$, so RMSE $\approx \sqrt{h/\ln x}$.

Figure 4 — Variance of $Z$ vs $h$. This is the key diagnostic. The Cramér–Poisson model predicts $\text{Var}(Z) = 1$. Values significantly below 1 indicate the denominator is too large; above 1 indicates sub-Poisson fluctuations. The $h$ that minimises $|\text{Var}(Z) - 1|$ is the optimal interval length for this range of $x$.

Figure 5 — 3-D surface $|E(x,h)|$. The plasma surface shows error magnitude varying jointly over $x$ and $h$. Ridges correspond to $x$-regions where a prime gap makes the error spike; valleys are regions of locally regular prime distribution.

Figure 6 — 3-D variance surface. The viridis surface shows $\text{Var}(Z)$ computed in 10 sub-windows of 500 integers each, for every $h$. The translucent red plane at $\text{Var}=1$ is the Poisson target. Where the surface dips below the red plane, the approximation $h/\ln x$ slightly over-predicts variance; where it rises above, there are genuine fluctuation excesses — direct echoes of the zeta zeros.


Results

[Figure 1 saved]

[Figure 2 saved]

[Figure 3 saved]

[Figure 4 saved]

Building 3-D surface …

[Figure 5 saved]

Building 3-D variance surface …

[Figure 6 saved]

✓ All figures complete.

Key Takeaways

The short-interval error $E(x,h)$ is not random noise — it is a deterministic quantity controlled by the distribution of primes, and indirectly by the non-trivial zeros of the Riemann zeta function $\zeta(s)$. The optimal interval length $h^*(x)$ balances two competing forces:

$$h \text{ too small} \Rightarrow \text{Poisson discreteness dominates},\quad h \text{ too large} \Rightarrow \text{systematic bias from } \zeta\text{-zeros accumulates}$$

In the range $x \sim 10^5$, the empirical optimum (closest $\text{Var}(Z)$ to 1) typically falls around $h \approx 50$–$100$, consistent with the heuristic $h \approx (\ln x)^2 \approx 136$ from Cramér’s model.