Minimizing the Error in Prime Counting on Short Intervals

March 23, 2026

A Deep Dive with Python Examples

The prime counting function $\pi(x)$ counts the number of primes up to $x$. One of the most fascinating open problems in analytic number theory concerns how well we can approximate $\pi(x)$ on short intervals $[x, x+h]$, where $h$ is much smaller than $x$.

The Problem

For a short interval of length $h$, the number of primes is:

$$\pi(x+h) - \pi(x)$$

The prime number theorem predicts this should be approximately:

$$\frac{h}{\ln x}$$

The error of this approximation is:

$$E(x, h) = \left(\pi(x+h) - \pi(x)\right) - \frac{h}{\ln x}$$

Our goal: find $h$ as a function of $x$ that minimizes $|E(x, h)|$ across many short intervals, and study the distribution of these errors.

Mathematical Framework

The Riemann Hypothesis predicts that for $h \gg x^{1/2+\varepsilon}$, the relative error satisfies:

$$\frac{E(x,h)}{h/\ln x} \to 0$$

but for very short intervals (Cramér’s conjecture), variance is dominated by:

$$\text{Var}(\pi(x+h) - \pi(x)) \approx \frac{h}{\ln x}$$

This suggests a Poisson-like fluctuation with mean $\mu = h/\ln x$.

The normalized error we minimize is:

$$Z(x, h) = \frac{E(x, h)}{\sqrt{h / \ln x}}$$

We seek $h = h^*(x)$ such that, over a window of starting points $x$, the variance of $Z(x, h)$ is minimized.

Concrete Example

Let’s fix $x \in [10^5, 10^5 + 5000]$ and study errors for different interval lengths $h \in {10, 20, 50, 100, 200, 500}$.

Source Code

# ============================================================
#  Prime Count Short-Interval Error Minimization
#  Environment: Google Colaboratory
# ============================================================

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import MaxNLocator
import warnings
warnings.filterwarnings('ignore')

# ── 1. Fast sieve ──────────────────────────────────────────
def sieve_of_eratosthenes(limit: int) -> np.ndarray:
    """Return boolean array; index i is True iff i is prime."""
    is_prime = np.ones(limit + 1, dtype=bool)
    is_prime[0:2] = False
    for p in range(2, int(limit**0.5) + 1):
        if is_prime[p]:
            is_prime[p*p::p] = False
    return is_prime

# ── 2. Cumulative prime count ───────────────────────────────
def build_pi(is_prime: np.ndarray) -> np.ndarray:
    """Prefix sum → π(x) for all x up to len(is_prime)-1."""
    return np.cumsum(is_prime)

# ── 3. Short-interval prime count and error ─────────────────
def short_interval_error(pi: np.ndarray,
                         x_values: np.ndarray,
                         h: int) -> np.ndarray:
    """
    E(x, h) = (π(x+h) − π(x)) − h/ln(x)
    Vectorised over all x in x_values.
    """
    observed = pi[x_values + h] - pi[x_values]
    expected = h / np.log(x_values.astype(float))
    return observed - expected

# ── 4. Normalised error (Z-score style) ────────────────────
def normalised_error(errors: np.ndarray, h: int,
                     x_values: np.ndarray) -> np.ndarray:
    sigma = np.sqrt(h / np.log(x_values.astype(float)))
    return errors / sigma

# ═══════════════════════════════════════════════════════════
#  PARAMETERS
# ═══════════════════════════════════════════════════════════
X_START   = 100_000
X_WINDOW  = 5_000          # slide x over [X_START, X_START + X_WINDOW]
H_VALUES  = [10, 20, 50, 100, 200, 500]
LIMIT     = X_START + X_WINDOW + max(H_VALUES) + 1

# ── Build sieve once ───────────────────────────────────────
print("Building sieve …")
IS_PRIME = sieve_of_eratosthenes(LIMIT)
PI       = build_pi(IS_PRIME)
x_arr    = np.arange(X_START, X_START + X_WINDOW)
print(f"Sieve built. Primes up to {LIMIT:,}: {PI[LIMIT-1]:,}")

# ── Compute errors for each h ──────────────────────────────
results = {}
for h in H_VALUES:
    raw = short_interval_error(PI, x_arr, h)
    nrm = normalised_error(raw, h, x_arr)
    results[h] = {
        "raw":  raw,
        "norm": nrm,
        "mae":  np.mean(np.abs(raw)),
        "rmse": np.sqrt(np.mean(raw**2)),
        "norm_var": np.var(nrm),
    }

# ── Summary table ──────────────────────────────────────────
print("\n{:>6}  {:>10}  {:>10}  {:>14}".format(
    "h", "MAE", "RMSE", "Var(norm_err)"))
print("─" * 46)
for h in H_VALUES:
    r = results[h]
    print(f"{h:>6}  {r['mae']:>10.4f}  {r['rmse']:>10.4f}  {r['norm_var']:>14.6f}")

Code Walkthrough

sieve_of_eratosthenes builds a Boolean NumPy array in $O(N \log \log N)$ time using vectorised stride assignment — far faster than a Python loop.

build_pi converts the Boolean sieve into the cumulative prime-count function $\pi(x)$ via np.cumsum. This makes any short-interval count $O(1)$: just subtract two array elements.

short_interval_error computes the raw error $E(x,h)$ for every $x$ simultaneously. The expected count $h / \ln x$ uses NumPy’s vectorised log.

normalised_error divides by the Cramér standard deviation $\sigma = \sqrt{h / \ln x}$, producing a dimensionless $Z$-score that can be fairly compared across different $h$ values.

The summary table shows MAE (Mean Absolute Error), RMSE, and the variance of the normalised error. The $h$ that minimises $\text{Var}(Z)$ closest to 1 is the “best-behaved” interval length — it matches the Poisson prediction most closely.

Visualisation Code

# ═══════════════════════════════════════════════════════════
#  FIGURES
# ═══════════════════════════════════════════════════════════
plt.style.use('seaborn-v0_8-whitegrid')
COLORS = plt.cm.plasma(np.linspace(0.15, 0.85, len(H_VALUES)))

# ── Figure 1: Raw error traces for each h ─────────────────
fig1, axes = plt.subplots(len(H_VALUES), 1,
                          figsize=(13, 11), sharex=True)
fig1.suptitle(r"Raw error $E(x,h) = (\pi(x+h)-\pi(x)) - h/\ln x$"
              "\n" + r"$x \in [10^5,\ 10^5\!+\!5000]$",
              fontsize=13, y=1.01)

for ax, h, col in zip(axes, H_VALUES, COLORS):
    ax.plot(x_arr, results[h]["raw"], color=col, lw=0.6, alpha=0.85)
    ax.axhline(0, color='black', lw=0.8, ls='--')
    ax.set_ylabel(f"h={h}", fontsize=9)
    ax.yaxis.set_major_locator(MaxNLocator(nbins=4, integer=True))

axes[-1].set_xlabel("x", fontsize=10)
plt.tight_layout()
plt.savefig("fig1_raw_errors.png", dpi=140, bbox_inches='tight')
plt.show()
print("[Figure 1 saved]")

# ── Figure 2: Normalised error distributions ───────────────
fig2, axes2 = plt.subplots(2, 3, figsize=(13, 7))
fig2.suptitle(r"Normalised error $Z(x,h) = E(x,h)\,/\,\sqrt{h/\ln x}$"
              " — histogram per $h$", fontsize=12)

for ax, h, col in zip(axes2.flat, H_VALUES, COLORS):
    z = results[h]["norm"]
    ax.hist(z, bins=60, color=col, alpha=0.75, density=True,
            edgecolor='white', linewidth=0.3)
    # Overlay standard normal for reference
    xs = np.linspace(z.min(), z.max(), 300)
    from scipy.stats import norm as sp_norm
    ax.plot(xs, sp_norm.pdf(xs), 'k--', lw=1.2, label='N(0,1)')
    ax.set_title(f"h = {h}  (Var={results[h]['norm_var']:.3f})", fontsize=10)
    ax.set_xlabel("Z", fontsize=9)
    ax.legend(fontsize=8)

plt.tight_layout()
plt.savefig("fig2_norm_distributions.png", dpi=140, bbox_inches='tight')
plt.show()
print("[Figure 2 saved]")

# ── Figure 3: MAE & RMSE vs h (bar chart) ─────────────────
fig3, ax3 = plt.subplots(figsize=(8, 4.5))
h_arr  = np.array(H_VALUES)
mae_arr  = np.array([results[h]["mae"]  for h in H_VALUES])
rmse_arr = np.array([results[h]["rmse"] for h in H_VALUES])

bar_w = 0.38
x_idx = np.arange(len(H_VALUES))
ax3.bar(x_idx - bar_w/2, mae_arr,  bar_w, label="MAE",  color="#5B4FCF", alpha=0.85)
ax3.bar(x_idx + bar_w/2, rmse_arr, bar_w, label="RMSE", color="#E8593C", alpha=0.85)
ax3.set_xticks(x_idx)
ax3.set_xticklabels([str(h) for h in H_VALUES])
ax3.set_xlabel("Interval length  h", fontsize=11)
ax3.set_ylabel("Error magnitude", fontsize=11)
ax3.set_title("MAE and RMSE of $E(x,h)$ across the sliding window", fontsize=12)
ax3.legend()

# annotate best h (lowest RMSE)
best_idx = np.argmin(rmse_arr)
ax3.annotate(f"min RMSE\nh={H_VALUES[best_idx]}",
             xy=(best_idx + bar_w/2, rmse_arr[best_idx]),
             xytext=(best_idx + 1.1, rmse_arr[best_idx] + 0.8),
             arrowprops=dict(arrowstyle='->', color='black'),
             fontsize=9, color='black')

plt.tight_layout()
plt.savefig("fig3_mae_rmse.png", dpi=140, bbox_inches='tight')
plt.show()
print("[Figure 3 saved]")

# ── Figure 4: Variance of normalised error vs h ────────────
fig4, ax4 = plt.subplots(figsize=(7, 4))
var_arr = np.array([results[h]["norm_var"] for h in H_VALUES])
ax4.plot(H_VALUES, var_arr, 'o-', color='#1D9E75', lw=2, ms=8)
ax4.axhline(1.0, color='gray', ls='--', lw=1.2, label='Poisson target = 1')
ax4.set_xlabel("h", fontsize=11)
ax4.set_ylabel(r"$\mathrm{Var}(Z)$", fontsize=11)
ax4.set_title(r"Variance of normalised error — optimal $h$ closest to 1", fontsize=12)
ax4.legend()
plt.tight_layout()
plt.savefig("fig4_var_vs_h.png", dpi=140, bbox_inches='tight')
plt.show()
print("[Figure 4 saved]")

# ── Figure 5: 3-D surface — |E(x, h)| over (x, h) ────────
print("\nBuilding 3-D surface …")

H_GRID  = np.array([10, 20, 50, 100, 200, 500])
X_THIN  = x_arr[::50]          # thin x for 3-D (100 points)

Z3D = np.zeros((len(X_THIN), len(H_GRID)))
for j, h in enumerate(H_GRID):
    raw_thin = short_interval_error(PI, X_THIN, h)
    Z3D[:, j] = np.abs(raw_thin)

Xg, Hg = np.meshgrid(X_THIN, H_GRID, indexing='ij')

fig5 = plt.figure(figsize=(13, 7))
ax5  = fig5.add_subplot(111, projection='3d')

surf = ax5.plot_surface(Xg, Hg, Z3D,
                        cmap=cm.plasma, alpha=0.88,
                        linewidth=0, antialiased=True)
fig5.colorbar(surf, ax=ax5, shrink=0.45, aspect=10,
              label=r"$|E(x,h)|$")

ax5.set_xlabel("x",  fontsize=10, labelpad=8)
ax5.set_ylabel("h",  fontsize=10, labelpad=8)
ax5.set_zlabel(r"|E(x,h)|", fontsize=10, labelpad=8)
ax5.set_title(r"3-D surface of $|E(x,h)|$ — short-interval prime error",
              fontsize=12, pad=14)
ax5.view_init(elev=30, azim=-60)
plt.tight_layout()
plt.savefig("fig5_3d_surface.png", dpi=140, bbox_inches='tight')
plt.show()
print("[Figure 5 saved]")

# ── Figure 6: 3-D surface — Var(Z) over sub-windows × h ──
print("Building 3-D variance surface …")

SUBWIN  = 500                           # each sub-window width
n_wins  = X_WINDOW // SUBWIN            # 10 sub-windows
win_ids = np.arange(n_wins)
win_mids = X_START + (win_ids + 0.5) * SUBWIN   # centre x of each window

VarZ_grid = np.zeros((n_wins, len(H_GRID)))
for i, wid in enumerate(win_ids):
    x_sub = x_arr[wid*SUBWIN : (wid+1)*SUBWIN]
    for j, h in enumerate(H_GRID):
        raw_sub = short_interval_error(PI, x_sub, h)
        nrm_sub = normalised_error(raw_sub, h, x_sub)
        VarZ_grid[i, j] = np.var(nrm_sub)

Xg2, Hg2 = np.meshgrid(win_mids, H_GRID, indexing='ij')

fig6 = plt.figure(figsize=(13, 7))
ax6  = fig6.add_subplot(111, projection='3d')

surf2 = ax6.plot_surface(Xg2, Hg2, VarZ_grid,
                          cmap=cm.viridis, alpha=0.88,
                          linewidth=0, antialiased=True)
ax6.plot_surface(Xg2, Hg2,
                 np.ones_like(VarZ_grid),
                 alpha=0.18, color='red')   # Poisson plane at Var=1

fig6.colorbar(surf2, ax=ax6, shrink=0.45, aspect=10,
              label=r"$\mathrm{Var}(Z)$")
ax6.set_xlabel("x (window centre)", fontsize=10, labelpad=8)
ax6.set_ylabel("h",                 fontsize=10, labelpad=8)
ax6.set_zlabel(r"Var(Z)",           fontsize=10, labelpad=8)
ax6.set_title(r"$\mathrm{Var}(Z)$ surface — red plane = Poisson target",
              fontsize=12, pad=14)
ax6.view_init(elev=28, azim=-50)
plt.tight_layout()
plt.savefig("fig6_3d_var_surface.png", dpi=140, bbox_inches='tight')
plt.show()
print("[Figure 6 saved]")

print("\n✓ All figures complete.")

Graph-by-Graph Explanation

Figure 1 — Raw error traces. Each subplot shows $E(x,h)$ as $x$ slides from $10^5$ to $10^5!+!5000$. Small $h$ (e.g., $h=10$) yields a highly jagged, nearly integer-valued error because the observed count jumps between 0 and 1 primes. Large $h$ (e.g., $h=500$) produces smoother, but systematically larger, absolute excursions.

Figure 2 — Normalised error distributions. After dividing by $\sqrt{h/\ln x}$, all histograms should theoretically resemble a standard normal (the dashed black curve). For small $h$, the distribution is discrete and heavy-tailed; for moderate $h$ ($\approx 50$–$200$), it most closely tracks $\mathcal{N}(0,1)$, confirming the Poisson CLT regime.

Figure 3 — MAE and RMSE vs $h$. Both metrics grow with $h$ in absolute terms — longer intervals accumulate more variance. This is expected: the raw error’s standard deviation scales as $\sqrt{h/\ln x}$, so RMSE $\approx \sqrt{h/\ln x}$.

Figure 4 — Variance of $Z$ vs $h$. This is the key diagnostic. The Cramér–Poisson model predicts $\text{Var}(Z) = 1$. Values significantly below 1 indicate the denominator is too large; above 1 indicates sub-Poisson fluctuations. The $h$ that minimises $|\text{Var}(Z) - 1|$ is the optimal interval length for this range of $x$.

Figure 5 — 3-D surface $|E(x,h)|$. The plasma surface shows error magnitude varying jointly over $x$ and $h$. Ridges correspond to $x$-regions where a prime gap makes the error spike; valleys are regions of locally regular prime distribution.

Figure 6 — 3-D variance surface. The viridis surface shows $\text{Var}(Z)$ computed in 10 sub-windows of 500 integers each, for every $h$. The translucent red plane at $\text{Var}=1$ is the Poisson target. Where the surface dips below the red plane, the approximation $h/\ln x$ slightly over-predicts variance; where it rises above, there are genuine fluctuation excesses — direct echoes of the zeta zeros.

Results

[Figure 1 saved]

[Figure 2 saved]

[Figure 3 saved]

[Figure 4 saved]

Building 3-D surface …

[Figure 5 saved]

Building 3-D variance surface …

[Figure 6 saved]

✓ All figures complete.

Key Takeaways

The short-interval error $E(x,h)$ is not random noise — it is a deterministic quantity controlled by the distribution of primes, and indirectly by the non-trivial zeros of the Riemann zeta function $\zeta(s)$. The optimal interval length $h^*(x)$ balances two competing forces:

$$h \text{ too small} \Rightarrow \text{Poisson discreteness dominates},\quad h \text{ too large} \Rightarrow \text{systematic bias from } \zeta\text{-zeros accumulates}$$

In the range $x \sim 10^5$, the empirical optimum (closest $\text{Var}(Z)$ to 1) typically falls around $h \approx 50$–$100$, consistent with the heuristic $h \approx (\ln x)^2 \approx 136$ from Cramér’s model.