Large Deviation Optimization

March 27, 2026

Finding the Optimal Portfolio Under Rare Event Constraints

Large deviation theory gives us a rigorous framework for quantifying the probability of rare but catastrophic events — and, crucially, for optimizing decisions so that those probabilities are minimized or controlled. In this article, we work through a concrete portfolio optimization problem under a large deviation constraint, implement it in Python, and visualize the results in both 2D and 3D.

Problem Setup

Consider a portfolio of $n = 2$ risky assets with log-returns $X_1, X_2$ that are i.i.d. over $T$ periods. We allocate fraction $w$ to asset 1 and $(1-w)$ to asset 2. The portfolio log-return per period is:

$$R(w) = w X_1 + (1-w) X_2$$

We model $X_i \sim \mathcal{N}(\mu_i, \sigma_i^2)$, so $R(w) \sim \mathcal{N}(\mu(w), \sigma^2(w))$ with:

$$\mu(w) = w\mu_1 + (1-w)\mu_2, \quad \sigma^2(w) = w^2\sigma_1^2 + (1-w)^2\sigma_2^2 + 2w(1-w)\rho\sigma_1\sigma_2$$

The Large Deviation Rate Function

By the Gärtner–Ellis theorem, the probability that the sample mean of $T$ i.i.d. returns falls below a threshold $\ell$ decays exponentially:

$$\mathbb{P}!\left(\frac{1}{T}\sum_{t=1}^T R_t \leq \ell\right) \approx e^{-T \cdot I(w, \ell)}$$

where $I(w, \ell)$ is the large deviation rate function. For a Gaussian $R(w)$:

$$I(w, \ell) = \frac{(\mu(w) - \ell)^2}{2\sigma^2(w)}, \quad \ell < \mu(w)$$

A larger $I$ means the bad event is exponentially less likely.

Optimization Problem

We solve two linked problems:

Problem 1 — Maximize the rate function (minimize crash probability):

$$\max_{w \in [0,1]} ; I(w, \ell)$$

Problem 2 — Mean-Rate frontier (Pareto tradeoff between expected return and safety):

$$\max_{w \in [0,1]} ; \lambda \cdot \mu(w) + (1-\lambda) \cdot I(w, \ell), \quad \lambda \in [0,1]$$

Python Implementation

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize_scalar, minimize
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
import warnings
warnings.filterwarnings('ignore')

# ── Parameters ──────────────────────────────────────────────────────────────
mu1, mu2       = 0.10, 0.06          # expected returns
sig1, sig2     = 0.20, 0.12          # volatilities
rho            = 0.30                 # correlation
ell            = -0.02                # loss threshold (sample mean below this = "crash")
T              = 252                  # horizon (trading days)

# ── Portfolio statistics ─────────────────────────────────────────────────────
def port_stats(w):
    """Mean and variance of portfolio return as a function of weight w."""
    mu  = w * mu1 + (1 - w) * mu2
    var = (w**2 * sig1**2
           + (1 - w)**2 * sig2**2
           + 2 * w * (1 - w) * rho * sig1 * sig2)
    return mu, var

# ── Large deviation rate function ────────────────────────────────────────────
def rate_function(w, threshold=ell):
    """
    Gaussian large deviation rate I(w, ell) = (mu - ell)^2 / (2*sigma^2).
    Returns 0 when threshold >= mu (no deviation to speak of).
    """
    mu, var = port_stats(w)
    if np.isscalar(w):
        if threshold >= mu:
            return 0.0
        return (mu - threshold)**2 / (2 * var)
    # vectorised branch
    result = np.zeros_like(w, dtype=float)
    valid  = threshold < mu
    result[valid] = (mu[valid] - threshold)**2 / (2 * var[valid])
    return result

# ── Problem 1: maximise I(w, ell) over w ∈ [0,1] ────────────────────────────
res1 = minimize_scalar(lambda w: -rate_function(w),
                       bounds=(0.0, 1.0), method='bounded')
w_opt     = res1.x
I_opt     = rate_function(w_opt)
mu_opt, _ = port_stats(w_opt)

print("=" * 55)
print("Problem 1 — Maximise Large Deviation Rate")
print(f"  Optimal weight  w* = {w_opt:.4f}")
print(f"  Rate function  I*  = {I_opt:.6f}")
print(f"  Expected return    = {mu_opt:.4f}")
print(f"  Crash prob ≈ exp(-T·I*) = {np.exp(-T * I_opt):.6e}")
print("=" * 55)

# ── Problem 2: Mean-Rate Pareto frontier ─────────────────────────────────────
lambdas   = np.linspace(0, 1, 200)
w_pareto  = np.zeros(len(lambdas))
for i, lam in enumerate(lambdas):
    obj = lambda w: -(lam * port_stats(w)[0] + (1 - lam) * rate_function(w))
    r   = minimize_scalar(obj, bounds=(0.0, 1.0), method='bounded')
    w_pareto[i] = r.x

mu_pareto = np.array([port_stats(w)[0] for w in w_pareto])
I_pareto  = np.array([rate_function(w)  for w in w_pareto])

# ── Grid for 3D surface ──────────────────────────────────────────────────────
w_grid   = np.linspace(0, 1, 300)
ell_grid = np.linspace(-0.10, mu1 - 1e-4, 300)
W, ELL   = np.meshgrid(w_grid, ell_grid)
MU, VAR  = port_stats(W)
# Element-wise rate on grid
I_grid   = np.where(ELL < MU, (MU - ELL)**2 / (2 * VAR), 0.0)

# ════════════════════════════════════════════════════════════════════════════
# Figure layout: 2 rows × 2 cols
# ════════════════════════════════════════════════════════════════════════════
fig = plt.figure(figsize=(18, 14))
fig.patch.set_facecolor('#0e0e0e')

# ── Panel 1: I(w) for fixed ell ──────────────────────────────────────────────
ax1 = fig.add_subplot(2, 2, 1)
ax1.set_facecolor('#1a1a2e')
w_arr = np.linspace(0, 1, 500)
I_arr = rate_function(w_arr)
ax1.plot(w_arr, I_arr, color='#00d4ff', lw=2.5, label=r'$I(w,\ell)$')
ax1.axvline(w_opt, color='#ff6b6b', lw=2, ls='--',
            label=rf'$w^*={w_opt:.3f}$')
ax1.scatter([w_opt], [I_opt], color='#ffd700', s=100, zorder=5)
ax1.set_xlabel('Portfolio weight $w$', color='white')
ax1.set_ylabel(r'Rate function $I(w,\ell)$', color='white')
ax1.set_title(rf'Rate Function vs Weight  ($\ell={ell}$)', color='white', fontsize=12)
ax1.legend(facecolor='#333', labelcolor='white')
ax1.tick_params(colors='white')
for sp in ax1.spines.values(): sp.set_color('#555')

# ── Panel 2: Crash probability exp(-T·I) ─────────────────────────────────────
ax2 = fig.add_subplot(2, 2, 2)
ax2.set_facecolor('#1a1a2e')
crash_prob = np.exp(-T * I_arr)
ax2.semilogy(w_arr, crash_prob, color='#ff9f43', lw=2.5,
             label=r'$e^{-T \cdot I(w,\ell)}$')
ax2.axvline(w_opt, color='#ff6b6b', lw=2, ls='--',
            label=rf'$w^*={w_opt:.3f}$')
ax2.set_xlabel('Portfolio weight $w$', color='white')
ax2.set_ylabel('Crash probability (log scale)', color='white')
ax2.set_title('Crash Probability vs Weight', color='white', fontsize=12)
ax2.legend(facecolor='#333', labelcolor='white')
ax2.tick_params(colors='white')
ax2.yaxis.set_tick_params(which='both', colors='white')
for sp in ax2.spines.values(): sp.set_color('#555')

# ── Panel 3: Mean–Rate Pareto frontier ───────────────────────────────────────
ax3 = fig.add_subplot(2, 2, 3)
ax3.set_facecolor('#1a1a2e')
sc = ax3.scatter(mu_pareto, I_pareto, c=lambdas, cmap='plasma',
                 s=15, zorder=3)
ax3.scatter([mu_opt], [I_opt], color='#ffd700', s=120, zorder=5,
            label=rf'Safety-optimal $w^*$')
cbar = plt.colorbar(sc, ax=ax3)
cbar.set_label(r'$\lambda$ (return weight)', color='white')
cbar.ax.yaxis.set_tick_params(color='white')
plt.setp(plt.getp(cbar.ax.axes, 'yticklabels'), color='white')
ax3.set_xlabel(r'Expected return $\mu(w)$', color='white')
ax3.set_ylabel(r'Rate function $I(w,\ell)$', color='white')
ax3.set_title('Mean – Rate Pareto Frontier', color='white', fontsize=12)
ax3.legend(facecolor='#333', labelcolor='white')
ax3.tick_params(colors='white')
for sp in ax3.spines.values(): sp.set_color('#555')

# ── Panel 4: 3D surface I(w, ell) ────────────────────────────────────────────
ax4 = fig.add_subplot(2, 2, 4, projection='3d')
ax4.set_facecolor('#0e0e0e')
surf = ax4.plot_surface(W, ELL, I_grid,
                        cmap='viridis', alpha=0.88,
                        linewidth=0, antialiased=True)
# mark optimum ridge at fixed ell
I_opt_arr = rate_function(w_arr, threshold=ell)
ax4.plot(w_arr, np.full_like(w_arr, ell), I_opt_arr,
         color='#ff6b6b', lw=2.5, label=rf'$\ell={ell}$ slice')
ax4.scatter([w_opt], [ell], [I_opt],
            color='#ffd700', s=80, zorder=5)
fig.colorbar(surf, ax=ax4, shrink=0.5, label='I(w, ℓ)', pad=0.1)
ax4.set_xlabel('Weight $w$', color='white', labelpad=8)
ax4.set_ylabel('Threshold $\\ell$', color='white', labelpad=8)
ax4.set_zlabel('Rate $I$', color='white', labelpad=8)
ax4.set_title('3D Rate Function Surface', color='white', fontsize=12)
ax4.tick_params(colors='white')
ax4.xaxis.pane.fill = False
ax4.yaxis.pane.fill = False
ax4.zaxis.pane.fill = False

plt.suptitle('Large Deviation Optimization — Portfolio Safety Analysis',
             color='white', fontsize=15, fontweight='bold', y=1.01)
plt.tight_layout()
plt.savefig('large_deviation_portfolio.png', dpi=150,
            bbox_inches='tight', facecolor='#0e0e0e')
plt.show()
print("Figure saved.")

Execution Results

=======================================================
Problem 1 — Maximise Large Deviation Rate
  Optimal weight  w* = 0.3303
  Rate function  I*  = 0.310134
  Expected return    = 0.0732
  Crash prob ≈ exp(-T·I*) = 1.143447e-34
=======================================================

Figure saved.

Code Walkthrough

Asset and Portfolio Model

mu1, mu2 = 0.10, 0.06
sig1, sig2 = 0.20, 0.12
rho = 0.30
ell = -0.02
T   = 252

Asset 1 offers higher return but higher volatility; asset 2 is the defensive position. The loss threshold $\ell = -0.02$ represents an annualised average daily return below $-2%$ — a severe drawdown event. The horizon $T = 252$ is one trading year.

port_stats(w) returns the exact Gaussian mean and variance for any weight $w$ using the bilinear covariance formula. Broadcasting over NumPy arrays makes it vectorisation-ready at zero extra cost.

The Rate Function

$$I(w, \ell) = \frac{(\mu(w) - \ell)^2}{2\sigma^2(w)}$$

This is the negative log-probability per unit time of the sample mean falling to $\ell$. It is zero when $\ell \geq \mu(w)$ (the threshold is above the mean, so the “bad” event is typical rather than rare). The code guards against this case explicitly with np.where on the grid and a scalar branch in the function.

Problem 1 — Rate Maximisation

minimize_scalar with method='bounded' performs golden-section search on $[0,1]$. We negate $I$ because SciPy only minimises. The result $w^*$ is the portfolio that makes a drawdown below $\ell$ exponentially least likely for a given horizon $T$.

Problem 2 — Pareto Sweep

For 200 values of the trade-off weight $\lambda$ we solve:

$$\max_w ; \lambda \mu(w) + (1-\lambda) I(w,\ell)$$

Each call is a one-dimensional bounded minimisation — cheap enough to run in a tight loop without parallelisation. The result is the mean–rate efficient frontier: every portfolio on it is Pareto-optimal in the sense that you cannot improve both expected return and crash-rate simultaneously.

Visualisation Breakdown

Panel 1 — Rate Function vs Weight

The blue curve shows $I(w, \ell)$ as $w$ sweeps from 0 (all asset 2) to 1 (all asset 1). The function is unimodal: too little weight in asset 1 leaves $\mu(w)$ close to $\ell$, shrinking the numerator; too much weight bloats $\sigma^2(w)$, inflating the denominator. The gold dot marks the sweet spot $w^*$.

Panel 2 — Crash Probability (log scale)

The crash probability $e^{-T \cdot I(w,\ell)}$ plotted on a log scale makes the exponential sensitivity stark. Near $w = 0$ the crash probability is orders of magnitude larger than at $w^*$. This is the essence of large deviation theory: tiny differences in rate produce enormous differences in rare-event probability over long horizons.

Panel 3 — Mean–Rate Pareto Frontier

The scatter plot traces the full trade-off as $\lambda$ varies. Points coloured toward purple (low $\lambda$) sit high on the rate axis — maximum safety portfolios. Points coloured toward yellow (high $\lambda$) shift right toward higher expected return at the cost of a lower rate. The gold dot marks the pure safety optimum from Problem 1.

Panel 4 — 3D Rate Surface $I(w, \ell)$

The surface sweeps both $w \in [0,1]$ and the threshold $\ell \in [-0.10, \mu_1)$. The shape reveals two key features:

As $\ell \to \mu(w)$ from below, $I \to 0$: the “bad” event ceases to be rare.
As $\ell \to -\infty$, $I$ grows quadratically: extreme crashes become super-exponentially unlikely.

The red curve is the fixed-$\ell$ slice used in Problems 1 and 2, with the gold dot at the optimum. The viridis colormap makes the ridge of high safety visible across the full $(\ell, w)$ plane.

Key Takeaways

The large deviation rate function $I(w, \ell)$ compresses all the probabilistic risk of a portfolio into a single, optimisation-friendly scalar. Maximising it is equivalent to minimising the exponential decay rate of crash probability — a much sharper criterion than variance minimisation (Markowitz) because it directly targets the tail. The Pareto frontier in Panel 3 shows that a pure safety portfolio and a pure return portfolio are generally different, and that the trade-off is smooth and navigable. For a practitioner, $w^*$ from Problem 1 is the starting point for any tail-risk-aware allocation.