Large Deviation Optimization

Finding the Optimal Portfolio Under Rare Event Constraints

Large deviation theory gives us a rigorous framework for quantifying the probability of rare but catastrophic events — and, crucially, for optimizing decisions so that those probabilities are minimized or controlled. In this article, we work through a concrete portfolio optimization problem under a large deviation constraint, implement it in Python, and visualize the results in both 2D and 3D.


Problem Setup

Consider a portfolio of $n = 2$ risky assets with log-returns $X_1, X_2$ that are i.i.d. over $T$ periods. We allocate fraction $w$ to asset 1 and $(1-w)$ to asset 2. The portfolio log-return per period is:

$$R(w) = w X_1 + (1-w) X_2$$

We model $X_i \sim \mathcal{N}(\mu_i, \sigma_i^2)$, so $R(w) \sim \mathcal{N}(\mu(w), \sigma^2(w))$ with:

$$\mu(w) = w\mu_1 + (1-w)\mu_2, \quad \sigma^2(w) = w^2\sigma_1^2 + (1-w)^2\sigma_2^2 + 2w(1-w)\rho\sigma_1\sigma_2$$

The Large Deviation Rate Function

By the Gärtner–Ellis theorem, the probability that the sample mean of $T$ i.i.d. returns falls below a threshold $\ell$ decays exponentially:

$$\mathbb{P}!\left(\frac{1}{T}\sum_{t=1}^T R_t \leq \ell\right) \approx e^{-T \cdot I(w, \ell)}$$

where $I(w, \ell)$ is the large deviation rate function. For a Gaussian $R(w)$:

$$I(w, \ell) = \frac{(\mu(w) - \ell)^2}{2\sigma^2(w)}, \quad \ell < \mu(w)$$

A larger $I$ means the bad event is exponentially less likely.

Optimization Problem

We solve two linked problems:

Problem 1 — Maximize the rate function (minimize crash probability):

$$\max_{w \in [0,1]} ; I(w, \ell)$$

Problem 2 — Mean-Rate frontier (Pareto tradeoff between expected return and safety):

$$\max_{w \in [0,1]} ; \lambda \cdot \mu(w) + (1-\lambda) \cdot I(w, \ell), \quad \lambda \in [0,1]$$


Python Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize_scalar, minimize
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
import warnings
warnings.filterwarnings('ignore')

# ── Parameters ──────────────────────────────────────────────────────────────
mu1, mu2 = 0.10, 0.06 # expected returns
sig1, sig2 = 0.20, 0.12 # volatilities
rho = 0.30 # correlation
ell = -0.02 # loss threshold (sample mean below this = "crash")
T = 252 # horizon (trading days)

# ── Portfolio statistics ─────────────────────────────────────────────────────
def port_stats(w):
"""Mean and variance of portfolio return as a function of weight w."""
mu = w * mu1 + (1 - w) * mu2
var = (w**2 * sig1**2
+ (1 - w)**2 * sig2**2
+ 2 * w * (1 - w) * rho * sig1 * sig2)
return mu, var

# ── Large deviation rate function ────────────────────────────────────────────
def rate_function(w, threshold=ell):
"""
Gaussian large deviation rate I(w, ell) = (mu - ell)^2 / (2*sigma^2).
Returns 0 when threshold >= mu (no deviation to speak of).
"""
mu, var = port_stats(w)
if np.isscalar(w):
if threshold >= mu:
return 0.0
return (mu - threshold)**2 / (2 * var)
# vectorised branch
result = np.zeros_like(w, dtype=float)
valid = threshold < mu
result[valid] = (mu[valid] - threshold)**2 / (2 * var[valid])
return result

# ── Problem 1: maximise I(w, ell) over w ∈ [0,1] ────────────────────────────
res1 = minimize_scalar(lambda w: -rate_function(w),
bounds=(0.0, 1.0), method='bounded')
w_opt = res1.x
I_opt = rate_function(w_opt)
mu_opt, _ = port_stats(w_opt)

print("=" * 55)
print("Problem 1 — Maximise Large Deviation Rate")
print(f" Optimal weight w* = {w_opt:.4f}")
print(f" Rate function I* = {I_opt:.6f}")
print(f" Expected return = {mu_opt:.4f}")
print(f" Crash prob ≈ exp(-T·I*) = {np.exp(-T * I_opt):.6e}")
print("=" * 55)

# ── Problem 2: Mean-Rate Pareto frontier ─────────────────────────────────────
lambdas = np.linspace(0, 1, 200)
w_pareto = np.zeros(len(lambdas))
for i, lam in enumerate(lambdas):
obj = lambda w: -(lam * port_stats(w)[0] + (1 - lam) * rate_function(w))
r = minimize_scalar(obj, bounds=(0.0, 1.0), method='bounded')
w_pareto[i] = r.x

mu_pareto = np.array([port_stats(w)[0] for w in w_pareto])
I_pareto = np.array([rate_function(w) for w in w_pareto])

# ── Grid for 3D surface ──────────────────────────────────────────────────────
w_grid = np.linspace(0, 1, 300)
ell_grid = np.linspace(-0.10, mu1 - 1e-4, 300)
W, ELL = np.meshgrid(w_grid, ell_grid)
MU, VAR = port_stats(W)
# Element-wise rate on grid
I_grid = np.where(ELL < MU, (MU - ELL)**2 / (2 * VAR), 0.0)

# ════════════════════════════════════════════════════════════════════════════
# Figure layout: 2 rows × 2 cols
# ════════════════════════════════════════════════════════════════════════════
fig = plt.figure(figsize=(18, 14))
fig.patch.set_facecolor('#0e0e0e')

# ── Panel 1: I(w) for fixed ell ──────────────────────────────────────────────
ax1 = fig.add_subplot(2, 2, 1)
ax1.set_facecolor('#1a1a2e')
w_arr = np.linspace(0, 1, 500)
I_arr = rate_function(w_arr)
ax1.plot(w_arr, I_arr, color='#00d4ff', lw=2.5, label=r'$I(w,\ell)$')
ax1.axvline(w_opt, color='#ff6b6b', lw=2, ls='--',
label=rf'$w^*={w_opt:.3f}$')
ax1.scatter([w_opt], [I_opt], color='#ffd700', s=100, zorder=5)
ax1.set_xlabel('Portfolio weight $w$', color='white')
ax1.set_ylabel(r'Rate function $I(w,\ell)$', color='white')
ax1.set_title(rf'Rate Function vs Weight ($\ell={ell}$)', color='white', fontsize=12)
ax1.legend(facecolor='#333', labelcolor='white')
ax1.tick_params(colors='white')
for sp in ax1.spines.values(): sp.set_color('#555')

# ── Panel 2: Crash probability exp(-T·I) ─────────────────────────────────────
ax2 = fig.add_subplot(2, 2, 2)
ax2.set_facecolor('#1a1a2e')
crash_prob = np.exp(-T * I_arr)
ax2.semilogy(w_arr, crash_prob, color='#ff9f43', lw=2.5,
label=r'$e^{-T \cdot I(w,\ell)}$')
ax2.axvline(w_opt, color='#ff6b6b', lw=2, ls='--',
label=rf'$w^*={w_opt:.3f}$')
ax2.set_xlabel('Portfolio weight $w$', color='white')
ax2.set_ylabel('Crash probability (log scale)', color='white')
ax2.set_title('Crash Probability vs Weight', color='white', fontsize=12)
ax2.legend(facecolor='#333', labelcolor='white')
ax2.tick_params(colors='white')
ax2.yaxis.set_tick_params(which='both', colors='white')
for sp in ax2.spines.values(): sp.set_color('#555')

# ── Panel 3: Mean–Rate Pareto frontier ───────────────────────────────────────
ax3 = fig.add_subplot(2, 2, 3)
ax3.set_facecolor('#1a1a2e')
sc = ax3.scatter(mu_pareto, I_pareto, c=lambdas, cmap='plasma',
s=15, zorder=3)
ax3.scatter([mu_opt], [I_opt], color='#ffd700', s=120, zorder=5,
label=rf'Safety-optimal $w^*$')
cbar = plt.colorbar(sc, ax=ax3)
cbar.set_label(r'$\lambda$ (return weight)', color='white')
cbar.ax.yaxis.set_tick_params(color='white')
plt.setp(plt.getp(cbar.ax.axes, 'yticklabels'), color='white')
ax3.set_xlabel(r'Expected return $\mu(w)$', color='white')
ax3.set_ylabel(r'Rate function $I(w,\ell)$', color='white')
ax3.set_title('Mean – Rate Pareto Frontier', color='white', fontsize=12)
ax3.legend(facecolor='#333', labelcolor='white')
ax3.tick_params(colors='white')
for sp in ax3.spines.values(): sp.set_color('#555')

# ── Panel 4: 3D surface I(w, ell) ────────────────────────────────────────────
ax4 = fig.add_subplot(2, 2, 4, projection='3d')
ax4.set_facecolor('#0e0e0e')
surf = ax4.plot_surface(W, ELL, I_grid,
cmap='viridis', alpha=0.88,
linewidth=0, antialiased=True)
# mark optimum ridge at fixed ell
I_opt_arr = rate_function(w_arr, threshold=ell)
ax4.plot(w_arr, np.full_like(w_arr, ell), I_opt_arr,
color='#ff6b6b', lw=2.5, label=rf'$\ell={ell}$ slice')
ax4.scatter([w_opt], [ell], [I_opt],
color='#ffd700', s=80, zorder=5)
fig.colorbar(surf, ax=ax4, shrink=0.5, label='I(w, ℓ)', pad=0.1)
ax4.set_xlabel('Weight $w$', color='white', labelpad=8)
ax4.set_ylabel('Threshold $\\ell$', color='white', labelpad=8)
ax4.set_zlabel('Rate $I$', color='white', labelpad=8)
ax4.set_title('3D Rate Function Surface', color='white', fontsize=12)
ax4.tick_params(colors='white')
ax4.xaxis.pane.fill = False
ax4.yaxis.pane.fill = False
ax4.zaxis.pane.fill = False

plt.suptitle('Large Deviation Optimization — Portfolio Safety Analysis',
color='white', fontsize=15, fontweight='bold', y=1.01)
plt.tight_layout()
plt.savefig('large_deviation_portfolio.png', dpi=150,
bbox_inches='tight', facecolor='#0e0e0e')
plt.show()
print("Figure saved.")

Execution Results

=======================================================
Problem 1 — Maximise Large Deviation Rate
  Optimal weight  w* = 0.3303
  Rate function  I*  = 0.310134
  Expected return    = 0.0732
  Crash prob ≈ exp(-T·I*) = 1.143447e-34
=======================================================

Figure saved.

Code Walkthrough

Asset and Portfolio Model

1
2
3
4
5
mu1, mu2 = 0.10, 0.06
sig1, sig2 = 0.20, 0.12
rho = 0.30
ell = -0.02
T = 252

Asset 1 offers higher return but higher volatility; asset 2 is the defensive position. The loss threshold $\ell = -0.02$ represents an annualised average daily return below $-2%$ — a severe drawdown event. The horizon $T = 252$ is one trading year.

port_stats(w) returns the exact Gaussian mean and variance for any weight $w$ using the bilinear covariance formula. Broadcasting over NumPy arrays makes it vectorisation-ready at zero extra cost.

The Rate Function

$$I(w, \ell) = \frac{(\mu(w) - \ell)^2}{2\sigma^2(w)}$$

This is the negative log-probability per unit time of the sample mean falling to $\ell$. It is zero when $\ell \geq \mu(w)$ (the threshold is above the mean, so the “bad” event is typical rather than rare). The code guards against this case explicitly with np.where on the grid and a scalar branch in the function.

Problem 1 — Rate Maximisation

minimize_scalar with method='bounded' performs golden-section search on $[0,1]$. We negate $I$ because SciPy only minimises. The result $w^*$ is the portfolio that makes a drawdown below $\ell$ exponentially least likely for a given horizon $T$.

Problem 2 — Pareto Sweep

For 200 values of the trade-off weight $\lambda$ we solve:

$$\max_w ; \lambda \mu(w) + (1-\lambda) I(w,\ell)$$

Each call is a one-dimensional bounded minimisation — cheap enough to run in a tight loop without parallelisation. The result is the mean–rate efficient frontier: every portfolio on it is Pareto-optimal in the sense that you cannot improve both expected return and crash-rate simultaneously.


Visualisation Breakdown

Panel 1 — Rate Function vs Weight

The blue curve shows $I(w, \ell)$ as $w$ sweeps from 0 (all asset 2) to 1 (all asset 1). The function is unimodal: too little weight in asset 1 leaves $\mu(w)$ close to $\ell$, shrinking the numerator; too much weight bloats $\sigma^2(w)$, inflating the denominator. The gold dot marks the sweet spot $w^*$.

Panel 2 — Crash Probability (log scale)

The crash probability $e^{-T \cdot I(w,\ell)}$ plotted on a log scale makes the exponential sensitivity stark. Near $w = 0$ the crash probability is orders of magnitude larger than at $w^*$. This is the essence of large deviation theory: tiny differences in rate produce enormous differences in rare-event probability over long horizons.

Panel 3 — Mean–Rate Pareto Frontier

The scatter plot traces the full trade-off as $\lambda$ varies. Points coloured toward purple (low $\lambda$) sit high on the rate axis — maximum safety portfolios. Points coloured toward yellow (high $\lambda$) shift right toward higher expected return at the cost of a lower rate. The gold dot marks the pure safety optimum from Problem 1.

Panel 4 — 3D Rate Surface $I(w, \ell)$

The surface sweeps both $w \in [0,1]$ and the threshold $\ell \in [-0.10, \mu_1)$. The shape reveals two key features:

  • As $\ell \to \mu(w)$ from below, $I \to 0$: the “bad” event ceases to be rare.
  • As $\ell \to -\infty$, $I$ grows quadratically: extreme crashes become super-exponentially unlikely.

The red curve is the fixed-$\ell$ slice used in Problems 1 and 2, with the gold dot at the optimum. The viridis colormap makes the ridge of high safety visible across the full $(\ell, w)$ plane.


Key Takeaways

The large deviation rate function $I(w, \ell)$ compresses all the probabilistic risk of a portfolio into a single, optimisation-friendly scalar. Maximising it is equivalent to minimising the exponential decay rate of crash probability — a much sharper criterion than variance minimisation (Markowitz) because it directly targets the tail. The Pareto frontier in Panel 3 shows that a pure safety portfolio and a pure return portfolio are generally different, and that the trade-off is smooth and navigable. For a practitioner, $w^*$ from Problem 1 is the starting point for any tail-risk-aware allocation.