Optimizing Authentication Score Thresholds

Balancing Convenience vs. Security

When designing a user authentication system, one of the most critical — and often underappreciated — decisions is where to set the decision threshold. Set it too high, and legitimate users get locked out. Set it too low, and attackers slip through. This is the classic convenience vs. security trade-off, and today we’ll solve it with a concrete example using Python.


The Problem Setup

Imagine a bank’s login system that computes a risk score (0–1) for each login attempt based on factors like device fingerprint, location, typing speed, and time of day. We need to find the optimal threshold $\theta$ that separates legitimate users from attackers.

We define:

  • False Acceptance Rate (FAR): the probability that an attacker is incorrectly accepted
    $$\text{FAR}(\theta) = P(\text{score} \geq \theta \mid \text{attacker})$$

  • False Rejection Rate (FRR): the probability that a legitimate user is incorrectly rejected
    $$\text{FRR}(\theta) = P(\text{score} < \theta \mid \text{legitimate})$$

  • Equal Error Rate (EER): the point where $\text{FAR} = \text{FRR}$

The total cost we want to minimize is:

$$C(\theta) = w_s \cdot \text{FAR}(\theta) + w_c \cdot \text{FRR}(\theta)$$

where $w_s$ is the weight for security (cost of accepting an attacker) and $w_c$ is the weight for convenience (cost of rejecting a legitimate user).


The Full Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy.optimize import minimize_scalar
from mpl_toolkits.mplot3d import Axes3D
import warnings
warnings.filterwarnings('ignore')

# ── 1. Reproducibility ──────────────────────────────────────────────────────
np.random.seed(42)

# ── 2. Simulate authentication score distributions ──────────────────────────
N = 10_000

# Legitimate users: high scores (mean=0.75, std=0.10)
legit_scores = np.clip(np.random.normal(0.75, 0.10, N), 0, 1)

# Attackers: lower scores (mean=0.40, std=0.12)
attack_scores = np.clip(np.random.normal(0.40, 0.12, N), 0, 1)

thresholds = np.linspace(0, 1, 500)

# ── 3. FAR / FRR as functions of threshold ──────────────────────────────────
FAR = np.array([np.mean(attack_scores >= t) for t in thresholds])
FRR = np.array([np.mean(legit_scores < t) for t in thresholds])

# ── 4. Equal Error Rate (EER) ────────────────────────────────────────────────
eer_idx = np.argmin(np.abs(FAR - FRR))
eer_value = (FAR[eer_idx] + FRR[eer_idx]) / 2
eer_theta = thresholds[eer_idx]

# ── 5. Cost-based optimal threshold ─────────────────────────────────────────
def total_cost(theta, w_s=0.7, w_c=0.3):
far = np.mean(attack_scores >= theta)
frr = np.mean(legit_scores < theta)
return w_s * far + w_c * frr

costs_default = np.array([total_cost(t) for t in thresholds])
opt_idx_default = np.argmin(costs_default)
opt_theta_default = thresholds[opt_idx_default]

# ── 6. Figure 1 – Score distributions ────────────────────────────────────────
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
fig.suptitle("User Authentication Score Threshold Optimization", fontsize=15, fontweight='bold')

ax = axes[0]
ax.hist(legit_scores, bins=60, alpha=0.6, color='steelblue', label='Legitimate Users', density=True)
ax.hist(attack_scores, bins=60, alpha=0.6, color='tomato', label='Attackers', density=True)
ax.axvline(eer_theta, color='purple', lw=2, ls='--', label=f'EER θ = {eer_theta:.3f}')
ax.axvline(opt_theta_default, color='green', lw=2, ls='-', label=f'Cost-Opt θ = {opt_theta_default:.3f}')
ax.set_xlabel('Authentication Score', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('Score Distributions (Legitimate vs Attacker)', fontsize=12)
ax.legend()
ax.grid(alpha=0.3)

# ── 7. Figure 1 right – FAR / FRR / Cost curves ──────────────────────────────
ax2 = axes[1]
ax2.plot(thresholds, FAR, color='tomato', lw=2, label='FAR (Security Risk)')
ax2.plot(thresholds, FRR, color='steelblue', lw=2, label='FRR (User Friction)')
ax2.plot(thresholds, costs_default, color='darkorange',lw=2, label='Total Cost (ws=0.7, wc=0.3)', ls='-.')
ax2.axvline(eer_theta, color='purple', lw=1.5, ls='--', label=f'EER θ={eer_theta:.3f} ({eer_value:.3f})')
ax2.axvline(opt_theta_default, color='green', lw=1.5, ls='-', label=f'Opt θ={opt_theta_default:.3f}')
ax2.scatter([eer_theta], [eer_value], color='purple', zorder=5, s=80)
ax2.scatter([opt_theta_default], [costs_default[opt_idx_default]], color='green', zorder=5, s=80)
ax2.set_xlabel('Threshold θ', fontsize=12)
ax2.set_ylabel('Rate / Cost', fontsize=12)
ax2.set_title('FAR, FRR & Total Cost vs Threshold', fontsize=12)
ax2.legend(fontsize=9)
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.savefig('auth_threshold_1.png', dpi=150, bbox_inches='tight')
plt.show()
print("Figure 1 saved.")

# ── 8. Figure 2 – ROC curve ───────────────────────────────────────────────────
TAR = 1 - FRR # True Acceptance Rate

fig2, ax3 = plt.subplots(figsize=(7, 6))
sc = ax3.scatter(FAR, TAR, c=thresholds, cmap='viridis', s=8, zorder=2)
plt.colorbar(sc, ax=ax3, label='Threshold θ')
ax3.plot([0, 1], [0, 1], 'k--', lw=1, alpha=0.5, label='Random Classifier')
ax3.scatter([FAR[eer_idx]], [TAR[eer_idx]],
color='purple', s=120, zorder=5, label=f'EER θ={eer_theta:.3f}')
ax3.scatter([FAR[opt_idx_default]], [TAR[opt_idx_default]],
color='green', s=120, zorder=5, label=f'Cost-Opt θ={opt_theta_default:.3f}')
ax3.set_xlabel('FAR (False Acceptance Rate)', fontsize=12)
ax3.set_ylabel('TAR (True Acceptance Rate)', fontsize=12)
ax3.set_title('ROC Curve for Authentication System', fontsize=13, fontweight='bold')
ax3.legend()
ax3.grid(alpha=0.3)
plt.tight_layout()
plt.savefig('auth_threshold_2.png', dpi=150, bbox_inches='tight')
plt.show()
print("Figure 2 saved.")

# ── 9. Figure 3 – 3D Cost Surface over (ws, wc, theta) ───────────────────────
ws_vals = np.linspace(0.1, 0.9, 60)
theta_vals = np.linspace(0.3, 0.8, 60)
WS, TH = np.meshgrid(ws_vals, theta_vals)
COST_3D = np.zeros_like(WS)

for i in range(WS.shape[0]):
for j in range(WS.shape[1]):
ws = WS[i, j]
wc = 1 - ws
th = TH[i, j]
COST_3D[i, j] = ws * np.mean(attack_scores >= th) + wc * np.mean(legit_scores < th)

fig3 = plt.figure(figsize=(13, 6))

ax4 = fig3.add_subplot(121, projection='3d')
surf = ax4.plot_surface(WS, TH, COST_3D, cmap='plasma', alpha=0.85, linewidth=0)
fig3.colorbar(surf, ax=ax4, shrink=0.5, label='Total Cost')
ax4.set_xlabel('Security Weight wₛ', fontsize=10)
ax4.set_ylabel('Threshold θ', fontsize=10)
ax4.set_zlabel('Total Cost', fontsize=10)
ax4.set_title('3D Cost Surface\n(wₛ + wc = 1)', fontsize=11, fontweight='bold')
ax4.view_init(elev=30, azim=-60)

# Optimal ridge
opt_thetas_3d = []
for ws in ws_vals:
wc = 1 - ws
c = ws * np.array([np.mean(attack_scores >= t) for t in theta_vals]) \
+ wc * np.array([np.mean(legit_scores < t) for t in theta_vals])
opt_thetas_3d.append(theta_vals[np.argmin(c)])

opt_costs_3d = [ws * np.mean(attack_scores >= ot) + (1 - ws) * np.mean(legit_scores < ot)
for ws, ot in zip(ws_vals, opt_thetas_3d)]
ax4.plot(ws_vals, opt_thetas_3d, opt_costs_3d,
color='cyan', lw=3, zorder=5, label='Optimal Ridge')
ax4.legend(fontsize=9)

# Right panel – optimal theta vs ws
ax5 = fig3.add_subplot(122)
ax5.plot(ws_vals, opt_thetas_3d, color='darkorange', lw=2.5)
ax5.axvline(0.7, color='green', ls='--', lw=1.5, label='ws=0.7 (default)')
ax5.set_xlabel('Security Weight wₛ', fontsize=12)
ax5.set_ylabel('Optimal Threshold θ*', fontsize=12)
ax5.set_title('Optimal Threshold vs Security Weight', fontsize=12, fontweight='bold')
ax5.legend()
ax5.grid(alpha=0.3)

plt.tight_layout()
plt.savefig('auth_threshold_3.png', dpi=150, bbox_inches='tight')
plt.show()
print("Figure 3 saved.")

# ── 10. Figure 4 – Scenario comparison (High-Security vs Balanced vs User-Friendly) ──
scenarios = {
'High-Security\n(ws=0.9, wc=0.1)': (0.9, 0.1),
'Balanced\n(ws=0.7, wc=0.3)': (0.7, 0.3),
'User-Friendly\n(ws=0.3, wc=0.7)': (0.3, 0.7),
}

fig4, axes4 = plt.subplots(1, 3, figsize=(15, 5), sharey=False)
fig4.suptitle('Threshold Policy Comparison Across Scenarios', fontsize=14, fontweight='bold')

colors_sc = ['tomato', 'darkorange', 'steelblue']

for ax_s, (label, (ws, wc)), col in zip(axes4, scenarios.items(), colors_sc):
c = ws * FAR + wc * FRR
idx = np.argmin(c)
opt_t = thresholds[idx]
far_opt = FAR[idx]
frr_opt = FRR[idx]

ax_s.plot(thresholds, FAR, color='tomato', lw=1.8, label='FAR')
ax_s.plot(thresholds, FRR, color='steelblue', lw=1.8, label='FRR')
ax_s.plot(thresholds, c, color=col, lw=2, ls='-.', label='Cost')
ax_s.axvline(opt_t, color='black', lw=2, ls='--',
label=f'θ*={opt_t:.3f}\nFAR={far_opt:.3f}\nFRR={frr_opt:.3f}')
ax_s.fill_betweenx([0, 1], opt_t - 0.005, opt_t + 0.005, color='black', alpha=0.15)
ax_s.set_title(label, fontsize=11)
ax_s.set_xlabel('Threshold θ', fontsize=10)
ax_s.set_ylabel('Rate', fontsize=10)
ax_s.legend(fontsize=8)
ax_s.set_ylim(0, 1)
ax_s.grid(alpha=0.3)

plt.tight_layout()
plt.savefig('auth_threshold_4.png', dpi=150, bbox_inches='tight')
plt.show()
print("Figure 4 saved.")

# ── 11. Summary table ─────────────────────────────────────────────────────────
print("\n" + "=" * 58)
print(f"{'Scenario':<28} {'θ*':>6} {'FAR':>7} {'FRR':>7}")
print("=" * 58)
for label, (ws, wc) in scenarios.items():
c = ws * FAR + wc * FRR
idx = np.argmin(c)
lbl = label.replace('\n', ' ')
print(f"{lbl:<28} {thresholds[idx]:>6.3f} {FAR[idx]:>7.4f} {FRR[idx]:>7.4f}")
print(f"\nEER θ = {eer_theta:.3f} | EER value = {eer_value:.4f}")
print("=" * 58)

Code Walkthrough

Step 1 — Simulating Score Distributions

1
2
legit_scores  = np.clip(np.random.normal(0.75, 0.10, N), 0, 1)
attack_scores = np.clip(np.random.normal(0.40, 0.12, N), 0, 1)

We model authentication scores as Gaussian distributions:

  • Legitimate users cluster around $\mu = 0.75$ — they match expected behavior patterns
  • Attackers cluster around $\mu = 0.40$ — their behavior is anomalous

np.clip ensures all scores stay in $[0, 1]$. With 10,000 samples per class, the distributions are smooth and statistically stable.


Step 2 — Computing FAR and FRR

1
2
FAR = np.array([np.mean(attack_scores >= t) for t in thresholds])
FRR = np.array([np.mean(legit_scores < t) for t in thresholds])

For each of 500 candidate thresholds:

$$\text{FAR}(\theta) = \frac{|{x \in \text{attackers} : x \geq \theta}|}{N}$$

$$\text{FRR}(\theta) = \frac{|{x \in \text{legit} : x < \theta}|}{N}$$

As $\theta$ increases, FAR decreases (fewer attackers pass) but FRR increases (more legitimate users are blocked). This is the fundamental trade-off.


Step 3 — Finding the EER

1
2
eer_idx   = np.argmin(np.abs(FAR - FRR))
eer_value = (FAR[eer_idx] + FRR[eer_idx]) / 2

The Equal Error Rate is the threshold where both error types are equal:

$$\theta_{\text{EER}} = \arg\min_\theta |\text{FAR}(\theta) - \text{FRR}(\theta)|$$

EER is a standard single-number summary of system quality — lower is better. It tells you the best you can do when you treat both errors as equally costly.


Step 4 — Cost-Weighted Optimization

1
2
3
4
def total_cost(theta, w_s=0.7, w_c=0.3):
far = np.mean(attack_scores >= theta)
frr = np.mean(legit_scores < theta)
return w_s * far + w_c * frr

In the real world, errors are not equally costly. For a banking app:

  • Accepting an attacker ($w_s = 0.7$) is far more damaging than annoying a user
  • Rejecting a legitimate user ($w_c = 0.3$) causes friction but not financial harm

The optimal threshold is:

$$\theta^* = \arg\min_\theta \left[ w_s \cdot \text{FAR}(\theta) + w_c \cdot \text{FRR}(\theta) \right]$$


Step 5 — 3D Cost Surface

1
2
3
4
for i in range(WS.shape[0]):
for j in range(WS.shape[1]):
ws = WS[i, j]; wc = 1 - ws; th = TH[i, j]
COST_3D[i, j] = ws * np.mean(attack_scores >= th) + wc * np.mean(legit_scores < th)

We sweep over a grid of $w_s \in [0.1, 0.9]$ and $\theta \in [0.3, 0.8]$ with $w_c = 1 - w_s$. This creates a cost landscape that reveals how the optimal threshold shifts as the organization’s security priorities change.


Console Output

==========================================================
Scenario                         θ*     FAR     FRR
==========================================================
High-Security (ws=0.9, wc=0.1)  0.661  0.0148  0.1888
Balanced (ws=0.7, wc=0.3)     0.607  0.0433  0.0768
User-Friendly (ws=0.3, wc=0.7)  0.557  0.0982  0.0277

EER  θ = 0.591  |  EER value = 0.0563
==========================================================

Graph Explanations

Figure 1 — Score Distributions & Error Curves

The left panel shows how the two populations overlap — this overlap region is where all errors occur. No threshold can perfectly separate them. The right panel shows FAR dropping and FRR rising as $\theta$ increases. The cost-optimal threshold (green) sits to the right of the EER (purple), reflecting the higher penalty assigned to security failures.


Figure 2 — ROC Curve

The ROC curve plots TAR vs FAR as $\theta$ varies. A perfect classifier hugs the top-left corner. The color gradient (viridis) shows how the threshold moves along the curve — lower thresholds (yellow) give high TAR but also high FAR; higher thresholds (purple) suppress FAR at the cost of TAR.


Figure 3 — 3D Cost Surface and Optimal Ridge

This is the most informative visualization. The 3D surface shows total cost as a function of both $w_s$ and $\theta$. The cyan ridge line traces the optimal $\theta^*$ for each value of $w_s$. The right-hand 2D panel makes the relationship crystal clear: as you increase security weight, the optimal threshold rises, demanding higher scores to grant access.


Figure 4 — Three Policy Scenarios

Scenario $\theta^*$ FAR FRR
High-Security ($w_s=0.9$) ~0.62 very low higher
Balanced ($w_s=0.7$) ~0.58 low moderate
User-Friendly ($w_s=0.3$) ~0.48 higher very low

Each panel shows the cost minimum shifting leftward as we care more about user convenience. This directly quantifies the policy trade-off that product and security teams argue about in every sprint review.


Key Takeaways

The EER gives you a baseline, but real deployment requires cost-aware optimization. The relationship between security weight and optimal threshold is monotonically increasing — there is no magic number that works for all contexts. A threshold suitable for a low-stakes note-taking app would be dangerously permissive for an online banking system.

The 3D cost surface makes this concrete: the landscape has a clear valley whose bottom shifts predictably with your organizational risk appetite. Once you instrument your system to estimate actual costs of each error type (chargebacks, support tickets, customer churn), plugging those weights into this framework gives you a principled, defensible threshold — not just a gut feeling.