Optimizing Authentication Score Thresholds

May 8, 2026

Balancing Convenience vs. Security

When designing a user authentication system, one of the most critical — and often underappreciated — decisions is where to set the decision threshold. Set it too high, and legitimate users get locked out. Set it too low, and attackers slip through. This is the classic convenience vs. security trade-off, and today we’ll solve it with a concrete example using Python.

The Problem Setup

Imagine a bank’s login system that computes a risk score (0–1) for each login attempt based on factors like device fingerprint, location, typing speed, and time of day. We need to find the optimal threshold $\theta$ that separates legitimate users from attackers.

We define:

False Acceptance Rate (FAR): the probability that an attacker is incorrectly accepted
$$\text{FAR}(\theta) = P(\text{score} \geq \theta \mid \text{attacker})$$
False Rejection Rate (FRR): the probability that a legitimate user is incorrectly rejected
$$\text{FRR}(\theta) = P(\text{score} < \theta \mid \text{legitimate})$$
Equal Error Rate (EER): the point where $\text{FAR} = \text{FRR}$

The total cost we want to minimize is:

$$C(\theta) = w_s \cdot \text{FAR}(\theta) + w_c \cdot \text{FRR}(\theta)$$

where $w_s$ is the weight for security (cost of accepting an attacker) and $w_c$ is the weight for convenience (cost of rejecting a legitimate user).

The Full Python Code

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy.optimize import minimize_scalar
from mpl_toolkits.mplot3d import Axes3D
import warnings
warnings.filterwarnings('ignore')

# ── 1. Reproducibility ──────────────────────────────────────────────────────
np.random.seed(42)

# ── 2. Simulate authentication score distributions ──────────────────────────
N = 10_000

# Legitimate users: high scores (mean=0.75, std=0.10)
legit_scores  = np.clip(np.random.normal(0.75, 0.10, N), 0, 1)

# Attackers: lower scores (mean=0.40, std=0.12)
attack_scores = np.clip(np.random.normal(0.40, 0.12, N), 0, 1)

thresholds = np.linspace(0, 1, 500)

# ── 3. FAR / FRR as functions of threshold ──────────────────────────────────
FAR = np.array([np.mean(attack_scores >= t) for t in thresholds])
FRR = np.array([np.mean(legit_scores   <  t) for t in thresholds])

# ── 4. Equal Error Rate (EER) ────────────────────────────────────────────────
eer_idx   = np.argmin(np.abs(FAR - FRR))
eer_value = (FAR[eer_idx] + FRR[eer_idx]) / 2
eer_theta = thresholds[eer_idx]

# ── 5. Cost-based optimal threshold ─────────────────────────────────────────
def total_cost(theta, w_s=0.7, w_c=0.3):
    far = np.mean(attack_scores >= theta)
    frr = np.mean(legit_scores   <  theta)
    return w_s * far + w_c * frr

costs_default = np.array([total_cost(t) for t in thresholds])
opt_idx_default  = np.argmin(costs_default)
opt_theta_default = thresholds[opt_idx_default]

# ── 6. Figure 1 – Score distributions ────────────────────────────────────────
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
fig.suptitle("User Authentication Score Threshold Optimization", fontsize=15, fontweight='bold')

ax = axes[0]
ax.hist(legit_scores,  bins=60, alpha=0.6, color='steelblue', label='Legitimate Users', density=True)
ax.hist(attack_scores, bins=60, alpha=0.6, color='tomato',    label='Attackers',        density=True)
ax.axvline(eer_theta,         color='purple', lw=2, ls='--', label=f'EER θ = {eer_theta:.3f}')
ax.axvline(opt_theta_default, color='green',  lw=2, ls='-',  label=f'Cost-Opt θ = {opt_theta_default:.3f}')
ax.set_xlabel('Authentication Score', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('Score Distributions (Legitimate vs Attacker)', fontsize=12)
ax.legend()
ax.grid(alpha=0.3)

# ── 7. Figure 1 right – FAR / FRR / Cost curves ──────────────────────────────
ax2 = axes[1]
ax2.plot(thresholds, FAR,            color='tomato',    lw=2, label='FAR (Security Risk)')
ax2.plot(thresholds, FRR,            color='steelblue', lw=2, label='FRR (User Friction)')
ax2.plot(thresholds, costs_default,  color='darkorange',lw=2, label='Total Cost (ws=0.7, wc=0.3)', ls='-.')
ax2.axvline(eer_theta,          color='purple', lw=1.5, ls='--', label=f'EER  θ={eer_theta:.3f} ({eer_value:.3f})')
ax2.axvline(opt_theta_default,  color='green',  lw=1.5, ls='-',  label=f'Opt  θ={opt_theta_default:.3f}')
ax2.scatter([eer_theta], [eer_value], color='purple', zorder=5, s=80)
ax2.scatter([opt_theta_default], [costs_default[opt_idx_default]], color='green', zorder=5, s=80)
ax2.set_xlabel('Threshold θ', fontsize=12)
ax2.set_ylabel('Rate / Cost', fontsize=12)
ax2.set_title('FAR, FRR & Total Cost vs Threshold', fontsize=12)
ax2.legend(fontsize=9)
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.savefig('auth_threshold_1.png', dpi=150, bbox_inches='tight')
plt.show()
print("Figure 1 saved.")

# ── 8. Figure 2 – ROC curve ───────────────────────────────────────────────────
TAR = 1 - FRR   # True Acceptance Rate

fig2, ax3 = plt.subplots(figsize=(7, 6))
sc = ax3.scatter(FAR, TAR, c=thresholds, cmap='viridis', s=8, zorder=2)
plt.colorbar(sc, ax=ax3, label='Threshold θ')
ax3.plot([0, 1], [0, 1], 'k--', lw=1, alpha=0.5, label='Random Classifier')
ax3.scatter([FAR[eer_idx]], [TAR[eer_idx]],
            color='purple', s=120, zorder=5, label=f'EER θ={eer_theta:.3f}')
ax3.scatter([FAR[opt_idx_default]], [TAR[opt_idx_default]],
            color='green',  s=120, zorder=5, label=f'Cost-Opt θ={opt_theta_default:.3f}')
ax3.set_xlabel('FAR  (False Acceptance Rate)', fontsize=12)
ax3.set_ylabel('TAR  (True Acceptance Rate)',  fontsize=12)
ax3.set_title('ROC Curve for Authentication System', fontsize=13, fontweight='bold')
ax3.legend()
ax3.grid(alpha=0.3)
plt.tight_layout()
plt.savefig('auth_threshold_2.png', dpi=150, bbox_inches='tight')
plt.show()
print("Figure 2 saved.")

# ── 9. Figure 3 – 3D Cost Surface over (ws, wc, theta) ───────────────────────
ws_vals    = np.linspace(0.1, 0.9, 60)
theta_vals = np.linspace(0.3, 0.8, 60)
WS, TH     = np.meshgrid(ws_vals, theta_vals)
COST_3D    = np.zeros_like(WS)

for i in range(WS.shape[0]):
    for j in range(WS.shape[1]):
        ws = WS[i, j]
        wc = 1 - ws
        th = TH[i, j]
        COST_3D[i, j] = ws * np.mean(attack_scores >= th) + wc * np.mean(legit_scores < th)

fig3 = plt.figure(figsize=(13, 6))

ax4 = fig3.add_subplot(121, projection='3d')
surf = ax4.plot_surface(WS, TH, COST_3D, cmap='plasma', alpha=0.85, linewidth=0)
fig3.colorbar(surf, ax=ax4, shrink=0.5, label='Total Cost')
ax4.set_xlabel('Security Weight wₛ', fontsize=10)
ax4.set_ylabel('Threshold θ',         fontsize=10)
ax4.set_zlabel('Total Cost',          fontsize=10)
ax4.set_title('3D Cost Surface\n(wₛ + wc = 1)', fontsize=11, fontweight='bold')
ax4.view_init(elev=30, azim=-60)

# Optimal ridge
opt_thetas_3d = []
for ws in ws_vals:
    wc = 1 - ws
    c  = ws * np.array([np.mean(attack_scores >= t) for t in theta_vals]) \
       + wc * np.array([np.mean(legit_scores   <  t) for t in theta_vals])
    opt_thetas_3d.append(theta_vals[np.argmin(c)])

opt_costs_3d = [ws * np.mean(attack_scores >= ot) + (1 - ws) * np.mean(legit_scores < ot)
                for ws, ot in zip(ws_vals, opt_thetas_3d)]
ax4.plot(ws_vals, opt_thetas_3d, opt_costs_3d,
         color='cyan', lw=3, zorder=5, label='Optimal Ridge')
ax4.legend(fontsize=9)

# Right panel – optimal theta vs ws
ax5 = fig3.add_subplot(122)
ax5.plot(ws_vals, opt_thetas_3d, color='darkorange', lw=2.5)
ax5.axvline(0.7, color='green', ls='--', lw=1.5, label='ws=0.7 (default)')
ax5.set_xlabel('Security Weight wₛ', fontsize=12)
ax5.set_ylabel('Optimal Threshold θ*', fontsize=12)
ax5.set_title('Optimal Threshold vs Security Weight', fontsize=12, fontweight='bold')
ax5.legend()
ax5.grid(alpha=0.3)

plt.tight_layout()
plt.savefig('auth_threshold_3.png', dpi=150, bbox_inches='tight')
plt.show()
print("Figure 3 saved.")

# ── 10. Figure 4 – Scenario comparison (High-Security vs Balanced vs User-Friendly) ──
scenarios = {
    'High-Security\n(ws=0.9, wc=0.1)': (0.9, 0.1),
    'Balanced\n(ws=0.7, wc=0.3)':      (0.7, 0.3),
    'User-Friendly\n(ws=0.3, wc=0.7)': (0.3, 0.7),
}

fig4, axes4 = plt.subplots(1, 3, figsize=(15, 5), sharey=False)
fig4.suptitle('Threshold Policy Comparison Across Scenarios', fontsize=14, fontweight='bold')

colors_sc = ['tomato', 'darkorange', 'steelblue']

for ax_s, (label, (ws, wc)), col in zip(axes4, scenarios.items(), colors_sc):
    c = ws * FAR + wc * FRR
    idx = np.argmin(c)
    opt_t = thresholds[idx]
    far_opt = FAR[idx]
    frr_opt = FRR[idx]

    ax_s.plot(thresholds, FAR, color='tomato',    lw=1.8, label='FAR')
    ax_s.plot(thresholds, FRR, color='steelblue', lw=1.8, label='FRR')
    ax_s.plot(thresholds, c,   color=col,         lw=2,   ls='-.', label='Cost')
    ax_s.axvline(opt_t, color='black', lw=2, ls='--',
                 label=f'θ*={opt_t:.3f}\nFAR={far_opt:.3f}\nFRR={frr_opt:.3f}')
    ax_s.fill_betweenx([0, 1], opt_t - 0.005, opt_t + 0.005, color='black', alpha=0.15)
    ax_s.set_title(label, fontsize=11)
    ax_s.set_xlabel('Threshold θ', fontsize=10)
    ax_s.set_ylabel('Rate', fontsize=10)
    ax_s.legend(fontsize=8)
    ax_s.set_ylim(0, 1)
    ax_s.grid(alpha=0.3)

plt.tight_layout()
plt.savefig('auth_threshold_4.png', dpi=150, bbox_inches='tight')
plt.show()
print("Figure 4 saved.")

# ── 11. Summary table ─────────────────────────────────────────────────────────
print("\n" + "=" * 58)
print(f"{'Scenario':<28} {'θ*':>6} {'FAR':>7} {'FRR':>7}")
print("=" * 58)
for label, (ws, wc) in scenarios.items():
    c   = ws * FAR + wc * FRR
    idx = np.argmin(c)
    lbl = label.replace('\n', ' ')
    print(f"{lbl:<28} {thresholds[idx]:>6.3f} {FAR[idx]:>7.4f} {FRR[idx]:>7.4f}")
print(f"\nEER  θ = {eer_theta:.3f}  |  EER value = {eer_value:.4f}")
print("=" * 58)

Code Walkthrough

Step 1 — Simulating Score Distributions

1 2	legit_scores = np.clip(np.random.normal(0.75, 0.10, N), 0, 1) attack_scores = np.clip(np.random.normal(0.40, 0.12, N), 0, 1)

We model authentication scores as Gaussian distributions:

Legitimate users cluster around $\mu = 0.75$ — they match expected behavior patterns
Attackers cluster around $\mu = 0.40$ — their behavior is anomalous

np.clip ensures all scores stay in $[0, 1]$. With 10,000 samples per class, the distributions are smooth and statistically stable.

Step 2 — Computing FAR and FRR

1 2	FAR = np.array([np.mean(attack_scores >= t) for t in thresholds]) FRR = np.array([np.mean(legit_scores < t) for t in thresholds])

For each of 500 candidate thresholds:

$$\text{FAR}(\theta) = \frac{|{x \in \text{attackers} : x \geq \theta}|}{N}$$

$$\text{FRR}(\theta) = \frac{|{x \in \text{legit} : x < \theta}|}{N}$$

As $\theta$ increases, FAR decreases (fewer attackers pass) but FRR increases (more legitimate users are blocked). This is the fundamental trade-off.

Step 3 — Finding the EER

1 2	eer_idx = np.argmin(np.abs(FAR - FRR)) eer_value = (FAR[eer_idx] + FRR[eer_idx]) / 2

The Equal Error Rate is the threshold where both error types are equal:

$$\theta_{\text{EER}} = \arg\min_\theta |\text{FAR}(\theta) - \text{FRR}(\theta)|$$

EER is a standard single-number summary of system quality — lower is better. It tells you the best you can do when you treat both errors as equally costly.

Step 4 — Cost-Weighted Optimization

def total_cost(theta, w_s=0.7, w_c=0.3):
    far = np.mean(attack_scores >= theta)
    frr = np.mean(legit_scores   <  theta)
    return w_s * far + w_c * frr

In the real world, errors are not equally costly. For a banking app:

Accepting an attacker ($w_s = 0.7$) is far more damaging than annoying a user
Rejecting a legitimate user ($w_c = 0.3$) causes friction but not financial harm

The optimal threshold is:

$$\theta^* = \arg\min_\theta \left[ w_s \cdot \text{FAR}(\theta) + w_c \cdot \text{FRR}(\theta) \right]$$

Step 5 — 3D Cost Surface

for i in range(WS.shape[0]):
    for j in range(WS.shape[1]):
        ws = WS[i, j]; wc = 1 - ws; th = TH[i, j]
        COST_3D[i, j] = ws * np.mean(attack_scores >= th) + wc * np.mean(legit_scores < th)

We sweep over a grid of $w_s \in [0.1, 0.9]$ and $\theta \in [0.3, 0.8]$ with $w_c = 1 - w_s$. This creates a cost landscape that reveals how the optimal threshold shifts as the organization’s security priorities change.

Console Output

==========================================================
Scenario                         θ*     FAR     FRR
==========================================================
High-Security (ws=0.9, wc=0.1)  0.661  0.0148  0.1888
Balanced (ws=0.7, wc=0.3)     0.607  0.0433  0.0768
User-Friendly (ws=0.3, wc=0.7)  0.557  0.0982  0.0277

EER  θ = 0.591  |  EER value = 0.0563
==========================================================

Graph Explanations

Figure 1 — Score Distributions & Error Curves

The left panel shows how the two populations overlap — this overlap region is where all errors occur. No threshold can perfectly separate them. The right panel shows FAR dropping and FRR rising as $\theta$ increases. The cost-optimal threshold (green) sits to the right of the EER (purple), reflecting the higher penalty assigned to security failures.

Figure 2 — ROC Curve

The ROC curve plots TAR vs FAR as $\theta$ varies. A perfect classifier hugs the top-left corner. The color gradient (viridis) shows how the threshold moves along the curve — lower thresholds (yellow) give high TAR but also high FAR; higher thresholds (purple) suppress FAR at the cost of TAR.

Figure 3 — 3D Cost Surface and Optimal Ridge

This is the most informative visualization. The 3D surface shows total cost as a function of both $w_s$ and $\theta$. The cyan ridge line traces the optimal $\theta^*$ for each value of $w_s$. The right-hand 2D panel makes the relationship crystal clear: as you increase security weight, the optimal threshold rises, demanding higher scores to grant access.

Figure 4 — Three Policy Scenarios

Scenario	$\theta^*$	FAR	FRR
High-Security ($w_s=0.9$)	~0.62	very low	higher
Balanced ($w_s=0.7$)	~0.58	low	moderate
User-Friendly ($w_s=0.3$)	~0.48	higher	very low

Each panel shows the cost minimum shifting leftward as we care more about user convenience. This directly quantifies the policy trade-off that product and security teams argue about in every sprint review.

Key Takeaways

The EER gives you a baseline, but real deployment requires cost-aware optimization. The relationship between security weight and optimal threshold is monotonically increasing — there is no magic number that works for all contexts. A threshold suitable for a low-stakes note-taking app would be dangerously permissive for an online banking system.

The 3D cost surface makes this concrete: the landscape has a clear valley whose bottom shifts predictably with your organizational risk appetite. Once you instrument your system to estimate actual costs of each error type (chargebacks, support tickets, customer churn), plugging those weights into this framework gives you a principled, defensible threshold — not just a gut feeling.