Parameter Optimization for Anomaly Detection Algorithms

Maximizing Detection Accuracy

Anomaly detection sits at the heart of modern monitoring systems — from fraud detection in banking to fault detection in industrial machinery. Yet the algorithms themselves are only as good as their parameters. In this article, we formulate parameter optimization for anomaly detection as a concrete mathematical optimization problem and solve it end-to-end in Python.


Problem Setup

We focus on the Isolation Forest algorithm. Its key parameters are:

  • $n_estimators$: number of isolation trees
  • $\max_samples$: fraction of samples per tree
  • $contamination$: assumed fraction of outliers in the dataset

We optimize these to maximize the F1 score:

$$F_1 = 2 \cdot \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}, \quad \text{Precision} = \frac{TP}{TP + FP}, \quad \text{Recall} = \frac{TP}{TP + FN}$$

The optimization objective is:

$$\theta^* = \arg\max_{\theta \in \Theta} ; F_1!\left(\mathcal{D}_{val},, \theta\right)$$

Since $F_1$ is non-differentiable with respect to $\theta$, we use Bayesian Optimization guided by the Expected Improvement acquisition function:

$$EI(\theta) = \mathbb{E}!\left[\max!\left(F_1(\theta) - F_1^+,, 0\right)\right]$$

where $F_1^+$ is the best observed value so far.


Synthetic Dataset

True contamination rate: $\rho = 0.08$ (8% anomalies).


Full Source Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
# ============================================================
# Install dependency (Google Colaboratory)
# ============================================================
!pip install scikit-optimize -q

# ============================================================
# Anomaly Detection Parameter Optimization
# Maximizing F1 Score via Bayesian Optimization
# ============================================================

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from sklearn.ensemble import IsolationForest
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix
from sklearn.model_selection import train_test_split
from scipy.interpolate import griddata
from skopt import gp_minimize
from skopt.space import Integer, Real
from skopt.utils import use_named_args
import warnings
warnings.filterwarnings("ignore")

# ------------------------------------------------------------
# 0. Reproducibility
# ------------------------------------------------------------
SEED = 42

# ------------------------------------------------------------
# 1. Synthetic Dataset Generation
# ------------------------------------------------------------
def generate_dataset(n_normal=1000, n_anomaly=88, seed=42):
rng_local = np.random.default_rng(seed)
n1 = n_normal // 2
n2 = n_normal - n1
X_norm1 = rng_local.multivariate_normal(
mean=[-2.0, -2.0],
cov=[[0.6, 0.2], [0.2, 0.6]],
size=n1
)
X_norm2 = rng_local.multivariate_normal(
mean=[2.0, 2.0],
cov=[[0.8, -0.3], [-0.3, 0.8]],
size=n2
)
X_normal = np.vstack([X_norm1, X_norm2])
X_anom = rng_local.uniform(low=-6.0, high=6.0, size=(n_anomaly, 2))
X = np.vstack([X_normal, X_anom])
y = np.array([1] * n_normal + [-1] * n_anomaly)
return X, y

X, y = generate_dataset()
X_train, X_val, y_train, y_val = train_test_split(
X, y, test_size=0.3, stratify=y, random_state=SEED
)
print(f"Train: {X_train.shape[0]} samples | Val: {X_val.shape[0]} samples")
print(f"Anomaly rate (val): {(y_val == -1).mean():.3f}")

# ------------------------------------------------------------
# 2. Bayesian Optimization
# ------------------------------------------------------------
search_space = [
Integer(50, 400, name="n_estimators"),
Real( 0.1, 1.0, name="max_samples"),
Real( 0.01, 0.25, name="contamination"),
]

call_history = []

@use_named_args(search_space)
def objective(n_estimators, max_samples, contamination):
model = IsolationForest(
n_estimators=int(n_estimators),
max_samples=float(max_samples),
contamination=float(contamination),
random_state=SEED,
n_jobs=-1
)
model.fit(X_train)
y_pred = model.predict(X_val)
f1 = f1_score(y_val, y_pred, pos_label=-1)
call_history.append({
"n_estimators": int(n_estimators),
"max_samples": float(max_samples),
"contamination": float(contamination),
"f1": f1
})
return -f1 # minimize negative F1

print("\nRunning Bayesian Optimization (50 calls)...")
result = gp_minimize(
func=objective,
dimensions=search_space,
n_calls=50,
n_initial_points=10,
acq_func="EI",
random_state=SEED,
verbose=False
)

best_params = {
"n_estimators": int(result.x[0]),
"max_samples": float(result.x[1]),
"contamination": float(result.x[2]),
}
best_f1 = -result.fun
print(f"\nBest Parameters : {best_params}")
print(f"Best F1 Score : {best_f1:.4f}")

# ------------------------------------------------------------
# 3. Final Model Evaluation
# ------------------------------------------------------------
best_model = IsolationForest(
n_estimators=best_params["n_estimators"],
max_samples=best_params["max_samples"],
contamination=best_params["contamination"],
random_state=SEED,
n_jobs=-1
)
best_model.fit(X_train)
y_pred_best = best_model.predict(X_val)

prec = precision_score(y_val, y_pred_best, pos_label=-1)
rec = recall_score(y_val, y_pred_best, pos_label=-1)
f1_opt = f1_score(y_val, y_pred_best, pos_label=-1)
cm_best = confusion_matrix(y_val, y_pred_best)

print(f"\nPrecision: {prec:.4f} | Recall: {rec:.4f} | F1: {f1_opt:.4f}")
print(f"Confusion Matrix:\n{cm_best}")

baseline = IsolationForest(contamination=0.08, random_state=SEED, n_jobs=-1)
baseline.fit(X_train)
y_pred_base = baseline.predict(X_val)
f1_base = f1_score(y_val, y_pred_base, pos_label=-1)
print(f"Baseline F1 : {f1_base:.4f}")

# ------------------------------------------------------------
# 4. Visualization
# ------------------------------------------------------------
f1_history = [-v for v in result.func_vals]
calls_idx = np.arange(1, len(f1_history) + 1)
best_so_far = np.maximum.accumulate(f1_history)

# Decision boundary meshgrid
x0_min, x0_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
x1_min, x1_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.linspace(x0_min, x0_max, 300),
np.linspace(x1_min, x1_max, 300))
Z = best_model.decision_function(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)

# Build arrays from call history
hist_arr = np.array([[h["contamination"], h["max_samples"],
h["n_estimators"], h["f1"]]
for h in call_history])
cont_vals = hist_arr[:, 0]
ms_vals = hist_arr[:, 1]
ne_vals = hist_arr[:, 2]
f1_vals_3d = hist_arr[:, 3]

# 3D surface 1: contamination × max_samples → F1
cont_grid = np.linspace(cont_vals.min(), cont_vals.max(), 60)
ms_grid = np.linspace(ms_vals.min(), ms_vals.max(), 60)
CG, MG = np.meshgrid(cont_grid, ms_grid)
F1G = griddata((cont_vals, ms_vals), f1_vals_3d,
(CG, MG), method="cubic")

# 3D surface 2: n_estimators × contamination → F1
ne_grid = np.linspace(ne_vals.min(), ne_vals.max(), 40)
co_grid = np.linspace(cont_vals.min(), cont_vals.max(), 40)
NEG, COG = np.meshgrid(ne_grid, co_grid)
F1G2 = griddata((ne_vals, cont_vals), f1_vals_3d,
(NEG, COG), method="cubic")

# ---- Style constants ----
BG_FIG = "#0d0d0d"
BG_AX = "#141414"
TITLE_COLOR = "#e8e0d0"
LABEL_COLOR = "#b0a898"
GRID_COLOR = "#2a2a2a"
ACCENT1 = "#ff6b35"
ACCENT2 = "#00d4ff"
ACCENT_GREEN = "#39ff14"

def style_ax(ax, title="", xlabel="", ylabel=""):
ax.set_facecolor(BG_AX)
for sp in ax.spines.values():
sp.set_edgecolor("#333333")
ax.tick_params(colors=LABEL_COLOR, labelsize=9)
ax.xaxis.label.set_color(LABEL_COLOR)
ax.yaxis.label.set_color(LABEL_COLOR)
ax.grid(color=GRID_COLOR, linewidth=0.5, linestyle="--")
if title:
ax.set_title(title, color=TITLE_COLOR, fontsize=11,
fontweight="bold", pad=8)
if xlabel:
ax.set_xlabel(xlabel, fontsize=9)
if ylabel:
ax.set_ylabel(ylabel, fontsize=9)

fig = plt.figure(figsize=(20, 18), facecolor=BG_FIG)
gs = gridspec.GridSpec(3, 3, figure=fig,
hspace=0.45, wspace=0.35,
left=0.06, right=0.97,
top=0.93, bottom=0.06)

# ---- Panel 1: Convergence ----
ax1 = fig.add_subplot(gs[0, 0])
style_ax(ax1, "Bayesian Optimization Convergence",
"Iteration", "F1 Score (anomaly class)")
ax1.scatter(calls_idx[:10], f1_history[:10],
color=LABEL_COLOR, s=30, zorder=3,
label="Random init", alpha=0.7)
ax1.scatter(calls_idx[10:], f1_history[10:],
color=ACCENT2, s=30, zorder=3,
label="BO calls", alpha=0.8)
ax1.plot(calls_idx, best_so_far, color=ACCENT1,
linewidth=2.0, label="Best so far", zorder=4)
ax1.axhline(f1_base, color=ACCENT_GREEN, linewidth=1.2,
linestyle="--", label=f"Baseline={f1_base:.3f}")
ax1.legend(fontsize=8, facecolor="#1a1a1a",
labelcolor=TITLE_COLOR, edgecolor="#444444")

# ---- Panel 2: Decision boundary ----
ax2 = fig.add_subplot(gs[0, 1:])
style_ax(ax2, "Isolation Forest Decision Boundary (Optimal Parameters)",
"Feature 1", "Feature 2")
cf = ax2.contourf(xx, yy, Z, levels=30, cmap="RdYlGn", alpha=0.6)
ax2.contour(xx, yy, Z, levels=[0], colors=ACCENT1, linewidths=1.8)
cb = plt.colorbar(cf, ax=ax2)
cb.ax.tick_params(colors=LABEL_COLOR)
cb.set_label("Anomaly Score", color=LABEL_COLOR, fontsize=9)
mask_n = y_val == 1
mask_a = y_val == -1
fp_mask = (y_val == 1) & (y_pred_best == -1)
fn_mask = (y_val == -1) & (y_pred_best == 1)
ax2.scatter(X_val[mask_n, 0], X_val[mask_n, 1],
c=ACCENT2, s=12, alpha=0.5, label="Normal", zorder=3)
ax2.scatter(X_val[mask_a, 0], X_val[mask_a, 1],
c=ACCENT1, s=40, marker="x", linewidths=1.5,
label="Anomaly (true)", zorder=4)
ax2.scatter(X_val[fp_mask, 0], X_val[fp_mask, 1],
c="yellow", s=60, marker="^", alpha=0.9,
label="FP (normal→anomaly)", zorder=5)
ax2.scatter(X_val[fn_mask, 0], X_val[fn_mask, 1],
c="magenta", s=60, marker="v", alpha=0.9,
label="FN (anomaly→normal)", zorder=5)
ax2.legend(fontsize=8, facecolor="#1a1a1a",
labelcolor=TITLE_COLOR, edgecolor="#444444")

# ---- Panel 3: 3D surface contamination × max_samples ----
ax3 = fig.add_subplot(gs[1, :2], projection="3d")
ax3.set_facecolor(BG_AX)
for axis in [ax3.xaxis, ax3.yaxis, ax3.zaxis]:
axis.pane.fill = False
axis.pane.set_edgecolor("#333333")
surf3 = ax3.plot_surface(CG, MG, np.nan_to_num(F1G, nan=0.0),
cmap="plasma", alpha=0.85, edgecolor="none")
ax3.scatter(best_params["contamination"],
best_params["max_samples"],
best_f1,
color=ACCENT1, s=200, marker="*", zorder=10, label="Optimum")
ax3.set_xlabel("Contamination", color=LABEL_COLOR, fontsize=9, labelpad=8)
ax3.set_ylabel("Max Samples", color=LABEL_COLOR, fontsize=9, labelpad=8)
ax3.set_zlabel("F1 Score", color=LABEL_COLOR, fontsize=9, labelpad=8)
ax3.set_title("F1 Surface: contamination × max_samples",
color=TITLE_COLOR, fontsize=11, fontweight="bold", pad=10)
ax3.tick_params(colors=LABEL_COLOR, labelsize=7)
plt.colorbar(surf3, ax=ax3, shrink=0.4, pad=0.12).ax.tick_params(colors=LABEL_COLOR)
ax3.legend(fontsize=8, facecolor="#1a1a1a",
labelcolor=TITLE_COLOR, edgecolor="#444444")

# ---- Panel 4: 3D surface contamination × n_estimators ----
ax4 = fig.add_subplot(gs[1, 2], projection="3d")
ax4.set_facecolor(BG_AX)
for axis in [ax4.xaxis, ax4.yaxis, ax4.zaxis]:
axis.pane.fill = False
axis.pane.set_edgecolor("#333333")
surf4 = ax4.plot_surface(COG, NEG, np.nan_to_num(F1G2, nan=0.0),
cmap="viridis", alpha=0.85, edgecolor="none")
ax4.set_xlabel("Contamination", color=LABEL_COLOR, fontsize=7, labelpad=6)
ax4.set_ylabel("n_estimators", color=LABEL_COLOR, fontsize=7, labelpad=6)
ax4.set_zlabel("F1", color=LABEL_COLOR, fontsize=7, labelpad=6)
ax4.set_title("F1 Surface: contamination × n_estimators",
color=TITLE_COLOR, fontsize=9, fontweight="bold", pad=8)
ax4.tick_params(colors=LABEL_COLOR, labelsize=6)

# ---- Panel 5: Sampled parameter space ----
ax5 = fig.add_subplot(gs[2, 0])
style_ax(ax5, "Sampled Parameters Colored by F1",
"Contamination", "Max Samples")
sc = ax5.scatter(
[h["contamination"] for h in call_history],
[h["max_samples"] for h in call_history],
c=[h["f1"] for h in call_history],
cmap="plasma", s=50, alpha=0.85, edgecolors="none"
)
ax5.scatter(best_params["contamination"],
best_params["max_samples"],
color=ACCENT1, s=200, marker="*", zorder=5, label="Optimum")
cb5 = plt.colorbar(sc, ax=ax5)
cb5.ax.tick_params(colors=LABEL_COLOR)
cb5.set_label("F1", color=LABEL_COLOR, fontsize=8)
ax5.legend(fontsize=8, facecolor="#1a1a1a",
labelcolor=TITLE_COLOR, edgecolor="#444444")

# ---- Panel 6: F1 per call bar chart ----
ax6 = fig.add_subplot(gs[2, 1])
style_ax(ax6, "F1 Score per BO Call", "Call Index", "F1 Score")
colors_bar = [ACCENT2 if i >= 10 else LABEL_COLOR
for i in range(len(f1_history))]
ax6.bar(calls_idx, f1_history, color=colors_bar, alpha=0.75, width=0.7)
ax6.axhline(best_f1, color=ACCENT1, linewidth=1.5, linestyle="--",
label=f"Best={best_f1:.3f}")
ax6.axhline(f1_base, color=ACCENT_GREEN, linewidth=1.2, linestyle="--",
label=f"Baseline={f1_base:.3f}")
ax6.legend(fontsize=8, facecolor="#1a1a1a",
labelcolor=TITLE_COLOR, edgecolor="#444444")

# ---- Panel 7: Confusion matrix ----
ax7 = fig.add_subplot(gs[2, 2])
style_ax(ax7, "Confusion Matrix (Optimal Model)")
ax7.set_facecolor(BG_AX)
im = ax7.imshow(cm_best, cmap="YlOrRd", aspect="auto")
for i in range(2):
for j in range(2):
ax7.text(j, i, str(cm_best[i, j]),
ha="center", va="center",
color="white", fontsize=16, fontweight="bold")
ax7.set_xticks([0, 1])
ax7.set_yticks([0, 1])
ax7.set_xticklabels(["Pred Normal", "Pred Anomaly"],
color=LABEL_COLOR, fontsize=8)
ax7.set_yticklabels(["True Normal", "True Anomaly"],
color=LABEL_COLOR, fontsize=8)
plt.colorbar(im, ax=ax7).ax.tick_params(colors=LABEL_COLOR)

fig.suptitle(
"Anomaly Detection Parameter Optimization"
" — Isolation Forest + Bayesian Optimization",
color=TITLE_COLOR, fontsize=14, fontweight="bold", y=0.97
)
plt.savefig("anomaly_detection_optimization.png", dpi=150,
bbox_inches="tight", facecolor=fig.get_facecolor())
plt.show()
print("Figure saved.")

Code Walkthrough

Section 0 — Installing scikit-optimize

!pip install scikit-optimize -q installs the library at runtime. This must be the very first cell executed so that all subsequent imports resolve correctly.

Section 1 — Synthetic Dataset

generate_dataset() builds a controlled benchmark: 1,000 normal points from two overlapping Gaussians and 88 anomalies (~8% rate) drawn uniformly over $[-6, 6]^2$. The two Gaussians have intentionally different covariance structures to make the decision boundary non-trivial. train_test_split with stratify=y preserves the anomaly rate in both splits.

Section 2 — Bayesian Optimization

The search space covers three parameters:

Parameter Type Range
n_estimators Integer $[50, 400]$
max_samples Real $[0.1, 1.0]$
contamination Real $[0.01, 0.25]$

The @use_named_args decorator wires the flat list used internally by gp_minimize to named Python arguments. The objective returns negative F1 because gp_minimize minimizes. With n_initial_points=10, the Gaussian Process is bootstrapped on 10 random draws; the remaining 40 calls are guided by the EI acquisition function — far more efficient than the $\sim 5{,}000$ evaluations a fine grid search would require over the same space.

Section 3 — Final Evaluation

The best parameter vector result.x rebuilds the optimal model. Metrics are reported with pos_label=-1 (anomaly class). A baseline IsolationForest with default contamination=0.08 provides the comparison reference.

Section 4 — Visualization

Panel 1 — Convergence Curve: Plots the F1 trajectory across all 50 BO calls. Grey dots = random initialization; cyan dots = BO-guided proposals; orange line = running best; green dashed = baseline. The shift in quality between the two phases shows BO learning the objective landscape.

Panel 2 — Decision Boundary: decision_function output is rendered as a color field. The orange contour at $z=0$ is the learned boundary. False Positives (▲) and False Negatives (▼) are individually marked to show exactly where the model errs.

Panels 3 & 4 — 3D F1 Surfaces: scipy.griddata interpolates the scattered BO evaluations onto a regular grid. Panel 3 maps contamination × max_samples → F1; Panel 4 maps contamination × n_estimators → F1. These surfaces reveal objective landscape structure — flat plateaus vs sharp ridges — that a grid search would need orders of magnitude more evaluations to uncover.

Panel 5 — Sampled Parameter Space: 2D scatter of contamination vs max_samples colored by F1, showing how BO concentrates high-F1 samples around the optimal region as iterations progress.

Panel 6 — Per-Call F1 Bar: Direct visualization of the F1 returned on every call. The quality improvement in the cyan (BO) phase versus the grey (random) phase is clearly visible.

Panel 7 — Confusion Matrix: The final 2×2 matrix of the optimal model on the validation set. Color intensity encodes count; values are printed directly in cells.


Execution Results

Train: 761 samples | Val: 327 samples
Anomaly rate (val): 0.080

Running Bayesian Optimization (50 calls)...

Best Parameters : {'n_estimators': 395, 'max_samples': 0.9802843384145278, 'contamination': 0.06691589893530461}
Best F1 Score   : 0.8750

Precision: 0.9545 | Recall: 0.8077 | F1: 0.8750
Confusion Matrix:
[[ 21   5]
 [  1 300]]
Baseline F1     : 0.7857

Figure saved.

Key Takeaways

The dominant insight is that the contamination hyperparameter drives most of the F1 variance: setting it far from the true anomaly rate $\rho = 0.08$ either misses anomalies (too low) or mislabels normal points (too high). The 3D surface in Panel 3 makes this ridge structure immediately visible.

Bayesian Optimization navigates to the optimal ridge in 50 evaluations. The surrogate-based acquisition balances exploitation of known good regions with exploration of uncertain ones:

$$\theta_{t+1} = \arg\max_\theta ; EI(\theta) = \arg\max_\theta ; \mathbb{E}!\left[\max!\left(\hat{f}(\theta) - f^+,, 0\right)\right]$$

where $\hat{f}(\theta)$ is the GP posterior mean and $f^+$ is the current best observed F1. This makes it the method of choice whenever each evaluation involves training a non-trivial model on a large dataset — precisely the regime that makes naive grid search impractical.