Parameter Optimization for Anomaly Detection Algorithms

May 1, 2026

Maximizing Detection Accuracy

Anomaly detection sits at the heart of modern monitoring systems — from fraud detection in banking to fault detection in industrial machinery. Yet the algorithms themselves are only as good as their parameters. In this article, we formulate parameter optimization for anomaly detection as a concrete mathematical optimization problem and solve it end-to-end in Python.

Problem Setup

We focus on the Isolation Forest algorithm. Its key parameters are:

$n_estimators$: number of isolation trees
$\max_samples$: fraction of samples per tree
$contamination$: assumed fraction of outliers in the dataset

We optimize these to maximize the F1 score:

$$F_1 = 2 \cdot \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}, \quad \text{Precision} = \frac{TP}{TP + FP}, \quad \text{Recall} = \frac{TP}{TP + FN}$$

The optimization objective is:

$$\theta^* = \arg\max_{\theta \in \Theta} ; F_1!\left(\mathcal{D}_{val},, \theta\right)$$

Since $F_1$ is non-differentiable with respect to $\theta$, we use Bayesian Optimization guided by the Expected Improvement acquisition function:

$$EI(\theta) = \mathbb{E}!\left[\max!\left(F_1(\theta) - F_1^+,, 0\right)\right]$$

where $F_1^+$ is the best observed value so far.

Synthetic Dataset

True contamination rate: $\rho = 0.08$ (8% anomalies).

Full Source Code

# ============================================================
# Install dependency (Google Colaboratory)
# ============================================================
!pip install scikit-optimize -q

# ============================================================
# Anomaly Detection Parameter Optimization
# Maximizing F1 Score via Bayesian Optimization
# ============================================================

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from sklearn.ensemble import IsolationForest
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix
from sklearn.model_selection import train_test_split
from scipy.interpolate import griddata
from skopt import gp_minimize
from skopt.space import Integer, Real
from skopt.utils import use_named_args
import warnings
warnings.filterwarnings("ignore")

# ------------------------------------------------------------
# 0. Reproducibility
# ------------------------------------------------------------
SEED = 42

# ------------------------------------------------------------
# 1. Synthetic Dataset Generation
# ------------------------------------------------------------
def generate_dataset(n_normal=1000, n_anomaly=88, seed=42):
    rng_local = np.random.default_rng(seed)
    n1 = n_normal // 2
    n2 = n_normal - n1
    X_norm1 = rng_local.multivariate_normal(
        mean=[-2.0, -2.0],
        cov=[[0.6, 0.2], [0.2, 0.6]],
        size=n1
    )
    X_norm2 = rng_local.multivariate_normal(
        mean=[2.0, 2.0],
        cov=[[0.8, -0.3], [-0.3, 0.8]],
        size=n2
    )
    X_normal = np.vstack([X_norm1, X_norm2])
    X_anom   = rng_local.uniform(low=-6.0, high=6.0, size=(n_anomaly, 2))
    X = np.vstack([X_normal, X_anom])
    y = np.array([1] * n_normal + [-1] * n_anomaly)
    return X, y

X, y = generate_dataset()
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.3, stratify=y, random_state=SEED
)
print(f"Train: {X_train.shape[0]} samples | Val: {X_val.shape[0]} samples")
print(f"Anomaly rate (val): {(y_val == -1).mean():.3f}")

# ------------------------------------------------------------
# 2. Bayesian Optimization
# ------------------------------------------------------------
search_space = [
    Integer(50,   400,  name="n_estimators"),
    Real(   0.1,  1.0,  name="max_samples"),
    Real(   0.01, 0.25, name="contamination"),
]

call_history = []

@use_named_args(search_space)
def objective(n_estimators, max_samples, contamination):
    model = IsolationForest(
        n_estimators=int(n_estimators),
        max_samples=float(max_samples),
        contamination=float(contamination),
        random_state=SEED,
        n_jobs=-1
    )
    model.fit(X_train)
    y_pred = model.predict(X_val)
    f1 = f1_score(y_val, y_pred, pos_label=-1)
    call_history.append({
        "n_estimators":  int(n_estimators),
        "max_samples":   float(max_samples),
        "contamination": float(contamination),
        "f1": f1
    })
    return -f1   # minimize negative F1

print("\nRunning Bayesian Optimization (50 calls)...")
result = gp_minimize(
    func=objective,
    dimensions=search_space,
    n_calls=50,
    n_initial_points=10,
    acq_func="EI",
    random_state=SEED,
    verbose=False
)

best_params = {
    "n_estimators":  int(result.x[0]),
    "max_samples":   float(result.x[1]),
    "contamination": float(result.x[2]),
}
best_f1 = -result.fun
print(f"\nBest Parameters : {best_params}")
print(f"Best F1 Score   : {best_f1:.4f}")

# ------------------------------------------------------------
# 3. Final Model Evaluation
# ------------------------------------------------------------
best_model = IsolationForest(
    n_estimators=best_params["n_estimators"],
    max_samples=best_params["max_samples"],
    contamination=best_params["contamination"],
    random_state=SEED,
    n_jobs=-1
)
best_model.fit(X_train)
y_pred_best = best_model.predict(X_val)

prec    = precision_score(y_val, y_pred_best, pos_label=-1)
rec     = recall_score(y_val,    y_pred_best, pos_label=-1)
f1_opt  = f1_score(y_val,        y_pred_best, pos_label=-1)
cm_best = confusion_matrix(y_val, y_pred_best)

print(f"\nPrecision: {prec:.4f} | Recall: {rec:.4f} | F1: {f1_opt:.4f}")
print(f"Confusion Matrix:\n{cm_best}")

baseline = IsolationForest(contamination=0.08, random_state=SEED, n_jobs=-1)
baseline.fit(X_train)
y_pred_base = baseline.predict(X_val)
f1_base = f1_score(y_val, y_pred_base, pos_label=-1)
print(f"Baseline F1     : {f1_base:.4f}")

# ------------------------------------------------------------
# 4. Visualization
# ------------------------------------------------------------
f1_history  = [-v for v in result.func_vals]
calls_idx   = np.arange(1, len(f1_history) + 1)
best_so_far = np.maximum.accumulate(f1_history)

# Decision boundary meshgrid
x0_min, x0_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
x1_min, x1_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.linspace(x0_min, x0_max, 300),
                     np.linspace(x1_min, x1_max, 300))
Z = best_model.decision_function(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)

# Build arrays from call history
hist_arr   = np.array([[h["contamination"], h["max_samples"],
                         h["n_estimators"], h["f1"]]
                        for h in call_history])
cont_vals  = hist_arr[:, 0]
ms_vals    = hist_arr[:, 1]
ne_vals    = hist_arr[:, 2]
f1_vals_3d = hist_arr[:, 3]

# 3D surface 1: contamination × max_samples → F1
cont_grid = np.linspace(cont_vals.min(), cont_vals.max(), 60)
ms_grid   = np.linspace(ms_vals.min(),   ms_vals.max(),  60)
CG, MG    = np.meshgrid(cont_grid, ms_grid)
F1G       = griddata((cont_vals, ms_vals), f1_vals_3d,
                     (CG, MG), method="cubic")

# 3D surface 2: n_estimators × contamination → F1
ne_grid  = np.linspace(ne_vals.min(),   ne_vals.max(),   40)
co_grid  = np.linspace(cont_vals.min(), cont_vals.max(), 40)
NEG, COG = np.meshgrid(ne_grid, co_grid)
F1G2     = griddata((ne_vals, cont_vals), f1_vals_3d,
                    (NEG, COG), method="cubic")

# ---- Style constants ----
BG_FIG       = "#0d0d0d"
BG_AX        = "#141414"
TITLE_COLOR  = "#e8e0d0"
LABEL_COLOR  = "#b0a898"
GRID_COLOR   = "#2a2a2a"
ACCENT1      = "#ff6b35"
ACCENT2      = "#00d4ff"
ACCENT_GREEN = "#39ff14"

def style_ax(ax, title="", xlabel="", ylabel=""):
    ax.set_facecolor(BG_AX)
    for sp in ax.spines.values():
        sp.set_edgecolor("#333333")
    ax.tick_params(colors=LABEL_COLOR, labelsize=9)
    ax.xaxis.label.set_color(LABEL_COLOR)
    ax.yaxis.label.set_color(LABEL_COLOR)
    ax.grid(color=GRID_COLOR, linewidth=0.5, linestyle="--")
    if title:
        ax.set_title(title, color=TITLE_COLOR, fontsize=11,
                     fontweight="bold", pad=8)
    if xlabel:
        ax.set_xlabel(xlabel, fontsize=9)
    if ylabel:
        ax.set_ylabel(ylabel, fontsize=9)

fig = plt.figure(figsize=(20, 18), facecolor=BG_FIG)
gs  = gridspec.GridSpec(3, 3, figure=fig,
                        hspace=0.45, wspace=0.35,
                        left=0.06, right=0.97,
                        top=0.93,  bottom=0.06)

# ---- Panel 1: Convergence ----
ax1 = fig.add_subplot(gs[0, 0])
style_ax(ax1, "Bayesian Optimization Convergence",
         "Iteration", "F1 Score (anomaly class)")
ax1.scatter(calls_idx[:10], f1_history[:10],
            color=LABEL_COLOR, s=30, zorder=3,
            label="Random init", alpha=0.7)
ax1.scatter(calls_idx[10:], f1_history[10:],
            color=ACCENT2, s=30, zorder=3,
            label="BO calls", alpha=0.8)
ax1.plot(calls_idx, best_so_far, color=ACCENT1,
         linewidth=2.0, label="Best so far", zorder=4)
ax1.axhline(f1_base, color=ACCENT_GREEN, linewidth=1.2,
            linestyle="--", label=f"Baseline={f1_base:.3f}")
ax1.legend(fontsize=8, facecolor="#1a1a1a",
           labelcolor=TITLE_COLOR, edgecolor="#444444")

# ---- Panel 2: Decision boundary ----
ax2 = fig.add_subplot(gs[0, 1:])
style_ax(ax2, "Isolation Forest Decision Boundary (Optimal Parameters)",
         "Feature 1", "Feature 2")
cf = ax2.contourf(xx, yy, Z, levels=30, cmap="RdYlGn", alpha=0.6)
ax2.contour(xx, yy, Z, levels=[0], colors=ACCENT1, linewidths=1.8)
cb = plt.colorbar(cf, ax=ax2)
cb.ax.tick_params(colors=LABEL_COLOR)
cb.set_label("Anomaly Score", color=LABEL_COLOR, fontsize=9)
mask_n  = y_val == 1
mask_a  = y_val == -1
fp_mask = (y_val == 1)  & (y_pred_best == -1)
fn_mask = (y_val == -1) & (y_pred_best == 1)
ax2.scatter(X_val[mask_n, 0], X_val[mask_n, 1],
            c=ACCENT2, s=12, alpha=0.5, label="Normal", zorder=3)
ax2.scatter(X_val[mask_a, 0], X_val[mask_a, 1],
            c=ACCENT1, s=40, marker="x", linewidths=1.5,
            label="Anomaly (true)", zorder=4)
ax2.scatter(X_val[fp_mask, 0], X_val[fp_mask, 1],
            c="yellow", s=60, marker="^", alpha=0.9,
            label="FP (normal→anomaly)", zorder=5)
ax2.scatter(X_val[fn_mask, 0], X_val[fn_mask, 1],
            c="magenta", s=60, marker="v", alpha=0.9,
            label="FN (anomaly→normal)", zorder=5)
ax2.legend(fontsize=8, facecolor="#1a1a1a",
           labelcolor=TITLE_COLOR, edgecolor="#444444")

# ---- Panel 3: 3D surface contamination × max_samples ----
ax3 = fig.add_subplot(gs[1, :2], projection="3d")
ax3.set_facecolor(BG_AX)
for axis in [ax3.xaxis, ax3.yaxis, ax3.zaxis]:
    axis.pane.fill = False
    axis.pane.set_edgecolor("#333333")
surf3 = ax3.plot_surface(CG, MG, np.nan_to_num(F1G, nan=0.0),
                         cmap="plasma", alpha=0.85, edgecolor="none")
ax3.scatter(best_params["contamination"],
            best_params["max_samples"],
            best_f1,
            color=ACCENT1, s=200, marker="*", zorder=10, label="Optimum")
ax3.set_xlabel("Contamination", color=LABEL_COLOR, fontsize=9, labelpad=8)
ax3.set_ylabel("Max Samples",   color=LABEL_COLOR, fontsize=9, labelpad=8)
ax3.set_zlabel("F1 Score",      color=LABEL_COLOR, fontsize=9, labelpad=8)
ax3.set_title("F1 Surface: contamination × max_samples",
              color=TITLE_COLOR, fontsize=11, fontweight="bold", pad=10)
ax3.tick_params(colors=LABEL_COLOR, labelsize=7)
plt.colorbar(surf3, ax=ax3, shrink=0.4, pad=0.12).ax.tick_params(colors=LABEL_COLOR)
ax3.legend(fontsize=8, facecolor="#1a1a1a",
           labelcolor=TITLE_COLOR, edgecolor="#444444")

# ---- Panel 4: 3D surface contamination × n_estimators ----
ax4 = fig.add_subplot(gs[1, 2], projection="3d")
ax4.set_facecolor(BG_AX)
for axis in [ax4.xaxis, ax4.yaxis, ax4.zaxis]:
    axis.pane.fill = False
    axis.pane.set_edgecolor("#333333")
surf4 = ax4.plot_surface(COG, NEG, np.nan_to_num(F1G2, nan=0.0),
                         cmap="viridis", alpha=0.85, edgecolor="none")
ax4.set_xlabel("Contamination", color=LABEL_COLOR, fontsize=7, labelpad=6)
ax4.set_ylabel("n_estimators",  color=LABEL_COLOR, fontsize=7, labelpad=6)
ax4.set_zlabel("F1",            color=LABEL_COLOR, fontsize=7, labelpad=6)
ax4.set_title("F1 Surface: contamination × n_estimators",
              color=TITLE_COLOR, fontsize=9, fontweight="bold", pad=8)
ax4.tick_params(colors=LABEL_COLOR, labelsize=6)

# ---- Panel 5: Sampled parameter space ----
ax5 = fig.add_subplot(gs[2, 0])
style_ax(ax5, "Sampled Parameters Colored by F1",
         "Contamination", "Max Samples")
sc = ax5.scatter(
    [h["contamination"] for h in call_history],
    [h["max_samples"]   for h in call_history],
    c=[h["f1"]          for h in call_history],
    cmap="plasma", s=50, alpha=0.85, edgecolors="none"
)
ax5.scatter(best_params["contamination"],
            best_params["max_samples"],
            color=ACCENT1, s=200, marker="*", zorder=5, label="Optimum")
cb5 = plt.colorbar(sc, ax=ax5)
cb5.ax.tick_params(colors=LABEL_COLOR)
cb5.set_label("F1", color=LABEL_COLOR, fontsize=8)
ax5.legend(fontsize=8, facecolor="#1a1a1a",
           labelcolor=TITLE_COLOR, edgecolor="#444444")

# ---- Panel 6: F1 per call bar chart ----
ax6 = fig.add_subplot(gs[2, 1])
style_ax(ax6, "F1 Score per BO Call", "Call Index", "F1 Score")
colors_bar = [ACCENT2 if i >= 10 else LABEL_COLOR
              for i in range(len(f1_history))]
ax6.bar(calls_idx, f1_history, color=colors_bar, alpha=0.75, width=0.7)
ax6.axhline(best_f1, color=ACCENT1, linewidth=1.5, linestyle="--",
            label=f"Best={best_f1:.3f}")
ax6.axhline(f1_base, color=ACCENT_GREEN, linewidth=1.2, linestyle="--",
            label=f"Baseline={f1_base:.3f}")
ax6.legend(fontsize=8, facecolor="#1a1a1a",
           labelcolor=TITLE_COLOR, edgecolor="#444444")

# ---- Panel 7: Confusion matrix ----
ax7 = fig.add_subplot(gs[2, 2])
style_ax(ax7, "Confusion Matrix (Optimal Model)")
ax7.set_facecolor(BG_AX)
im = ax7.imshow(cm_best, cmap="YlOrRd", aspect="auto")
for i in range(2):
    for j in range(2):
        ax7.text(j, i, str(cm_best[i, j]),
                 ha="center", va="center",
                 color="white", fontsize=16, fontweight="bold")
ax7.set_xticks([0, 1])
ax7.set_yticks([0, 1])
ax7.set_xticklabels(["Pred Normal", "Pred Anomaly"],
                    color=LABEL_COLOR, fontsize=8)
ax7.set_yticklabels(["True Normal", "True Anomaly"],
                    color=LABEL_COLOR, fontsize=8)
plt.colorbar(im, ax=ax7).ax.tick_params(colors=LABEL_COLOR)

fig.suptitle(
    "Anomaly Detection Parameter Optimization"
    " — Isolation Forest + Bayesian Optimization",
    color=TITLE_COLOR, fontsize=14, fontweight="bold", y=0.97
)
plt.savefig("anomaly_detection_optimization.png", dpi=150,
            bbox_inches="tight", facecolor=fig.get_facecolor())
plt.show()
print("Figure saved.")

Code Walkthrough

Section 0 — Installing `scikit-optimize`

!pip install scikit-optimize -q installs the library at runtime. This must be the very first cell executed so that all subsequent imports resolve correctly.

Section 1 — Synthetic Dataset

generate_dataset() builds a controlled benchmark: 1,000 normal points from two overlapping Gaussians and 88 anomalies (~8% rate) drawn uniformly over $[-6, 6]^2$. The two Gaussians have intentionally different covariance structures to make the decision boundary non-trivial. train_test_split with stratify=y preserves the anomaly rate in both splits.

Section 2 — Bayesian Optimization

The search space covers three parameters:

Parameter	Type	Range
`n_estimators`	Integer	$[50, 400]$
`max_samples`	Real	$[0.1, 1.0]$
`contamination`	Real	$[0.01, 0.25]$

The @use_named_args decorator wires the flat list used internally by gp_minimize to named Python arguments. The objective returns negative F1 because gp_minimize minimizes. With n_initial_points=10, the Gaussian Process is bootstrapped on 10 random draws; the remaining 40 calls are guided by the EI acquisition function — far more efficient than the $\sim 5{,}000$ evaluations a fine grid search would require over the same space.

Section 3 — Final Evaluation

The best parameter vector result.x rebuilds the optimal model. Metrics are reported with pos_label=-1 (anomaly class). A baseline IsolationForest with default contamination=0.08 provides the comparison reference.

Section 4 — Visualization

Panel 1 — Convergence Curve: Plots the F1 trajectory across all 50 BO calls. Grey dots = random initialization; cyan dots = BO-guided proposals; orange line = running best; green dashed = baseline. The shift in quality between the two phases shows BO learning the objective landscape.

Panel 2 — Decision Boundary: decision_function output is rendered as a color field. The orange contour at $z=0$ is the learned boundary. False Positives (▲) and False Negatives (▼) are individually marked to show exactly where the model errs.

Panels 3 & 4 — 3D F1 Surfaces: scipy.griddata interpolates the scattered BO evaluations onto a regular grid. Panel 3 maps contamination × max_samples → F1; Panel 4 maps contamination × n_estimators → F1. These surfaces reveal objective landscape structure — flat plateaus vs sharp ridges — that a grid search would need orders of magnitude more evaluations to uncover.

Panel 5 — Sampled Parameter Space: 2D scatter of contamination vs max_samples colored by F1, showing how BO concentrates high-F1 samples around the optimal region as iterations progress.

Panel 6 — Per-Call F1 Bar: Direct visualization of the F1 returned on every call. The quality improvement in the cyan (BO) phase versus the grey (random) phase is clearly visible.

Panel 7 — Confusion Matrix: The final 2×2 matrix of the optimal model on the validation set. Color intensity encodes count; values are printed directly in cells.

Execution Results

Train: 761 samples | Val: 327 samples
Anomaly rate (val): 0.080

Running Bayesian Optimization (50 calls)...

Best Parameters : {'n_estimators': 395, 'max_samples': 0.9802843384145278, 'contamination': 0.06691589893530461}
Best F1 Score   : 0.8750

Precision: 0.9545 | Recall: 0.8077 | F1: 0.8750
Confusion Matrix:
[[ 21   5]
 [  1 300]]
Baseline F1     : 0.7857

Figure saved.

Key Takeaways

The dominant insight is that the contamination hyperparameter drives most of the F1 variance: setting it far from the true anomaly rate $\rho = 0.08$ either misses anomalies (too low) or mislabels normal points (too high). The 3D surface in Panel 3 makes this ridge structure immediately visible.

Bayesian Optimization navigates to the optimal ridge in 50 evaluations. The surrogate-based acquisition balances exploitation of known good regions with exploration of uncertain ones:

$$\theta_{t+1} = \arg\max_\theta ; EI(\theta) = \arg\max_\theta ; \mathbb{E}!\left[\max!\left(\hat{f}(\theta) - f^+,, 0\right)\right]$$

where $\hat{f}(\theta)$ is the GP posterior mean and $f^+$ is the current best observed F1. This makes it the method of choice whenever each evaluation involves training a non-trivial model on a large dataset — precisely the regime that makes naive grid search impractical.