πŸ—‚οΈ Job Scheduling Problem (Task Assignment) β€” Solved with Python

What Is the Job Scheduling Problem?

The Job Scheduling (Task Assignment) problem is one of the most classic problems in combinatorial optimization. The question is simple to state: Given a set of jobs and a set of machines, how do we assign jobs to machines to minimize total processing time (makespan)?

This is essentially asking: how do we pack tasks into workers as efficiently as possible?


Problem Formulation

Let:

$$
m = \text{number of machines (workers)}, \quad n = \text{number of jobs}
$$

$$
p_{ij} = \text{processing time of job } j \text{ on machine } i
$$

The goal is to find an assignment

$$
x_{ij} \in {0, 1}, \quad \text{where } x_{ij} = 1 \text{ if job } j \text{ is assigned to machine } i
$$

subject to:

$$
\sum_{i=1}^{m} x_{ij} = 1 \quad \forall j \in {1, \ldots, n}
\quad \text{(each job is assigned to exactly one machine)}
$$

Minimizing the makespan (the time until all jobs are finished):

$$
\text{minimize} \quad C_{\max} = \max_{i} \sum_{j=1}^{n} p_{ij} \cdot x_{ij}
$$


Concrete Example

We have 4 machines and 8 jobs. Each job has a different processing time on each machine (due to hardware differences, skill sets, etc.):

Job M1 M2 M3 M4
J1 6 8 5 7
J2 4 3 7 5
J3 9 5 4 6
J4 3 7 8 4
J5 7 4 6 9
J6 5 6 3 8
J7 8 9 5 3
J8 4 5 9 6

We’ll solve this using two approaches:

  1. Exact solution via Integer Linear Programming (ILP) with PuLP
  2. Metaheuristic via Simulated Annealing for larger/faster instances

Full Source Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
# ============================================================
# Job Scheduling (Task Assignment) Problem
# Google Colaboratory β€” Full Solution
# ============================================================

# --- Install dependencies ---
import subprocess
subprocess.run(["pip", "install", "pulp", "matplotlib", "numpy", "scipy"], check=True)

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from mpl_toolkits.mplot3d import Axes3D
import pulp
import random
import math
import time

# ============================================================
# 1. PROBLEM DEFINITION
# ============================================================
random.seed(42)
np.random.seed(42)

N_MACHINES = 4
N_JOBS = 8

# Processing time matrix [machine][job]
proc_time = np.array([
[6, 4, 9, 3, 7, 5, 8, 4], # Machine 1
[8, 3, 5, 7, 4, 6, 9, 5], # Machine 2
[5, 7, 4, 8, 6, 3, 5, 9], # Machine 3
[7, 5, 6, 4, 9, 8, 3, 6], # Machine 4
])

machine_names = [f"Machine {i+1}" for i in range(N_MACHINES)]
job_names = [f"Job {j+1}" for j in range(N_JOBS)]

print("=" * 55)
print(" Processing Time Matrix (rows=machines, cols=jobs)")
print("=" * 55)
header = " " + " ".join(f"{j:>6}" for j in job_names)
print(header)
for i, row in enumerate(proc_time):
vals = " ".join(f"{v:>6}" for v in row)
print(f"{machine_names[i]:>10} {vals}")

# ============================================================
# 2. EXACT SOLUTION β€” Integer Linear Programming (PuLP)
# ============================================================
print("\n" + "=" * 55)
print(" [Method 1] ILP Exact Solution (PuLP)")
print("=" * 55)

prob = pulp.LpProblem("JobScheduling_Makespan", pulp.LpMinimize)

# Decision variables: x[i][j] = 1 if job j is assigned to machine i
x = [[pulp.LpVariable(f"x_{i}_{j}", cat="Binary")
for j in range(N_JOBS)]
for i in range(N_MACHINES)]

# Makespan variable
C_max = pulp.LpVariable("C_max", lowBound=0)

# Objective: minimize makespan
prob += C_max

# Constraint 1: each job is assigned to exactly one machine
for j in range(N_JOBS):
prob += pulp.lpSum(x[i][j] for i in range(N_MACHINES)) == 1

# Constraint 2: makespan >= load of every machine
for i in range(N_MACHINES):
prob += C_max >= pulp.lpSum(proc_time[i][j] * x[i][j] for j in range(N_JOBS))

# Solve
start_ilp = time.time()
prob.solve(pulp.PULP_CBC_CMD(msg=0))
time_ilp = time.time() - start_ilp

# Extract results
ilp_assignment = [-1] * N_JOBS # ilp_assignment[j] = machine index
for j in range(N_JOBS):
for i in range(N_MACHINES):
if pulp.value(x[i][j]) > 0.5:
ilp_assignment[j] = i

ilp_loads = [sum(proc_time[i][j] for j in range(N_JOBS) if ilp_assignment[j] == i)
for i in range(N_MACHINES)]
ilp_makespan = max(ilp_loads)

print(f" Status : {pulp.LpStatus[prob.status]}")
print(f" Makespan : {ilp_makespan}")
print(f" Solve Time : {time_ilp:.4f} sec")
print(f" Assignment : {[ilp_assignment[j]+1 for j in range(N_JOBS)]} (Machine No.)")
print(f" Machine Loads: {ilp_loads}")

# ============================================================
# 3. METAHEURISTIC β€” Simulated Annealing (Faster for Large N)
# ============================================================
print("\n" + "=" * 55)
print(" [Method 2] Simulated Annealing")
print("=" * 55)

def calc_makespan(assignment, pt):
loads = np.zeros(N_MACHINES)
for j, m in enumerate(assignment):
loads[m] += pt[m][j]
return np.max(loads), loads

def simulated_annealing(pt, n_machines, n_jobs,
T_init=200.0, T_min=0.01,
alpha=0.995, n_iter=5000):
# Initial random assignment
current = [random.randint(0, n_machines - 1) for _ in range(n_jobs)]
current_ms, current_loads = calc_makespan(current, pt)
best = current[:]
best_ms = current_ms
history = [current_ms]

T = T_init
while T > T_min:
for _ in range(n_iter):
# Neighbour: move a random job to a random different machine
j = random.randint(0, n_jobs - 1)
new_m = random.randint(0, n_machines - 2)
if new_m >= current[j]:
new_m += 1
neighbor = current[:]
neighbor[j] = new_m
neighbor_ms, _ = calc_makespan(neighbor, pt)

delta = neighbor_ms - current_ms
if delta < 0 or random.random() < math.exp(-delta / T):
current = neighbor
current_ms = neighbor_ms

if current_ms < best_ms:
best = current[:]
best_ms = current_ms

history.append(best_ms)
T *= alpha

return best, best_ms, history

start_sa = time.time()
sa_assignment, sa_makespan, sa_history = simulated_annealing(
proc_time, N_MACHINES, N_JOBS)
time_sa = time.time() - start_sa

sa_loads = [sum(proc_time[i][j] for j in range(N_JOBS) if sa_assignment[j] == i)
for i in range(N_MACHINES)]

print(f" Makespan : {sa_makespan}")
print(f" Solve Time : {time_sa:.4f} sec")
print(f" Assignment : {[sa_assignment[j]+1 for j in range(N_JOBS)]} (Machine No.)")
print(f" Machine Loads: {sa_loads}")

# ============================================================
# 4. SCALABILITY TEST β€” ILP vs SA on larger instances
# ============================================================
print("\n" + "=" * 55)
print(" [Scalability Test] Varying number of jobs")
print("=" * 55)

job_counts = [4, 6, 8, 10, 12, 14, 16]
times_ilp = []
times_sa = []
makespans_ilp = []
makespans_sa = []

for nj in job_counts:
pt_test = np.random.randint(1, 15, size=(N_MACHINES, nj))

# ILP
p2 = pulp.LpProblem("JS", pulp.LpMinimize)
xv = [[pulp.LpVariable(f"x_{i}_{j}", cat="Binary")
for j in range(nj)] for i in range(N_MACHINES)]
cm = pulp.LpVariable("C_max", lowBound=0)
p2 += cm
for j in range(nj):
p2 += pulp.lpSum(xv[i][j] for i in range(N_MACHINES)) == 1
for i in range(N_MACHINES):
p2 += cm >= pulp.lpSum(pt_test[i][j] * xv[i][j] for j in range(nj))
t0 = time.time()
p2.solve(pulp.PULP_CBC_CMD(msg=0))
times_ilp.append(time.time() - t0)
makespans_ilp.append(pulp.value(cm))

# SA
t0 = time.time()
_, ms_sa, _ = simulated_annealing(pt_test, N_MACHINES, nj,
T_init=150, n_iter=2000)
times_sa.append(time.time() - t0)
makespans_sa.append(ms_sa)

print(f" Jobs={nj:2d} | ILP makespan={makespans_ilp[-1]:.0f} "
f"SA makespan={makespans_sa[-1]:.0f} | "
f"ILP t={times_ilp[-1]:.3f}s SA t={times_sa[-1]:.3f}s")

# ============================================================
# 5. VISUALIZATION
# ============================================================

colors_jobs = plt.cm.Set3(np.linspace(0, 1, N_JOBS))

fig = plt.figure(figsize=(20, 22))
fig.suptitle("Job Scheduling (Task Assignment) β€” Full Analysis",
fontsize=16, fontweight="bold", y=0.98)

# ── Plot 1: ILP Gantt Chart ───────────────────────────────
ax1 = fig.add_subplot(3, 2, 1)
ax1.set_title("(1) ILP Optimal Assignment β€” Gantt Chart", fontweight="bold")
start_times = [0] * N_MACHINES
for j in range(N_JOBS):
m = ilp_assignment[j]
ax1.barh(m, proc_time[m][j], left=start_times[m],
color=colors_jobs[j], edgecolor="white", linewidth=1.5, height=0.6)
ax1.text(start_times[m] + proc_time[m][j] / 2, m,
f"J{j+1}", ha="center", va="center", fontsize=9, fontweight="bold")
start_times[m] += proc_time[m][j]
ax1.axvline(ilp_makespan, color="red", linestyle="--", linewidth=2, label=f"Makespan={ilp_makespan}")
ax1.set_yticks(range(N_MACHINES))
ax1.set_yticklabels(machine_names)
ax1.set_xlabel("Time")
ax1.legend()
ax1.grid(axis="x", alpha=0.3)

# ── Plot 2: SA Gantt Chart ────────────────────────────────
ax2 = fig.add_subplot(3, 2, 2)
ax2.set_title("(2) SA Assignment β€” Gantt Chart", fontweight="bold")
start_times2 = [0] * N_MACHINES
for j in range(N_JOBS):
m = sa_assignment[j]
ax2.barh(m, proc_time[m][j], left=start_times2[m],
color=colors_jobs[j], edgecolor="white", linewidth=1.5, height=0.6)
ax2.text(start_times2[m] + proc_time[m][j] / 2, m,
f"J{j+1}", ha="center", va="center", fontsize=9, fontweight="bold")
start_times2[m] += proc_time[m][j]
ax2.axvline(sa_makespan, color="red", linestyle="--", linewidth=2, label=f"Makespan={sa_makespan}")
ax2.set_yticks(range(N_MACHINES))
ax2.set_yticklabels(machine_names)
ax2.set_xlabel("Time")
ax2.legend()
ax2.grid(axis="x", alpha=0.3)

# ── Plot 3: Machine Load Comparison (Grouped Bar) ─────────
ax3 = fig.add_subplot(3, 2, 3)
ax3.set_title("(3) Machine Load Comparison: ILP vs SA", fontweight="bold")
x_pos = np.arange(N_MACHINES)
width = 0.35
bars1 = ax3.bar(x_pos - width/2, ilp_loads, width, label=f"ILP (makespan={ilp_makespan})",
color="#4C72B0", edgecolor="white")
bars2 = ax3.bar(x_pos + width/2, sa_loads, width, label=f"SA (makespan={sa_makespan})",
color="#DD8452", edgecolor="white")
ax3.axhline(ilp_makespan, color="#4C72B0", linestyle="--", alpha=0.7, linewidth=1.5)
ax3.axhline(sa_makespan, color="#DD8452", linestyle="--", alpha=0.7, linewidth=1.5)
for bar in bars1:
ax3.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.3,
str(int(bar.get_height())), ha="center", va="bottom", fontsize=9)
for bar in bars2:
ax3.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.3,
str(int(bar.get_height())), ha="center", va="bottom", fontsize=9)
ax3.set_xticks(x_pos)
ax3.set_xticklabels(machine_names)
ax3.set_ylabel("Total Load (time units)")
ax3.legend()
ax3.grid(axis="y", alpha=0.3)

# ── Plot 4: SA Convergence History ───────────────────────
ax4 = fig.add_subplot(3, 2, 4)
ax4.set_title("(4) Simulated Annealing Convergence", fontweight="bold")
ax4.plot(sa_history, color="#2ca02c", linewidth=1.5, alpha=0.8)
ax4.axhline(ilp_makespan, color="red", linestyle="--", linewidth=1.5, label=f"ILP optimal={ilp_makespan}")
ax4.fill_between(range(len(sa_history)), sa_history, ilp_makespan,
where=[h > ilp_makespan for h in sa_history],
alpha=0.15, color="orange", label="Gap to optimal")
ax4.set_xlabel("Temperature step")
ax4.set_ylabel("Best Makespan Found")
ax4.legend()
ax4.grid(alpha=0.3)

# ── Plot 5: Scalability β€” Solve Time ─────────────────────
ax5 = fig.add_subplot(3, 2, 5)
ax5.set_title("(5) Solve Time vs. Number of Jobs (Scalability)", fontweight="bold")
ax5.plot(job_counts, times_ilp, "o-", color="#4C72B0", label="ILP (PuLP)", linewidth=2, markersize=7)
ax5.plot(job_counts, times_sa, "s-", color="#DD8452", label="SA", linewidth=2, markersize=7)
ax5.set_xlabel("Number of Jobs")
ax5.set_ylabel("Solve Time (seconds)")
ax5.legend()
ax5.grid(alpha=0.3)

# ── Plot 6: 3D β€” Processing Time Surface ─────────────────
ax6 = fig.add_subplot(3, 2, 6, projection="3d")
ax6.set_title("(6) 3D Processing Time Surface\n(Machine Γ— Job)", fontweight="bold")
_m = np.arange(N_MACHINES)
_j = np.arange(N_JOBS)
M_grid, J_grid = np.meshgrid(_m, _j, indexing="ij")
surf = ax6.plot_surface(M_grid, J_grid, proc_time.astype(float),
cmap="viridis", edgecolor="none", alpha=0.85)
# Mark ILP assignment with red scatter
for j in range(N_JOBS):
m_opt = ilp_assignment[j]
ax6.scatter(m_opt, j, proc_time[m_opt][j] + 0.3,
color="red", s=60, zorder=5)
fig.colorbar(surf, ax=ax6, shrink=0.5, label="Processing Time")
ax6.set_xlabel("Machine")
ax6.set_ylabel("Job")
ax6.set_zlabel("Proc. Time")
ax6.set_xticks(range(N_MACHINES))
ax6.set_xticklabels([f"M{i+1}" for i in range(N_MACHINES)], fontsize=8)
ax6.set_yticks(range(N_JOBS))
ax6.set_yticklabels([f"J{j+1}" for j in range(N_JOBS)], fontsize=8)

plt.tight_layout(rect=[0, 0, 1, 0.97])
plt.savefig("job_scheduling_results.png", dpi=150, bbox_inches="tight")
plt.show()
print("\n Figure saved: job_scheduling_results.png")

Code Walkthrough

Section 1 β€” Problem Definition

The processing time matrix proc_time is a 4 Γ— 8 NumPy array where proc_time[i][j] is how long machine i takes on job j. This asymmetry is what makes the problem interesting β€” if all machines were identical, greedy balancing would be trivially optimal.

Section 2 β€” Exact ILP Solution (PuLP)

We model the problem exactly as the math above:

  • x[i][j] is a binary variable β€” 1 if job j goes to machine i, else 0
  • C_max is a continuous variable representing the makespan we’re minimizing
  • Two constraint sets enforce: (a) every job is assigned exactly once, and (b) C_max is at least as large as every machine’s total load

The solver (PULP_CBC_CMD) finds the globally optimal assignment. For small instances like ours (8 jobs Γ— 4 machines = 32 binary variables), this runs in milliseconds.

Section 3 β€” Simulated Annealing

For larger instances, ILP becomes exponentially slow. Simulated Annealing (SA) is a metaheuristic that:

  1. Starts with a random assignment
  2. At each step, moves one job to a different machine (the β€œneighbour”)
  3. Accepts worsening moves with probability $e^{-\Delta / T}$, where $T$ is the current β€œtemperature”
  4. Cools down by multiplying $T$ by alpha = 0.995 each outer loop

The acceptance criterion comes from statistical mechanics:

$$
P(\text{accept worse solution}) = e^{-\Delta C_{\max} / T}
$$

This allows the algorithm to escape local optima early in the search and exploit good regions as it cools.

Section 4 β€” Scalability Test

We run both methods on random instances with 4 to 16 jobs and measure solve time vs. solution quality. This reveals how quickly ILP’s runtime grows compared to SA’s roughly constant time.

Section 5 β€” Visualizations

Six plots are generated:


Visualization Explanations

Plot (1) & (2) β€” Gantt Charts
Each colored bar represents a job assigned to a machine. The horizontal length is the processing time. The red dashed line marks the makespan β€” the moment all work is done. An ideal schedule pushes this line as far left as possible.

Plot (3) β€” Machine Load Comparison
Side-by-side grouped bars show the total workload on each machine under ILP vs. SA. The dashed horizontal lines mark each method’s makespan. A well-balanced schedule has bars of roughly equal height.

Plot (4) β€” SA Convergence
The green line traces the best makespan found as SA cools. The orange-shaded region shows the gap to the ILP optimum. Notice that the gap narrows and eventually closes β€” SA finds the same answer as ILP on this small instance.

Plot (5) β€” Scalability
This reveals the core trade-off: ILP solve time grows steeply with job count, while SA stays nearly flat. For production systems with hundreds of jobs, SA (or similar metaheuristics) is the practical choice.

Plot (6) β€” 3D Processing Time Surface
The surface visualizes the full proc_time matrix as a 3D landscape. The red dots mark the ILP-optimal assignment β€” one per job, showing which machine was chosen. Notice how the dots tend to land in valleys (short processing times), confirming the solver is exploiting the structure of the matrix.


Execution Results

=======================================================
  Processing Time Matrix (rows=machines, cols=jobs)
=======================================================
          Job 1   Job 2   Job 3   Job 4   Job 5   Job 6   Job 7   Job 8
 Machine 1       6       4       9       3       7       5       8       4
 Machine 2       8       3       5       7       4       6       9       5
 Machine 3       5       7       4       8       6       3       5       9
 Machine 4       7       5       6       4       9       8       3       6

=======================================================
  [Method 1] ILP Exact Solution (PuLP)
=======================================================
  Status      : Optimal
  Makespan    : 9
  Solve Time  : 0.1342 sec
  Assignment  : [1, 2, 3, 1, 2, 3, 4, 4]  (Machine No.)
  Machine Loads: [np.int64(9), np.int64(7), np.int64(7), np.int64(9)]

=======================================================
  [Method 2] Simulated Annealing
=======================================================
  Makespan    : 9.0
  Solve Time  : 138.5160 sec
  Assignment  : [3, 2, 4, 1, 2, 3, 4, 1]  (Machine No.)
  Machine Loads: [np.int64(7), np.int64(7), np.int64(8), np.int64(9)]

=======================================================
  [Scalability Test] Varying number of jobs
=======================================================
  Jobs= 4  | ILP makespan=7  SA makespan=7  | ILP t=0.013s  SA t=38.049s
  Jobs= 6  | ILP makespan=7  SA makespan=7  | ILP t=0.015s  SA t=42.428s
  Jobs= 8  | ILP makespan=9  SA makespan=9  | ILP t=0.014s  SA t=48.436s
  Jobs=10  | ILP makespan=10  SA makespan=10  | ILP t=0.014s  SA t=53.329s
  Jobs=12  | ILP makespan=13  SA makespan=13  | ILP t=0.020s  SA t=59.293s
  Jobs=14  | ILP makespan=17  SA makespan=17  | ILP t=0.020s  SA t=64.121s
  Jobs=16  | ILP makespan=12  SA makespan=12  | ILP t=0.023s  SA t=68.924s

Figure saved: job_scheduling_results.png

Key Takeaways

The makespan minimization problem is NP-hard in general, but for moderate sizes ILP provides a provably optimal answer in seconds. For large-scale scheduling (think cloud job orchestration, factory floors, logistics routing), metaheuristics like Simulated Annealing, Genetic Algorithms, or modern solvers like Google OR-Tools offer practical and near-optimal solutions. The 3D surface plot intuitively shows why the problem is hard β€” the optimal picks are spread across a rugged landscape with no single dominant row or column.