Processing math: 100%

Optimizing Transportation Routes with Python and Linear Programming

Introduction

Transportation route optimization is a crucial problem in logistics and supply chain management.
The goal is to determine the most cost-effective way to transport goods from suppliers to customers while minimizing costs.
This type of problem is often formulated as a linear programming (LP) problem and solved using optimization techniques.

In this blog post, we will go through a practical example of solving a transportation problem using Python in Google Colaboratory.
We will utilize the PuLP library for linear programming and visualize the results using Matplotlib.


Problem Statement

A company needs to transport goods from two warehouses (suppliers) to three retail stores (destinations).
The transportation cost per unit of goods between each supplier and store is given in the table below.

Cost Table (per unit of goods)

Store A Store B Store C
Warehouse X $2 $3 $1
Warehouse Y $5 $4 $8

Supply & Demand

  • Warehouse X can supply 50 units.
  • Warehouse Y can supply 60 units.
  • Store A requires 30 units.
  • Store B requires 40 units.
  • Store C requires 40 units.

The objective is to minimize the total transportation cost while ensuring that supply and demand constraints are met.


Mathematical Formulation

Define xij as the number of units transported from warehouse i to store j.

Objective Function

We aim to minimize the total transportation cost:

Minimize Z=2xXA+3xXB+1xXC+5xYA+4xYB+8xYC

Constraints

Supply constraints (warehouses):
xXA+xXB+xXC50


xYA+xYB+xYC60

Demand constraints (stores):
xXA+xYA=30


xXB+xYB=40

xXC+xYC=40

Non-negativity constraint:
xij0


Solving with Python (PuLP)

Let’s now solve this problem using Python.
Below is the complete code that sets up and solves the transportation optimization problem.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
from pulp import LpProblem, LpMinimize, LpVariable, lpSum
import matplotlib.pyplot as plt
import numpy as np

# Define the problem
problem = LpProblem("Transportation_Optimization", LpMinimize)

# Define decision variables
x_XA = LpVariable("x_XA", lowBound=0, cat='Continuous')
x_XB = LpVariable("x_XB", lowBound=0, cat='Continuous')
x_XC = LpVariable("x_XC", lowBound=0, cat='Continuous')
x_YA = LpVariable("x_YA", lowBound=0, cat='Continuous')
x_YB = LpVariable("x_YB", lowBound=0, cat='Continuous')
x_YC = LpVariable("x_YC", lowBound=0, cat='Continuous')

# Objective function: Minimize total cost
problem += (2*x_XA + 3*x_XB + 1*x_XC +
5*x_YA + 4*x_YB + 8*x_YC), "Total Transportation Cost"

# Supply constraints
problem += x_XA + x_XB + x_XC <= 50, "Supply_X"
problem += x_YA + x_YB + x_YC <= 60, "Supply_Y"

# Demand constraints
problem += x_XA + x_YA == 30, "Demand_A"
problem += x_XB + x_YB == 40, "Demand_B"
problem += x_XC + x_YC == 40, "Demand_C"

# Solve the problem
problem.solve()

# Output results
print("\nOptimal Transportation Plan:")
for v in problem.variables():
print(f"{v.name} = {v.varValue}")
print(f"\nTotal Minimum Cost: ${problem.objective.value()}")

# Visualization
labels = ["X → A", "X → B", "X → C", "Y → A", "Y → B", "Y → C"]
values = [v.varValue for v in problem.variables()]

plt.figure(figsize=(8, 6))
plt.bar(labels, values, color=['blue', 'green', 'red', 'purple', 'orange', 'brown'])
plt.xlabel("Routes")
plt.ylabel("Units Transported")
plt.title("Optimized Transportation Plan")
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

Explanation of the Code

  1. Defining the Problem

    • We create a linear programming problem using LpProblem().
    • The goal is minimization of transportation cost.
  2. Defining Decision Variables

    • Each route (e.g., x_XA, x_XB, etc.) is a continuous variable representing the number of units transported.
  3. Setting the Objective Function

    • We minimize the total cost by summing cost × transported units for each route.
  4. Defining Constraints

    • Supply constraints ensure warehouses do not exceed their limits.
    • Demand constraints ensure stores receive the required amount.
  5. Solving the Problem

    • The solve() method finds the optimal transportation plan.
  6. Output Results & Visualization

    • The results are printed in a readable format.
    • A bar chart is generated using Matplotlib to visualize the transportation flow.

Results & Interpretation

After running the script, we obtain:

1
2
3
4
5
6
7
8
9
Optimal Transportation Plan:
x_XA = 10.0
x_XB = 0.0
x_XC = 40.0
x_YA = 20.0
x_YB = 40.0
x_YC = 0.0

Total Minimum Cost: $320.0

Key Insights

  • Warehouse X supplies 10 units to Store A and 40 units to Store C.
  • Warehouse Y supplies 20 units to Store A and 40 units to Store B.
  • Total cost is minimized to $320.0.

Graph Interpretation

  • The bar chart shows how many units each route transports.
  • The highest transportation occurs between X → C and Y → B.
  • The optimized plan avoids unnecessary high-cost routes.


Conclusion

This example demonstrates how linear programming effectively optimizes transportation routes in logistics.
By using Python’s PuLP, we minimized costs while satisfying all constraints.

Optimizing Staff Allocation

A Python Solution for Business Efficiency

Today, I’m diving into an interesting optimization challenge that many businesses face - the staff allocation problem.
This is a classic operations research problem where we need to assign staff to different shifts while minimizing costs and meeting various constraints.

Let’s walk through a concrete example and solve it using Python’s optimization libraries.
I’ll show you how to formulate the problem, solve it, and visualize the results.

The Problem Statement

Imagine we manage a call center that needs to be staffed 24/7.
We need to determine how many employees to assign to each shift to ensure we have enough staff to handle the expected call volume while minimizing labor costs.

Here’s our scenario:

  • We have 3 shifts: Morning (8am-4pm), Afternoon (4pm-12am), and Night (12am-8am)
  • Each shift has different minimum staffing requirements based on call volume
  • Each employee can only work one shift per day
  • Different shifts have different costs (night shifts cost more)
  • We need to minimize the total staffing cost while meeting all requirements

Mathematical Formulation

Let’s define our variables and constraints:

  • Let xi be the number of employees assigned to shift i
  • Each shift has a minimum staffing requirement ri
  • Each shift has an associated cost ci per employee
  • Our objective is to minimize the total cost: minicixi
  • Subject to: xiri for all i (meeting minimum requirements)
  • And xi0 and integer (non-negative integers)

Python Implementation

Let’s implement this using PuLP, a popular linear programming library in Python:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
import numpy as np
import pulp as pl
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# Define the problem
def solve_staff_allocation(shift_requirements, shift_costs):
"""
Solves the staff allocation problem

Parameters:
- shift_requirements: List of minimum staff required for each shift
- shift_costs: List of cost per employee for each shift

Returns:
- model: The solved PuLP model
- solution: Dictionary with solution details
"""
# Create shift names
shift_names = [f"Shift_{i+1}" for i in range(len(shift_requirements))]

# Create the model
model = pl.LpProblem("Staff_Allocation_Problem", pl.LpMinimize)

# Define variables: number of staff for each shift (must be integers)
staff_vars = {
shift: pl.LpVariable(f"staff_{shift}", lowBound=0, cat=pl.LpInteger)
for shift in shift_names
}

# Objective function: minimize total cost
model += pl.lpSum([shift_costs[i] * staff_vars[shift]
for i, shift in enumerate(shift_names)])

# Constraints: meet minimum staffing requirements
for i, shift in enumerate(shift_names):
model += staff_vars[shift] >= shift_requirements[i], f"Min_Staff_{shift}"

# Solve the model
model.solve(pl.PULP_CBC_CMD(msg=False))

# Extract solution details
solution = {
"status": pl.LpStatus[model.status],
"total_cost": pl.value(model.objective),
"staff_allocation": {shift: pl.value(var) for shift, var in staff_vars.items()}
}

return model, solution

# Example data
shift_requirements = [10, 8, 5] # Minimum staff for morning, afternoon, night shifts
shift_costs = [100, 120, 140] # Cost per employee for each shift

# Solve the problem
model, solution = solve_staff_allocation(shift_requirements, shift_costs)

# Print results
print("Solution Status:", solution["status"])
print("Total Cost:", solution["total_cost"])
print("Staff Allocation:")
for shift, staff in solution["staff_allocation"].items():
print(f" {shift}: {int(staff)} employees")

# Visualization
def visualize_solution(solution, shift_requirements, shift_costs):
"""
Creates visualizations for the staff allocation solution
"""
# Prepare data for visualization
shifts = list(solution["staff_allocation"].keys())
allocated_staff = [solution["staff_allocation"][shift] for shift in shifts]
min_required = shift_requirements
shift_labels = ["Morning (8am-4pm)", "Afternoon (4pm-12am)", "Night (12am-8am)"]

# Create DataFrame for easier plotting
df = pd.DataFrame({
'Shift': shift_labels,
'Allocated': allocated_staff,
'Required': min_required,
'Cost per Employee': shift_costs,
'Total Cost': [allocated_staff[i] * shift_costs[i] for i in range(len(shifts))]
})

# Set up figure with multiple subplots
fig, axs = plt.subplots(2, 1, figsize=(12, 12))

# Staff allocation vs requirements
ax1 = axs[0]
bar_width = 0.35
x = np.arange(len(shifts))

bars1 = ax1.bar(x - bar_width/2, allocated_staff, bar_width, label='Allocated Staff', color='skyblue')
bars2 = ax1.bar(x + bar_width/2, min_required, bar_width, label='Minimum Required', color='lightgreen')

ax1.set_xlabel('Shift')
ax1.set_ylabel('Number of Employees')
ax1.set_title('Staff Allocation vs. Minimum Requirements')
ax1.set_xticks(x)
ax1.set_xticklabels(shift_labels)
ax1.legend()

# Add value labels on top of bars
for bars in [bars1, bars2]:
for bar in bars:
height = bar.get_height()
ax1.annotate(f'{int(height)}',
xy=(bar.get_x() + bar.get_width() / 2, height),
xytext=(0, 3),
textcoords="offset points",
ha='center', va='bottom')

# Cost breakdown
ax2 = axs[1]
cost_data = df[['Shift', 'Total Cost']]
sns.barplot(x='Shift', y='Total Cost', data=cost_data, ax=ax2, color='salmon')
ax2.set_title('Cost Breakdown by Shift')
ax2.set_ylabel('Total Cost ($)')

# Add value labels on top of bars
for i, p in enumerate(ax2.patches):
height = p.get_height()
ax2.annotate(f'${int(height)}',
xy=(p.get_x() + p.get_width() / 2, height),
xytext=(0, 3),
textcoords="offset points",
ha='center', va='bottom')

plt.tight_layout()
return fig

# Generate and display visualization
fig = visualize_solution(solution, shift_requirements, shift_costs)
plt.show()

# Let's try another scenario with different requirements and costs
print("\n--- Alternative Scenario ---")

# Different scenario with varying call volumes throughout the day
alt_requirements = [15, 12, 4] # Higher morning/afternoon, lower night
alt_costs = [100, 110, 150] # More expensive night shift

alt_model, alt_solution = solve_staff_allocation(alt_requirements, alt_costs)

# Print results for alternative scenario
print("Solution Status:", alt_solution["status"])
print("Total Cost:", alt_solution["total_cost"])
print("Staff Allocation:")
for shift, staff in alt_solution["staff_allocation"].items():
print(f" {shift}: {int(staff)} employees")

# Visualize alternative scenario
alt_fig = visualize_solution(alt_solution, alt_requirements, alt_costs)
plt.show()

Code Explanation

Let’s break down the key components of our solution:

1. Problem Setup

We’ve defined a function solve_staff_allocation() that takes two main inputs:

  • shift_requirements: The minimum number of staff needed for each shift
  • shift_costs: The cost per employee for each shift

2. Linear Programming Model

  • We use PuLP to create a linear programming model with the objective to minimize total cost
  • Our decision variables are staff_vars, representing the number of staff assigned to each shift
  • We set these as integer variables with a lower bound of 0 (we can’t have negative staff!)
  • The objective function is the sum of (cost per employee × number of employees) for each shift

3. Constraints

  • For each shift, we add a constraint that the number of assigned staff must be greater than or equal to the minimum requirement

4. Solving and Results

  • The model is solved using the CBC solver (a free open-source solver)
  • We extract the solution status, total cost, and staff allocation for each shift
Solution Status: Optimal
Total Cost: 2660.0
Staff Allocation:
  Shift_1: 10 employees
  Shift_2: 8 employees
  Shift_3: 5 employees

5. Visualization

The visualize_solution() function creates two important visualizations:

  • A bar chart comparing allocated staff vs. minimum requirements for each shift
  • A cost breakdown showing the total cost for each shift

Results Analysis

In our basic scenario:

  • Morning shift (8am-4pm): Requires 10 employees, costs $100 each
  • Afternoon shift (4pm-12am): Requires 8 employees, costs $120 each
  • Night shift (12am-8am): Requires 5 employees, costs $140 each

The optimal solution exactly matches the minimum requirements for each shift, since:

  1. There’s no benefit to assigning extra staff beyond requirements
  2. Each shift has a different cost, so we want to assign exactly the minimum needed for the more expensive shifts

The total cost is: (10 × 100)+(8×120) + (5 × 140)=2,660 per day.

Alternative Scenario

I’ve also tested an alternative scenario with different requirements:

  • Higher demand during morning (15) and afternoon (12) shifts
  • Lower demand during night shift (4)
  • Even higher cost differential for night shift ($150)

This scenario reflects a more typical business where daytime hours are busier.
The solution follows the same pattern - assign exactly the minimum required staff to each shift to minimize costs.

--- Alternative Scenario ---
Solution Status: Optimal
Total Cost: 3420.0
Staff Allocation:
  Shift_1: 15 employees
  Shift_2: 12 employees
  Shift_3: 4 employees

Extending the Model

This basic model can be extended in several ways to make it more realistic:

  • Adding constraints on total available staff
  • Considering part-time employees
  • Including overtime costs
  • Adding preferences or restrictions for certain employees
  • Considering multi-day scheduling with rest requirements

Conclusion

The staff allocation problem is a practical application of linear programming that can provide significant cost savings for businesses.
By properly formulating the problem and using optimization tools like PuLP, we can quickly find optimal staffing solutions even for complex scenarios.

Python’s ecosystem makes it easy to not only solve these problems but also to analyze and visualize the results, helping stakeholders understand and implement the solutions.

The Set Cover Problem

A Practical Example with Python Implementation

Today, I want to dive into one of the classic problems in computational complexity theory and optimization - the Set Cover Problem.

It’s a fascinating challenge that has applications in various fields, from resource allocation to facility location planning.

What is the Set Cover Problem?

The Set Cover Problem asks: Given a universe U of elements and a collection S of sets whose union equals the universe, what is the minimum number of sets from S needed to cover all elements in U?

Mathematically, we aim to:

  • Minimize sS1

  • Subject to sSs=U

  • Where SS

This is known to be NP-hard, but we can solve it using approximation algorithms.

A Concrete Example

Let’s consider a practical scenario: Imagine we have several radio stations, each covering specific cities.
We want to select the minimum number of stations that cover all cities.

Universe U = {City1, City2, City3, City4, City5, City6, City7, City8, City9, City10}

Collection of sets S = {

  • Station1: {City1, City2, City3}
  • Station2: {City2, City4, City5, City6}
  • Station3: {City3, City6, City7}
  • Station4: {City4, City8, City9}
  • Station5: {City5, City7, City9, City10}
  • Station6: {City1, City10}
    }

Let’s solve this using the greedy approximation algorithm in Python.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
from matplotlib.colors import to_rgba

def greedy_set_cover(universe, sets, costs=None):
"""
Greedy algorithm for the set cover problem.

Args:
universe: Set of elements to cover
sets: Dictionary mapping set_id to the set of elements
costs: Dictionary mapping set_id to the cost (default: all 1)

Returns:
List of set_ids that form the cover and the total cost
"""
if costs is None:
costs = {set_id: 1 for set_id in sets}

# Make a copy of the universe and initialize the cover
universe_copy = set(universe)
cover = []
total_cost = 0

# While there are uncovered elements
while universe_copy:
# Find the set with the best cost-effectiveness
best_set = None
best_cost_effectiveness = 0

for set_id, elements in sets.items():
# Skip if already in the cover
if set_id in cover:
continue

# Calculate new elements covered
new_elements = len(universe_copy & elements)
if new_elements == 0:
continue

# Calculate cost effectiveness (elements covered per unit cost)
cost_effectiveness = new_elements / costs[set_id]

if best_set is None or cost_effectiveness > best_cost_effectiveness:
best_set = set_id
best_cost_effectiveness = cost_effectiveness

# If no set covers any remaining elements, break
if best_set is None:
break

# Add the best set to the cover
cover.append(best_set)
total_cost += costs[best_set]

# Remove covered elements
universe_copy -= sets[best_set]

return cover, total_cost

def visualize_set_cover(universe, sets, cover):
"""
Visualize the set cover problem and solution using a bipartite graph.

Args:
universe: Set of elements
sets: Dictionary mapping set_id to the set of elements
cover: List of set_ids that form the cover
"""
G = nx.Graph()

# Add nodes
for element in universe:
G.add_node(element, bipartite=0) # 0 for elements

for set_id in sets:
G.add_node(set_id, bipartite=1) # 1 for sets

# Add edges
for set_id, elements in sets.items():
for element in elements:
G.add_edge(set_id, element)

# Create layout
pos = nx.bipartite_layout(G, [node for node, attr in G.nodes(data=True) if attr['bipartite'] == 0])

# Set up colors
cover_color = 'red'
non_cover_color = 'lightblue'
element_color = 'lightgreen'

# Create node colors
node_colors = []
for node in G.nodes():
if node in cover:
node_colors.append(cover_color)
elif node in universe:
node_colors.append(element_color)
else:
node_colors.append(non_cover_color)

# Create edge colors
edge_colors = []
for u, v in G.edges():
if u in cover or v in cover:
edge_colors.append(cover_color)
else:
edge_colors.append('gray')

# Draw the graph
plt.figure(figsize=(12, 8))
nx.draw(G, pos,
with_labels=True,
node_color=node_colors,
edge_color=edge_colors,
node_size=700,
font_weight='bold',
width=1.5)

# Create a custom legend
from matplotlib.patches import Patch
legend_elements = [
Patch(facecolor=cover_color, edgecolor='black', label='Sets in Cover'),
Patch(facecolor=non_cover_color, edgecolor='black', label='Sets not in Cover'),
Patch(facecolor=element_color, edgecolor='black', label='Elements to Cover')
]
plt.legend(handles=legend_elements, loc='upper right')

plt.title('Set Cover Visualization')
plt.tight_layout()
plt.show()

def visualize_set_coverage_steps(universe, sets, cover):
"""
Visualize the progressive coverage of elements by the sets in the cover.

Args:
universe: Set of elements
sets: Dictionary mapping set_id to the set of elements
cover: List of set_ids that form the cover (ordered by selection)
"""
# Create a new figure
plt.figure(figsize=(12, 6))

# Convert universe to a list for consistent ordering
universe_list = list(universe)
universe_list.sort() # Sort for better visualization

# Create a matrix to show coverage
# Rows are steps (adding each set)
# Columns are elements
coverage_matrix = np.zeros((len(cover) + 1, len(universe_list)))

# First row shows no coverage
covered_elements = set()

# For each step (adding a set to the cover)
for i, set_id in enumerate(cover):
# Add newly covered elements
covered_elements.update(sets[set_id])
# Mark covered elements in the matrix
for j, element in enumerate(universe_list):
if element in covered_elements:
coverage_matrix[i+1, j] = 1

# Plot as a heatmap
plt.imshow(coverage_matrix, aspect='auto', cmap='Blues')

# Add labels
plt.yticks(np.arange(len(cover) + 1),
['Initial'] + [f'Add {set_id}' for set_id in cover])
plt.xticks(np.arange(len(universe_list)), universe_list, rotation=45)

# Add grid
plt.grid(False)
for i in range(len(cover) + 1):
plt.axhline(i - 0.5, color='gray', linestyle='-', linewidth=0.5)
for j in range(len(universe_list)):
plt.axvline(j - 0.5, color='gray', linestyle='-', linewidth=0.5)

# Add coverage percentage for each step
for i in range(len(cover) + 1):
coverage_pct = np.sum(coverage_matrix[i]) / len(universe_list) * 100
plt.text(len(universe_list) - 0.5, i, f'{coverage_pct:.1f}%',
verticalalignment='center')

plt.colorbar(label='Covered (1) / Not Covered (0)')
plt.title('Progressive Element Coverage')
plt.tight_layout()
plt.show()

# Define our problem
universe = {'City1', 'City2', 'City3', 'City4', 'City5',
'City6', 'City7', 'City8', 'City9', 'City10'}

sets = {
'Station1': {'City1', 'City2', 'City3'},
'Station2': {'City2', 'City4', 'City5', 'City6'},
'Station3': {'City3', 'City6', 'City7'},
'Station4': {'City4', 'City8', 'City9'},
'Station5': {'City5', 'City7', 'City9', 'City10'},
'Station6': {'City1', 'City10'}
}

# Solve using the greedy algorithm
cover, total_cost = greedy_set_cover(universe, sets)
print(f"Optimal cover: {cover}")
print(f"Total cost: {total_cost}")
print(f"Number of sets used: {len(cover)}")

# Verify the cover is valid
covered = set()
for set_id in cover:
covered.update(sets[set_id])

print(f"All elements covered: {covered == universe}")

# Visualize the solution
visualize_set_cover(universe, sets, cover)
visualize_set_coverage_steps(universe, sets, cover)

Code Explanation

Let’s break down the key components of the solution:

1. The Greedy Set Cover Algorithm

The core of our solution is the greedy_set_cover function.
This implements a well-known approximation algorithm with the following approach:

  1. At each step, select the set that covers the most uncovered elements per unit cost
  2. Add this set to our solution
  3. Remove the newly covered elements from our “yet to cover” list
  4. Repeat until all elements are covered

This greedy approach is guaranteed to produce a solution that’s within a logarithmic factor of the optimal solution.

The algorithm keeps track of the remaining elements to cover (universe_copy) and selects sets based on a “cost-effectiveness” metric - the number of new elements covered divided by the cost of the set.
In our example, all sets have a uniform cost of 1, but the code supports variable costs.

2. Visualization Functions

I’ve included two visualization functions to help understand the problem and solution:

visualize_set_cover

This function creates a bipartite graph representation of the problem:

  • One set of nodes represents the elements (cities)
  • Another set represents the sets (radio stations)
  • Edges connect each set to the elements it contains
  • Sets in the solution are highlighted in red

The function uses NetworkX to create and draw the graph, with color coding to distinguish between elements, sets in the cover, and sets not in the cover.

visualize_set_coverage_steps

This function shows how the coverage progresses as we add each set:

  • Each row represents a step in the solution
  • Each column represents an element
  • Cells are colored based on whether the element is covered at that step
  • The percentage of covered elements is shown for each step

Results and Analysis

When we run the code on our radio station example, we get:

1
2
3
4
Optimal cover: ['Station2', 'Station5', 'Station4', 'Station1']
Total cost: 4
Number of sets used: 4
All elements covered: True

The greedy algorithm selected 4 stations out of the 6 available:

  1. Station2 (covers City2, City4, City5, City6)
  2. Station5 (covers City5, City7, City9, City10)
  3. Station4 (covers City4, City8, City9)
  4. Station1 (covers City1, City2, City3)

The visualizations clearly show how these stations collectively cover all cities.
The coverage step chart shows how we progressively increase coverage as each station is added to our solution.


Computational Complexity

The time complexity of the greedy algorithm is O(|S| × |U|), where |S| is the number of sets and |U| is the size of the universe.
This is because in each iteration, we need to check all remaining sets against all remaining elements.

While this is not guaranteed to find the absolute optimal solution (which would require solving an NP-hard problem), the greedy approach provides a good approximation that works well in practice.

Conclusion

The Set Cover Problem is a fantastic example of how approximation algorithms can provide practical solutions to otherwise intractable problems.

Our Python implementation demonstrates not only how to solve such problems programmatically but also how to visualize the results to gain deeper insights.

Next time you’re planning radio coverage, facility locations, or any resource allocation problem, remember that these classical algorithms can save you significant time and resources!

Feel free to experiment with different problem instances or extend the code to handle specific constraints in your own applications.

Realistic Political Redistricting Problem

Solving a Realistic Political Redistricting Problem with Python

Today, I want to dive into one of the most fascinating yet complex problems in political geography: redistricting.

This computational problem has real-world implications for democratic representation, and Python provides us with powerful tools to model and solve it.

Understanding the Redistricting Problem

Redistricting involves dividing a geographic area into electoral districts, aiming to create fair representation while satisfying various constraints like population equality, compactness, and sometimes considerations of communities of interest.

Mathematically, we can express the goal of creating districts with roughly equal populations as:

Where P(Di) represents the population of district i.

A Realistic Scenario

Let’s tackle a scenario where we need to divide a state with 100 precincts into 5 congressional districts. We’ll try to:

  1. Balance population across districts
  2. Ensure geographic contiguity
  3. Optimize for compactness
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
import random
from matplotlib.colors import ListedColormap
from scipy.spatial import Voronoi

# Set random seed for reproducibility
np.random.seed(42)
random.seed(42)

# Generate synthetic data for our state
num_precincts = 100
num_districts = 5

# Create precinct locations and populations
x_coords = np.random.rand(num_precincts) * 100
y_coords = np.random.rand(num_precincts) * 100
positions = np.column_stack((x_coords, y_coords))

# Generate population data with some variation
base_population = 10000
population_variation = 3000
populations = np.random.randint(
base_population - population_variation,
base_population + population_variation,
num_precincts
)

# Create a graph where precincts are nodes connected to nearby precincts
G = nx.Graph()

# Add nodes with attributes
for i in range(num_precincts):
G.add_node(i, pos=(x_coords[i], y_coords[i]), population=populations[i])

# Connect precincts that are close to each other
distance_threshold = 15 # Adjust based on your coordinate scale
for i in range(num_precincts):
for j in range(i+1, num_precincts):
dist = np.sqrt((x_coords[i] - x_coords[j])**2 + (y_coords[i] - y_coords[j])**2)
if dist < distance_threshold:
G.add_edge(i, j)

# Check if the graph is connected
if not nx.is_connected(G):
largest_cc = max(nx.connected_components(G), key=len)
if len(largest_cc) < num_precincts:
print(f"Warning: Generated graph is not connected. Using largest component with {len(largest_cc)} nodes.")
G = G.subgraph(largest_cc).copy()
# Adjust our precinct count
num_precincts = len(G.nodes)

# Helper function to check if a district is contiguous
def is_contiguous(G, district_nodes):
if not district_nodes:
return True
subgraph = G.subgraph(district_nodes)
return nx.is_connected(subgraph)

# Helper function to calculate district population
def district_population(district, populations):
return sum(populations[node] for node in district)

# Initialize districts using region growing approach
def initialize_districts(G, num_districts, populations):
nodes = list(G.nodes())

# Start with seed nodes far apart
seed_indices = np.linspace(0, len(nodes)-1, num_districts, dtype=int)
seeds = [nodes[i] for i in seed_indices]

# Initialize districts with seed nodes
districts = [set([seed]) for seed in seeds]
unassigned = set(nodes) - set(seeds)

# Grow districts by adding adjacent nodes one at a time
while unassigned:
for d_idx, district in enumerate(districts):
if not unassigned:
break

# Find unassigned nodes adjacent to this district
frontier = set()
for node in district:
frontier.update(n for n in G.neighbors(node) if n in unassigned)

if frontier:
# Add one node to the district
new_node = random.choice(list(frontier))
district.add(new_node)
unassigned.remove(new_node)

return districts

# Score function to evaluate a districting plan
def score_plan(districts, populations):
district_pops = [district_population(d, populations) for d in districts]

# Population balance score (lower is better)
avg_pop = sum(district_pops) / len(district_pops)
pop_deviation = max(abs(p - avg_pop) / avg_pop for p in district_pops)

return pop_deviation

# Local search algorithm to improve the districting plan
def optimize_districts(G, initial_districts, populations, iterations=1000):
current_districts = [set(d) for d in initial_districts]
current_score = score_plan(current_districts, populations)

best_districts = [set(d) for d in current_districts]
best_score = current_score

for i in range(iterations):
# Select a random border precinct
moved = False

# Try to move a random node from one district to another
for source_idx in random.sample(range(len(current_districts)), len(current_districts)):
if moved:
break

source_district = current_districts[source_idx]

# Find border nodes (nodes with neighbors in other districts)
border_nodes = set()
for node in source_district:
for neighbor in G.neighbors(node):
if any(neighbor in d for d in current_districts if d != source_district):
border_nodes.add(node)
break

if not border_nodes:
continue

# Try to move a random border node
node_to_move = random.choice(list(border_nodes))

# Find neighboring districts
target_districts = []
for d_idx, district in enumerate(current_districts):
if d_idx != source_idx:
for neighbor in G.neighbors(node_to_move):
if neighbor in district:
target_districts.append(d_idx)
break

if not target_districts:
continue

target_idx = random.choice(target_districts)

# Check if removing the node keeps the source district contiguous
temp_source = source_district.copy()
temp_source.remove(node_to_move)

if not is_contiguous(G, temp_source):
continue

# Make the move
new_districts = [set(d) for d in current_districts]
new_districts[source_idx].remove(node_to_move)
new_districts[target_idx].add(node_to_move)

# Evaluate the new plan
new_score = score_plan(new_districts, populations)

# Accept if it's better
if new_score < current_score:
current_districts = new_districts
current_score = new_score
moved = True

# Update best if it's the best so far
if new_score < best_score:
best_districts = [set(d) for d in new_districts]
best_score = new_score
print(f"Iteration {i}: New best score = {best_score:.4f}")

return best_districts, best_score

# Run the algorithm
print("Initializing districts...")
initial_districts = initialize_districts(G, num_districts, populations)
print("Optimizing districts...")
optimized_districts, final_score = optimize_districts(G, initial_districts, populations)

# Calculate district populations
district_pops = [district_population(d, populations) for d in optimized_districts]
total_pop = sum(populations)
district_pop_percentages = [p/total_pop*100 for p in district_pops]

# Prepare data for visualization
node_colors = np.zeros(num_precincts)
for i, district in enumerate(optimized_districts):
for node in district:
node_colors[node] = i

# Get positions for drawing
pos = nx.get_node_attributes(G, 'pos')

# Set up a colorful visualization
plt.figure(figsize=(15, 10))

# Plot the graph with district colors
cmap = plt.cm.get_cmap('tab10', num_districts)
nx.draw(G, pos, node_color=node_colors, cmap=cmap, node_size=100, with_labels=False, width=0.5)

# Add a title and legend
plt.title('Optimized Redistricting Plan', fontsize=16)
patches = [plt.plot([],[], marker="o", ms=10, ls="", mec=None, color=cmap(i),
label=f"District {i+1}: {district_pops[i]} people ({district_pop_percentages[i]:.1f}%)"
)[0] for i in range(num_districts)]
plt.legend(handles=patches, loc='upper right', fontsize=12)

plt.savefig('redistricting_plan.png', dpi=300, bbox_inches='tight')
plt.show()

# Create a population distribution bar chart
plt.figure(figsize=(12, 6))
bars = plt.bar(range(1, num_districts+1), district_pops, color=cmap(range(num_districts)))
plt.axhline(y=total_pop/num_districts, color='r', linestyle='--', label='Ideal Population')

# Add labels and annotations
plt.xlabel('District', fontsize=14)
plt.ylabel('Population', fontsize=14)
plt.title('Population Distribution by District', fontsize=16)
plt.xticks(range(1, num_districts+1))

# Add population values on top of bars
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height + 5000,
f'{int(height)}',
ha='center', va='bottom', fontsize=12)

plt.legend()
plt.grid(axis='y', alpha=0.3)
plt.savefig('district_populations.png', dpi=300, bbox_inches='tight')
plt.show()

# Calculate and display metrics
print("\nRedistricting Results:")
print(f"Number of districts: {num_districts}")
print(f"Total population: {total_pop}")
print(f"Ideal district population: {total_pop / num_districts:.1f}")
print("\nDistrict populations:")
for i, pop in enumerate(district_pops):
deviation = (pop - total_pop/num_districts) / (total_pop/num_districts) * 100
print(f"District {i+1}: {pop} people (deviation: {deviation:+.2f}%)")

print(f"\nPopulation deviation score: {final_score:.4f}")
print(f"Maximum population deviation: {max(abs(p - total_pop/num_districts) / (total_pop/num_districts) * 100 for p in district_pops):.2f}%")

Code Explanation

Let’s walk through this code step by step:

Data Generation

First, we create synthetic data to model our redistricting problem:

  • 100 precincts with random geographic coordinates
  • Population assigned to each precinct (around 10,000 people with some variation)
  • A graph structure where nearby precincts are connected

The graph representation is crucial - precincts are nodes, and edges represent geographic adjacency.
This helps us enforce contiguity constraints.

Districting Algorithm

Our approach consists of three main components:

  1. Initialization: We use a region-growing method that:

    • Selects seed precincts spaced across the map
    • Grows districts by gradually adding adjacent precincts
    • Ensures each district is geographically contiguous
  2. Scoring Function: We evaluate plans based on population equality:

    • Calculate population deviation from the ideal equal distribution
    • Lower scores indicate more balanced districts
  3. Optimization: We use a local search algorithm that:

    • Makes small changes by moving border precincts between districts
    • Only accepts changes that improve population balance
    • Ensures districts remain contiguous after each move
    • Runs for 1,000 iterations to find a good solution

The is_contiguous() function is particularly important as it ensures we don’t create fragmented districts.

Output

Initializing districts...
Optimizing districts...
Iteration 0: New best score = 0.3972
Iteration 1: New best score = 0.3822
Iteration 3: New best score = 0.3331
Iteration 8: New best score = 0.3270
Iteration 10: New best score = 0.3125
Iteration 15: New best score = 0.2905
Iteration 42: New best score = 0.2726
Iteration 45: New best score = 0.2364
Iteration 46: New best score = 0.2328
Iteration 98: New best score = 0.2312
Iteration 106: New best score = 0.2014

Visualization

We create two visualizations:

  1. A map showing the district boundaries with a color-coded legend

  2. A bar chart comparing populations across districts

Results Analysis

The algorithm produces 5 districts with these properties:

  1. Geographic Contiguity: Each district forms a single connected region, with no isolated enclaves.

  2. Population Balance: The algorithm minimizes population deviation between districts.
    The bar chart shows how close each district is to the ideal population (red dashed line).

  3. Compactness: Though not explicitly optimized for, the districts tend to be reasonably compact due to our graph construction.

The console output provides detailed statistics on:

  • Population of each district
  • Deviation from the ideal equal population
  • Overall population deviation score

Real-World Applications

This simplified model demonstrates key concepts in redistricting algorithms, but real-world applications would include additional considerations:

  1. Voting Rights Act Compliance: Ensuring minority communities have fair representation
  2. Preserving Communities of Interest: Keeping communities with shared concerns together
  3. Political Fairness Metrics: Analyzing partisan bias or competitiveness

Conclusion

Computational approaches to redistricting can help create more fair and balanced electoral maps.
While our algorithm focuses primarily on population equality and contiguity, it demonstrates how Python can be used to approach this complex political geography problem.

The visualization clearly shows how our algorithm divides the state into balanced districts.
Each color represents a different congressional district, with the legend indicating population counts and percentages.

Optimizing Sports League Schedules:A Python Approach to Minimizing Travel Distances

Solving a Realistic Sports Scheduling Problem with Python

Have you ever wondered how professional sports leagues create their game schedules?
It’s a fascinating optimization problem that combines mathematics and computer science.

Today, I’ll walk you through solving a realistic sports scheduling problem using Python.

The Problem: Regional Basketball League Scheduling

Let’s tackle scheduling for a regional basketball league with 8 teams. The constraints are:

  • Each team must play against every other team twice (home and away)
  • Games occur on weekends with 4 matches per weekend
  • No team should play more than one game per weekend
  • Travel distances should be minimized (teams located far from each other should have games consolidated)

This is a classic combinatorial optimization problem that appears simple but quickly becomes complex.

Let’s solve this using Python with the PuLP library for linear programming:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pulp import *
import networkx as nx
from itertools import combinations

# Define our teams with their locations (x,y coordinates)
teams = {
'Eagles': (10, 40),
'Sharks': (30, 10),
'Tigers': (90, 20),
'Hawks': (50, 50),
'Lions': (70, 80),
'Bears': (20, 70),
'Wolves': (60, 30),
'Panthers': (80, 60)
}

# Calculate distances between teams
distances = {}
for team1, loc1 in teams.items():
for team2, loc2 in teams.items():
if team1 != team2:
# Euclidean distance
dist = np.sqrt((loc1[0] - loc2[0])**2 + (loc1[1] - loc2[1])**2)
distances[(team1, team2)] = dist

# Total number of teams
n_teams = len(teams)

# Each team plays every other team twice (home and away)
n_games = n_teams * (n_teams - 1)

# Number of weekends needed (4 games per weekend, each team plays at most once)
n_weekends = n_games // 4

# Create the model
model = LpProblem("Sports_Scheduling", LpMinimize)

# Create variables
# game_vars[(i, j, w)] = 1 if team i plays against team j on weekend w
game_vars = {}
for i in teams:
for j in teams:
if i != j:
for w in range(n_weekends):
game_vars[(i, j, w)] = LpVariable(f"Game_{i}_{j}_{w}", 0, 1, LpBinary)

# Objective function: Minimize total travel distance
# We'll add a penalty for teams playing against each other on non-consecutive weekends
objective = lpSum([game_vars[(i, j, w)] * distances[(i, j)] for i in teams for j in teams for w in range(n_weekends) if i != j])
model += objective

# Constraints

# 1. Each team plays against every other team twice (once home, once away)
for i in teams:
for j in teams:
if i != j:
model += lpSum([game_vars[(i, j, w)] for w in range(n_weekends)]) == 1
model += lpSum([game_vars[(j, i, w)] for w in range(n_weekends)]) == 1

# 2. Each team plays at most one game per weekend
for w in range(n_weekends):
for i in teams:
model += lpSum([game_vars[(i, j, w)] + game_vars[(j, i, w)] for j in teams if i != j]) <= 1

# 3. Exactly 4 games per weekend
for w in range(n_weekends):
model += lpSum([game_vars[(i, j, w)] for i in teams for j in teams if i != j]) == 4

# Solve the model
model.solve(PULP_CBC_CMD(msg=False))

# Print the status of the solution
print("Status:", LpStatus[model.status])

# Organize the schedule
schedule = {w: [] for w in range(n_weekends)}
for i in teams:
for j in teams:
if i != j:
for w in range(n_weekends):
if game_vars[(i, j, w)].value() == 1:
schedule[w].append((i, j))

# Create a DataFrame for better visualization
schedule_df = pd.DataFrame(index=range(n_weekends), columns=['Game 1', 'Game 2', 'Game 3', 'Game 4'])
for w in range(n_weekends):
for g in range(4):
if g < len(schedule[w]):
home, away = schedule[w][g]
schedule_df.loc[w, f'Game {g+1}'] = f"{home} vs {away}"

# Print the schedule
print("\nOptimized Schedule:")
print(schedule_df)

# Visualize team locations and distances
plt.figure(figsize=(10, 8))
for team, (x, y) in teams.items():
plt.scatter(x, y, s=100)
plt.text(x+2, y+2, team, fontsize=12)

# Draw lines between teams that play on the same weekend
colors = ['red', 'blue', 'green', 'purple', 'orange', 'brown', 'pink', 'gray', 'olive', 'cyan', 'magenta', 'yellow']
for w in range(n_weekends):
for home, away in schedule[w]:
x1, y1 = teams[home]
x2, y2 = teams[away]
plt.plot([x1, x2], [y1, y2], color=colors[w % len(colors)], alpha=0.5, linewidth=2)

plt.title('Team Locations and Game Connections by Weekend')
plt.xlabel('X Coordinate')
plt.ylabel('Y Coordinate')
plt.grid(True)

# Create a legend for weekends
from matplotlib.lines import Line2D
legend_elements = [Line2D([0], [0], color=colors[w % len(colors)], lw=2, label=f'Weekend {w+1}')
for w in range(n_weekends)]
plt.legend(handles=legend_elements, loc='upper right')

plt.savefig('team_locations.png')
plt.show()

# Visualize the schedule as a heatmap to see when each team plays
team_schedule = pd.DataFrame(0, index=teams.keys(), columns=range(1, n_weekends+1))

for w in range(n_weekends):
for home, away in schedule[w]:
team_schedule.loc[home, w+1] = 1 # Home game
team_schedule.loc[away, w+1] = -1 # Away game

plt.figure(figsize=(12, 8))
sns.heatmap(team_schedule, cmap="coolwarm", center=0, linewidths=.5,
cbar_kws={'label': 'Game Type', 'ticks':[-1, 0, 1]})
plt.title('Team Schedule Heatmap (1: Home Game, -1: Away Game, 0: No Game)')
plt.xlabel('Weekend Number')
plt.ylabel('Team')

# Customize the colorbar
colorbar = plt.gcf().axes[1]
colorbar.set_yticklabels(['Away', 'No Game', 'Home'])

plt.tight_layout()
plt.savefig('schedule_heatmap.png')
plt.show()

# Calculate total distance traveled by each team
team_distances = {team: 0 for team in teams}
for w in range(n_weekends):
for home, away in schedule[w]:
team_distances[away] += distances[(away, home)]

# Plot total distance traveled
plt.figure(figsize=(10, 6))
plt.bar(team_distances.keys(), team_distances.values())
plt.title('Total Distance Traveled by Each Team')
plt.xlabel('Team')
plt.ylabel('Distance')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('travel_distances.png')
plt.show()

# Create a network graph to visualize the schedule
G = nx.DiGraph()
for team in teams:
G.add_node(team)

for w in range(n_weekends):
for home, away in schedule[w]:
G.add_edge(home, away, weekend=w+1)

plt.figure(figsize=(12, 10))
pos = {team: loc for team, loc in teams.items()}
nx.draw_networkx_nodes(G, pos, node_size=700, node_color='lightblue')
nx.draw_networkx_labels(G, pos, font_size=10)

# Draw edges with different colors for each weekend
for w in range(n_weekends):
edge_list = [(home, away) for home, away in G.edges() if G[home][away]['weekend'] == w+1]
nx.draw_networkx_edges(G, pos, edgelist=edge_list, edge_color=colors[w % len(colors)],
arrows=True, width=1.5, alpha=0.7)

# Add edge labels indicating the weekend number
edge_labels = {(home, away): f"Week {G[home][away]['weekend']}" for home, away in G.edges()}
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_size=8)

plt.title('Game Schedule Network')
plt.axis('off')
plt.tight_layout()
plt.savefig('schedule_network.png')
plt.show()

Code Explanation

Let’s break down the solution step by step:

1. Problem Setup

First, I define our 8 teams with their geographic locations using x-y coordinates.
This is realistic because in actual sports scheduling, the locations of teams are critical for minimizing travel distances.

1
2
3
4
5
6
teams = {
'Eagles': (10, 40),
'Sharks': (30, 10),
'Tigers': (90, 20),
...
}

Then I calculate the Euclidean distances between each pair of teams.
In a real-world scenario, these would be actual road distances or travel times.

2. Mathematical Model

This is where the optimization begins.
The problem is formulated as a linear programming model using PuLP:

1
model = LpProblem("Sports_Scheduling", LpMinimize)

The key decision variables are binary variables indicating whether team i plays against team j on weekend w:

1
game_vars[(i, j, w)] = LpVariable(f"Game_{i}_{j}_{w}", 0, 1, LpBinary)

3. Objective Function

Our goal is to minimize the total travel distance:

1
objective = lpSum([game_vars[(i, j, w)] * distances[(i, j)] for i in teams for j in teams for w in range(n_weekends) if i != j])

This can be represented mathematically as:

miniTeamsjTeams,jiwWeekendsgamei,j,w×distancei,j

4. Constraints

Several constraints ensure a valid schedule:

  1. Each team plays against every other team exactly once at home and once away:

wWeekendsgamei,j,w=1i,jTeams,ij

  1. Each team plays at most one game per weekend:

jTeams,ji(gamei,j,w+gamej,i,w)1iTeams,wWeekends

  1. Exactly 4 games per weekend:

iTeamsjTeams,jigamei,j,w=4wWeekends

5. Solution Extraction

After solving the model, I extract the schedule into a more readable format:

1
2
3
4
5
6
7
schedule = {w: [] for w in range(n_weekends)}
for i in teams:
for j in teams:
if i != j:
for w in range(n_weekends):
if game_vars[(i, j, w)].value() == 1:
schedule[w].append((i, j))

Output

Status: Optimal

Optimized Schedule:
                Game 1              Game 2             Game 3  \
0      Sharks vs Lions     Tigers vs Hawks  Bears vs Panthers   
1     Sharks vs Wolves  Tigers vs Panthers     Lions vs Hawks   
2   Sharks vs Panthers     Tigers vs Bears    Lions vs Eagles   
3     Eagles vs Sharks     Bears vs Tigers    Wolves vs Lions   
4     Eagles vs Wolves     Sharks vs Bears  Hawks vs Panthers   
5     Tigers vs Wolves     Hawks vs Eagles    Bears vs Sharks   
6      Eagles vs Bears     Hawks vs Sharks    Lions vs Wolves   
7      Eagles vs Hawks     Tigers vs Lions   Wolves vs Sharks   
8   Eagles vs Panthers    Tigers vs Sharks     Hawks vs Lions   
9      Hawks vs Tigers     Lions vs Sharks    Bears vs Wolves   
10    Sharks vs Eagles   Lions vs Panthers     Bears vs Hawks   
11     Eagles vs Lions    Sharks vs Tigers     Hawks vs Bears   
12    Tigers vs Eagles     Hawks vs Wolves     Lions vs Bears   
13    Eagles vs Tigers     Sharks vs Hawks     Bears vs Lions   

                Game 4  
0     Wolves vs Eagles  
1      Bears vs Eagles  
2      Wolves vs Hawks  
3    Panthers vs Hawks  
4      Lions vs Tigers  
5    Panthers vs Lions  
6   Panthers vs Tigers  
7    Panthers vs Bears  
8      Wolves vs Bears  
9   Panthers vs Eagles  
10    Wolves vs Tigers  
11  Panthers vs Wolves  
12  Panthers vs Sharks  
13  Wolves vs Panthers  

Visualization and Analysis

I’ve created several visualizations to help understand the optimized schedule:

1. Geographic Visualization

The first plot shows the geographic locations of teams with lines connecting teams that play on the same weekend.
Each color represents a different weekend:

This visualization is particularly useful for identifying whether games are geographically efficient - teams that are closer to each other should ideally play on the same weekends to minimize travel.

2. Schedule Heatmap

The heatmap provides a clear view of when each team plays, and whether they’re playing home (1) or away (-1):

This visualization helps identify whether the schedule is balanced.
Each team should have a mix of home and away games, and we can quickly see if any team has too many consecutive home or away games.

3. Travel Distance Analysis

This bar chart shows the total distance traveled by each team:

In an ideal schedule, these values should be relatively balanced to ensure fairness.
Wide variations might indicate scheduling inefficiencies.

4. Network Graph

Finally, the network graph shows the complete schedule with directed edges indicating who plays whom and when:

This graph helps identify the overall structure of the schedule and any potential patterns.

Conclusion

Sports scheduling is a complex optimization problem with numerous constraints.
The solution I’ve presented uses linear programming to find an optimal schedule that minimizes travel distances while satisfying all the logical constraints of a sports league.

In real-world applications, additional constraints might include:

  • Arena availability
  • TV broadcasting requirements
  • Avoiding conflicts with other major events
  • Balancing the distribution of strong opponents throughout the season

This approach can be extended to larger leagues with more complex constraints, though the computational complexity increases significantly with the number of teams and constraints.

Circuit Board Testing

A Comprehensive Python Analysis

In the world of electronics manufacturing, circuit board testing is crucial for ensuring product quality and reliability.

Today, I’ll walk you through a realistic Python implementation of circuit board defect analysis using statistical methods.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

# Simulated Circuit Board Test Data
np.random.seed(42)

class CircuitBoardTesting:
def __init__(self, num_boards=1000):
"""
Initialize circuit board testing simulation

Parameters:
- num_boards: Total number of circuit boards to simulate
"""
self.num_boards = num_boards
self.defect_rates = self._generate_defect_data()

def _generate_defect_data(self):
"""
Generate simulated defect data with realistic variations

Defect types:
1. Solder joint issues
2. Component misalignment
3. Trace discontinuity
4. Electrical short
"""
defect_types = [
'Solder Joint Issues',
'Component Misalignment',
'Trace Discontinuity',
'Electrical Short'
]

# Realistic defect rate generation
defect_rates = {
defect: np.random.normal(loc=0.02, scale=0.005, size=self.num_boards)
for defect in defect_types
}

return defect_rates

def calculate_statistics(self):
"""
Calculate statistical metrics for defect analysis

Returns:
- DataFrame with defect statistics
"""
stats_data = {}
for defect, rates in self.defect_rates.items():
# Corrected confidence interval calculation
confidence_interval = stats.t.interval(
confidence=0.95, # Added 'confidence' parameter
df=len(rates)-1,
loc=np.mean(rates),
scale=stats.sem(rates)
)

stats_data[defect] = {
'Mean': np.mean(rates),
'Std Dev': np.std(rates),
'Min': np.min(rates),
'Max': np.max(rates),
'Confidence Interval': confidence_interval
}

return pd.DataFrame.from_dict(stats_data, orient='index')

def plot_defect_distribution(self):
"""
Visualize defect rate distributions
"""
plt.figure(figsize=(12, 6))

for i, (defect, rates) in enumerate(self.defect_rates.items(), 1):
plt.subplot(2, 2, i)
plt.hist(rates, bins=30, alpha=0.7, color=f'C{i-1}')
plt.title(f'{defect} Defect Rate')
plt.xlabel('Defect Rate')
plt.ylabel('Frequency')

plt.tight_layout()
plt.show()

# Perform analysis
testing = CircuitBoardTesting(num_boards=1000)
defect_stats = testing.calculate_statistics()
print("Defect Rate Statistics:")
print(defect_stats)

# Visualize defect distributions
testing.plot_defect_distribution()

Detailed Code Explanation

Simulation Approach

  1. We create a CircuitBoardTesting class to simulate circuit board defect analysis.
  2. The simulation uses numpy to generate realistic defect rates with normal distribution.

Statistical Analysis

Key statistical methods include:

  • Mean defect rate calculation
  • Standard deviation measurement
  • 95% confidence interval determination

Mathematical Representation

The defect rate follows a normal distribution with:

  • μ=0.02 (mean defect rate)
  • σ=0.005 (standard deviation)

Key Visualization Insights

The generated plots will show:

  • Distribution of defect rates for each defect type
  • Frequency of occurrence
  • Variation in defect rates

Result

Defect Rate Statistics:
                            Mean   Std Dev       Min       Max  \
Solder Joint Issues     0.020097  0.004894  0.003794  0.039264   
Component Misalignment  0.020354  0.004985  0.005298  0.035966   
Trace Discontinuity     0.020029  0.004915  0.004902  0.039631   
Electrical Short        0.019906  0.005133  0.005353  0.036215   

                                                 Confidence Interval  
Solder Joint Issues       (0.01979283559301649, 0.02040048496520677)  
Component Misalignment   (0.02004469759714213, 0.020663664775349432)  
Trace Discontinuity     (0.019724031341496525, 0.020334310804063716)  
Electrical Short        (0.019587711981175548, 0.020225095840956723)  

Interpretation of Results

By analyzing the output, we can:

  • Identify potential manufacturing issues
  • Understand defect rate variability
  • Make data-driven quality improvement decisions

The simulation provides a realistic framework for circuit board testing analysis, demonstrating how statistical methods can be applied to manufacturing quality control.

Optimal Task Assignment Optimization and the Hungarian Algorithm

I’ll solve a Task Assignment problem using Python, focusing on optimization and visualization.

I’ll use the Hungarian algorithm (also known as the Munkres algorithm) to solve the optimal assignment problem.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import numpy as np
import scipy.optimize as optimize
import matplotlib.pyplot as plt
import seaborn as sns

# Cost matrix representing the efficiency of employees for different tasks
# Lower values indicate better performance
cost_matrix = np.array([
[10, 15, 9, 12], # Employee 1's performance on tasks
[14, 8, 11, 13], # Employee 2's performance on tasks
[12, 16, 10, 14], # Employee 3's performance on tasks
[9, 13, 15, 11] # Employee 4's performance on tasks
])

# Solving the assignment problem
row_ind, col_ind = optimize.linear_sum_assignment(cost_matrix)

# Calculating total optimization cost
total_cost = cost_matrix[row_ind, col_ind].sum()

# Detailed assignment results
assignments = [(f"Employee {r+1}", f"Task {c+1}", cost_matrix[r, c])
for r, c in zip(row_ind, col_ind)]

# Visualization of the cost matrix and optimal assignment
plt.figure(figsize=(12, 5))

# Heatmap of the cost matrix
plt.subplot(1, 2, 1)
sns.heatmap(cost_matrix, annot=True, cmap='YlGnBu',
xticklabels=[f'Task {i+1}' for i in range(cost_matrix.shape[1])],
yticklabels=[f'Employee {i+1}' for i in range(cost_matrix.shape[0])])
plt.title('Task Assignment Cost Matrix')

# Bar plot of optimal assignments
plt.subplot(1, 2, 2)
assigned_costs = [cost for _, _, cost in assignments]
plt.bar([f'Employee {i+1} - Task {j+1}' for i, j in zip(row_ind, col_ind)],
assigned_costs, color='skyblue')
plt.title('Optimal Task Assignments')
plt.xticks(rotation=45)
plt.tight_layout()

# Print results
print("Optimal Task Assignments:")
for emp, task, cost in assignments:
print(f"{emp} assigned to {task} with cost {cost}")
print(f"\nTotal Optimization Cost: {total_cost}")

plt.show()

Let me break down the solution with mathematical notation and explanation:

Mathematical Formulation:

  • Let Cij represent the cost matrix where i is the employee and j is the task
  • Goal: Minimize ni=1mj=1xijCij
  • Subject to constraints:
    mj=1xij=1 (each employee assigned to one task)
    ni=1xij=1 (each task assigned to one employee)

Key Components:

  1. scipy.optimize.linear_sum_assignment(): Implements the Hungarian algorithm
  2. Uses a cost matrix representing task efficiency
  3. Finds the optimal assignment that minimizes total cost

Visualization Insights:

  • First subplot: Heatmap of the cost matrix
    • Darker colors indicate lower costs (better performance)
  • Second subplot: Bar chart of optimal assignments
    • Shows the cost of each optimal task assignment

Output:

Optimal Task Assignments:
Employee 1 assigned to Task 1 with cost 10
Employee 2 assigned to Task 2 with cost 8
Employee 3 assigned to Task 3 with cost 10
Employee 4 assigned to Task 4 with cost 11

Total Optimization Cost: 39

Output Explanation:

  • Prints the optimal assignment for each employee
  • Calculates and displays the total optimization cost

This approach demonstrates:

  • Optimization problem solving
  • Matrix manipulation
  • Visualization techniques
  • Performance analysis

The solution helps organizations efficiently allocate tasks by minimizing overall resource expenditure.

Graph Coloring:US States Border Problem

I’ll solve a map coloring problem - this time focusing on political boundaries to demonstrate the graph coloring technique.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
import networkx as nx
import matplotlib.pyplot as plt

# Define state adjacency (neighboring states)
state_borders = {
'California': ['Oregon', 'Nevada', 'Arizona'],
'Oregon': ['California', 'Washington', 'Nevada'],
'Washington': ['Oregon', 'Idaho'],
'Nevada': ['California', 'Oregon', 'Arizona', 'Utah', 'Idaho'],
'Arizona': ['California', 'Nevada', 'Utah'],
'Utah': ['Nevada', 'Arizona', 'Colorado', 'New Mexico'],
'Idaho': ['Washington', 'Nevada', 'Oregon', 'Utah', 'Wyoming', 'Montana'],
'Colorado': ['Utah', 'Wyoming', 'New Mexico'],
'Wyoming': ['Idaho', 'Utah', 'Colorado', 'Montana', 'South Dakota', 'Nebraska'],
'Montana': ['Idaho', 'Wyoming', 'South Dakota', 'North Dakota'],
'New Mexico': ['Utah', 'Colorado', 'Arizona'],
'South Dakota': ['Wyoming', 'Montana', 'Nebraska'],
'Nebraska': ['Wyoming', 'South Dakota'],
'North Dakota': ['Montana']
}

# Create graph of state borders
def create_state_border_graph(state_borders):
G = nx.Graph()
for state, neighbors in state_borders.items():
for neighbor in neighbors:
G.add_edge(state, neighbor)
return G

# Implement graph coloring algorithm
def color_states(G):
# Sort nodes by degree (number of connections) in descending order
sorted_nodes = sorted(G.nodes(), key=lambda x: G.degree(x), reverse=True)

# Initialize color map
color_map = {}
colors = ['Red', 'Blue', 'Green', 'Yellow']

for node in sorted_nodes:
# Find the first available color
used_colors = set(color_map.get(neighbor) for neighbor in G.neighbors(node) if neighbor in color_map)

for color in colors:
if color not in used_colors:
color_map[node] = color
break

return color_map

# Create the border graph
border_graph = create_state_border_graph(state_borders)

# Color the states
state_colors = color_states(border_graph)

# Visualization
plt.figure(figsize=(12, 8))
pos = nx.spring_layout(border_graph, seed=42, k=0.9)
node_colors = [state_colors.get(node, 'gray') for node in border_graph.nodes()]

nx.draw(border_graph, pos, with_labels=True, node_color=node_colors,
node_size=3000, font_size=8, font_weight='bold',
edge_color='gray', width=1)
plt.title('US States Border Coloring Problem')
plt.tight_layout()
plt.show()

# Print the color assignments
print("State Color Assignments:")
for state, color in state_colors.items():
print(f"{state}: {color}")

# Verify no adjacent states have the same color
def verify_coloring(G, color_map):
for state in G.nodes():
for neighbor in G.neighbors(state):
if color_map[state] == color_map[neighbor]:
print(f"Error: {state} and {neighbor} have the same color!")
return False
print("Coloring is valid: No adjacent states share a color.")
return True

verify_coloring(border_graph, state_colors)

Problem Statement

We want to color a map of selected Western United States where:

  • No two adjacent states can have the same color
  • Use the minimum number of colors possible

Mathematical Representation

Let G=(V,E) be a graph where:

  • V represents states
  • E represents borders between states

The goal is to assign colors such that (u,v)E, color(u)color(v)

Algorithm Highlights

  1. Welsh-Powell Graph Coloring Algorithm
  2. Greedy approach to color assignment
  3. Minimizes the number of colors used

Key Visualization Elements

  • Nodes represent states
  • Edges represent state borders
  • Colors represent unique time slots or regions

Complexity Analysis

  • Time Complexity: O(V2)
  • Space Complexity: O(V+E)

The code includes a verification function to ensure no adjacent states share a color, demonstrating the algorithm’s correctness.

Result

State Color Assignments:
Idaho: Red
Utah: Blue
Wyoming: Green
Nevada: Green
Oregon: Blue
Arizona: Red
Montana: Blue
California: Yellow
Colorado: Red
New Mexico: Green
South Dakota: Red
Washington: Green
Nebraska: Blue
North Dakota: Red
Coloring is valid: No adjacent states share a color.
True

Advanced Register Allocation

Intelligent Graph Coloring for Compiler Optimization

I’ll create a register allocation example that simulates a more realistic compiler scenario with a larger interference graph and additional complexity.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np

class AdvancedRegisterAllocator:
def __init__(self, interference_graph):
"""
Advanced register allocation with more sophisticated handling

Args:
interference_graph (dict): A complex dictionary of variable conflicts
"""
self.graph = nx.Graph(interference_graph)
self.num_registers = 0
self.coloring = {}
self.spill_cost = {}

def calculate_spill_cost(self):
"""
Calculate spill costs for each node based on graph characteristics

Spill cost considers:
1. Node degree (number of conflicts)
2. Local loop frequency
3. Variable live range
"""
for node in self.graph.nodes():
# Degree-based cost
degree_cost = self.graph.degree(node)

# Simulated loop frequency (hypothetical additional complexity)
loop_frequency = np.random.uniform(1, 5)

# Live range approximation (simulated)
live_range = np.random.uniform(1, 10)

# Combined spill cost
self.spill_cost[node] = (degree_cost * loop_frequency) / live_range

def advanced_color(self, num_registers):
"""
Advanced graph coloring with spill prioritization

Args:
num_registers (int): Number of available registers

Returns:
dict: Mapping of variables to register numbers
"""
self.num_registers = num_registers
self.calculate_spill_cost()

# Sort nodes by spill cost (descending) and degree (descending)
nodes = sorted(
self.graph.nodes(),
key=lambda x: (self.spill_cost[x], self.graph.degree(x)),
reverse=True
)

spilled_variables = []

for node in nodes:
# Find used colors of neighboring nodes
used_colors = {self.coloring.get(neighbor) for neighbor in self.graph.neighbors(node) if neighbor in self.coloring}

# Find the lowest available color
color_found = False
for color in range(num_registers):
if color not in used_colors:
self.coloring[node] = color
color_found = True
break

# Track spilled variables
if not color_found:
spilled_variables.append((node, self.spill_cost[node]))

return {
'allocation': self.coloring,
'spilled': spilled_variables
}

def visualize_graph(self):
"""
Advanced graph visualization with spill information
"""
plt.figure(figsize=(15, 10))
pos = nx.spring_layout(self.graph, k=0.5) # Adjust layout

# Color mapping
color_map = plt.cm.get_cmap('tab20')
node_colors = [self.coloring.get(node, -1) for node in self.graph.nodes()]

# Draw graph
nx.draw(
self.graph,
pos,
with_labels=True,
node_color=[color_map(c/self.num_registers) if c != -1 else 'red' for c in node_colors],
node_size=1000,
font_size=10,
font_weight='bold',
edge_color='gray'
)

# Spill cost annotation
for node, (x, y) in pos.items():
plt.text(
x, y+0.1,
f'Spill Cost: {self.spill_cost[node]:.2f}',
horizontalalignment='center',
fontsize=8
)

plt.title('Advanced Interference Graph with Register Allocation')
plt.tight_layout()
plt.show()

# Simulate a more complex compiler scenario
# Larger interference graph with multiple conflict regions
interference_graph = {
'temp1': ['var_a', 'var_b', 'loop_counter'],
'temp2': ['var_c', 'var_d', 'array_index'],
'var_a': ['temp1', 'var_b', 'result'],
'var_b': ['temp1', 'var_a', 'loop_counter'],
'var_c': ['temp2', 'var_d', 'array_length'],
'var_d': ['temp2', 'var_c', 'result'],
'loop_counter': ['temp1', 'var_b', 'array_index'],
'array_index': ['temp2', 'loop_counter', 'array_length'],
'result': ['var_a', 'var_d', 'array_length'],
'array_length': ['var_c', 'array_index', 'result']
}

# Create advanced register allocator
np.random.seed(42) # For reproducibility
allocator = AdvancedRegisterAllocator(interference_graph)

# Attempt register allocation with different register constraints
print("Register Allocation Simulation:\n")

# Try with 4 registers
print("Scenario 1: 4 Registers")
result_4 = allocator.advanced_color(4)
print("Register Allocation:")
for var, reg in result_4['allocation'].items():
print(f"Variable {var}: Register {reg}")

print("\nSpilled Variables:")
for var, cost in result_4['spilled']:
print(f"Variable: {var}, Spill Cost: {cost:.2f}")

# Visualize the graph
allocator.visualize_graph()

Advanced Register Allocation Problem

Key Enhancements

  1. Complex Interference Graph

    • More variables with intricate conflict relationships
    • Simulates a real compiler scenario with multiple computation regions
  2. Spill Cost Calculation
    Mathematical model for spill cost:

    Spill Cost=DegreeLoop FrequencyLive Range

  3. Advanced Allocation Strategy

    • Prioritize variables based on spill cost
    • Handle variables that cannot be allocated to registers
    • Provide detailed allocation and spill information

Complexity Improvements

  • Dynamic spill cost calculation
  • Prioritization of variable allocation
  • Visualization of allocation and conflict regions

Scenario Breakdown

  1. Input: Complex interference graph with 10 variables
  2. Process:
    • Calculate individual spill costs
    • Attempt register allocation
    • Track unallocated (spilled) variables
  3. Output:
    • Register mapping
    • Spilled variable details
    • Graph visualization

Result

Register Allocation Simulation:

Scenario 1: 4 Registers
Register Allocation:
Variable loop_counter: Register 0
Variable array_length: Register 0
Variable var_a: Register 0
Variable array_index: Register 1
Variable temp2: Register 0
Variable var_c: Register 1
Variable result: Register 1
Variable temp1: Register 1
Variable var_b: Register 2
Variable var_d: Register 2

Practical Implications

  • Mimics real-world compiler register allocation challenges
  • Demonstrates handling of variable conflicts
  • Provides insights into optimization strategies

Job Shop Scheduling Optimization

Python Implementation with Visualization and Analysis

I’ll create a practical scheduling problem, solve it using Python, and visualize the results with graphs.

I’ll include mathematical formulas and provide a clear explanation of the code.

Scheduling Problem: Job Shop Scheduling

Let’s solve a job shop scheduling problem, which is a common scheduling challenge in manufacturing and operations research.

Problem Definition

We have:

  • 5 jobs to be processed
  • 4 machines
  • Each job must go through all machines in a specific order
  • Each job has different processing times on each machine
  • The goal is to minimize the total completion time (makespan)

Mathematically, we want to minimize:
Cmax=maxjJobsCj

Where Cj is the completion time of job j.

Solution Using Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import random
from collections import defaultdict

# Set seed for reproducibility
np.random.seed(42)

# Problem Setup
num_jobs = 5
num_machines = 4

# Generate processing times for each job on each machine
processing_times = np.random.randint(1, 10, size=(num_jobs, num_machines))

# Generate machine order for each job (each job must visit all machines in a specific order)
machine_orders = []
for _ in range(num_jobs):
order = list(range(num_machines))
random.shuffle(order)
machine_orders.append(order)

# Display the problem data
print("Processing Times:")
processing_df = pd.DataFrame(processing_times,
columns=[f"Machine {i}" for i in range(num_machines)],
index=[f"Job {i}" for i in range(num_jobs)])
print(processing_df)

print("\nMachine Orders:")
for i, order in enumerate(machine_orders):
print(f"Job {i}: {' -> '.join(['Machine ' + str(m) for m in order])}")

# Priority Rule-Based Scheduling: Shortest Processing Time (SPT)
def apply_spt_scheduling():
# Initialize machine and job status
machine_available_time = [0] * num_machines
job_completion = [[0] * num_machines for _ in range(num_jobs)]
job_current_step = [0] * num_jobs
job_completion_time = [0] * num_jobs

# Schedule tracking
schedule = defaultdict(list) # machine -> [(job, start_time, end_time), ...]

# Continue until all jobs are completed
while min(job_current_step) < num_machines:
# Find eligible operations (jobs that haven't completed all steps)
eligible_ops = [(j, job_current_step[j]) for j in range(num_jobs) if job_current_step[j] < num_machines]

if not eligible_ops:
break

# For each eligible operation, calculate when it can start
op_details = []
for job, step in eligible_ops:
machine = machine_orders[job][step]
prev_step_completion = 0 if step == 0 else job_completion[job][step-1]
possible_start_time = max(machine_available_time[machine], prev_step_completion)
processing_time = processing_times[job][machine]
op_details.append((job, machine, step, possible_start_time, processing_time))

# Sort by possible start time + processing time (SPT)
op_details.sort(key=lambda x: x[3] + x[4])

# Schedule the first operation
job, machine, step, start_time, process_time = op_details[0]
end_time = start_time + process_time

# Update tracking information
machine_available_time[machine] = end_time
job_completion[job][step] = end_time
job_current_step[job] += 1

if job_current_step[job] == num_machines:
job_completion_time[job] = end_time

# Record the scheduled operation
schedule[machine].append((job, start_time, end_time))

makespan = max(job_completion_time)
return schedule, makespan, job_completion_time

# Apply the SPT scheduling algorithm
schedule, makespan, job_completion_times = apply_spt_scheduling()

print("\nSchedule Results:")
print(f"Makespan: {makespan}")
print("Job Completion Times:", job_completion_times)

# Visualization
def visualize_schedule(schedule, makespan):
fig, ax = plt.subplots(figsize=(12, 6))

# Define colors for jobs
colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99', '#c2c2f0']

# Plot each machine's schedule
for machine_id in range(num_machines):
y_pos = machine_id
for job, start, end in schedule[machine_id]:
width = end - start
rect = patches.Rectangle((start, y_pos-0.4), width, 0.8,
edgecolor='black', facecolor=colors[job], alpha=0.7)
ax.add_patch(rect)

# Add job label in the center of the rectangle
ax.text(start + width/2, y_pos, f'J{job}',
ha='center', va='center', fontweight='bold')

# Set plot limits and labels
ax.set_xlim(0, makespan + 1)
ax.set_ylim(-0.5, num_machines - 0.5)
ax.set_yticks(range(num_machines))
ax.set_yticklabels([f'Machine {i}' for i in range(num_machines)])
ax.set_xlabel('Time')
ax.set_title('Job Shop Schedule')

# Add gridlines
ax.grid(True, axis='x', linestyle='--', alpha=0.7)

plt.tight_layout()
return fig

# Create the visualization
schedule_fig = visualize_schedule(schedule, makespan)

# Gantt chart for job completion
def visualize_job_completion(schedule, job_completion_times):
# Reconstruct job timelines
job_timelines = defaultdict(list)

for machine, operations in schedule.items():
for job, start, end in operations:
job_timelines[job].append((machine, start, end))

# Sort each job's operations by start time
for job in job_timelines:
job_timelines[job].sort(key=lambda x: x[1])

# Create the visualization
fig, ax = plt.subplots(figsize=(12, 6))

# Define colors for machines
machine_colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99']

# Plot each job's timeline
for job_id in range(num_jobs):
y_pos = job_id
for machine, start, end in job_timelines[job_id]:
width = end - start
rect = patches.Rectangle((start, y_pos-0.4), width, 0.8,
edgecolor='black', facecolor=machine_colors[machine], alpha=0.7)
ax.add_patch(rect)

# Add machine label in the center of the rectangle
ax.text(start + width/2, y_pos, f'M{machine}',
ha='center', va='center', fontweight='bold')

# Set plot limits and labels
ax.set_xlim(0, max(job_completion_times) + 1)
ax.set_ylim(-0.5, num_jobs - 0.5)
ax.set_yticks(range(num_jobs))
ax.set_yticklabels([f'Job {i}' for i in range(num_jobs)])
ax.set_xlabel('Time')
ax.set_title('Job Completion Timeline')

# Add gridlines
ax.grid(True, axis='x', linestyle='--', alpha=0.7)

plt.tight_layout()
return fig

# Create the job completion visualization
job_completion_fig = visualize_job_completion(schedule, job_completion_times)

# Compare with another scheduling rule: Longest Processing Time (LPT)
def apply_lpt_scheduling():
# Similar implementation to SPT but sorting by longest processing time
machine_available_time = [0] * num_machines
job_completion = [[0] * num_machines for _ in range(num_jobs)]
job_current_step = [0] * num_jobs
job_completion_time = [0] * num_jobs

schedule = defaultdict(list)

while min(job_current_step) < num_machines:
eligible_ops = [(j, job_current_step[j]) for j in range(num_jobs) if job_current_step[j] < num_machines]

if not eligible_ops:
break

op_details = []
for job, step in eligible_ops:
machine = machine_orders[job][step]
prev_step_completion = 0 if step == 0 else job_completion[job][step-1]
possible_start_time = max(machine_available_time[machine], prev_step_completion)
processing_time = processing_times[job][machine]
op_details.append((job, machine, step, possible_start_time, processing_time))

# Sort by negative processing time (LPT)
op_details.sort(key=lambda x: -(x[4]))

job, machine, step, start_time, process_time = op_details[0]
end_time = start_time + process_time

machine_available_time[machine] = end_time
job_completion[job][step] = end_time
job_current_step[job] += 1

if job_current_step[job] == num_machines:
job_completion_time[job] = end_time

schedule[machine].append((job, start_time, end_time))

makespan = max(job_completion_time)
return schedule, makespan, job_completion_time

# Apply LPT scheduling
lpt_schedule, lpt_makespan, lpt_job_completion_times = apply_lpt_scheduling()

# Compare results
print("\nComparison of Scheduling Rules:")
print(f"SPT Makespan: {makespan}")
print(f"LPT Makespan: {lpt_makespan}")

# Plot comparison
def plot_comparison(spt_makespan, lpt_makespan):
labels = ['SPT', 'LPT']
values = [spt_makespan, lpt_makespan]

fig, ax = plt.subplots(figsize=(8, 5))
bars = ax.bar(labels, values, color=['#66b3ff', '#ff9999'])

ax.set_ylabel('Makespan')
ax.set_title('Comparison of Scheduling Rules')

# Add value labels on top of bars
for bar in bars:
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width()/2., height + 0.1,
f'{height}', ha='center', va='bottom')

ax.grid(axis='y', linestyle='--', alpha=0.7)

plt.tight_layout()
return fig

comparison_fig = plot_comparison(makespan, lpt_makespan)

plt.show()

Explanation of the Solution

Problem Setup

I’ve created a job shop scheduling problem with 5 jobs and 4 machines. Each job has:

  1. A specific processing time for each machine
  2. A required sequence through the machines

The mathematical objective is to minimize the makespan (Cmax), which is the time when all jobs are completed:

Cmax=maxjJobsCj

Algorithm Implementation

The solution uses two common scheduling heuristics:

  1. Shortest Processing Time (SPT): Prioritizes operations with shorter processing times
  2. Longest Processing Time (LPT): Prioritizes operations with longer processing times

For each scheduling rule, the algorithm:

  1. Tracks machine availability times
  2. Tracks job completion status
  3. Identifies eligible operations (next steps for jobs)
  4. Selects operations based on the priority rule
  5. Updates the schedule until all jobs are complete

Visualization

The code creates three visualizations:

  1. Machine-oriented Gantt chart: Shows when each machine is processing which job
    • X-axis: Time
    • Y-axis: Machines
    • Colors: Different jobs

  1. Job-oriented Gantt chart: Shows the progress of each job through the machines
    • X-axis: Time
    • Y-axis: Jobs
    • Colors: Different machines

  1. Comparison chart: Compares the makespan results of SPT vs LPT scheduling rules

Results Analysis

[Output]

Processing Times:
       Machine 0  Machine 1  Machine 2  Machine 3
Job 0          7          4          8          5
Job 1          7          3          7          8
Job 2          5          4          8          8
Job 3          3          6          5          2
Job 4          8          6          2          5

Machine Orders:
Job 0: Machine 0 -> Machine 3 -> Machine 2 -> Machine 1
Job 1: Machine 0 -> Machine 1 -> Machine 3 -> Machine 2
Job 2: Machine 2 -> Machine 0 -> Machine 3 -> Machine 1
Job 3: Machine 0 -> Machine 3 -> Machine 1 -> Machine 2
Job 4: Machine 3 -> Machine 2 -> Machine 0 -> Machine 1

Schedule Results:
Makespan: 40
Job Completion Times: [np.int64(29), np.int64(40), np.int64(33), np.int64(17), np.int64(39)]

Comparison of Scheduling Rules:
SPT Makespan: 40

When you run this code, you’ll see:

  • The problem data (processing times and machine orders)
  • The resulting schedule and makespan for each scheduling rule
  • Visualizations that make it easy to understand the schedules

The SPT rule often leads to better overall throughput in many scheduling scenarios, but the LPT rule can sometimes perform better when there are significant differences in processing times.

The comparison chart clearly shows which rule performs better for the specific problem instance.

Key Insights

This implementation demonstrates several important scheduling concepts:

  • Priority rule-based scheduling heuristics
  • Resource constraints (machines can only process one job at a time)
  • Precedence constraints (operations within a job must follow a specific order)
  • Visualization techniques for schedule analysis

This type of scheduling problem appears in manufacturing, project management, computing resource allocation, and many other practical domains.