Processing math: 100%

The Set Cover Problem

A Practical Example with Python Implementation

Today, I want to dive into one of the classic problems in computational complexity theory and optimization - the Set Cover Problem.

It’s a fascinating challenge that has applications in various fields, from resource allocation to facility location planning.

What is the Set Cover Problem?

The Set Cover Problem asks: Given a universe U of elements and a collection S of sets whose union equals the universe, what is the minimum number of sets from S needed to cover all elements in U?

Mathematically, we aim to:

  • Minimize sS1

  • Subject to sSs=U

  • Where SS

This is known to be NP-hard, but we can solve it using approximation algorithms.

A Concrete Example

Let’s consider a practical scenario: Imagine we have several radio stations, each covering specific cities.
We want to select the minimum number of stations that cover all cities.

Universe U = {City1, City2, City3, City4, City5, City6, City7, City8, City9, City10}

Collection of sets S = {

  • Station1: {City1, City2, City3}
  • Station2: {City2, City4, City5, City6}
  • Station3: {City3, City6, City7}
  • Station4: {City4, City8, City9}
  • Station5: {City5, City7, City9, City10}
  • Station6: {City1, City10}
    }

Let’s solve this using the greedy approximation algorithm in Python.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
from matplotlib.colors import to_rgba

def greedy_set_cover(universe, sets, costs=None):
"""
Greedy algorithm for the set cover problem.

Args:
universe: Set of elements to cover
sets: Dictionary mapping set_id to the set of elements
costs: Dictionary mapping set_id to the cost (default: all 1)

Returns:
List of set_ids that form the cover and the total cost
"""
if costs is None:
costs = {set_id: 1 for set_id in sets}

# Make a copy of the universe and initialize the cover
universe_copy = set(universe)
cover = []
total_cost = 0

# While there are uncovered elements
while universe_copy:
# Find the set with the best cost-effectiveness
best_set = None
best_cost_effectiveness = 0

for set_id, elements in sets.items():
# Skip if already in the cover
if set_id in cover:
continue

# Calculate new elements covered
new_elements = len(universe_copy & elements)
if new_elements == 0:
continue

# Calculate cost effectiveness (elements covered per unit cost)
cost_effectiveness = new_elements / costs[set_id]

if best_set is None or cost_effectiveness > best_cost_effectiveness:
best_set = set_id
best_cost_effectiveness = cost_effectiveness

# If no set covers any remaining elements, break
if best_set is None:
break

# Add the best set to the cover
cover.append(best_set)
total_cost += costs[best_set]

# Remove covered elements
universe_copy -= sets[best_set]

return cover, total_cost

def visualize_set_cover(universe, sets, cover):
"""
Visualize the set cover problem and solution using a bipartite graph.

Args:
universe: Set of elements
sets: Dictionary mapping set_id to the set of elements
cover: List of set_ids that form the cover
"""
G = nx.Graph()

# Add nodes
for element in universe:
G.add_node(element, bipartite=0) # 0 for elements

for set_id in sets:
G.add_node(set_id, bipartite=1) # 1 for sets

# Add edges
for set_id, elements in sets.items():
for element in elements:
G.add_edge(set_id, element)

# Create layout
pos = nx.bipartite_layout(G, [node for node, attr in G.nodes(data=True) if attr['bipartite'] == 0])

# Set up colors
cover_color = 'red'
non_cover_color = 'lightblue'
element_color = 'lightgreen'

# Create node colors
node_colors = []
for node in G.nodes():
if node in cover:
node_colors.append(cover_color)
elif node in universe:
node_colors.append(element_color)
else:
node_colors.append(non_cover_color)

# Create edge colors
edge_colors = []
for u, v in G.edges():
if u in cover or v in cover:
edge_colors.append(cover_color)
else:
edge_colors.append('gray')

# Draw the graph
plt.figure(figsize=(12, 8))
nx.draw(G, pos,
with_labels=True,
node_color=node_colors,
edge_color=edge_colors,
node_size=700,
font_weight='bold',
width=1.5)

# Create a custom legend
from matplotlib.patches import Patch
legend_elements = [
Patch(facecolor=cover_color, edgecolor='black', label='Sets in Cover'),
Patch(facecolor=non_cover_color, edgecolor='black', label='Sets not in Cover'),
Patch(facecolor=element_color, edgecolor='black', label='Elements to Cover')
]
plt.legend(handles=legend_elements, loc='upper right')

plt.title('Set Cover Visualization')
plt.tight_layout()
plt.show()

def visualize_set_coverage_steps(universe, sets, cover):
"""
Visualize the progressive coverage of elements by the sets in the cover.

Args:
universe: Set of elements
sets: Dictionary mapping set_id to the set of elements
cover: List of set_ids that form the cover (ordered by selection)
"""
# Create a new figure
plt.figure(figsize=(12, 6))

# Convert universe to a list for consistent ordering
universe_list = list(universe)
universe_list.sort() # Sort for better visualization

# Create a matrix to show coverage
# Rows are steps (adding each set)
# Columns are elements
coverage_matrix = np.zeros((len(cover) + 1, len(universe_list)))

# First row shows no coverage
covered_elements = set()

# For each step (adding a set to the cover)
for i, set_id in enumerate(cover):
# Add newly covered elements
covered_elements.update(sets[set_id])
# Mark covered elements in the matrix
for j, element in enumerate(universe_list):
if element in covered_elements:
coverage_matrix[i+1, j] = 1

# Plot as a heatmap
plt.imshow(coverage_matrix, aspect='auto', cmap='Blues')

# Add labels
plt.yticks(np.arange(len(cover) + 1),
['Initial'] + [f'Add {set_id}' for set_id in cover])
plt.xticks(np.arange(len(universe_list)), universe_list, rotation=45)

# Add grid
plt.grid(False)
for i in range(len(cover) + 1):
plt.axhline(i - 0.5, color='gray', linestyle='-', linewidth=0.5)
for j in range(len(universe_list)):
plt.axvline(j - 0.5, color='gray', linestyle='-', linewidth=0.5)

# Add coverage percentage for each step
for i in range(len(cover) + 1):
coverage_pct = np.sum(coverage_matrix[i]) / len(universe_list) * 100
plt.text(len(universe_list) - 0.5, i, f'{coverage_pct:.1f}%',
verticalalignment='center')

plt.colorbar(label='Covered (1) / Not Covered (0)')
plt.title('Progressive Element Coverage')
plt.tight_layout()
plt.show()

# Define our problem
universe = {'City1', 'City2', 'City3', 'City4', 'City5',
'City6', 'City7', 'City8', 'City9', 'City10'}

sets = {
'Station1': {'City1', 'City2', 'City3'},
'Station2': {'City2', 'City4', 'City5', 'City6'},
'Station3': {'City3', 'City6', 'City7'},
'Station4': {'City4', 'City8', 'City9'},
'Station5': {'City5', 'City7', 'City9', 'City10'},
'Station6': {'City1', 'City10'}
}

# Solve using the greedy algorithm
cover, total_cost = greedy_set_cover(universe, sets)
print(f"Optimal cover: {cover}")
print(f"Total cost: {total_cost}")
print(f"Number of sets used: {len(cover)}")

# Verify the cover is valid
covered = set()
for set_id in cover:
covered.update(sets[set_id])

print(f"All elements covered: {covered == universe}")

# Visualize the solution
visualize_set_cover(universe, sets, cover)
visualize_set_coverage_steps(universe, sets, cover)

Code Explanation

Let’s break down the key components of the solution:

1. The Greedy Set Cover Algorithm

The core of our solution is the greedy_set_cover function.
This implements a well-known approximation algorithm with the following approach:

  1. At each step, select the set that covers the most uncovered elements per unit cost
  2. Add this set to our solution
  3. Remove the newly covered elements from our “yet to cover” list
  4. Repeat until all elements are covered

This greedy approach is guaranteed to produce a solution that’s within a logarithmic factor of the optimal solution.

The algorithm keeps track of the remaining elements to cover (universe_copy) and selects sets based on a “cost-effectiveness” metric - the number of new elements covered divided by the cost of the set.
In our example, all sets have a uniform cost of 1, but the code supports variable costs.

2. Visualization Functions

I’ve included two visualization functions to help understand the problem and solution:

visualize_set_cover

This function creates a bipartite graph representation of the problem:

  • One set of nodes represents the elements (cities)
  • Another set represents the sets (radio stations)
  • Edges connect each set to the elements it contains
  • Sets in the solution are highlighted in red

The function uses NetworkX to create and draw the graph, with color coding to distinguish between elements, sets in the cover, and sets not in the cover.

visualize_set_coverage_steps

This function shows how the coverage progresses as we add each set:

  • Each row represents a step in the solution
  • Each column represents an element
  • Cells are colored based on whether the element is covered at that step
  • The percentage of covered elements is shown for each step

Results and Analysis

When we run the code on our radio station example, we get:

1
2
3
4
Optimal cover: ['Station2', 'Station5', 'Station4', 'Station1']
Total cost: 4
Number of sets used: 4
All elements covered: True

The greedy algorithm selected 4 stations out of the 6 available:

  1. Station2 (covers City2, City4, City5, City6)
  2. Station5 (covers City5, City7, City9, City10)
  3. Station4 (covers City4, City8, City9)
  4. Station1 (covers City1, City2, City3)

The visualizations clearly show how these stations collectively cover all cities.
The coverage step chart shows how we progressively increase coverage as each station is added to our solution.


Computational Complexity

The time complexity of the greedy algorithm is O(|S| × |U|), where |S| is the number of sets and |U| is the size of the universe.
This is because in each iteration, we need to check all remaining sets against all remaining elements.

While this is not guaranteed to find the absolute optimal solution (which would require solving an NP-hard problem), the greedy approach provides a good approximation that works well in practice.

Conclusion

The Set Cover Problem is a fantastic example of how approximation algorithms can provide practical solutions to otherwise intractable problems.

Our Python implementation demonstrates not only how to solve such problems programmatically but also how to visualize the results to gain deeper insights.

Next time you’re planning radio coverage, facility locations, or any resource allocation problem, remember that these classical algorithms can save you significant time and resources!

Feel free to experiment with different problem instances or extend the code to handle specific constraints in your own applications.

Realistic Political Redistricting Problem

Solving a Realistic Political Redistricting Problem with Python

Today, I want to dive into one of the most fascinating yet complex problems in political geography: redistricting.

This computational problem has real-world implications for democratic representation, and Python provides us with powerful tools to model and solve it.

Understanding the Redistricting Problem

Redistricting involves dividing a geographic area into electoral districts, aiming to create fair representation while satisfying various constraints like population equality, compactness, and sometimes considerations of communities of interest.

Mathematically, we can express the goal of creating districts with roughly equal populations as:

Where P(Di) represents the population of district i.

A Realistic Scenario

Let’s tackle a scenario where we need to divide a state with 100 precincts into 5 congressional districts. We’ll try to:

  1. Balance population across districts
  2. Ensure geographic contiguity
  3. Optimize for compactness
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
import random
from matplotlib.colors import ListedColormap
from scipy.spatial import Voronoi

# Set random seed for reproducibility
np.random.seed(42)
random.seed(42)

# Generate synthetic data for our state
num_precincts = 100
num_districts = 5

# Create precinct locations and populations
x_coords = np.random.rand(num_precincts) * 100
y_coords = np.random.rand(num_precincts) * 100
positions = np.column_stack((x_coords, y_coords))

# Generate population data with some variation
base_population = 10000
population_variation = 3000
populations = np.random.randint(
base_population - population_variation,
base_population + population_variation,
num_precincts
)

# Create a graph where precincts are nodes connected to nearby precincts
G = nx.Graph()

# Add nodes with attributes
for i in range(num_precincts):
G.add_node(i, pos=(x_coords[i], y_coords[i]), population=populations[i])

# Connect precincts that are close to each other
distance_threshold = 15 # Adjust based on your coordinate scale
for i in range(num_precincts):
for j in range(i+1, num_precincts):
dist = np.sqrt((x_coords[i] - x_coords[j])**2 + (y_coords[i] - y_coords[j])**2)
if dist < distance_threshold:
G.add_edge(i, j)

# Check if the graph is connected
if not nx.is_connected(G):
largest_cc = max(nx.connected_components(G), key=len)
if len(largest_cc) < num_precincts:
print(f"Warning: Generated graph is not connected. Using largest component with {len(largest_cc)} nodes.")
G = G.subgraph(largest_cc).copy()
# Adjust our precinct count
num_precincts = len(G.nodes)

# Helper function to check if a district is contiguous
def is_contiguous(G, district_nodes):
if not district_nodes:
return True
subgraph = G.subgraph(district_nodes)
return nx.is_connected(subgraph)

# Helper function to calculate district population
def district_population(district, populations):
return sum(populations[node] for node in district)

# Initialize districts using region growing approach
def initialize_districts(G, num_districts, populations):
nodes = list(G.nodes())

# Start with seed nodes far apart
seed_indices = np.linspace(0, len(nodes)-1, num_districts, dtype=int)
seeds = [nodes[i] for i in seed_indices]

# Initialize districts with seed nodes
districts = [set([seed]) for seed in seeds]
unassigned = set(nodes) - set(seeds)

# Grow districts by adding adjacent nodes one at a time
while unassigned:
for d_idx, district in enumerate(districts):
if not unassigned:
break

# Find unassigned nodes adjacent to this district
frontier = set()
for node in district:
frontier.update(n for n in G.neighbors(node) if n in unassigned)

if frontier:
# Add one node to the district
new_node = random.choice(list(frontier))
district.add(new_node)
unassigned.remove(new_node)

return districts

# Score function to evaluate a districting plan
def score_plan(districts, populations):
district_pops = [district_population(d, populations) for d in districts]

# Population balance score (lower is better)
avg_pop = sum(district_pops) / len(district_pops)
pop_deviation = max(abs(p - avg_pop) / avg_pop for p in district_pops)

return pop_deviation

# Local search algorithm to improve the districting plan
def optimize_districts(G, initial_districts, populations, iterations=1000):
current_districts = [set(d) for d in initial_districts]
current_score = score_plan(current_districts, populations)

best_districts = [set(d) for d in current_districts]
best_score = current_score

for i in range(iterations):
# Select a random border precinct
moved = False

# Try to move a random node from one district to another
for source_idx in random.sample(range(len(current_districts)), len(current_districts)):
if moved:
break

source_district = current_districts[source_idx]

# Find border nodes (nodes with neighbors in other districts)
border_nodes = set()
for node in source_district:
for neighbor in G.neighbors(node):
if any(neighbor in d for d in current_districts if d != source_district):
border_nodes.add(node)
break

if not border_nodes:
continue

# Try to move a random border node
node_to_move = random.choice(list(border_nodes))

# Find neighboring districts
target_districts = []
for d_idx, district in enumerate(current_districts):
if d_idx != source_idx:
for neighbor in G.neighbors(node_to_move):
if neighbor in district:
target_districts.append(d_idx)
break

if not target_districts:
continue

target_idx = random.choice(target_districts)

# Check if removing the node keeps the source district contiguous
temp_source = source_district.copy()
temp_source.remove(node_to_move)

if not is_contiguous(G, temp_source):
continue

# Make the move
new_districts = [set(d) for d in current_districts]
new_districts[source_idx].remove(node_to_move)
new_districts[target_idx].add(node_to_move)

# Evaluate the new plan
new_score = score_plan(new_districts, populations)

# Accept if it's better
if new_score < current_score:
current_districts = new_districts
current_score = new_score
moved = True

# Update best if it's the best so far
if new_score < best_score:
best_districts = [set(d) for d in new_districts]
best_score = new_score
print(f"Iteration {i}: New best score = {best_score:.4f}")

return best_districts, best_score

# Run the algorithm
print("Initializing districts...")
initial_districts = initialize_districts(G, num_districts, populations)
print("Optimizing districts...")
optimized_districts, final_score = optimize_districts(G, initial_districts, populations)

# Calculate district populations
district_pops = [district_population(d, populations) for d in optimized_districts]
total_pop = sum(populations)
district_pop_percentages = [p/total_pop*100 for p in district_pops]

# Prepare data for visualization
node_colors = np.zeros(num_precincts)
for i, district in enumerate(optimized_districts):
for node in district:
node_colors[node] = i

# Get positions for drawing
pos = nx.get_node_attributes(G, 'pos')

# Set up a colorful visualization
plt.figure(figsize=(15, 10))

# Plot the graph with district colors
cmap = plt.cm.get_cmap('tab10', num_districts)
nx.draw(G, pos, node_color=node_colors, cmap=cmap, node_size=100, with_labels=False, width=0.5)

# Add a title and legend
plt.title('Optimized Redistricting Plan', fontsize=16)
patches = [plt.plot([],[], marker="o", ms=10, ls="", mec=None, color=cmap(i),
label=f"District {i+1}: {district_pops[i]} people ({district_pop_percentages[i]:.1f}%)"
)[0] for i in range(num_districts)]
plt.legend(handles=patches, loc='upper right', fontsize=12)

plt.savefig('redistricting_plan.png', dpi=300, bbox_inches='tight')
plt.show()

# Create a population distribution bar chart
plt.figure(figsize=(12, 6))
bars = plt.bar(range(1, num_districts+1), district_pops, color=cmap(range(num_districts)))
plt.axhline(y=total_pop/num_districts, color='r', linestyle='--', label='Ideal Population')

# Add labels and annotations
plt.xlabel('District', fontsize=14)
plt.ylabel('Population', fontsize=14)
plt.title('Population Distribution by District', fontsize=16)
plt.xticks(range(1, num_districts+1))

# Add population values on top of bars
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height + 5000,
f'{int(height)}',
ha='center', va='bottom', fontsize=12)

plt.legend()
plt.grid(axis='y', alpha=0.3)
plt.savefig('district_populations.png', dpi=300, bbox_inches='tight')
plt.show()

# Calculate and display metrics
print("\nRedistricting Results:")
print(f"Number of districts: {num_districts}")
print(f"Total population: {total_pop}")
print(f"Ideal district population: {total_pop / num_districts:.1f}")
print("\nDistrict populations:")
for i, pop in enumerate(district_pops):
deviation = (pop - total_pop/num_districts) / (total_pop/num_districts) * 100
print(f"District {i+1}: {pop} people (deviation: {deviation:+.2f}%)")

print(f"\nPopulation deviation score: {final_score:.4f}")
print(f"Maximum population deviation: {max(abs(p - total_pop/num_districts) / (total_pop/num_districts) * 100 for p in district_pops):.2f}%")

Code Explanation

Let’s walk through this code step by step:

Data Generation

First, we create synthetic data to model our redistricting problem:

  • 100 precincts with random geographic coordinates
  • Population assigned to each precinct (around 10,000 people with some variation)
  • A graph structure where nearby precincts are connected

The graph representation is crucial - precincts are nodes, and edges represent geographic adjacency.
This helps us enforce contiguity constraints.

Districting Algorithm

Our approach consists of three main components:

  1. Initialization: We use a region-growing method that:

    • Selects seed precincts spaced across the map
    • Grows districts by gradually adding adjacent precincts
    • Ensures each district is geographically contiguous
  2. Scoring Function: We evaluate plans based on population equality:

    • Calculate population deviation from the ideal equal distribution
    • Lower scores indicate more balanced districts
  3. Optimization: We use a local search algorithm that:

    • Makes small changes by moving border precincts between districts
    • Only accepts changes that improve population balance
    • Ensures districts remain contiguous after each move
    • Runs for 1,000 iterations to find a good solution

The is_contiguous() function is particularly important as it ensures we don’t create fragmented districts.

Output

Initializing districts...
Optimizing districts...
Iteration 0: New best score = 0.3972
Iteration 1: New best score = 0.3822
Iteration 3: New best score = 0.3331
Iteration 8: New best score = 0.3270
Iteration 10: New best score = 0.3125
Iteration 15: New best score = 0.2905
Iteration 42: New best score = 0.2726
Iteration 45: New best score = 0.2364
Iteration 46: New best score = 0.2328
Iteration 98: New best score = 0.2312
Iteration 106: New best score = 0.2014

Visualization

We create two visualizations:

  1. A map showing the district boundaries with a color-coded legend

  2. A bar chart comparing populations across districts

Results Analysis

The algorithm produces 5 districts with these properties:

  1. Geographic Contiguity: Each district forms a single connected region, with no isolated enclaves.

  2. Population Balance: The algorithm minimizes population deviation between districts.
    The bar chart shows how close each district is to the ideal population (red dashed line).

  3. Compactness: Though not explicitly optimized for, the districts tend to be reasonably compact due to our graph construction.

The console output provides detailed statistics on:

  • Population of each district
  • Deviation from the ideal equal population
  • Overall population deviation score

Real-World Applications

This simplified model demonstrates key concepts in redistricting algorithms, but real-world applications would include additional considerations:

  1. Voting Rights Act Compliance: Ensuring minority communities have fair representation
  2. Preserving Communities of Interest: Keeping communities with shared concerns together
  3. Political Fairness Metrics: Analyzing partisan bias or competitiveness

Conclusion

Computational approaches to redistricting can help create more fair and balanced electoral maps.
While our algorithm focuses primarily on population equality and contiguity, it demonstrates how Python can be used to approach this complex political geography problem.

The visualization clearly shows how our algorithm divides the state into balanced districts.
Each color represents a different congressional district, with the legend indicating population counts and percentages.

Optimizing Sports League Schedules:A Python Approach to Minimizing Travel Distances

Solving a Realistic Sports Scheduling Problem with Python

Have you ever wondered how professional sports leagues create their game schedules?
It’s a fascinating optimization problem that combines mathematics and computer science.

Today, I’ll walk you through solving a realistic sports scheduling problem using Python.

The Problem: Regional Basketball League Scheduling

Let’s tackle scheduling for a regional basketball league with 8 teams. The constraints are:

  • Each team must play against every other team twice (home and away)
  • Games occur on weekends with 4 matches per weekend
  • No team should play more than one game per weekend
  • Travel distances should be minimized (teams located far from each other should have games consolidated)

This is a classic combinatorial optimization problem that appears simple but quickly becomes complex.

Let’s solve this using Python with the PuLP library for linear programming:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pulp import *
import networkx as nx
from itertools import combinations

# Define our teams with their locations (x,y coordinates)
teams = {
'Eagles': (10, 40),
'Sharks': (30, 10),
'Tigers': (90, 20),
'Hawks': (50, 50),
'Lions': (70, 80),
'Bears': (20, 70),
'Wolves': (60, 30),
'Panthers': (80, 60)
}

# Calculate distances between teams
distances = {}
for team1, loc1 in teams.items():
for team2, loc2 in teams.items():
if team1 != team2:
# Euclidean distance
dist = np.sqrt((loc1[0] - loc2[0])**2 + (loc1[1] - loc2[1])**2)
distances[(team1, team2)] = dist

# Total number of teams
n_teams = len(teams)

# Each team plays every other team twice (home and away)
n_games = n_teams * (n_teams - 1)

# Number of weekends needed (4 games per weekend, each team plays at most once)
n_weekends = n_games // 4

# Create the model
model = LpProblem("Sports_Scheduling", LpMinimize)

# Create variables
# game_vars[(i, j, w)] = 1 if team i plays against team j on weekend w
game_vars = {}
for i in teams:
for j in teams:
if i != j:
for w in range(n_weekends):
game_vars[(i, j, w)] = LpVariable(f"Game_{i}_{j}_{w}", 0, 1, LpBinary)

# Objective function: Minimize total travel distance
# We'll add a penalty for teams playing against each other on non-consecutive weekends
objective = lpSum([game_vars[(i, j, w)] * distances[(i, j)] for i in teams for j in teams for w in range(n_weekends) if i != j])
model += objective

# Constraints

# 1. Each team plays against every other team twice (once home, once away)
for i in teams:
for j in teams:
if i != j:
model += lpSum([game_vars[(i, j, w)] for w in range(n_weekends)]) == 1
model += lpSum([game_vars[(j, i, w)] for w in range(n_weekends)]) == 1

# 2. Each team plays at most one game per weekend
for w in range(n_weekends):
for i in teams:
model += lpSum([game_vars[(i, j, w)] + game_vars[(j, i, w)] for j in teams if i != j]) <= 1

# 3. Exactly 4 games per weekend
for w in range(n_weekends):
model += lpSum([game_vars[(i, j, w)] for i in teams for j in teams if i != j]) == 4

# Solve the model
model.solve(PULP_CBC_CMD(msg=False))

# Print the status of the solution
print("Status:", LpStatus[model.status])

# Organize the schedule
schedule = {w: [] for w in range(n_weekends)}
for i in teams:
for j in teams:
if i != j:
for w in range(n_weekends):
if game_vars[(i, j, w)].value() == 1:
schedule[w].append((i, j))

# Create a DataFrame for better visualization
schedule_df = pd.DataFrame(index=range(n_weekends), columns=['Game 1', 'Game 2', 'Game 3', 'Game 4'])
for w in range(n_weekends):
for g in range(4):
if g < len(schedule[w]):
home, away = schedule[w][g]
schedule_df.loc[w, f'Game {g+1}'] = f"{home} vs {away}"

# Print the schedule
print("\nOptimized Schedule:")
print(schedule_df)

# Visualize team locations and distances
plt.figure(figsize=(10, 8))
for team, (x, y) in teams.items():
plt.scatter(x, y, s=100)
plt.text(x+2, y+2, team, fontsize=12)

# Draw lines between teams that play on the same weekend
colors = ['red', 'blue', 'green', 'purple', 'orange', 'brown', 'pink', 'gray', 'olive', 'cyan', 'magenta', 'yellow']
for w in range(n_weekends):
for home, away in schedule[w]:
x1, y1 = teams[home]
x2, y2 = teams[away]
plt.plot([x1, x2], [y1, y2], color=colors[w % len(colors)], alpha=0.5, linewidth=2)

plt.title('Team Locations and Game Connections by Weekend')
plt.xlabel('X Coordinate')
plt.ylabel('Y Coordinate')
plt.grid(True)

# Create a legend for weekends
from matplotlib.lines import Line2D
legend_elements = [Line2D([0], [0], color=colors[w % len(colors)], lw=2, label=f'Weekend {w+1}')
for w in range(n_weekends)]
plt.legend(handles=legend_elements, loc='upper right')

plt.savefig('team_locations.png')
plt.show()

# Visualize the schedule as a heatmap to see when each team plays
team_schedule = pd.DataFrame(0, index=teams.keys(), columns=range(1, n_weekends+1))

for w in range(n_weekends):
for home, away in schedule[w]:
team_schedule.loc[home, w+1] = 1 # Home game
team_schedule.loc[away, w+1] = -1 # Away game

plt.figure(figsize=(12, 8))
sns.heatmap(team_schedule, cmap="coolwarm", center=0, linewidths=.5,
cbar_kws={'label': 'Game Type', 'ticks':[-1, 0, 1]})
plt.title('Team Schedule Heatmap (1: Home Game, -1: Away Game, 0: No Game)')
plt.xlabel('Weekend Number')
plt.ylabel('Team')

# Customize the colorbar
colorbar = plt.gcf().axes[1]
colorbar.set_yticklabels(['Away', 'No Game', 'Home'])

plt.tight_layout()
plt.savefig('schedule_heatmap.png')
plt.show()

# Calculate total distance traveled by each team
team_distances = {team: 0 for team in teams}
for w in range(n_weekends):
for home, away in schedule[w]:
team_distances[away] += distances[(away, home)]

# Plot total distance traveled
plt.figure(figsize=(10, 6))
plt.bar(team_distances.keys(), team_distances.values())
plt.title('Total Distance Traveled by Each Team')
plt.xlabel('Team')
plt.ylabel('Distance')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('travel_distances.png')
plt.show()

# Create a network graph to visualize the schedule
G = nx.DiGraph()
for team in teams:
G.add_node(team)

for w in range(n_weekends):
for home, away in schedule[w]:
G.add_edge(home, away, weekend=w+1)

plt.figure(figsize=(12, 10))
pos = {team: loc for team, loc in teams.items()}
nx.draw_networkx_nodes(G, pos, node_size=700, node_color='lightblue')
nx.draw_networkx_labels(G, pos, font_size=10)

# Draw edges with different colors for each weekend
for w in range(n_weekends):
edge_list = [(home, away) for home, away in G.edges() if G[home][away]['weekend'] == w+1]
nx.draw_networkx_edges(G, pos, edgelist=edge_list, edge_color=colors[w % len(colors)],
arrows=True, width=1.5, alpha=0.7)

# Add edge labels indicating the weekend number
edge_labels = {(home, away): f"Week {G[home][away]['weekend']}" for home, away in G.edges()}
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_size=8)

plt.title('Game Schedule Network')
plt.axis('off')
plt.tight_layout()
plt.savefig('schedule_network.png')
plt.show()

Code Explanation

Let’s break down the solution step by step:

1. Problem Setup

First, I define our 8 teams with their geographic locations using x-y coordinates.
This is realistic because in actual sports scheduling, the locations of teams are critical for minimizing travel distances.

1
2
3
4
5
6
teams = {
'Eagles': (10, 40),
'Sharks': (30, 10),
'Tigers': (90, 20),
...
}

Then I calculate the Euclidean distances between each pair of teams.
In a real-world scenario, these would be actual road distances or travel times.

2. Mathematical Model

This is where the optimization begins.
The problem is formulated as a linear programming model using PuLP:

1
model = LpProblem("Sports_Scheduling", LpMinimize)

The key decision variables are binary variables indicating whether team i plays against team j on weekend w:

1
game_vars[(i, j, w)] = LpVariable(f"Game_{i}_{j}_{w}", 0, 1, LpBinary)

3. Objective Function

Our goal is to minimize the total travel distance:

1
objective = lpSum([game_vars[(i, j, w)] * distances[(i, j)] for i in teams for j in teams for w in range(n_weekends) if i != j])

This can be represented mathematically as:

miniTeamsjTeams,jiwWeekendsgamei,j,w×distancei,j

4. Constraints

Several constraints ensure a valid schedule:

  1. Each team plays against every other team exactly once at home and once away:

wWeekendsgamei,j,w=1i,jTeams,ij

  1. Each team plays at most one game per weekend:

jTeams,ji(gamei,j,w+gamej,i,w)1iTeams,wWeekends

  1. Exactly 4 games per weekend:

iTeamsjTeams,jigamei,j,w=4wWeekends

5. Solution Extraction

After solving the model, I extract the schedule into a more readable format:

1
2
3
4
5
6
7
schedule = {w: [] for w in range(n_weekends)}
for i in teams:
for j in teams:
if i != j:
for w in range(n_weekends):
if game_vars[(i, j, w)].value() == 1:
schedule[w].append((i, j))

Output

Status: Optimal

Optimized Schedule:
                Game 1              Game 2             Game 3  \
0      Sharks vs Lions     Tigers vs Hawks  Bears vs Panthers   
1     Sharks vs Wolves  Tigers vs Panthers     Lions vs Hawks   
2   Sharks vs Panthers     Tigers vs Bears    Lions vs Eagles   
3     Eagles vs Sharks     Bears vs Tigers    Wolves vs Lions   
4     Eagles vs Wolves     Sharks vs Bears  Hawks vs Panthers   
5     Tigers vs Wolves     Hawks vs Eagles    Bears vs Sharks   
6      Eagles vs Bears     Hawks vs Sharks    Lions vs Wolves   
7      Eagles vs Hawks     Tigers vs Lions   Wolves vs Sharks   
8   Eagles vs Panthers    Tigers vs Sharks     Hawks vs Lions   
9      Hawks vs Tigers     Lions vs Sharks    Bears vs Wolves   
10    Sharks vs Eagles   Lions vs Panthers     Bears vs Hawks   
11     Eagles vs Lions    Sharks vs Tigers     Hawks vs Bears   
12    Tigers vs Eagles     Hawks vs Wolves     Lions vs Bears   
13    Eagles vs Tigers     Sharks vs Hawks     Bears vs Lions   

                Game 4  
0     Wolves vs Eagles  
1      Bears vs Eagles  
2      Wolves vs Hawks  
3    Panthers vs Hawks  
4      Lions vs Tigers  
5    Panthers vs Lions  
6   Panthers vs Tigers  
7    Panthers vs Bears  
8      Wolves vs Bears  
9   Panthers vs Eagles  
10    Wolves vs Tigers  
11  Panthers vs Wolves  
12  Panthers vs Sharks  
13  Wolves vs Panthers  

Visualization and Analysis

I’ve created several visualizations to help understand the optimized schedule:

1. Geographic Visualization

The first plot shows the geographic locations of teams with lines connecting teams that play on the same weekend.
Each color represents a different weekend:

This visualization is particularly useful for identifying whether games are geographically efficient - teams that are closer to each other should ideally play on the same weekends to minimize travel.

2. Schedule Heatmap

The heatmap provides a clear view of when each team plays, and whether they’re playing home (1) or away (-1):

This visualization helps identify whether the schedule is balanced.
Each team should have a mix of home and away games, and we can quickly see if any team has too many consecutive home or away games.

3. Travel Distance Analysis

This bar chart shows the total distance traveled by each team:

In an ideal schedule, these values should be relatively balanced to ensure fairness.
Wide variations might indicate scheduling inefficiencies.

4. Network Graph

Finally, the network graph shows the complete schedule with directed edges indicating who plays whom and when:

This graph helps identify the overall structure of the schedule and any potential patterns.

Conclusion

Sports scheduling is a complex optimization problem with numerous constraints.
The solution I’ve presented uses linear programming to find an optimal schedule that minimizes travel distances while satisfying all the logical constraints of a sports league.

In real-world applications, additional constraints might include:

  • Arena availability
  • TV broadcasting requirements
  • Avoiding conflicts with other major events
  • Balancing the distribution of strong opponents throughout the season

This approach can be extended to larger leagues with more complex constraints, though the computational complexity increases significantly with the number of teams and constraints.

Circuit Board Testing

A Comprehensive Python Analysis

In the world of electronics manufacturing, circuit board testing is crucial for ensuring product quality and reliability.

Today, I’ll walk you through a realistic Python implementation of circuit board defect analysis using statistical methods.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

# Simulated Circuit Board Test Data
np.random.seed(42)

class CircuitBoardTesting:
def __init__(self, num_boards=1000):
"""
Initialize circuit board testing simulation

Parameters:
- num_boards: Total number of circuit boards to simulate
"""
self.num_boards = num_boards
self.defect_rates = self._generate_defect_data()

def _generate_defect_data(self):
"""
Generate simulated defect data with realistic variations

Defect types:
1. Solder joint issues
2. Component misalignment
3. Trace discontinuity
4. Electrical short
"""
defect_types = [
'Solder Joint Issues',
'Component Misalignment',
'Trace Discontinuity',
'Electrical Short'
]

# Realistic defect rate generation
defect_rates = {
defect: np.random.normal(loc=0.02, scale=0.005, size=self.num_boards)
for defect in defect_types
}

return defect_rates

def calculate_statistics(self):
"""
Calculate statistical metrics for defect analysis

Returns:
- DataFrame with defect statistics
"""
stats_data = {}
for defect, rates in self.defect_rates.items():
# Corrected confidence interval calculation
confidence_interval = stats.t.interval(
confidence=0.95, # Added 'confidence' parameter
df=len(rates)-1,
loc=np.mean(rates),
scale=stats.sem(rates)
)

stats_data[defect] = {
'Mean': np.mean(rates),
'Std Dev': np.std(rates),
'Min': np.min(rates),
'Max': np.max(rates),
'Confidence Interval': confidence_interval
}

return pd.DataFrame.from_dict(stats_data, orient='index')

def plot_defect_distribution(self):
"""
Visualize defect rate distributions
"""
plt.figure(figsize=(12, 6))

for i, (defect, rates) in enumerate(self.defect_rates.items(), 1):
plt.subplot(2, 2, i)
plt.hist(rates, bins=30, alpha=0.7, color=f'C{i-1}')
plt.title(f'{defect} Defect Rate')
plt.xlabel('Defect Rate')
plt.ylabel('Frequency')

plt.tight_layout()
plt.show()

# Perform analysis
testing = CircuitBoardTesting(num_boards=1000)
defect_stats = testing.calculate_statistics()
print("Defect Rate Statistics:")
print(defect_stats)

# Visualize defect distributions
testing.plot_defect_distribution()

Detailed Code Explanation

Simulation Approach

  1. We create a CircuitBoardTesting class to simulate circuit board defect analysis.
  2. The simulation uses numpy to generate realistic defect rates with normal distribution.

Statistical Analysis

Key statistical methods include:

  • Mean defect rate calculation
  • Standard deviation measurement
  • 95% confidence interval determination

Mathematical Representation

The defect rate follows a normal distribution with:

  • μ=0.02 (mean defect rate)
  • σ=0.005 (standard deviation)

Key Visualization Insights

The generated plots will show:

  • Distribution of defect rates for each defect type
  • Frequency of occurrence
  • Variation in defect rates

Result

Defect Rate Statistics:
                            Mean   Std Dev       Min       Max  \
Solder Joint Issues     0.020097  0.004894  0.003794  0.039264   
Component Misalignment  0.020354  0.004985  0.005298  0.035966   
Trace Discontinuity     0.020029  0.004915  0.004902  0.039631   
Electrical Short        0.019906  0.005133  0.005353  0.036215   

                                                 Confidence Interval  
Solder Joint Issues       (0.01979283559301649, 0.02040048496520677)  
Component Misalignment   (0.02004469759714213, 0.020663664775349432)  
Trace Discontinuity     (0.019724031341496525, 0.020334310804063716)  
Electrical Short        (0.019587711981175548, 0.020225095840956723)  

Interpretation of Results

By analyzing the output, we can:

  • Identify potential manufacturing issues
  • Understand defect rate variability
  • Make data-driven quality improvement decisions

The simulation provides a realistic framework for circuit board testing analysis, demonstrating how statistical methods can be applied to manufacturing quality control.

Optimal Task Assignment Optimization and the Hungarian Algorithm

I’ll solve a Task Assignment problem using Python, focusing on optimization and visualization.

I’ll use the Hungarian algorithm (also known as the Munkres algorithm) to solve the optimal assignment problem.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import numpy as np
import scipy.optimize as optimize
import matplotlib.pyplot as plt
import seaborn as sns

# Cost matrix representing the efficiency of employees for different tasks
# Lower values indicate better performance
cost_matrix = np.array([
[10, 15, 9, 12], # Employee 1's performance on tasks
[14, 8, 11, 13], # Employee 2's performance on tasks
[12, 16, 10, 14], # Employee 3's performance on tasks
[9, 13, 15, 11] # Employee 4's performance on tasks
])

# Solving the assignment problem
row_ind, col_ind = optimize.linear_sum_assignment(cost_matrix)

# Calculating total optimization cost
total_cost = cost_matrix[row_ind, col_ind].sum()

# Detailed assignment results
assignments = [(f"Employee {r+1}", f"Task {c+1}", cost_matrix[r, c])
for r, c in zip(row_ind, col_ind)]

# Visualization of the cost matrix and optimal assignment
plt.figure(figsize=(12, 5))

# Heatmap of the cost matrix
plt.subplot(1, 2, 1)
sns.heatmap(cost_matrix, annot=True, cmap='YlGnBu',
xticklabels=[f'Task {i+1}' for i in range(cost_matrix.shape[1])],
yticklabels=[f'Employee {i+1}' for i in range(cost_matrix.shape[0])])
plt.title('Task Assignment Cost Matrix')

# Bar plot of optimal assignments
plt.subplot(1, 2, 2)
assigned_costs = [cost for _, _, cost in assignments]
plt.bar([f'Employee {i+1} - Task {j+1}' for i, j in zip(row_ind, col_ind)],
assigned_costs, color='skyblue')
plt.title('Optimal Task Assignments')
plt.xticks(rotation=45)
plt.tight_layout()

# Print results
print("Optimal Task Assignments:")
for emp, task, cost in assignments:
print(f"{emp} assigned to {task} with cost {cost}")
print(f"\nTotal Optimization Cost: {total_cost}")

plt.show()

Let me break down the solution with mathematical notation and explanation:

Mathematical Formulation:

  • Let Cij represent the cost matrix where i is the employee and j is the task
  • Goal: Minimize ni=1mj=1xijCij
  • Subject to constraints:
    mj=1xij=1 (each employee assigned to one task)
    ni=1xij=1 (each task assigned to one employee)

Key Components:

  1. scipy.optimize.linear_sum_assignment(): Implements the Hungarian algorithm
  2. Uses a cost matrix representing task efficiency
  3. Finds the optimal assignment that minimizes total cost

Visualization Insights:

  • First subplot: Heatmap of the cost matrix
    • Darker colors indicate lower costs (better performance)
  • Second subplot: Bar chart of optimal assignments
    • Shows the cost of each optimal task assignment

Output:

Optimal Task Assignments:
Employee 1 assigned to Task 1 with cost 10
Employee 2 assigned to Task 2 with cost 8
Employee 3 assigned to Task 3 with cost 10
Employee 4 assigned to Task 4 with cost 11

Total Optimization Cost: 39

Output Explanation:

  • Prints the optimal assignment for each employee
  • Calculates and displays the total optimization cost

This approach demonstrates:

  • Optimization problem solving
  • Matrix manipulation
  • Visualization techniques
  • Performance analysis

The solution helps organizations efficiently allocate tasks by minimizing overall resource expenditure.

Graph Coloring:US States Border Problem

I’ll solve a map coloring problem - this time focusing on political boundaries to demonstrate the graph coloring technique.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
import networkx as nx
import matplotlib.pyplot as plt

# Define state adjacency (neighboring states)
state_borders = {
'California': ['Oregon', 'Nevada', 'Arizona'],
'Oregon': ['California', 'Washington', 'Nevada'],
'Washington': ['Oregon', 'Idaho'],
'Nevada': ['California', 'Oregon', 'Arizona', 'Utah', 'Idaho'],
'Arizona': ['California', 'Nevada', 'Utah'],
'Utah': ['Nevada', 'Arizona', 'Colorado', 'New Mexico'],
'Idaho': ['Washington', 'Nevada', 'Oregon', 'Utah', 'Wyoming', 'Montana'],
'Colorado': ['Utah', 'Wyoming', 'New Mexico'],
'Wyoming': ['Idaho', 'Utah', 'Colorado', 'Montana', 'South Dakota', 'Nebraska'],
'Montana': ['Idaho', 'Wyoming', 'South Dakota', 'North Dakota'],
'New Mexico': ['Utah', 'Colorado', 'Arizona'],
'South Dakota': ['Wyoming', 'Montana', 'Nebraska'],
'Nebraska': ['Wyoming', 'South Dakota'],
'North Dakota': ['Montana']
}

# Create graph of state borders
def create_state_border_graph(state_borders):
G = nx.Graph()
for state, neighbors in state_borders.items():
for neighbor in neighbors:
G.add_edge(state, neighbor)
return G

# Implement graph coloring algorithm
def color_states(G):
# Sort nodes by degree (number of connections) in descending order
sorted_nodes = sorted(G.nodes(), key=lambda x: G.degree(x), reverse=True)

# Initialize color map
color_map = {}
colors = ['Red', 'Blue', 'Green', 'Yellow']

for node in sorted_nodes:
# Find the first available color
used_colors = set(color_map.get(neighbor) for neighbor in G.neighbors(node) if neighbor in color_map)

for color in colors:
if color not in used_colors:
color_map[node] = color
break

return color_map

# Create the border graph
border_graph = create_state_border_graph(state_borders)

# Color the states
state_colors = color_states(border_graph)

# Visualization
plt.figure(figsize=(12, 8))
pos = nx.spring_layout(border_graph, seed=42, k=0.9)
node_colors = [state_colors.get(node, 'gray') for node in border_graph.nodes()]

nx.draw(border_graph, pos, with_labels=True, node_color=node_colors,
node_size=3000, font_size=8, font_weight='bold',
edge_color='gray', width=1)
plt.title('US States Border Coloring Problem')
plt.tight_layout()
plt.show()

# Print the color assignments
print("State Color Assignments:")
for state, color in state_colors.items():
print(f"{state}: {color}")

# Verify no adjacent states have the same color
def verify_coloring(G, color_map):
for state in G.nodes():
for neighbor in G.neighbors(state):
if color_map[state] == color_map[neighbor]:
print(f"Error: {state} and {neighbor} have the same color!")
return False
print("Coloring is valid: No adjacent states share a color.")
return True

verify_coloring(border_graph, state_colors)

Problem Statement

We want to color a map of selected Western United States where:

  • No two adjacent states can have the same color
  • Use the minimum number of colors possible

Mathematical Representation

Let G=(V,E) be a graph where:

  • V represents states
  • E represents borders between states

The goal is to assign colors such that (u,v)E, color(u)color(v)

Algorithm Highlights

  1. Welsh-Powell Graph Coloring Algorithm
  2. Greedy approach to color assignment
  3. Minimizes the number of colors used

Key Visualization Elements

  • Nodes represent states
  • Edges represent state borders
  • Colors represent unique time slots or regions

Complexity Analysis

  • Time Complexity: O(V2)
  • Space Complexity: O(V+E)

The code includes a verification function to ensure no adjacent states share a color, demonstrating the algorithm’s correctness.

Result

State Color Assignments:
Idaho: Red
Utah: Blue
Wyoming: Green
Nevada: Green
Oregon: Blue
Arizona: Red
Montana: Blue
California: Yellow
Colorado: Red
New Mexico: Green
South Dakota: Red
Washington: Green
Nebraska: Blue
North Dakota: Red
Coloring is valid: No adjacent states share a color.
True

Advanced Register Allocation

Intelligent Graph Coloring for Compiler Optimization

I’ll create a register allocation example that simulates a more realistic compiler scenario with a larger interference graph and additional complexity.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np

class AdvancedRegisterAllocator:
def __init__(self, interference_graph):
"""
Advanced register allocation with more sophisticated handling

Args:
interference_graph (dict): A complex dictionary of variable conflicts
"""
self.graph = nx.Graph(interference_graph)
self.num_registers = 0
self.coloring = {}
self.spill_cost = {}

def calculate_spill_cost(self):
"""
Calculate spill costs for each node based on graph characteristics

Spill cost considers:
1. Node degree (number of conflicts)
2. Local loop frequency
3. Variable live range
"""
for node in self.graph.nodes():
# Degree-based cost
degree_cost = self.graph.degree(node)

# Simulated loop frequency (hypothetical additional complexity)
loop_frequency = np.random.uniform(1, 5)

# Live range approximation (simulated)
live_range = np.random.uniform(1, 10)

# Combined spill cost
self.spill_cost[node] = (degree_cost * loop_frequency) / live_range

def advanced_color(self, num_registers):
"""
Advanced graph coloring with spill prioritization

Args:
num_registers (int): Number of available registers

Returns:
dict: Mapping of variables to register numbers
"""
self.num_registers = num_registers
self.calculate_spill_cost()

# Sort nodes by spill cost (descending) and degree (descending)
nodes = sorted(
self.graph.nodes(),
key=lambda x: (self.spill_cost[x], self.graph.degree(x)),
reverse=True
)

spilled_variables = []

for node in nodes:
# Find used colors of neighboring nodes
used_colors = {self.coloring.get(neighbor) for neighbor in self.graph.neighbors(node) if neighbor in self.coloring}

# Find the lowest available color
color_found = False
for color in range(num_registers):
if color not in used_colors:
self.coloring[node] = color
color_found = True
break

# Track spilled variables
if not color_found:
spilled_variables.append((node, self.spill_cost[node]))

return {
'allocation': self.coloring,
'spilled': spilled_variables
}

def visualize_graph(self):
"""
Advanced graph visualization with spill information
"""
plt.figure(figsize=(15, 10))
pos = nx.spring_layout(self.graph, k=0.5) # Adjust layout

# Color mapping
color_map = plt.cm.get_cmap('tab20')
node_colors = [self.coloring.get(node, -1) for node in self.graph.nodes()]

# Draw graph
nx.draw(
self.graph,
pos,
with_labels=True,
node_color=[color_map(c/self.num_registers) if c != -1 else 'red' for c in node_colors],
node_size=1000,
font_size=10,
font_weight='bold',
edge_color='gray'
)

# Spill cost annotation
for node, (x, y) in pos.items():
plt.text(
x, y+0.1,
f'Spill Cost: {self.spill_cost[node]:.2f}',
horizontalalignment='center',
fontsize=8
)

plt.title('Advanced Interference Graph with Register Allocation')
plt.tight_layout()
plt.show()

# Simulate a more complex compiler scenario
# Larger interference graph with multiple conflict regions
interference_graph = {
'temp1': ['var_a', 'var_b', 'loop_counter'],
'temp2': ['var_c', 'var_d', 'array_index'],
'var_a': ['temp1', 'var_b', 'result'],
'var_b': ['temp1', 'var_a', 'loop_counter'],
'var_c': ['temp2', 'var_d', 'array_length'],
'var_d': ['temp2', 'var_c', 'result'],
'loop_counter': ['temp1', 'var_b', 'array_index'],
'array_index': ['temp2', 'loop_counter', 'array_length'],
'result': ['var_a', 'var_d', 'array_length'],
'array_length': ['var_c', 'array_index', 'result']
}

# Create advanced register allocator
np.random.seed(42) # For reproducibility
allocator = AdvancedRegisterAllocator(interference_graph)

# Attempt register allocation with different register constraints
print("Register Allocation Simulation:\n")

# Try with 4 registers
print("Scenario 1: 4 Registers")
result_4 = allocator.advanced_color(4)
print("Register Allocation:")
for var, reg in result_4['allocation'].items():
print(f"Variable {var}: Register {reg}")

print("\nSpilled Variables:")
for var, cost in result_4['spilled']:
print(f"Variable: {var}, Spill Cost: {cost:.2f}")

# Visualize the graph
allocator.visualize_graph()

Advanced Register Allocation Problem

Key Enhancements

  1. Complex Interference Graph

    • More variables with intricate conflict relationships
    • Simulates a real compiler scenario with multiple computation regions
  2. Spill Cost Calculation
    Mathematical model for spill cost:

    Spill Cost=DegreeLoop FrequencyLive Range

  3. Advanced Allocation Strategy

    • Prioritize variables based on spill cost
    • Handle variables that cannot be allocated to registers
    • Provide detailed allocation and spill information

Complexity Improvements

  • Dynamic spill cost calculation
  • Prioritization of variable allocation
  • Visualization of allocation and conflict regions

Scenario Breakdown

  1. Input: Complex interference graph with 10 variables
  2. Process:
    • Calculate individual spill costs
    • Attempt register allocation
    • Track unallocated (spilled) variables
  3. Output:
    • Register mapping
    • Spilled variable details
    • Graph visualization

Result

Register Allocation Simulation:

Scenario 1: 4 Registers
Register Allocation:
Variable loop_counter: Register 0
Variable array_length: Register 0
Variable var_a: Register 0
Variable array_index: Register 1
Variable temp2: Register 0
Variable var_c: Register 1
Variable result: Register 1
Variable temp1: Register 1
Variable var_b: Register 2
Variable var_d: Register 2

Practical Implications

  • Mimics real-world compiler register allocation challenges
  • Demonstrates handling of variable conflicts
  • Provides insights into optimization strategies

Job Shop Scheduling Optimization

Python Implementation with Visualization and Analysis

I’ll create a practical scheduling problem, solve it using Python, and visualize the results with graphs.

I’ll include mathematical formulas and provide a clear explanation of the code.

Scheduling Problem: Job Shop Scheduling

Let’s solve a job shop scheduling problem, which is a common scheduling challenge in manufacturing and operations research.

Problem Definition

We have:

  • 5 jobs to be processed
  • 4 machines
  • Each job must go through all machines in a specific order
  • Each job has different processing times on each machine
  • The goal is to minimize the total completion time (makespan)

Mathematically, we want to minimize:
Cmax=maxjJobsCj

Where Cj is the completion time of job j.

Solution Using Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import random
from collections import defaultdict

# Set seed for reproducibility
np.random.seed(42)

# Problem Setup
num_jobs = 5
num_machines = 4

# Generate processing times for each job on each machine
processing_times = np.random.randint(1, 10, size=(num_jobs, num_machines))

# Generate machine order for each job (each job must visit all machines in a specific order)
machine_orders = []
for _ in range(num_jobs):
order = list(range(num_machines))
random.shuffle(order)
machine_orders.append(order)

# Display the problem data
print("Processing Times:")
processing_df = pd.DataFrame(processing_times,
columns=[f"Machine {i}" for i in range(num_machines)],
index=[f"Job {i}" for i in range(num_jobs)])
print(processing_df)

print("\nMachine Orders:")
for i, order in enumerate(machine_orders):
print(f"Job {i}: {' -> '.join(['Machine ' + str(m) for m in order])}")

# Priority Rule-Based Scheduling: Shortest Processing Time (SPT)
def apply_spt_scheduling():
# Initialize machine and job status
machine_available_time = [0] * num_machines
job_completion = [[0] * num_machines for _ in range(num_jobs)]
job_current_step = [0] * num_jobs
job_completion_time = [0] * num_jobs

# Schedule tracking
schedule = defaultdict(list) # machine -> [(job, start_time, end_time), ...]

# Continue until all jobs are completed
while min(job_current_step) < num_machines:
# Find eligible operations (jobs that haven't completed all steps)
eligible_ops = [(j, job_current_step[j]) for j in range(num_jobs) if job_current_step[j] < num_machines]

if not eligible_ops:
break

# For each eligible operation, calculate when it can start
op_details = []
for job, step in eligible_ops:
machine = machine_orders[job][step]
prev_step_completion = 0 if step == 0 else job_completion[job][step-1]
possible_start_time = max(machine_available_time[machine], prev_step_completion)
processing_time = processing_times[job][machine]
op_details.append((job, machine, step, possible_start_time, processing_time))

# Sort by possible start time + processing time (SPT)
op_details.sort(key=lambda x: x[3] + x[4])

# Schedule the first operation
job, machine, step, start_time, process_time = op_details[0]
end_time = start_time + process_time

# Update tracking information
machine_available_time[machine] = end_time
job_completion[job][step] = end_time
job_current_step[job] += 1

if job_current_step[job] == num_machines:
job_completion_time[job] = end_time

# Record the scheduled operation
schedule[machine].append((job, start_time, end_time))

makespan = max(job_completion_time)
return schedule, makespan, job_completion_time

# Apply the SPT scheduling algorithm
schedule, makespan, job_completion_times = apply_spt_scheduling()

print("\nSchedule Results:")
print(f"Makespan: {makespan}")
print("Job Completion Times:", job_completion_times)

# Visualization
def visualize_schedule(schedule, makespan):
fig, ax = plt.subplots(figsize=(12, 6))

# Define colors for jobs
colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99', '#c2c2f0']

# Plot each machine's schedule
for machine_id in range(num_machines):
y_pos = machine_id
for job, start, end in schedule[machine_id]:
width = end - start
rect = patches.Rectangle((start, y_pos-0.4), width, 0.8,
edgecolor='black', facecolor=colors[job], alpha=0.7)
ax.add_patch(rect)

# Add job label in the center of the rectangle
ax.text(start + width/2, y_pos, f'J{job}',
ha='center', va='center', fontweight='bold')

# Set plot limits and labels
ax.set_xlim(0, makespan + 1)
ax.set_ylim(-0.5, num_machines - 0.5)
ax.set_yticks(range(num_machines))
ax.set_yticklabels([f'Machine {i}' for i in range(num_machines)])
ax.set_xlabel('Time')
ax.set_title('Job Shop Schedule')

# Add gridlines
ax.grid(True, axis='x', linestyle='--', alpha=0.7)

plt.tight_layout()
return fig

# Create the visualization
schedule_fig = visualize_schedule(schedule, makespan)

# Gantt chart for job completion
def visualize_job_completion(schedule, job_completion_times):
# Reconstruct job timelines
job_timelines = defaultdict(list)

for machine, operations in schedule.items():
for job, start, end in operations:
job_timelines[job].append((machine, start, end))

# Sort each job's operations by start time
for job in job_timelines:
job_timelines[job].sort(key=lambda x: x[1])

# Create the visualization
fig, ax = plt.subplots(figsize=(12, 6))

# Define colors for machines
machine_colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99']

# Plot each job's timeline
for job_id in range(num_jobs):
y_pos = job_id
for machine, start, end in job_timelines[job_id]:
width = end - start
rect = patches.Rectangle((start, y_pos-0.4), width, 0.8,
edgecolor='black', facecolor=machine_colors[machine], alpha=0.7)
ax.add_patch(rect)

# Add machine label in the center of the rectangle
ax.text(start + width/2, y_pos, f'M{machine}',
ha='center', va='center', fontweight='bold')

# Set plot limits and labels
ax.set_xlim(0, max(job_completion_times) + 1)
ax.set_ylim(-0.5, num_jobs - 0.5)
ax.set_yticks(range(num_jobs))
ax.set_yticklabels([f'Job {i}' for i in range(num_jobs)])
ax.set_xlabel('Time')
ax.set_title('Job Completion Timeline')

# Add gridlines
ax.grid(True, axis='x', linestyle='--', alpha=0.7)

plt.tight_layout()
return fig

# Create the job completion visualization
job_completion_fig = visualize_job_completion(schedule, job_completion_times)

# Compare with another scheduling rule: Longest Processing Time (LPT)
def apply_lpt_scheduling():
# Similar implementation to SPT but sorting by longest processing time
machine_available_time = [0] * num_machines
job_completion = [[0] * num_machines for _ in range(num_jobs)]
job_current_step = [0] * num_jobs
job_completion_time = [0] * num_jobs

schedule = defaultdict(list)

while min(job_current_step) < num_machines:
eligible_ops = [(j, job_current_step[j]) for j in range(num_jobs) if job_current_step[j] < num_machines]

if not eligible_ops:
break

op_details = []
for job, step in eligible_ops:
machine = machine_orders[job][step]
prev_step_completion = 0 if step == 0 else job_completion[job][step-1]
possible_start_time = max(machine_available_time[machine], prev_step_completion)
processing_time = processing_times[job][machine]
op_details.append((job, machine, step, possible_start_time, processing_time))

# Sort by negative processing time (LPT)
op_details.sort(key=lambda x: -(x[4]))

job, machine, step, start_time, process_time = op_details[0]
end_time = start_time + process_time

machine_available_time[machine] = end_time
job_completion[job][step] = end_time
job_current_step[job] += 1

if job_current_step[job] == num_machines:
job_completion_time[job] = end_time

schedule[machine].append((job, start_time, end_time))

makespan = max(job_completion_time)
return schedule, makespan, job_completion_time

# Apply LPT scheduling
lpt_schedule, lpt_makespan, lpt_job_completion_times = apply_lpt_scheduling()

# Compare results
print("\nComparison of Scheduling Rules:")
print(f"SPT Makespan: {makespan}")
print(f"LPT Makespan: {lpt_makespan}")

# Plot comparison
def plot_comparison(spt_makespan, lpt_makespan):
labels = ['SPT', 'LPT']
values = [spt_makespan, lpt_makespan]

fig, ax = plt.subplots(figsize=(8, 5))
bars = ax.bar(labels, values, color=['#66b3ff', '#ff9999'])

ax.set_ylabel('Makespan')
ax.set_title('Comparison of Scheduling Rules')

# Add value labels on top of bars
for bar in bars:
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width()/2., height + 0.1,
f'{height}', ha='center', va='bottom')

ax.grid(axis='y', linestyle='--', alpha=0.7)

plt.tight_layout()
return fig

comparison_fig = plot_comparison(makespan, lpt_makespan)

plt.show()

Explanation of the Solution

Problem Setup

I’ve created a job shop scheduling problem with 5 jobs and 4 machines. Each job has:

  1. A specific processing time for each machine
  2. A required sequence through the machines

The mathematical objective is to minimize the makespan (Cmax), which is the time when all jobs are completed:

Cmax=maxjJobsCj

Algorithm Implementation

The solution uses two common scheduling heuristics:

  1. Shortest Processing Time (SPT): Prioritizes operations with shorter processing times
  2. Longest Processing Time (LPT): Prioritizes operations with longer processing times

For each scheduling rule, the algorithm:

  1. Tracks machine availability times
  2. Tracks job completion status
  3. Identifies eligible operations (next steps for jobs)
  4. Selects operations based on the priority rule
  5. Updates the schedule until all jobs are complete

Visualization

The code creates three visualizations:

  1. Machine-oriented Gantt chart: Shows when each machine is processing which job
    • X-axis: Time
    • Y-axis: Machines
    • Colors: Different jobs

  1. Job-oriented Gantt chart: Shows the progress of each job through the machines
    • X-axis: Time
    • Y-axis: Jobs
    • Colors: Different machines

  1. Comparison chart: Compares the makespan results of SPT vs LPT scheduling rules

Results Analysis

[Output]

Processing Times:
       Machine 0  Machine 1  Machine 2  Machine 3
Job 0          7          4          8          5
Job 1          7          3          7          8
Job 2          5          4          8          8
Job 3          3          6          5          2
Job 4          8          6          2          5

Machine Orders:
Job 0: Machine 0 -> Machine 3 -> Machine 2 -> Machine 1
Job 1: Machine 0 -> Machine 1 -> Machine 3 -> Machine 2
Job 2: Machine 2 -> Machine 0 -> Machine 3 -> Machine 1
Job 3: Machine 0 -> Machine 3 -> Machine 1 -> Machine 2
Job 4: Machine 3 -> Machine 2 -> Machine 0 -> Machine 1

Schedule Results:
Makespan: 40
Job Completion Times: [np.int64(29), np.int64(40), np.int64(33), np.int64(17), np.int64(39)]

Comparison of Scheduling Rules:
SPT Makespan: 40

When you run this code, you’ll see:

  • The problem data (processing times and machine orders)
  • The resulting schedule and makespan for each scheduling rule
  • Visualizations that make it easy to understand the schedules

The SPT rule often leads to better overall throughput in many scheduling scenarios, but the LPT rule can sometimes perform better when there are significant differences in processing times.

The comparison chart clearly shows which rule performs better for the specific problem instance.

Key Insights

This implementation demonstrates several important scheduling concepts:

  • Priority rule-based scheduling heuristics
  • Resource constraints (machines can only process one job at a time)
  • Precedence constraints (operations within a job must follow a specific order)
  • Visualization techniques for schedule analysis

This type of scheduling problem appears in manufacturing, project management, computing resource allocation, and many other practical domains.

Graph Coloring Optimization

ILP vs. Heuristic Approaches

I’ll create a different discrete optimization example.
Let’s explore the Graph Coloring Problem, which has applications in scheduling, register allocation, and resource assignment.

Graph Coloring Problem

The Graph Coloring Problem involves assigning colors to vertices of a graph such that no adjacent vertices share the same color, while minimizing the total number of colors used.

Mathematical Formulation

Let’s define:

  • G=(V,E): An undirected graph with vertices V and edges E
  • C: Set of possible colors
  • xiv: Binary variable (1 if vertex i is assigned color v, 0 otherwise)
  • yv: Binary variable (1 if color v is used, 0 otherwise)

The objective function is:
minvCyv

Subject to:

  • Each vertex must be assigned exactly one color: vCxiv=1,iV
  • Adjacent vertices must have different colors: xiv+xjv1,(i,j)E,vC
  • If color v is assigned to any vertex, then yv=1: xivyv,iV,vC
  • All variables are binary: xiv,yv0,1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
import pulp as pl
from itertools import combinations
import matplotlib.cm as cm
import matplotlib.colors as mcolors

# Set random seed for reproducibility
np.random.seed(42)

# Parameters
num_vertices = 20
edge_probability = 0.2 # Probability of an edge between any two vertices

# Generate a random graph
G = nx.gnp_random_graph(num_vertices, edge_probability)

# We can also add some additional edges to make it more challenging
# Add edges between some specific nodes to create more constraints
additional_edges = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)]
for u, v in additional_edges:
if u < num_vertices and v < num_vertices and not G.has_edge(u, v):
G.add_edge(u, v)

# Print basic graph statistics
print(f"Graph with {G.number_of_nodes()} vertices and {G.number_of_edges()} edges")
print(f"Average degree: {2 * G.number_of_edges() / G.number_of_nodes():.2f}")

# Upper bound on the number of colors needed (worst case)
max_colors = min(num_vertices, max(dict(G.degree()).values()) + 1)
print(f"Maximum number of colors needed (worst case): {max_colors}")

# Create the optimization model
model = pl.LpProblem("Graph_Coloring", pl.LpMinimize)

# Set of colors (using indices)
colors = range(max_colors)

# Decision variables
# x[i, v] = 1 if vertex i is assigned color v, 0 otherwise
x = {(i, v): pl.LpVariable(f"x_{i}_{v}", cat='Binary')
for i in G.nodes() for v in colors}

# y[v] = 1 if color v is used, 0 otherwise
y = {v: pl.LpVariable(f"y_{v}", cat='Binary') for v in colors}

# Objective function: minimize the number of colors used
model += pl.lpSum(y[v] for v in colors)

# Constraint: each vertex must be assigned exactly one color
for i in G.nodes():
model += pl.lpSum(x[(i, v)] for v in colors) == 1

# Constraint: adjacent vertices must have different colors
for (i, j) in G.edges():
for v in colors:
model += x[(i, v)] + x[(j, v)] <= 1

# Constraint: if a vertex is assigned a color, that color is used
for i in G.nodes():
for v in colors:
model += x[(i, v)] <= y[v]

# Constraint: ensure colors are used in order (symmetry breaking)
for v in range(1, max_colors):
model += y[v] <= y[v-1]

# Solve the model
model.solve(pl.PULP_CBC_CMD(msg=False))

# Extract results
vertex_colors = {}
colors_used = set()

for i in G.nodes():
for v in colors:
if pl.value(x[(i, v)]) > 0.5:
vertex_colors[i] = v
colors_used.add(v)

num_colors_used = len(colors_used)

print(f"Optimal solution status: {pl.LpStatus[model.status]}")
print(f"Minimum number of colors needed: {num_colors_used}")
print("Vertex coloring:")
for i, color in sorted(vertex_colors.items()):
print(f"Vertex {i}: Color {color}")

# Compare with greedy coloring
greedy_coloring = nx.greedy_color(G)
num_greedy_colors = max(greedy_coloring.values()) + 1
print(f"Number of colors with greedy algorithm: {num_greedy_colors}")

# Heuristic: DSatur algorithm
dsatur_coloring = nx.coloring.greedy_color(G, strategy="saturation_largest_first")
num_dsatur_colors = max(dsatur_coloring.values()) + 1
print(f"Number of colors with DSatur algorithm: {num_dsatur_colors}")

# Visualization of the colored graph
plt.figure(figsize=(12, 10))

# Choose a color map for visualization
color_names = list(mcolors.TABLEAU_COLORS.keys())
selected_colors = color_names[:max(num_colors_used, num_greedy_colors, num_dsatur_colors)]

# Create position layout for nodes
pos = nx.spring_layout(G, seed=42)

# Create a subplot for the optimal solution
plt.subplot(2, 2, 1)
node_colors_optimal = [vertex_colors[i] for i in G.nodes()]
color_map_optimal = [selected_colors[c] for c in node_colors_optimal]

nx.draw(G, pos, with_labels=True, node_color=color_map_optimal,
node_size=500, font_color="white", font_weight="bold",
edge_color="gray", width=2, alpha=0.9)

plt.title(f"Optimal Coloring: {num_colors_used} Colors")

# Create a subplot for the greedy solution
plt.subplot(2, 2, 2)
node_colors_greedy = [greedy_coloring[i] for i in G.nodes()]
color_map_greedy = [selected_colors[c] for c in node_colors_greedy]

nx.draw(G, pos, with_labels=True, node_color=color_map_greedy,
node_size=500, font_color="white", font_weight="bold",
edge_color="gray", width=2, alpha=0.9)

plt.title(f"Greedy Coloring: {num_greedy_colors} Colors")

# Create a subplot for the DSatur solution
plt.subplot(2, 2, 3)
node_colors_dsatur = [dsatur_coloring[i] for i in G.nodes()]
color_map_dsatur = [selected_colors[c] for c in node_colors_dsatur]

nx.draw(G, pos, with_labels=True, node_color=color_map_dsatur,
node_size=500, font_color="white", font_weight="bold",
edge_color="gray", width=2, alpha=0.9)

plt.title(f"DSatur Coloring: {num_dsatur_colors} Colors")

# Generate a legend for the colors
plt.subplot(2, 2, 4)
plt.axis('off')
for i in range(max(num_colors_used, num_greedy_colors, num_dsatur_colors)):
plt.plot([0], [0], 'o', color=selected_colors[i], label=f'Color {i}')
plt.legend(loc='center', fontsize=12)
plt.title("Color Legend")

plt.tight_layout()

# Algorithm performance comparison
plt.figure(figsize=(10, 6))
methods = ['Optimal', 'Greedy', 'DSatur']
colors_count = [num_colors_used, num_greedy_colors, num_dsatur_colors]

bars = plt.bar(methods, colors_count, color=['green', 'blue', 'orange'])
plt.ylabel('Number of Colors Used')
plt.title('Comparison of Graph Coloring Algorithms')
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Add value labels on top of bars
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height + 0.05,
f'{height}', ha='center', va='bottom')

# Graph statistics visualization
plt.figure(figsize=(12, 6))

# Calculate degree distribution
degree_sequence = sorted([d for n, d in G.degree()], reverse=True)
degree_count = {}
for d in degree_sequence:
if d in degree_count:
degree_count[d] += 1
else:
degree_count[d] = 1

plt.subplot(1, 2, 1)
plt.bar(degree_count.keys(), degree_count.values(), color='skyblue')
plt.title('Degree Distribution')
plt.xlabel('Degree')
plt.ylabel('Count')
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Create conflict matrix visualization
plt.subplot(1, 2, 2)
adj_matrix = nx.to_numpy_array(G)
plt.imshow(adj_matrix, cmap='Blues', interpolation='none')
plt.colorbar(label='Connection')
plt.title('Adjacency Matrix')
plt.xlabel('Vertex')
plt.ylabel('Vertex')

plt.tight_layout()

# Chromatic number analysis based on graph density
plt.figure(figsize=(10, 6))

# Calculate graph density
density = nx.density(G)
print(f"Graph density: {density:.3f}")

# Theoretical bounds
vertices = G.number_of_nodes()
edges = G.number_of_edges()
max_degree = max(dict(G.degree()).values())
brooks_bound = max(2, min(vertices, max_degree))

# Generate points for the theoretical bounds
densities = np.linspace(0, 1, 100)
clique_bounds = [max(2, int(vertices * d) + 1) for d in densities] # Rough estimate based on clique number

plt.plot(densities, clique_bounds, 'r--', label='Theoretical upper bound')
plt.scatter([density], [num_colors_used], color='green', s=100, label='Optimal solution')
plt.scatter([density], [num_greedy_colors], color='blue', s=100, label='Greedy algorithm')
plt.scatter([density], [num_dsatur_colors], color='orange', s=100, label='DSatur algorithm')
plt.scatter([density], [brooks_bound], color='purple', s=100, label="Brooks' bound")

plt.axvline(x=density, color='gray', linestyle='--', alpha=0.5)
plt.text(density+0.02, 2, f'Current density: {density:.3f}', rotation=90, alpha=0.7)

plt.xlabel('Graph Density')
plt.ylabel('Chromatic Number (Colors Required)')
plt.title('Relationship Between Graph Density and Chromatic Number')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Code Explanation

Let me walk through the key components of this solution:

  1. Problem Setup:

    • We generate a random graph with a specified number of vertices and edge probability
    • The goal is to color the vertices such that no adjacent vertices share the same color, while minimizing the total number of colors used
  2. Mathematical Model:

    • We use PuLP to formulate a binary integer programming model
    • Decision variables represent the assignment of colors to vertices and whether each color is used
    • The objective function minimizes the total number of colors used
    • Constraints ensure:
      • Each vertex gets exactly one color
      • Adjacent vertices have different colors
      • Colors are used in sequence (to break symmetry)
  3. Solution Approaches:

    • Optimal method: Integer Linear Programming using PuLP
    • Greedy method: A simple sequential coloring heuristic
    • DSatur algorithm: A more sophisticated heuristic that prioritizes vertices with the highest saturation degree
  4. Visualizations:

    • Colored graph visualizations showing the optimal and heuristic solutions
    • A comparison chart of the number of colors used by different algorithms
    • Degree distribution and adjacency matrix to understand the graph structure
    • An analysis of the relationship between graph density and chromatic number

Results

Graph with 20 vertices and 42 edges
Average degree: 4.20
Maximum number of colors needed (worst case): 9
Optimal solution status: Optimal
Minimum number of colors needed: 4
Vertex coloring:
Vertex 0: Color 0
Vertex 1: Color 2
Vertex 2: Color 1
Vertex 3: Color 2
Vertex 4: Color 1
Vertex 5: Color 1
Vertex 6: Color 3
Vertex 7: Color 0
Vertex 8: Color 0
Vertex 9: Color 3
Vertex 10: Color 2
Vertex 11: Color 1
Vertex 12: Color 0
Vertex 13: Color 2
Vertex 14: Color 0
Vertex 15: Color 3
Vertex 16: Color 1
Vertex 17: Color 1
Vertex 18: Color 2
Vertex 19: Color 1
Number of colors with greedy algorithm: 4
Number of colors with DSatur algorithm: 4
Graph density: 0.221




Results Interpretation

The solution provides several key insights:

  1. Optimal Coloring: The integer programming approach finds the minimum number of colors needed (chromatic number of the graph).

  2. Heuristic Performance: We can see how well the greedy and DSatur heuristics perform compared to the optimal solution.
    For many graphs, DSatur performs very well with much less computational effort.

  3. Graph Structure Impact: The degree distribution, density, and connectivity patterns of the graph directly affect how many colors are needed.

  4. Theoretical Bounds: We compare the results with Brooks’ bound (maximum degree + 1) and density-based estimates.

Real-World Applications

The Graph Coloring Problem has numerous practical applications:

  1. Scheduling: Assigning time slots to events, classes, or exams to avoid conflicts.

  2. Frequency Assignment: Allocating radio frequencies to transmitters to avoid interference.

  3. Register Allocation: Assigning variables to CPU registers in compiler optimization.

  4. Map Coloring: Coloring geographical maps such that no adjacent regions share the same color.

  5. Task Assignment: Allocating incompatible tasks to different time slots or resources.

  6. Circuit Board Testing: Minimizing the number of tests needed for a printed circuit board.

  7. Sports Scheduling: Creating tournament schedules where teams must play against each other.

This problem demonstrates how discrete optimization can help solve complex allocation problems where resources must be assigned while respecting conflict constraints.

The comparisons between exact and heuristic methods illustrate the trade-offs between solution quality and computational efficiency that are typical in many real-world optimization scenarios.

Quantum Harmonic Oscillator

Numerical Simulation and Visualization with Python

I’ll solve a practical quantum mechanics problem using Python in Google Colab, including clear explanations and visualization.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
import numpy as np
import matplotlib.pyplot as plt
from scipy import sparse
from scipy.linalg import expm
from scipy.special import hermite, factorial

# Set plotting parameters for better visualization
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['font.size'] = 14

# Physical constants
hbar = 1.0 # Reduced Planck constant (using natural units)
m = 1.0 # Mass of the particle
omega = 1.0 # Angular frequency of the oscillator

# Simulation parameters
n_points = 1000 # Number of position points for discretization
x_range = 10.0 # Position range for simulation
dx = 2.0 * x_range / n_points # Position step size
x = np.linspace(-x_range, x_range, n_points) # Position array

# Create potential energy function (harmonic oscillator: V(x) = 0.5 * m * omega^2 * x^2)
V = 0.5 * m * omega**2 * x**2

# Create second derivative operator (kinetic energy part)
diag = np.ones(n_points) * (2.0 / dx**2)
off_diag = np.ones(n_points-1) * (-1.0 / dx**2)
D2 = sparse.diags([off_diag, diag, off_diag], [-1, 0, 1])

# Create Hamiltonian operator: H = -hbar^2/(2m) * d^2/dx^2 + V(x)
H = sparse.diags(V) - (hbar**2 / (2.0 * m)) * D2

# Find eigenvalues and eigenvectors of the Hamiltonian
eigenvalues, eigenvectors = sparse.linalg.eigs(H, k=10, which='SM') # Find lowest 10 energy states
idx = eigenvalues.argsort() # Sort by energy
eigenvalues = eigenvalues[idx].real
eigenvectors = eigenvectors[:, idx]

# Normalize eigenvectors
for i in range(len(eigenvalues)):
eigenvectors[:, i] = eigenvectors[:, i] / np.sqrt(np.sum(np.abs(eigenvectors[:, i])**2) * dx)

# Analytical solutions for comparison
def analytical_wavefunction(n, x):
"""
Calculate the analytical wavefunction for quantum harmonic oscillator
for quantum number n at position x
"""
# Constants for the harmonic oscillator
alpha = m * omega / hbar

# Normalization constant
normalization = (alpha / np.pi)**0.25 * (1.0 / np.sqrt(2**n * factorial(n)))

# Hermite polynomial
herm = hermite(n)(np.sqrt(alpha) * x)

# Full wavefunction
psi = normalization * herm * np.exp(-alpha * x**2 / 2.0)

return psi

# Calculate analytical energy levels
def analytical_energy(n):
"""Calculate analytical energy for quantum harmonic oscillator with quantum number n"""
return hbar * omega * (n + 0.5)

# Compute analytical solutions
analytical_energies = [analytical_energy(n) for n in range(10)]
analytical_wavefunctions = [analytical_wavefunction(n, x) for n in range(10)]

# Plot and compare numerical and analytical solutions
fig, axs = plt.subplots(5, 2, figsize=(15, 20))
axs = axs.flatten()

for i in range(10):
axs[i].plot(x, eigenvectors[:, i].real, 'b-', label=f'Numerical ψ{i}')
axs[i].plot(x, analytical_wavefunctions[i], 'r--', label=f'Analytical ψ{i}')
axs[i].axhline(y=0, color='k', linestyle='-', alpha=0.3)
axs[i].set_title(f'Energy Level {i}: E_numerical = {eigenvalues[i]:.5f}, E_analytical = {analytical_energies[i]:.5f}')
axs[i].set_xlabel('Position (x)')
axs[i].set_ylabel('Wave Function (ψ)')
axs[i].legend()
axs[i].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Time evolution of a wavepacket
# Create a Gaussian wavepacket as initial state (superposition of ground and first excited states)
psi_0 = 0.7 * eigenvectors[:, 0] + 0.7 * eigenvectors[:, 1]
psi_0 = psi_0 / np.sqrt(np.sum(np.abs(psi_0)**2) * dx) # Normalize

# Time parameters
t_total = 10.0 # Total simulation time
n_steps = 100 # Number of time steps
dt = t_total / n_steps # Time step

# Store wavefunction at different times
psi_t = np.zeros((n_steps, n_points), dtype=complex)
psi_t[0] = psi_0

# Time evolution operator
def time_evolution(psi, H, dt):
"""Time evolution using the operator exp(-i*H*dt/hbar)"""
# We use the fact that eigenvectors of H provide a basis for expansion
coeffs = np.zeros(10, dtype=complex)
for i in range(10):
coeffs[i] = np.sum(np.conj(eigenvectors[:, i]) * psi) * dx

# Evolve each component and recombine
psi_evolved = np.zeros_like(psi, dtype=complex)
for i in range(10):
psi_evolved += coeffs[i] * np.exp(-1j * eigenvalues[i] * dt / hbar) * eigenvectors[:, i]

return psi_evolved

# Perform time evolution
for t in range(1, n_steps):
psi_t[t] = time_evolution(psi_t[t-1], H, dt)

# Probability densities for animation
prob_densities = np.abs(psi_t)**2

# Plot time evolution
fig, ax = plt.subplots(figsize=(12, 8))

# Plot potential energy curve (scaled for visibility)
V_scaled = V / np.max(V) * np.max(prob_densities) * 0.8
ax.plot(x, V_scaled, 'k-', alpha=0.3, label='Potential (scaled)')

# Select specific timesteps to plot
time_points = [0, 25, 50, 75, 99]
colors = ['b', 'g', 'r', 'c', 'm']

for i, t in enumerate(time_points):
ax.plot(x, prob_densities[t], colors[i], label=f't = {t*dt:.2f}')

ax.set_xlabel('Position (x)')
ax.set_ylabel('Probability Density |ψ(x,t)|²')
ax.set_title('Time Evolution of Quantum Harmonic Oscillator Wavepacket')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Create an energy level diagram
n_levels = 10
fig, ax = plt.subplots(figsize=(8, 10))

# Plot energy levels
for n in range(n_levels):
E = analytical_energy(n)
ax.plot([-0.5, 0.5], [E, E], 'b-', linewidth=2)
ax.text(0.6, E, f'n = {n}, E = {E:.2f}ħω', va='center')

# Add potential well visualization
x_pot = np.linspace(-0.4, 0.4, 100)
V_pot = 0.5 * m * omega**2 * x_pot**2
ax.plot(x_pot, V_pot, 'r--', label='V(x) = ½mω²x²')

ax.set_xlim(-1, 1.5)
ax.set_ylim(-0.5, analytical_energy(n_levels-1) + 1)
ax.set_ylabel('Energy (in units of ħω)')
ax.set_title('Energy Levels of Quantum Harmonic Oscillator')
ax.set_xticks([])
ax.legend(loc='upper right')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Quantum Harmonic Oscillator Simulation

In this example, I’ll solve the quantum harmonic oscillator problem, which is one of the most important systems in quantum mechanics.

It models a particle moving in a parabolic potential well, like a mass on a spring at the quantum scale.

Theoretical Background

The quantum harmonic oscillator is described by the Hamiltonian:

ˆH=22md2dx2+12mω2x2

Where:

  • is the reduced Planck constant
  • m is the mass of the particle
  • ω is the angular frequency of the oscillator
  • x is the position

The energy eigenvalues are given by:

En=ω(n+12)

And the wavefunctions are:

ψn(x)=(απ)1/412nn!Hn(αx)eαx2/2

Where:

  • α=mω
  • Hn are the Hermite polynomials
  • n is the quantum number (0,1,2,)

Code Explanation

The code solves this problem using two methods:

  1. Numerical solution: Using a discretized position grid and sparse matrix techniques
  2. Analytical solution: Calculating the exact expressions for comparison

Key Components:

  1. Set up the position grid and potential:

    • Create a discretized position space
    • Define the harmonic potential V(x)=12mω2x2
  2. Create the Hamiltonian matrix:

    • The kinetic energy term uses a finite difference approximation for the second derivative
    • The potential energy is added as a diagonal matrix
  3. Find eigenvalues and eigenvectors:

    • Compute the lowest 10 energy states
    • Sort them by energy
  4. Compare with analytical solutions:

    • Calculate the exact energy levels En=ω(n+12)
    • Calculate the exact wavefunctions using Hermite polynomials
  5. Simulate time evolution:

    • Create a wavepacket as a superposition of ground and first excited states
    • Evolve it through time and visualize the probability density

Results and Visualization

The code produces three main visualizations:

  1. Wavefunction Comparison:
    The first plot shows the first 10 energy eigenstates, comparing the numerical solutions (blue solid lines) with the analytical solutions (red dashed lines).
    The close match verifies our numerical method is working correctly.

  1. Time Evolution:
    The second plot shows the time evolution of a wavepacket (initially a superposition of ground and first excited states).
    The probability density is shown at five different times, demonstrating the oscillatory behavior of the wavepacket.

  1. Energy Level Diagram:
    The final plot shows the quantized energy levels of the harmonic oscillator.
    Notice how they are equally spaced by ω, which is a key characteristic of the quantum harmonic oscillator.

Physical Interpretation

  1. Quantization of Energy: Unlike a classical oscillator that can have any energy, the quantum harmonic oscillator can only have discrete energy values.

  2. Zero-Point Energy: Even in the ground state (n=0), the energy is E0=12ω, not zero. This represents the quantum mechanical uncertainty principle at work.

  3. Probability Distribution: The wavefunction squared gives the probability density of finding the particle at a particular position.
    For the ground state, this is concentrated around the center, but for higher energy states, there are nodes where the probability is zero.

  4. Time Evolution: The time-dependent wavepacket oscillates back and forth, similar to a classical oscillator, but maintains its quantum nature with the probability spread according to the uncertainty principle.

This simulation provides a clear visualization of these fundamental quantum mechanical concepts using a system that has direct analogies to classical mechanics.