Optimizing Sports League Schedules:A Python Approach to Minimizing Travel Distances

Solving a Realistic Sports Scheduling Problem with Python

Have you ever wondered how professional sports leagues create their game schedules?
It’s a fascinating optimization problem that combines mathematics and computer science.

Today, I’ll walk you through solving a realistic sports scheduling problem using $Python$.

The Problem: Regional Basketball League Scheduling

Let’s tackle scheduling for a regional basketball league with $8$ teams. The constraints are:

  • Each team must play against every other team twice (home and away)
  • Games occur on weekends with $4$ matches per weekend
  • No team should play more than one game per weekend
  • Travel distances should be minimized (teams located far from each other should have games consolidated)

This is a classic combinatorial optimization problem that appears simple but quickly becomes complex.

Let’s solve this using $Python$ with the $PuLP$ library for linear programming:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pulp import *
import networkx as nx
from itertools import combinations

# Define our teams with their locations (x,y coordinates)
teams = {
'Eagles': (10, 40),
'Sharks': (30, 10),
'Tigers': (90, 20),
'Hawks': (50, 50),
'Lions': (70, 80),
'Bears': (20, 70),
'Wolves': (60, 30),
'Panthers': (80, 60)
}

# Calculate distances between teams
distances = {}
for team1, loc1 in teams.items():
for team2, loc2 in teams.items():
if team1 != team2:
# Euclidean distance
dist = np.sqrt((loc1[0] - loc2[0])**2 + (loc1[1] - loc2[1])**2)
distances[(team1, team2)] = dist

# Total number of teams
n_teams = len(teams)

# Each team plays every other team twice (home and away)
n_games = n_teams * (n_teams - 1)

# Number of weekends needed (4 games per weekend, each team plays at most once)
n_weekends = n_games // 4

# Create the model
model = LpProblem("Sports_Scheduling", LpMinimize)

# Create variables
# game_vars[(i, j, w)] = 1 if team i plays against team j on weekend w
game_vars = {}
for i in teams:
for j in teams:
if i != j:
for w in range(n_weekends):
game_vars[(i, j, w)] = LpVariable(f"Game_{i}_{j}_{w}", 0, 1, LpBinary)

# Objective function: Minimize total travel distance
# We'll add a penalty for teams playing against each other on non-consecutive weekends
objective = lpSum([game_vars[(i, j, w)] * distances[(i, j)] for i in teams for j in teams for w in range(n_weekends) if i != j])
model += objective

# Constraints

# 1. Each team plays against every other team twice (once home, once away)
for i in teams:
for j in teams:
if i != j:
model += lpSum([game_vars[(i, j, w)] for w in range(n_weekends)]) == 1
model += lpSum([game_vars[(j, i, w)] for w in range(n_weekends)]) == 1

# 2. Each team plays at most one game per weekend
for w in range(n_weekends):
for i in teams:
model += lpSum([game_vars[(i, j, w)] + game_vars[(j, i, w)] for j in teams if i != j]) <= 1

# 3. Exactly 4 games per weekend
for w in range(n_weekends):
model += lpSum([game_vars[(i, j, w)] for i in teams for j in teams if i != j]) == 4

# Solve the model
model.solve(PULP_CBC_CMD(msg=False))

# Print the status of the solution
print("Status:", LpStatus[model.status])

# Organize the schedule
schedule = {w: [] for w in range(n_weekends)}
for i in teams:
for j in teams:
if i != j:
for w in range(n_weekends):
if game_vars[(i, j, w)].value() == 1:
schedule[w].append((i, j))

# Create a DataFrame for better visualization
schedule_df = pd.DataFrame(index=range(n_weekends), columns=['Game 1', 'Game 2', 'Game 3', 'Game 4'])
for w in range(n_weekends):
for g in range(4):
if g < len(schedule[w]):
home, away = schedule[w][g]
schedule_df.loc[w, f'Game {g+1}'] = f"{home} vs {away}"

# Print the schedule
print("\nOptimized Schedule:")
print(schedule_df)

# Visualize team locations and distances
plt.figure(figsize=(10, 8))
for team, (x, y) in teams.items():
plt.scatter(x, y, s=100)
plt.text(x+2, y+2, team, fontsize=12)

# Draw lines between teams that play on the same weekend
colors = ['red', 'blue', 'green', 'purple', 'orange', 'brown', 'pink', 'gray', 'olive', 'cyan', 'magenta', 'yellow']
for w in range(n_weekends):
for home, away in schedule[w]:
x1, y1 = teams[home]
x2, y2 = teams[away]
plt.plot([x1, x2], [y1, y2], color=colors[w % len(colors)], alpha=0.5, linewidth=2)

plt.title('Team Locations and Game Connections by Weekend')
plt.xlabel('X Coordinate')
plt.ylabel('Y Coordinate')
plt.grid(True)

# Create a legend for weekends
from matplotlib.lines import Line2D
legend_elements = [Line2D([0], [0], color=colors[w % len(colors)], lw=2, label=f'Weekend {w+1}')
for w in range(n_weekends)]
plt.legend(handles=legend_elements, loc='upper right')

plt.savefig('team_locations.png')
plt.show()

# Visualize the schedule as a heatmap to see when each team plays
team_schedule = pd.DataFrame(0, index=teams.keys(), columns=range(1, n_weekends+1))

for w in range(n_weekends):
for home, away in schedule[w]:
team_schedule.loc[home, w+1] = 1 # Home game
team_schedule.loc[away, w+1] = -1 # Away game

plt.figure(figsize=(12, 8))
sns.heatmap(team_schedule, cmap="coolwarm", center=0, linewidths=.5,
cbar_kws={'label': 'Game Type', 'ticks':[-1, 0, 1]})
plt.title('Team Schedule Heatmap (1: Home Game, -1: Away Game, 0: No Game)')
plt.xlabel('Weekend Number')
plt.ylabel('Team')

# Customize the colorbar
colorbar = plt.gcf().axes[1]
colorbar.set_yticklabels(['Away', 'No Game', 'Home'])

plt.tight_layout()
plt.savefig('schedule_heatmap.png')
plt.show()

# Calculate total distance traveled by each team
team_distances = {team: 0 for team in teams}
for w in range(n_weekends):
for home, away in schedule[w]:
team_distances[away] += distances[(away, home)]

# Plot total distance traveled
plt.figure(figsize=(10, 6))
plt.bar(team_distances.keys(), team_distances.values())
plt.title('Total Distance Traveled by Each Team')
plt.xlabel('Team')
plt.ylabel('Distance')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('travel_distances.png')
plt.show()

# Create a network graph to visualize the schedule
G = nx.DiGraph()
for team in teams:
G.add_node(team)

for w in range(n_weekends):
for home, away in schedule[w]:
G.add_edge(home, away, weekend=w+1)

plt.figure(figsize=(12, 10))
pos = {team: loc for team, loc in teams.items()}
nx.draw_networkx_nodes(G, pos, node_size=700, node_color='lightblue')
nx.draw_networkx_labels(G, pos, font_size=10)

# Draw edges with different colors for each weekend
for w in range(n_weekends):
edge_list = [(home, away) for home, away in G.edges() if G[home][away]['weekend'] == w+1]
nx.draw_networkx_edges(G, pos, edgelist=edge_list, edge_color=colors[w % len(colors)],
arrows=True, width=1.5, alpha=0.7)

# Add edge labels indicating the weekend number
edge_labels = {(home, away): f"Week {G[home][away]['weekend']}" for home, away in G.edges()}
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_size=8)

plt.title('Game Schedule Network')
plt.axis('off')
plt.tight_layout()
plt.savefig('schedule_network.png')
plt.show()

Code Explanation

Let’s break down the solution step by step:

1. Problem Setup

First, I define our $8$ teams with their geographic locations using x-y coordinates.
This is realistic because in actual sports scheduling, the locations of teams are critical for minimizing travel distances.

1
2
3
4
5
6
teams = {
'Eagles': (10, 40),
'Sharks': (30, 10),
'Tigers': (90, 20),
...
}

Then I calculate the Euclidean distances between each pair of teams.
In a real-world scenario, these would be actual road distances or travel times.

2. Mathematical Model

This is where the optimization begins.
The problem is formulated as a linear programming model using PuLP:

1
model = LpProblem("Sports_Scheduling", LpMinimize)

The key decision variables are binary variables indicating whether team i plays against team j on weekend w:

1
game_vars[(i, j, w)] = LpVariable(f"Game_{i}_{j}_{w}", 0, 1, LpBinary)

3. Objective Function

Our goal is to minimize the total travel distance:

1
objective = lpSum([game_vars[(i, j, w)] * distances[(i, j)] for i in teams for j in teams for w in range(n_weekends) if i != j])

This can be represented mathematically as:

$$\min \sum_{i \in Teams} \sum_{j \in Teams, j \neq i} \sum_{w \in Weekends} game_{i,j,w} \times distance_{i,j}$$

4. Constraints

Several constraints ensure a valid schedule:

  1. Each team plays against every other team exactly once at home and once away:

$$\sum_{w \in Weekends} game_{i,j,w} = 1 \quad \forall i,j \in Teams, i \neq j$$

  1. Each team plays at most one game per weekend:

$$\sum_{j \in Teams, j \neq i} (game_{i,j,w} + game_{j,i,w}) \leq 1 \quad \forall i \in Teams, \forall w \in Weekends$$

  1. Exactly 4 games per weekend:

$$\sum_{i \in Teams} \sum_{j \in Teams, j \neq i} game_{i,j,w} = 4 \quad \forall w \in Weekends$$

5. Solution Extraction

After solving the model, I extract the schedule into a more readable format:

1
2
3
4
5
6
7
schedule = {w: [] for w in range(n_weekends)}
for i in teams:
for j in teams:
if i != j:
for w in range(n_weekends):
if game_vars[(i, j, w)].value() == 1:
schedule[w].append((i, j))

Output

Status: Optimal

Optimized Schedule:
                Game 1              Game 2             Game 3  \
0      Sharks vs Lions     Tigers vs Hawks  Bears vs Panthers   
1     Sharks vs Wolves  Tigers vs Panthers     Lions vs Hawks   
2   Sharks vs Panthers     Tigers vs Bears    Lions vs Eagles   
3     Eagles vs Sharks     Bears vs Tigers    Wolves vs Lions   
4     Eagles vs Wolves     Sharks vs Bears  Hawks vs Panthers   
5     Tigers vs Wolves     Hawks vs Eagles    Bears vs Sharks   
6      Eagles vs Bears     Hawks vs Sharks    Lions vs Wolves   
7      Eagles vs Hawks     Tigers vs Lions   Wolves vs Sharks   
8   Eagles vs Panthers    Tigers vs Sharks     Hawks vs Lions   
9      Hawks vs Tigers     Lions vs Sharks    Bears vs Wolves   
10    Sharks vs Eagles   Lions vs Panthers     Bears vs Hawks   
11     Eagles vs Lions    Sharks vs Tigers     Hawks vs Bears   
12    Tigers vs Eagles     Hawks vs Wolves     Lions vs Bears   
13    Eagles vs Tigers     Sharks vs Hawks     Bears vs Lions   

                Game 4  
0     Wolves vs Eagles  
1      Bears vs Eagles  
2      Wolves vs Hawks  
3    Panthers vs Hawks  
4      Lions vs Tigers  
5    Panthers vs Lions  
6   Panthers vs Tigers  
7    Panthers vs Bears  
8      Wolves vs Bears  
9   Panthers vs Eagles  
10    Wolves vs Tigers  
11  Panthers vs Wolves  
12  Panthers vs Sharks  
13  Wolves vs Panthers  

Visualization and Analysis

I’ve created several visualizations to help understand the optimized schedule:

1. Geographic Visualization

The first plot shows the geographic locations of teams with lines connecting teams that play on the same weekend.
Each color represents a different weekend:

This visualization is particularly useful for identifying whether games are geographically efficient - teams that are closer to each other should ideally play on the same weekends to minimize travel.

2. Schedule Heatmap

The heatmap provides a clear view of when each team plays, and whether they’re playing home (1) or away (-1):

This visualization helps identify whether the schedule is balanced.
Each team should have a mix of home and away games, and we can quickly see if any team has too many consecutive home or away games.

3. Travel Distance Analysis

This bar chart shows the total distance traveled by each team:

In an ideal schedule, these values should be relatively balanced to ensure fairness.
Wide variations might indicate scheduling inefficiencies.

4. Network Graph

Finally, the network graph shows the complete schedule with directed edges indicating who plays whom and when:

This graph helps identify the overall structure of the schedule and any potential patterns.

Conclusion

Sports scheduling is a complex optimization problem with numerous constraints.
The solution I’ve presented uses linear programming to find an optimal schedule that minimizes travel distances while satisfying all the logical constraints of a sports league.

In real-world applications, additional constraints might include:

  • Arena availability
  • TV broadcasting requirements
  • Avoiding conflicts with other major events
  • Balancing the distribution of strong opponents throughout the season

This approach can be extended to larger leagues with more complex constraints, though the computational complexity increases significantly with the number of teams and constraints.

Circuit Board Testing

A Comprehensive Python Analysis

In the world of electronics manufacturing, circuit board testing is crucial for ensuring product quality and reliability.

Today, I’ll walk you through a realistic $Python$ implementation of circuit board defect analysis using statistical methods.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

# Simulated Circuit Board Test Data
np.random.seed(42)

class CircuitBoardTesting:
def __init__(self, num_boards=1000):
"""
Initialize circuit board testing simulation

Parameters:
- num_boards: Total number of circuit boards to simulate
"""
self.num_boards = num_boards
self.defect_rates = self._generate_defect_data()

def _generate_defect_data(self):
"""
Generate simulated defect data with realistic variations

Defect types:
1. Solder joint issues
2. Component misalignment
3. Trace discontinuity
4. Electrical short
"""
defect_types = [
'Solder Joint Issues',
'Component Misalignment',
'Trace Discontinuity',
'Electrical Short'
]

# Realistic defect rate generation
defect_rates = {
defect: np.random.normal(loc=0.02, scale=0.005, size=self.num_boards)
for defect in defect_types
}

return defect_rates

def calculate_statistics(self):
"""
Calculate statistical metrics for defect analysis

Returns:
- DataFrame with defect statistics
"""
stats_data = {}
for defect, rates in self.defect_rates.items():
# Corrected confidence interval calculation
confidence_interval = stats.t.interval(
confidence=0.95, # Added 'confidence' parameter
df=len(rates)-1,
loc=np.mean(rates),
scale=stats.sem(rates)
)

stats_data[defect] = {
'Mean': np.mean(rates),
'Std Dev': np.std(rates),
'Min': np.min(rates),
'Max': np.max(rates),
'Confidence Interval': confidence_interval
}

return pd.DataFrame.from_dict(stats_data, orient='index')

def plot_defect_distribution(self):
"""
Visualize defect rate distributions
"""
plt.figure(figsize=(12, 6))

for i, (defect, rates) in enumerate(self.defect_rates.items(), 1):
plt.subplot(2, 2, i)
plt.hist(rates, bins=30, alpha=0.7, color=f'C{i-1}')
plt.title(f'{defect} Defect Rate')
plt.xlabel('Defect Rate')
plt.ylabel('Frequency')

plt.tight_layout()
plt.show()

# Perform analysis
testing = CircuitBoardTesting(num_boards=1000)
defect_stats = testing.calculate_statistics()
print("Defect Rate Statistics:")
print(defect_stats)

# Visualize defect distributions
testing.plot_defect_distribution()

Detailed Code Explanation

Simulation Approach

  1. We create a CircuitBoardTesting class to simulate circuit board defect analysis.
  2. The simulation uses numpy to generate realistic defect rates with normal distribution.

Statistical Analysis

Key statistical methods include:

  • Mean defect rate calculation
  • Standard deviation measurement
  • 95% confidence interval determination

Mathematical Representation

The defect rate follows a normal distribution with:

  • $\mu = 0.02$ (mean defect rate)
  • $\sigma = 0.005$ (standard deviation)

Key Visualization Insights

The generated plots will show:

  • Distribution of defect rates for each defect type
  • Frequency of occurrence
  • Variation in defect rates

Result

Defect Rate Statistics:
                            Mean   Std Dev       Min       Max  \
Solder Joint Issues     0.020097  0.004894  0.003794  0.039264   
Component Misalignment  0.020354  0.004985  0.005298  0.035966   
Trace Discontinuity     0.020029  0.004915  0.004902  0.039631   
Electrical Short        0.019906  0.005133  0.005353  0.036215   

                                                 Confidence Interval  
Solder Joint Issues       (0.01979283559301649, 0.02040048496520677)  
Component Misalignment   (0.02004469759714213, 0.020663664775349432)  
Trace Discontinuity     (0.019724031341496525, 0.020334310804063716)  
Electrical Short        (0.019587711981175548, 0.020225095840956723)  

Interpretation of Results

By analyzing the output, we can:

  • Identify potential manufacturing issues
  • Understand defect rate variability
  • Make data-driven quality improvement decisions

The simulation provides a realistic framework for circuit board testing analysis, demonstrating how statistical methods can be applied to manufacturing quality control.

Optimal Task Assignment Optimization and the Hungarian Algorithm

I’ll solve a Task Assignment problem using $Python$, focusing on optimization and visualization.

I’ll use the Hungarian algorithm (also known as the Munkres algorithm) to solve the optimal assignment problem.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import numpy as np
import scipy.optimize as optimize
import matplotlib.pyplot as plt
import seaborn as sns

# Cost matrix representing the efficiency of employees for different tasks
# Lower values indicate better performance
cost_matrix = np.array([
[10, 15, 9, 12], # Employee 1's performance on tasks
[14, 8, 11, 13], # Employee 2's performance on tasks
[12, 16, 10, 14], # Employee 3's performance on tasks
[9, 13, 15, 11] # Employee 4's performance on tasks
])

# Solving the assignment problem
row_ind, col_ind = optimize.linear_sum_assignment(cost_matrix)

# Calculating total optimization cost
total_cost = cost_matrix[row_ind, col_ind].sum()

# Detailed assignment results
assignments = [(f"Employee {r+1}", f"Task {c+1}", cost_matrix[r, c])
for r, c in zip(row_ind, col_ind)]

# Visualization of the cost matrix and optimal assignment
plt.figure(figsize=(12, 5))

# Heatmap of the cost matrix
plt.subplot(1, 2, 1)
sns.heatmap(cost_matrix, annot=True, cmap='YlGnBu',
xticklabels=[f'Task {i+1}' for i in range(cost_matrix.shape[1])],
yticklabels=[f'Employee {i+1}' for i in range(cost_matrix.shape[0])])
plt.title('Task Assignment Cost Matrix')

# Bar plot of optimal assignments
plt.subplot(1, 2, 2)
assigned_costs = [cost for _, _, cost in assignments]
plt.bar([f'Employee {i+1} - Task {j+1}' for i, j in zip(row_ind, col_ind)],
assigned_costs, color='skyblue')
plt.title('Optimal Task Assignments')
plt.xticks(rotation=45)
plt.tight_layout()

# Print results
print("Optimal Task Assignments:")
for emp, task, cost in assignments:
print(f"{emp} assigned to {task} with cost {cost}")
print(f"\nTotal Optimization Cost: {total_cost}")

plt.show()

Let me break down the solution with mathematical notation and explanation:

Mathematical Formulation:

  • Let $C_{ij}$ represent the cost matrix where $i$ is the employee and $j$ is the task
  • Goal: Minimize $\sum_{i=1}^{n} \sum_{j=1}^{m} x_{ij} \cdot C_{ij}$
  • Subject to constraints:
    $\sum_{j=1}^{m} x_{ij} = 1$ (each employee assigned to one task)
    $\sum_{i=1}^{n} x_{ij} = 1$ (each task assigned to one employee)

Key Components:

  1. scipy.optimize.linear_sum_assignment(): Implements the Hungarian algorithm
  2. Uses a cost matrix representing task efficiency
  3. Finds the optimal assignment that minimizes total cost

Visualization Insights:

  • First subplot: Heatmap of the cost matrix
    • Darker colors indicate lower costs (better performance)
  • Second subplot: Bar chart of optimal assignments
    • Shows the cost of each optimal task assignment

Output:

Optimal Task Assignments:
Employee 1 assigned to Task 1 with cost 10
Employee 2 assigned to Task 2 with cost 8
Employee 3 assigned to Task 3 with cost 10
Employee 4 assigned to Task 4 with cost 11

Total Optimization Cost: 39

Output Explanation:

  • Prints the optimal assignment for each employee
  • Calculates and displays the total optimization cost

This approach demonstrates:

  • Optimization problem solving
  • Matrix manipulation
  • Visualization techniques
  • Performance analysis

The solution helps organizations efficiently allocate tasks by minimizing overall resource expenditure.

Graph Coloring:US States Border Problem

I’ll solve a map coloring problem - this time focusing on political boundaries to demonstrate the graph coloring technique.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
import networkx as nx
import matplotlib.pyplot as plt

# Define state adjacency (neighboring states)
state_borders = {
'California': ['Oregon', 'Nevada', 'Arizona'],
'Oregon': ['California', 'Washington', 'Nevada'],
'Washington': ['Oregon', 'Idaho'],
'Nevada': ['California', 'Oregon', 'Arizona', 'Utah', 'Idaho'],
'Arizona': ['California', 'Nevada', 'Utah'],
'Utah': ['Nevada', 'Arizona', 'Colorado', 'New Mexico'],
'Idaho': ['Washington', 'Nevada', 'Oregon', 'Utah', 'Wyoming', 'Montana'],
'Colorado': ['Utah', 'Wyoming', 'New Mexico'],
'Wyoming': ['Idaho', 'Utah', 'Colorado', 'Montana', 'South Dakota', 'Nebraska'],
'Montana': ['Idaho', 'Wyoming', 'South Dakota', 'North Dakota'],
'New Mexico': ['Utah', 'Colorado', 'Arizona'],
'South Dakota': ['Wyoming', 'Montana', 'Nebraska'],
'Nebraska': ['Wyoming', 'South Dakota'],
'North Dakota': ['Montana']
}

# Create graph of state borders
def create_state_border_graph(state_borders):
G = nx.Graph()
for state, neighbors in state_borders.items():
for neighbor in neighbors:
G.add_edge(state, neighbor)
return G

# Implement graph coloring algorithm
def color_states(G):
# Sort nodes by degree (number of connections) in descending order
sorted_nodes = sorted(G.nodes(), key=lambda x: G.degree(x), reverse=True)

# Initialize color map
color_map = {}
colors = ['Red', 'Blue', 'Green', 'Yellow']

for node in sorted_nodes:
# Find the first available color
used_colors = set(color_map.get(neighbor) for neighbor in G.neighbors(node) if neighbor in color_map)

for color in colors:
if color not in used_colors:
color_map[node] = color
break

return color_map

# Create the border graph
border_graph = create_state_border_graph(state_borders)

# Color the states
state_colors = color_states(border_graph)

# Visualization
plt.figure(figsize=(12, 8))
pos = nx.spring_layout(border_graph, seed=42, k=0.9)
node_colors = [state_colors.get(node, 'gray') for node in border_graph.nodes()]

nx.draw(border_graph, pos, with_labels=True, node_color=node_colors,
node_size=3000, font_size=8, font_weight='bold',
edge_color='gray', width=1)
plt.title('US States Border Coloring Problem')
plt.tight_layout()
plt.show()

# Print the color assignments
print("State Color Assignments:")
for state, color in state_colors.items():
print(f"{state}: {color}")

# Verify no adjacent states have the same color
def verify_coloring(G, color_map):
for state in G.nodes():
for neighbor in G.neighbors(state):
if color_map[state] == color_map[neighbor]:
print(f"Error: {state} and {neighbor} have the same color!")
return False
print("Coloring is valid: No adjacent states share a color.")
return True

verify_coloring(border_graph, state_colors)

Problem Statement

We want to color a map of selected Western United States where:

  • No two adjacent states can have the same color
  • Use the minimum number of colors possible

Mathematical Representation

Let $G = (V, E)$ be a graph where:

  • $V$ represents states
  • $E$ represents borders between states

The goal is to assign colors such that $\forall (u,v) \in E$, $color(u) \neq color(v)$

Algorithm Highlights

  1. Welsh-Powell Graph Coloring Algorithm
  2. Greedy approach to color assignment
  3. Minimizes the number of colors used

Key Visualization Elements

  • Nodes represent states
  • Edges represent state borders
  • Colors represent unique time slots or regions

Complexity Analysis

  • Time Complexity: $O(V^2)$
  • Space Complexity: $O(V + E)$

The code includes a verification function to ensure no adjacent states share a color, demonstrating the algorithm’s correctness.

Result

State Color Assignments:
Idaho: Red
Utah: Blue
Wyoming: Green
Nevada: Green
Oregon: Blue
Arizona: Red
Montana: Blue
California: Yellow
Colorado: Red
New Mexico: Green
South Dakota: Red
Washington: Green
Nebraska: Blue
North Dakota: Red
Coloring is valid: No adjacent states share a color.
True

Advanced Register Allocation

Intelligent Graph Coloring for Compiler Optimization

I’ll create a register allocation example that simulates a more realistic compiler scenario with a larger interference graph and additional complexity.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np

class AdvancedRegisterAllocator:
def __init__(self, interference_graph):
"""
Advanced register allocation with more sophisticated handling

Args:
interference_graph (dict): A complex dictionary of variable conflicts
"""
self.graph = nx.Graph(interference_graph)
self.num_registers = 0
self.coloring = {}
self.spill_cost = {}

def calculate_spill_cost(self):
"""
Calculate spill costs for each node based on graph characteristics

Spill cost considers:
1. Node degree (number of conflicts)
2. Local loop frequency
3. Variable live range
"""
for node in self.graph.nodes():
# Degree-based cost
degree_cost = self.graph.degree(node)

# Simulated loop frequency (hypothetical additional complexity)
loop_frequency = np.random.uniform(1, 5)

# Live range approximation (simulated)
live_range = np.random.uniform(1, 10)

# Combined spill cost
self.spill_cost[node] = (degree_cost * loop_frequency) / live_range

def advanced_color(self, num_registers):
"""
Advanced graph coloring with spill prioritization

Args:
num_registers (int): Number of available registers

Returns:
dict: Mapping of variables to register numbers
"""
self.num_registers = num_registers
self.calculate_spill_cost()

# Sort nodes by spill cost (descending) and degree (descending)
nodes = sorted(
self.graph.nodes(),
key=lambda x: (self.spill_cost[x], self.graph.degree(x)),
reverse=True
)

spilled_variables = []

for node in nodes:
# Find used colors of neighboring nodes
used_colors = {self.coloring.get(neighbor) for neighbor in self.graph.neighbors(node) if neighbor in self.coloring}

# Find the lowest available color
color_found = False
for color in range(num_registers):
if color not in used_colors:
self.coloring[node] = color
color_found = True
break

# Track spilled variables
if not color_found:
spilled_variables.append((node, self.spill_cost[node]))

return {
'allocation': self.coloring,
'spilled': spilled_variables
}

def visualize_graph(self):
"""
Advanced graph visualization with spill information
"""
plt.figure(figsize=(15, 10))
pos = nx.spring_layout(self.graph, k=0.5) # Adjust layout

# Color mapping
color_map = plt.cm.get_cmap('tab20')
node_colors = [self.coloring.get(node, -1) for node in self.graph.nodes()]

# Draw graph
nx.draw(
self.graph,
pos,
with_labels=True,
node_color=[color_map(c/self.num_registers) if c != -1 else 'red' for c in node_colors],
node_size=1000,
font_size=10,
font_weight='bold',
edge_color='gray'
)

# Spill cost annotation
for node, (x, y) in pos.items():
plt.text(
x, y+0.1,
f'Spill Cost: {self.spill_cost[node]:.2f}',
horizontalalignment='center',
fontsize=8
)

plt.title('Advanced Interference Graph with Register Allocation')
plt.tight_layout()
plt.show()

# Simulate a more complex compiler scenario
# Larger interference graph with multiple conflict regions
interference_graph = {
'temp1': ['var_a', 'var_b', 'loop_counter'],
'temp2': ['var_c', 'var_d', 'array_index'],
'var_a': ['temp1', 'var_b', 'result'],
'var_b': ['temp1', 'var_a', 'loop_counter'],
'var_c': ['temp2', 'var_d', 'array_length'],
'var_d': ['temp2', 'var_c', 'result'],
'loop_counter': ['temp1', 'var_b', 'array_index'],
'array_index': ['temp2', 'loop_counter', 'array_length'],
'result': ['var_a', 'var_d', 'array_length'],
'array_length': ['var_c', 'array_index', 'result']
}

# Create advanced register allocator
np.random.seed(42) # For reproducibility
allocator = AdvancedRegisterAllocator(interference_graph)

# Attempt register allocation with different register constraints
print("Register Allocation Simulation:\n")

# Try with 4 registers
print("Scenario 1: 4 Registers")
result_4 = allocator.advanced_color(4)
print("Register Allocation:")
for var, reg in result_4['allocation'].items():
print(f"Variable {var}: Register {reg}")

print("\nSpilled Variables:")
for var, cost in result_4['spilled']:
print(f"Variable: {var}, Spill Cost: {cost:.2f}")

# Visualize the graph
allocator.visualize_graph()

Advanced Register Allocation Problem

Key Enhancements

  1. Complex Interference Graph

    • More variables with intricate conflict relationships
    • Simulates a real compiler scenario with multiple computation regions
  2. Spill Cost Calculation
    Mathematical model for spill cost:

    $\text{Spill Cost} = \frac{\text{Degree} \cdot \text{Loop Frequency}}{\text{Live Range}}$

  3. Advanced Allocation Strategy

    • Prioritize variables based on spill cost
    • Handle variables that cannot be allocated to registers
    • Provide detailed allocation and spill information

Complexity Improvements

  • Dynamic spill cost calculation
  • Prioritization of variable allocation
  • Visualization of allocation and conflict regions

Scenario Breakdown

  1. Input: Complex interference graph with $10$ variables
  2. Process:
    • Calculate individual spill costs
    • Attempt register allocation
    • Track unallocated (spilled) variables
  3. Output:
    • Register mapping
    • Spilled variable details
    • Graph visualization

Result

Register Allocation Simulation:

Scenario 1: 4 Registers
Register Allocation:
Variable loop_counter: Register 0
Variable array_length: Register 0
Variable var_a: Register 0
Variable array_index: Register 1
Variable temp2: Register 0
Variable var_c: Register 1
Variable result: Register 1
Variable temp1: Register 1
Variable var_b: Register 2
Variable var_d: Register 2

Practical Implications

  • Mimics real-world compiler register allocation challenges
  • Demonstrates handling of variable conflicts
  • Provides insights into optimization strategies

Job Shop Scheduling Optimization

Python Implementation with Visualization and Analysis

I’ll create a practical scheduling problem, solve it using $Python$, and visualize the results with graphs.

I’ll include mathematical formulas and provide a clear explanation of the code.

Scheduling Problem: Job Shop Scheduling

Let’s solve a job shop scheduling problem, which is a common scheduling challenge in manufacturing and operations research.

Problem Definition

We have:

  • 5 jobs to be processed
  • 4 machines
  • Each job must go through all machines in a specific order
  • Each job has different processing times on each machine
  • The goal is to minimize the total completion time (makespan)

Mathematically, we want to minimize:
$$ C_{max} = \max_{j \in \text{Jobs}} C_j $$

Where $C_j$ is the completion time of job $j$.

Solution Using Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import random
from collections import defaultdict

# Set seed for reproducibility
np.random.seed(42)

# Problem Setup
num_jobs = 5
num_machines = 4

# Generate processing times for each job on each machine
processing_times = np.random.randint(1, 10, size=(num_jobs, num_machines))

# Generate machine order for each job (each job must visit all machines in a specific order)
machine_orders = []
for _ in range(num_jobs):
order = list(range(num_machines))
random.shuffle(order)
machine_orders.append(order)

# Display the problem data
print("Processing Times:")
processing_df = pd.DataFrame(processing_times,
columns=[f"Machine {i}" for i in range(num_machines)],
index=[f"Job {i}" for i in range(num_jobs)])
print(processing_df)

print("\nMachine Orders:")
for i, order in enumerate(machine_orders):
print(f"Job {i}: {' -> '.join(['Machine ' + str(m) for m in order])}")

# Priority Rule-Based Scheduling: Shortest Processing Time (SPT)
def apply_spt_scheduling():
# Initialize machine and job status
machine_available_time = [0] * num_machines
job_completion = [[0] * num_machines for _ in range(num_jobs)]
job_current_step = [0] * num_jobs
job_completion_time = [0] * num_jobs

# Schedule tracking
schedule = defaultdict(list) # machine -> [(job, start_time, end_time), ...]

# Continue until all jobs are completed
while min(job_current_step) < num_machines:
# Find eligible operations (jobs that haven't completed all steps)
eligible_ops = [(j, job_current_step[j]) for j in range(num_jobs) if job_current_step[j] < num_machines]

if not eligible_ops:
break

# For each eligible operation, calculate when it can start
op_details = []
for job, step in eligible_ops:
machine = machine_orders[job][step]
prev_step_completion = 0 if step == 0 else job_completion[job][step-1]
possible_start_time = max(machine_available_time[machine], prev_step_completion)
processing_time = processing_times[job][machine]
op_details.append((job, machine, step, possible_start_time, processing_time))

# Sort by possible start time + processing time (SPT)
op_details.sort(key=lambda x: x[3] + x[4])

# Schedule the first operation
job, machine, step, start_time, process_time = op_details[0]
end_time = start_time + process_time

# Update tracking information
machine_available_time[machine] = end_time
job_completion[job][step] = end_time
job_current_step[job] += 1

if job_current_step[job] == num_machines:
job_completion_time[job] = end_time

# Record the scheduled operation
schedule[machine].append((job, start_time, end_time))

makespan = max(job_completion_time)
return schedule, makespan, job_completion_time

# Apply the SPT scheduling algorithm
schedule, makespan, job_completion_times = apply_spt_scheduling()

print("\nSchedule Results:")
print(f"Makespan: {makespan}")
print("Job Completion Times:", job_completion_times)

# Visualization
def visualize_schedule(schedule, makespan):
fig, ax = plt.subplots(figsize=(12, 6))

# Define colors for jobs
colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99', '#c2c2f0']

# Plot each machine's schedule
for machine_id in range(num_machines):
y_pos = machine_id
for job, start, end in schedule[machine_id]:
width = end - start
rect = patches.Rectangle((start, y_pos-0.4), width, 0.8,
edgecolor='black', facecolor=colors[job], alpha=0.7)
ax.add_patch(rect)

# Add job label in the center of the rectangle
ax.text(start + width/2, y_pos, f'J{job}',
ha='center', va='center', fontweight='bold')

# Set plot limits and labels
ax.set_xlim(0, makespan + 1)
ax.set_ylim(-0.5, num_machines - 0.5)
ax.set_yticks(range(num_machines))
ax.set_yticklabels([f'Machine {i}' for i in range(num_machines)])
ax.set_xlabel('Time')
ax.set_title('Job Shop Schedule')

# Add gridlines
ax.grid(True, axis='x', linestyle='--', alpha=0.7)

plt.tight_layout()
return fig

# Create the visualization
schedule_fig = visualize_schedule(schedule, makespan)

# Gantt chart for job completion
def visualize_job_completion(schedule, job_completion_times):
# Reconstruct job timelines
job_timelines = defaultdict(list)

for machine, operations in schedule.items():
for job, start, end in operations:
job_timelines[job].append((machine, start, end))

# Sort each job's operations by start time
for job in job_timelines:
job_timelines[job].sort(key=lambda x: x[1])

# Create the visualization
fig, ax = plt.subplots(figsize=(12, 6))

# Define colors for machines
machine_colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99']

# Plot each job's timeline
for job_id in range(num_jobs):
y_pos = job_id
for machine, start, end in job_timelines[job_id]:
width = end - start
rect = patches.Rectangle((start, y_pos-0.4), width, 0.8,
edgecolor='black', facecolor=machine_colors[machine], alpha=0.7)
ax.add_patch(rect)

# Add machine label in the center of the rectangle
ax.text(start + width/2, y_pos, f'M{machine}',
ha='center', va='center', fontweight='bold')

# Set plot limits and labels
ax.set_xlim(0, max(job_completion_times) + 1)
ax.set_ylim(-0.5, num_jobs - 0.5)
ax.set_yticks(range(num_jobs))
ax.set_yticklabels([f'Job {i}' for i in range(num_jobs)])
ax.set_xlabel('Time')
ax.set_title('Job Completion Timeline')

# Add gridlines
ax.grid(True, axis='x', linestyle='--', alpha=0.7)

plt.tight_layout()
return fig

# Create the job completion visualization
job_completion_fig = visualize_job_completion(schedule, job_completion_times)

# Compare with another scheduling rule: Longest Processing Time (LPT)
def apply_lpt_scheduling():
# Similar implementation to SPT but sorting by longest processing time
machine_available_time = [0] * num_machines
job_completion = [[0] * num_machines for _ in range(num_jobs)]
job_current_step = [0] * num_jobs
job_completion_time = [0] * num_jobs

schedule = defaultdict(list)

while min(job_current_step) < num_machines:
eligible_ops = [(j, job_current_step[j]) for j in range(num_jobs) if job_current_step[j] < num_machines]

if not eligible_ops:
break

op_details = []
for job, step in eligible_ops:
machine = machine_orders[job][step]
prev_step_completion = 0 if step == 0 else job_completion[job][step-1]
possible_start_time = max(machine_available_time[machine], prev_step_completion)
processing_time = processing_times[job][machine]
op_details.append((job, machine, step, possible_start_time, processing_time))

# Sort by negative processing time (LPT)
op_details.sort(key=lambda x: -(x[4]))

job, machine, step, start_time, process_time = op_details[0]
end_time = start_time + process_time

machine_available_time[machine] = end_time
job_completion[job][step] = end_time
job_current_step[job] += 1

if job_current_step[job] == num_machines:
job_completion_time[job] = end_time

schedule[machine].append((job, start_time, end_time))

makespan = max(job_completion_time)
return schedule, makespan, job_completion_time

# Apply LPT scheduling
lpt_schedule, lpt_makespan, lpt_job_completion_times = apply_lpt_scheduling()

# Compare results
print("\nComparison of Scheduling Rules:")
print(f"SPT Makespan: {makespan}")
print(f"LPT Makespan: {lpt_makespan}")

# Plot comparison
def plot_comparison(spt_makespan, lpt_makespan):
labels = ['SPT', 'LPT']
values = [spt_makespan, lpt_makespan]

fig, ax = plt.subplots(figsize=(8, 5))
bars = ax.bar(labels, values, color=['#66b3ff', '#ff9999'])

ax.set_ylabel('Makespan')
ax.set_title('Comparison of Scheduling Rules')

# Add value labels on top of bars
for bar in bars:
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width()/2., height + 0.1,
f'{height}', ha='center', va='bottom')

ax.grid(axis='y', linestyle='--', alpha=0.7)

plt.tight_layout()
return fig

comparison_fig = plot_comparison(makespan, lpt_makespan)

plt.show()

Explanation of the Solution

Problem Setup

I’ve created a job shop scheduling problem with 5 jobs and 4 machines. Each job has:

  1. A specific processing time for each machine
  2. A required sequence through the machines

The mathematical objective is to minimize the makespan ($C_{max}$), which is the time when all jobs are completed:

$$ C_{max} = \max_{j \in \text{Jobs}} C_j $$

Algorithm Implementation

The solution uses two common scheduling heuristics:

  1. Shortest Processing Time (SPT): Prioritizes operations with shorter processing times
  2. Longest Processing Time (LPT): Prioritizes operations with longer processing times

For each scheduling rule, the algorithm:

  1. Tracks machine availability times
  2. Tracks job completion status
  3. Identifies eligible operations (next steps for jobs)
  4. Selects operations based on the priority rule
  5. Updates the schedule until all jobs are complete

Visualization

The code creates three visualizations:

  1. Machine-oriented Gantt chart: Shows when each machine is processing which job
    • X-axis: Time
    • Y-axis: Machines
    • Colors: Different jobs

  1. Job-oriented Gantt chart: Shows the progress of each job through the machines
    • X-axis: Time
    • Y-axis: Jobs
    • Colors: Different machines

  1. Comparison chart: Compares the makespan results of SPT vs LPT scheduling rules

Results Analysis

[Output]

Processing Times:
       Machine 0  Machine 1  Machine 2  Machine 3
Job 0          7          4          8          5
Job 1          7          3          7          8
Job 2          5          4          8          8
Job 3          3          6          5          2
Job 4          8          6          2          5

Machine Orders:
Job 0: Machine 0 -> Machine 3 -> Machine 2 -> Machine 1
Job 1: Machine 0 -> Machine 1 -> Machine 3 -> Machine 2
Job 2: Machine 2 -> Machine 0 -> Machine 3 -> Machine 1
Job 3: Machine 0 -> Machine 3 -> Machine 1 -> Machine 2
Job 4: Machine 3 -> Machine 2 -> Machine 0 -> Machine 1

Schedule Results:
Makespan: 40
Job Completion Times: [np.int64(29), np.int64(40), np.int64(33), np.int64(17), np.int64(39)]

Comparison of Scheduling Rules:
SPT Makespan: 40

When you run this code, you’ll see:

  • The problem data (processing times and machine orders)
  • The resulting schedule and makespan for each scheduling rule
  • Visualizations that make it easy to understand the schedules

The SPT rule often leads to better overall throughput in many scheduling scenarios, but the LPT rule can sometimes perform better when there are significant differences in processing times.

The comparison chart clearly shows which rule performs better for the specific problem instance.

Key Insights

This implementation demonstrates several important scheduling concepts:

  • Priority rule-based scheduling heuristics
  • Resource constraints (machines can only process one job at a time)
  • Precedence constraints (operations within a job must follow a specific order)
  • Visualization techniques for schedule analysis

This type of scheduling problem appears in manufacturing, project management, computing resource allocation, and many other practical domains.

Graph Coloring Optimization

ILP vs. Heuristic Approaches

I’ll create a different discrete optimization example.
Let’s explore the Graph Coloring Problem, which has applications in scheduling, register allocation, and resource assignment.

Graph Coloring Problem

The Graph Coloring Problem involves assigning colors to vertices of a graph such that no adjacent vertices share the same color, while minimizing the total number of colors used.

Mathematical Formulation

Let’s define:

  • $G = (V, E)$: An undirected graph with vertices $V$ and edges $E$
  • $C$: Set of possible colors
  • $x_{iv}$: Binary variable (1 if vertex $i$ is assigned color $v$, 0 otherwise)
  • $y_v$: Binary variable (1 if color $v$ is used, 0 otherwise)

The objective function is:
$\min \sum_{v \in C} y_v$

Subject to:

  • Each vertex must be assigned exactly one color: $\sum_{v \in C} x_{iv} = 1, \forall i \in V$
  • Adjacent vertices must have different colors: $x_{iv} + x_{jv} \leq 1, \forall (i,j) \in E, \forall v \in C$
  • If color $v$ is assigned to any vertex, then $y_v = 1$: $x_{iv} \leq y_v, \forall i \in V, \forall v \in C$
  • All variables are binary: $x_{iv}, y_v \in {0, 1}$
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
import pulp as pl
from itertools import combinations
import matplotlib.cm as cm
import matplotlib.colors as mcolors

# Set random seed for reproducibility
np.random.seed(42)

# Parameters
num_vertices = 20
edge_probability = 0.2 # Probability of an edge between any two vertices

# Generate a random graph
G = nx.gnp_random_graph(num_vertices, edge_probability)

# We can also add some additional edges to make it more challenging
# Add edges between some specific nodes to create more constraints
additional_edges = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)]
for u, v in additional_edges:
if u < num_vertices and v < num_vertices and not G.has_edge(u, v):
G.add_edge(u, v)

# Print basic graph statistics
print(f"Graph with {G.number_of_nodes()} vertices and {G.number_of_edges()} edges")
print(f"Average degree: {2 * G.number_of_edges() / G.number_of_nodes():.2f}")

# Upper bound on the number of colors needed (worst case)
max_colors = min(num_vertices, max(dict(G.degree()).values()) + 1)
print(f"Maximum number of colors needed (worst case): {max_colors}")

# Create the optimization model
model = pl.LpProblem("Graph_Coloring", pl.LpMinimize)

# Set of colors (using indices)
colors = range(max_colors)

# Decision variables
# x[i, v] = 1 if vertex i is assigned color v, 0 otherwise
x = {(i, v): pl.LpVariable(f"x_{i}_{v}", cat='Binary')
for i in G.nodes() for v in colors}

# y[v] = 1 if color v is used, 0 otherwise
y = {v: pl.LpVariable(f"y_{v}", cat='Binary') for v in colors}

# Objective function: minimize the number of colors used
model += pl.lpSum(y[v] for v in colors)

# Constraint: each vertex must be assigned exactly one color
for i in G.nodes():
model += pl.lpSum(x[(i, v)] for v in colors) == 1

# Constraint: adjacent vertices must have different colors
for (i, j) in G.edges():
for v in colors:
model += x[(i, v)] + x[(j, v)] <= 1

# Constraint: if a vertex is assigned a color, that color is used
for i in G.nodes():
for v in colors:
model += x[(i, v)] <= y[v]

# Constraint: ensure colors are used in order (symmetry breaking)
for v in range(1, max_colors):
model += y[v] <= y[v-1]

# Solve the model
model.solve(pl.PULP_CBC_CMD(msg=False))

# Extract results
vertex_colors = {}
colors_used = set()

for i in G.nodes():
for v in colors:
if pl.value(x[(i, v)]) > 0.5:
vertex_colors[i] = v
colors_used.add(v)

num_colors_used = len(colors_used)

print(f"Optimal solution status: {pl.LpStatus[model.status]}")
print(f"Minimum number of colors needed: {num_colors_used}")
print("Vertex coloring:")
for i, color in sorted(vertex_colors.items()):
print(f"Vertex {i}: Color {color}")

# Compare with greedy coloring
greedy_coloring = nx.greedy_color(G)
num_greedy_colors = max(greedy_coloring.values()) + 1
print(f"Number of colors with greedy algorithm: {num_greedy_colors}")

# Heuristic: DSatur algorithm
dsatur_coloring = nx.coloring.greedy_color(G, strategy="saturation_largest_first")
num_dsatur_colors = max(dsatur_coloring.values()) + 1
print(f"Number of colors with DSatur algorithm: {num_dsatur_colors}")

# Visualization of the colored graph
plt.figure(figsize=(12, 10))

# Choose a color map for visualization
color_names = list(mcolors.TABLEAU_COLORS.keys())
selected_colors = color_names[:max(num_colors_used, num_greedy_colors, num_dsatur_colors)]

# Create position layout for nodes
pos = nx.spring_layout(G, seed=42)

# Create a subplot for the optimal solution
plt.subplot(2, 2, 1)
node_colors_optimal = [vertex_colors[i] for i in G.nodes()]
color_map_optimal = [selected_colors[c] for c in node_colors_optimal]

nx.draw(G, pos, with_labels=True, node_color=color_map_optimal,
node_size=500, font_color="white", font_weight="bold",
edge_color="gray", width=2, alpha=0.9)

plt.title(f"Optimal Coloring: {num_colors_used} Colors")

# Create a subplot for the greedy solution
plt.subplot(2, 2, 2)
node_colors_greedy = [greedy_coloring[i] for i in G.nodes()]
color_map_greedy = [selected_colors[c] for c in node_colors_greedy]

nx.draw(G, pos, with_labels=True, node_color=color_map_greedy,
node_size=500, font_color="white", font_weight="bold",
edge_color="gray", width=2, alpha=0.9)

plt.title(f"Greedy Coloring: {num_greedy_colors} Colors")

# Create a subplot for the DSatur solution
plt.subplot(2, 2, 3)
node_colors_dsatur = [dsatur_coloring[i] for i in G.nodes()]
color_map_dsatur = [selected_colors[c] for c in node_colors_dsatur]

nx.draw(G, pos, with_labels=True, node_color=color_map_dsatur,
node_size=500, font_color="white", font_weight="bold",
edge_color="gray", width=2, alpha=0.9)

plt.title(f"DSatur Coloring: {num_dsatur_colors} Colors")

# Generate a legend for the colors
plt.subplot(2, 2, 4)
plt.axis('off')
for i in range(max(num_colors_used, num_greedy_colors, num_dsatur_colors)):
plt.plot([0], [0], 'o', color=selected_colors[i], label=f'Color {i}')
plt.legend(loc='center', fontsize=12)
plt.title("Color Legend")

plt.tight_layout()

# Algorithm performance comparison
plt.figure(figsize=(10, 6))
methods = ['Optimal', 'Greedy', 'DSatur']
colors_count = [num_colors_used, num_greedy_colors, num_dsatur_colors]

bars = plt.bar(methods, colors_count, color=['green', 'blue', 'orange'])
plt.ylabel('Number of Colors Used')
plt.title('Comparison of Graph Coloring Algorithms')
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Add value labels on top of bars
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height + 0.05,
f'{height}', ha='center', va='bottom')

# Graph statistics visualization
plt.figure(figsize=(12, 6))

# Calculate degree distribution
degree_sequence = sorted([d for n, d in G.degree()], reverse=True)
degree_count = {}
for d in degree_sequence:
if d in degree_count:
degree_count[d] += 1
else:
degree_count[d] = 1

plt.subplot(1, 2, 1)
plt.bar(degree_count.keys(), degree_count.values(), color='skyblue')
plt.title('Degree Distribution')
plt.xlabel('Degree')
plt.ylabel('Count')
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Create conflict matrix visualization
plt.subplot(1, 2, 2)
adj_matrix = nx.to_numpy_array(G)
plt.imshow(adj_matrix, cmap='Blues', interpolation='none')
plt.colorbar(label='Connection')
plt.title('Adjacency Matrix')
plt.xlabel('Vertex')
plt.ylabel('Vertex')

plt.tight_layout()

# Chromatic number analysis based on graph density
plt.figure(figsize=(10, 6))

# Calculate graph density
density = nx.density(G)
print(f"Graph density: {density:.3f}")

# Theoretical bounds
vertices = G.number_of_nodes()
edges = G.number_of_edges()
max_degree = max(dict(G.degree()).values())
brooks_bound = max(2, min(vertices, max_degree))

# Generate points for the theoretical bounds
densities = np.linspace(0, 1, 100)
clique_bounds = [max(2, int(vertices * d) + 1) for d in densities] # Rough estimate based on clique number

plt.plot(densities, clique_bounds, 'r--', label='Theoretical upper bound')
plt.scatter([density], [num_colors_used], color='green', s=100, label='Optimal solution')
plt.scatter([density], [num_greedy_colors], color='blue', s=100, label='Greedy algorithm')
plt.scatter([density], [num_dsatur_colors], color='orange', s=100, label='DSatur algorithm')
plt.scatter([density], [brooks_bound], color='purple', s=100, label="Brooks' bound")

plt.axvline(x=density, color='gray', linestyle='--', alpha=0.5)
plt.text(density+0.02, 2, f'Current density: {density:.3f}', rotation=90, alpha=0.7)

plt.xlabel('Graph Density')
plt.ylabel('Chromatic Number (Colors Required)')
plt.title('Relationship Between Graph Density and Chromatic Number')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Code Explanation

Let me walk through the key components of this solution:

  1. Problem Setup:

    • We generate a random graph with a specified number of vertices and edge probability
    • The goal is to color the vertices such that no adjacent vertices share the same color, while minimizing the total number of colors used
  2. Mathematical Model:

    • We use PuLP to formulate a binary integer programming model
    • Decision variables represent the assignment of colors to vertices and whether each color is used
    • The objective function minimizes the total number of colors used
    • Constraints ensure:
      • Each vertex gets exactly one color
      • Adjacent vertices have different colors
      • Colors are used in sequence (to break symmetry)
  3. Solution Approaches:

    • Optimal method: Integer Linear Programming using PuLP
    • Greedy method: A simple sequential coloring heuristic
    • DSatur algorithm: A more sophisticated heuristic that prioritizes vertices with the highest saturation degree
  4. Visualizations:

    • Colored graph visualizations showing the optimal and heuristic solutions
    • A comparison chart of the number of colors used by different algorithms
    • Degree distribution and adjacency matrix to understand the graph structure
    • An analysis of the relationship between graph density and chromatic number

Results

Graph with 20 vertices and 42 edges
Average degree: 4.20
Maximum number of colors needed (worst case): 9
Optimal solution status: Optimal
Minimum number of colors needed: 4
Vertex coloring:
Vertex 0: Color 0
Vertex 1: Color 2
Vertex 2: Color 1
Vertex 3: Color 2
Vertex 4: Color 1
Vertex 5: Color 1
Vertex 6: Color 3
Vertex 7: Color 0
Vertex 8: Color 0
Vertex 9: Color 3
Vertex 10: Color 2
Vertex 11: Color 1
Vertex 12: Color 0
Vertex 13: Color 2
Vertex 14: Color 0
Vertex 15: Color 3
Vertex 16: Color 1
Vertex 17: Color 1
Vertex 18: Color 2
Vertex 19: Color 1
Number of colors with greedy algorithm: 4
Number of colors with DSatur algorithm: 4
Graph density: 0.221




Results Interpretation

The solution provides several key insights:

  1. Optimal Coloring: The integer programming approach finds the minimum number of colors needed (chromatic number of the graph).

  2. Heuristic Performance: We can see how well the greedy and DSatur heuristics perform compared to the optimal solution.
    For many graphs, DSatur performs very well with much less computational effort.

  3. Graph Structure Impact: The degree distribution, density, and connectivity patterns of the graph directly affect how many colors are needed.

  4. Theoretical Bounds: We compare the results with Brooks’ bound (maximum degree + $1$) and density-based estimates.

Real-World Applications

The Graph Coloring Problem has numerous practical applications:

  1. Scheduling: Assigning time slots to events, classes, or exams to avoid conflicts.

  2. Frequency Assignment: Allocating radio frequencies to transmitters to avoid interference.

  3. Register Allocation: Assigning variables to CPU registers in compiler optimization.

  4. Map Coloring: Coloring geographical maps such that no adjacent regions share the same color.

  5. Task Assignment: Allocating incompatible tasks to different time slots or resources.

  6. Circuit Board Testing: Minimizing the number of tests needed for a printed circuit board.

  7. Sports Scheduling: Creating tournament schedules where teams must play against each other.

This problem demonstrates how discrete optimization can help solve complex allocation problems where resources must be assigned while respecting conflict constraints.

The comparisons between exact and heuristic methods illustrate the trade-offs between solution quality and computational efficiency that are typical in many real-world optimization scenarios.

Quantum Harmonic Oscillator

Numerical Simulation and Visualization with Python

I’ll solve a practical quantum mechanics problem using $Python$ in Google Colab, including clear explanations and visualization.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
import numpy as np
import matplotlib.pyplot as plt
from scipy import sparse
from scipy.linalg import expm
from scipy.special import hermite, factorial

# Set plotting parameters for better visualization
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['font.size'] = 14

# Physical constants
hbar = 1.0 # Reduced Planck constant (using natural units)
m = 1.0 # Mass of the particle
omega = 1.0 # Angular frequency of the oscillator

# Simulation parameters
n_points = 1000 # Number of position points for discretization
x_range = 10.0 # Position range for simulation
dx = 2.0 * x_range / n_points # Position step size
x = np.linspace(-x_range, x_range, n_points) # Position array

# Create potential energy function (harmonic oscillator: V(x) = 0.5 * m * omega^2 * x^2)
V = 0.5 * m * omega**2 * x**2

# Create second derivative operator (kinetic energy part)
diag = np.ones(n_points) * (2.0 / dx**2)
off_diag = np.ones(n_points-1) * (-1.0 / dx**2)
D2 = sparse.diags([off_diag, diag, off_diag], [-1, 0, 1])

# Create Hamiltonian operator: H = -hbar^2/(2m) * d^2/dx^2 + V(x)
H = sparse.diags(V) - (hbar**2 / (2.0 * m)) * D2

# Find eigenvalues and eigenvectors of the Hamiltonian
eigenvalues, eigenvectors = sparse.linalg.eigs(H, k=10, which='SM') # Find lowest 10 energy states
idx = eigenvalues.argsort() # Sort by energy
eigenvalues = eigenvalues[idx].real
eigenvectors = eigenvectors[:, idx]

# Normalize eigenvectors
for i in range(len(eigenvalues)):
eigenvectors[:, i] = eigenvectors[:, i] / np.sqrt(np.sum(np.abs(eigenvectors[:, i])**2) * dx)

# Analytical solutions for comparison
def analytical_wavefunction(n, x):
"""
Calculate the analytical wavefunction for quantum harmonic oscillator
for quantum number n at position x
"""
# Constants for the harmonic oscillator
alpha = m * omega / hbar

# Normalization constant
normalization = (alpha / np.pi)**0.25 * (1.0 / np.sqrt(2**n * factorial(n)))

# Hermite polynomial
herm = hermite(n)(np.sqrt(alpha) * x)

# Full wavefunction
psi = normalization * herm * np.exp(-alpha * x**2 / 2.0)

return psi

# Calculate analytical energy levels
def analytical_energy(n):
"""Calculate analytical energy for quantum harmonic oscillator with quantum number n"""
return hbar * omega * (n + 0.5)

# Compute analytical solutions
analytical_energies = [analytical_energy(n) for n in range(10)]
analytical_wavefunctions = [analytical_wavefunction(n, x) for n in range(10)]

# Plot and compare numerical and analytical solutions
fig, axs = plt.subplots(5, 2, figsize=(15, 20))
axs = axs.flatten()

for i in range(10):
axs[i].plot(x, eigenvectors[:, i].real, 'b-', label=f'Numerical ψ{i}')
axs[i].plot(x, analytical_wavefunctions[i], 'r--', label=f'Analytical ψ{i}')
axs[i].axhline(y=0, color='k', linestyle='-', alpha=0.3)
axs[i].set_title(f'Energy Level {i}: E_numerical = {eigenvalues[i]:.5f}, E_analytical = {analytical_energies[i]:.5f}')
axs[i].set_xlabel('Position (x)')
axs[i].set_ylabel('Wave Function (ψ)')
axs[i].legend()
axs[i].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Time evolution of a wavepacket
# Create a Gaussian wavepacket as initial state (superposition of ground and first excited states)
psi_0 = 0.7 * eigenvectors[:, 0] + 0.7 * eigenvectors[:, 1]
psi_0 = psi_0 / np.sqrt(np.sum(np.abs(psi_0)**2) * dx) # Normalize

# Time parameters
t_total = 10.0 # Total simulation time
n_steps = 100 # Number of time steps
dt = t_total / n_steps # Time step

# Store wavefunction at different times
psi_t = np.zeros((n_steps, n_points), dtype=complex)
psi_t[0] = psi_0

# Time evolution operator
def time_evolution(psi, H, dt):
"""Time evolution using the operator exp(-i*H*dt/hbar)"""
# We use the fact that eigenvectors of H provide a basis for expansion
coeffs = np.zeros(10, dtype=complex)
for i in range(10):
coeffs[i] = np.sum(np.conj(eigenvectors[:, i]) * psi) * dx

# Evolve each component and recombine
psi_evolved = np.zeros_like(psi, dtype=complex)
for i in range(10):
psi_evolved += coeffs[i] * np.exp(-1j * eigenvalues[i] * dt / hbar) * eigenvectors[:, i]

return psi_evolved

# Perform time evolution
for t in range(1, n_steps):
psi_t[t] = time_evolution(psi_t[t-1], H, dt)

# Probability densities for animation
prob_densities = np.abs(psi_t)**2

# Plot time evolution
fig, ax = plt.subplots(figsize=(12, 8))

# Plot potential energy curve (scaled for visibility)
V_scaled = V / np.max(V) * np.max(prob_densities) * 0.8
ax.plot(x, V_scaled, 'k-', alpha=0.3, label='Potential (scaled)')

# Select specific timesteps to plot
time_points = [0, 25, 50, 75, 99]
colors = ['b', 'g', 'r', 'c', 'm']

for i, t in enumerate(time_points):
ax.plot(x, prob_densities[t], colors[i], label=f't = {t*dt:.2f}')

ax.set_xlabel('Position (x)')
ax.set_ylabel('Probability Density |ψ(x,t)|²')
ax.set_title('Time Evolution of Quantum Harmonic Oscillator Wavepacket')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Create an energy level diagram
n_levels = 10
fig, ax = plt.subplots(figsize=(8, 10))

# Plot energy levels
for n in range(n_levels):
E = analytical_energy(n)
ax.plot([-0.5, 0.5], [E, E], 'b-', linewidth=2)
ax.text(0.6, E, f'n = {n}, E = {E:.2f}ħω', va='center')

# Add potential well visualization
x_pot = np.linspace(-0.4, 0.4, 100)
V_pot = 0.5 * m * omega**2 * x_pot**2
ax.plot(x_pot, V_pot, 'r--', label='V(x) = ½mω²x²')

ax.set_xlim(-1, 1.5)
ax.set_ylim(-0.5, analytical_energy(n_levels-1) + 1)
ax.set_ylabel('Energy (in units of ħω)')
ax.set_title('Energy Levels of Quantum Harmonic Oscillator')
ax.set_xticks([])
ax.legend(loc='upper right')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Quantum Harmonic Oscillator Simulation

In this example, I’ll solve the quantum harmonic oscillator problem, which is one of the most important systems in quantum mechanics.

It models a particle moving in a parabolic potential well, like a mass on a spring at the quantum scale.

Theoretical Background

The quantum harmonic oscillator is described by the Hamiltonian:

$$\hat{H} = -\frac{\hbar^2}{2m}\frac{d^2}{dx^2} + \frac{1}{2}m\omega^2 x^2$$

Where:

  • $\hbar$ is the reduced Planck constant
  • $m$ is the mass of the particle
  • $\omega$ is the angular frequency of the oscillator
  • $x$ is the position

The energy eigenvalues are given by:

$$E_n = \hbar\omega(n + \frac{1}{2})$$

And the wavefunctions are:

$$\psi_n(x) = \left(\frac{\alpha}{\pi}\right)^{1/4}\frac{1}{\sqrt{2^n n!}}H_n(\sqrt{\alpha}x)e^{-\alpha x^2/2}$$

Where:

  • $\alpha = \frac{m\omega}{\hbar}$
  • $H_n$ are the Hermite polynomials
  • $n$ is the quantum number $(0, 1, 2, …)$

Code Explanation

The code solves this problem using two methods:

  1. Numerical solution: Using a discretized position grid and sparse matrix techniques
  2. Analytical solution: Calculating the exact expressions for comparison

Key Components:

  1. Set up the position grid and potential:

    • Create a discretized position space
    • Define the harmonic potential $V(x) = \frac{1}{2}m\omega^2 x^2$
  2. Create the Hamiltonian matrix:

    • The kinetic energy term uses a finite difference approximation for the second derivative
    • The potential energy is added as a diagonal matrix
  3. Find eigenvalues and eigenvectors:

    • Compute the lowest $10$ energy states
    • Sort them by energy
  4. Compare with analytical solutions:

    • Calculate the exact energy levels $E_n = \hbar\omega(n + \frac{1}{2})$
    • Calculate the exact wavefunctions using Hermite polynomials
  5. Simulate time evolution:

    • Create a wavepacket as a superposition of ground and first excited states
    • Evolve it through time and visualize the probability density

Results and Visualization

The code produces three main visualizations:

  1. Wavefunction Comparison:
    The first plot shows the first $10$ energy eigenstates, comparing the numerical solutions (blue solid lines) with the analytical solutions (red dashed lines).
    The close match verifies our numerical method is working correctly.

  1. Time Evolution:
    The second plot shows the time evolution of a wavepacket (initially a superposition of ground and first excited states).
    The probability density is shown at five different times, demonstrating the oscillatory behavior of the wavepacket.

  1. Energy Level Diagram:
    The final plot shows the quantized energy levels of the harmonic oscillator.
    Notice how they are equally spaced by $\hbar\omega$, which is a key characteristic of the quantum harmonic oscillator.

Physical Interpretation

  1. Quantization of Energy: Unlike a classical oscillator that can have any energy, the quantum harmonic oscillator can only have discrete energy values.

  2. Zero-Point Energy: Even in the ground state (n=0), the energy is $E_0 = \frac{1}{2}\hbar\omega$, not zero. This represents the quantum mechanical uncertainty principle at work.

  3. Probability Distribution: The wavefunction squared gives the probability density of finding the particle at a particular position.
    For the ground state, this is concentrated around the center, but for higher energy states, there are nodes where the probability is zero.

  4. Time Evolution: The time-dependent wavepacket oscillates back and forth, similar to a classical oscillator, but maintains its quantum nature with the probability spread according to the uncertainty principle.

This simulation provides a clear visualization of these fundamental quantum mechanical concepts using a system that has direct analogies to classical mechanics.

Geometric Brownian Motion

Stock Price Simulation and Probability Analysis

I’ll solve a practical probability theory problem using $Python$, present the mathematical formulas, and visualize the results with graphs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Set random seed for reproducibility
np.random.seed(42)

# Parameters for our stock price model
initial_price = 100
drift = 0.0002 # Daily drift (expected return)
volatility = 0.015 # Daily volatility
trading_days = 252 # Number of trading days in a year
num_simulations = 1000 # Number of simulations to run
time_horizon = 252 # Simulate for 1 year

# Create an array to store all simulation results
all_simulations = np.zeros((num_simulations, time_horizon + 1))
all_simulations[:, 0] = initial_price # Set initial price

# Perform the simulations using Geometric Brownian Motion
for i in range(num_simulations):
# Generate random shocks
random_shocks = np.random.normal(0, 1, time_horizon)

# Calculate daily returns
daily_returns = np.exp(drift - 0.5 * volatility**2 + volatility * random_shocks)

# Calculate price path
price_path = initial_price * np.cumprod(daily_returns)

# Store the simulation
all_simulations[i, 1:] = price_path

# Calculate statistics
final_prices = all_simulations[:, -1]
mean_final_price = np.mean(final_prices)
median_final_price = np.median(final_prices)
std_final_price = np.std(final_prices)

# Calculate probability of ending above certain thresholds
prob_above_initial = np.mean(final_prices > initial_price)
prob_above_110 = np.mean(final_prices > 110)
prob_above_120 = np.mean(final_prices > 120)
prob_below_90 = np.mean(final_prices < 90)

# Calculate confidence intervals
confidence_level = 0.95
lower_percentile = (1 - confidence_level) / 2
upper_percentile = 1 - lower_percentile
confidence_interval = np.percentile(final_prices, [lower_percentile * 100, upper_percentile * 100])

# Plotting
plt.figure(figsize=(15, 10))

# Plot 1: Sample paths
plt.subplot(2, 2, 1)
for i in range(50): # Plot just 50 sample paths for clarity
plt.plot(all_simulations[i], linewidth=0.8, alpha=0.6)
plt.title('Sample Stock Price Paths')
plt.xlabel('Trading Days')
plt.ylabel('Stock Price')
plt.grid(True)

# Plot 2: Histogram of final prices
plt.subplot(2, 2, 2)
sns.histplot(final_prices, kde=True, bins=30)
plt.axvline(initial_price, color='r', linestyle='--', label='Initial Price')
plt.axvline(mean_final_price, color='g', linestyle='-', label='Mean Final Price')
plt.axvline(confidence_interval[0], color='orange', linestyle=':', label=f'{confidence_level*100}% CI Lower')
plt.axvline(confidence_interval[1], color='orange', linestyle=':', label=f'{confidence_level*100}% CI Upper')
plt.title('Distribution of Final Stock Prices')
plt.xlabel('Stock Price')
plt.ylabel('Frequency')
plt.legend()
plt.grid(True)

# Plot 3: Empirical CDF
plt.subplot(2, 2, 3)
x = np.sort(final_prices)
y = np.arange(1, len(x) + 1) / len(x)
plt.plot(x, y)
plt.axvline(initial_price, color='r', linestyle='--', label='Initial Price')
plt.axhline(prob_above_initial, color='b', linestyle='--')
plt.axvline(110, color='g', linestyle='--', label='$110')
plt.axhline(prob_above_110, color='b', linestyle=':')
plt.title('Empirical Cumulative Distribution Function')
plt.xlabel('Stock Price')
plt.ylabel('Probability')
plt.grid(True)
plt.legend()

# Plot 4: Q-Q plot to check for normality of returns
plt.subplot(2, 2, 4)
log_returns = np.log(all_simulations[:, 1:] / all_simulations[:, :-1])
log_returns = log_returns.flatten()
stats.probplot(log_returns, dist="norm", plot=plt)
plt.title('Q-Q Plot of Log Returns')
plt.grid(True)

plt.tight_layout()
plt.savefig('stock_price_simulation.png', dpi=300)
plt.show()

# Display the statistical results
print(f"Initial stock price: ${initial_price:.2f}")
print(f"Mean final price: ${mean_final_price:.2f}")
print(f"Median final price: ${median_final_price:.2f}")
print(f"Standard deviation of final price: ${std_final_price:.2f}")
print(f"95% confidence interval: [${confidence_interval[0]:.2f}, ${confidence_interval[1]:.2f}]")
print(f"Probability of ending above initial price (${initial_price:.2f}): {prob_above_initial:.2%}")
print(f"Probability of ending above $110: {prob_above_110:.2%}")
print(f"Probability of ending above $120: {prob_above_120:.2%}")
print(f"Probability of ending below $90: {prob_below_90:.2%}")

Stock Price Movement Simulation Using Geometric Brownian Motion

In this example, I’ll simulate stock price movements using a probability model called Geometric Brownian Motion (GBM), which is widely used in financial mathematics and forms the foundation of the Black-Scholes option pricing model.

Mathematical Model

The stock price movement follows Geometric Brownian Motion, which can be expressed as:

$$dS_t = \mu S_t dt + \sigma S_t dW_t$$

Where:

  • $S_t$ is the stock price at time $t$
  • $\mu$ is the drift (expected return)
  • $\sigma$ is the volatility
  • $W_t$ is a Wiener process (standard Brownian motion)

The solution to this stochastic differential equation gives us the stock price at time $t$:

$$S_t = S_0 \exp\left(\left(\mu - \frac{\sigma^2}{2}\right)t + \sigma W_t\right)$$

For discrete time simulations, we can use:

$$S_{t+\Delta t} = S_t \exp\left(\left(\mu - \frac{\sigma^2}{2}\right)\Delta t + \sigma \sqrt{\Delta t} Z_t\right)$$

Where $Z_t \sim N(0,1)$ is a standard normal random variable.

Code Explanation

  1. Setup and Parameters:

    • We set initial stock price to $100
    • Drift (µ) of $0.0002$ per day (approximately $5$% annual return)
    • Volatility (σ) of $0.015$ per day (approximately $24$% annualized)
    • Simulating $1,000$ possible price paths over $252$ trading days ($1$ year)
  2. Simulation:

    • We generate random shocks from a normal distribution
    • Calculate daily returns using the GBM formula
    • Compute cumulative price paths starting from the initial price
    • Store all simulations in an array
  3. Statistical Analysis:

    • Calculate mean, median, and standard deviation of final prices
    • Determine probabilities of the price exceeding certain thresholds
    • Calculate 95% confidence intervals
  4. Visualization:

    • Sample price paths to visualize potential trajectories
    • Histogram showing the distribution of final prices
    • Empirical cumulative distribution function (ECDF)
    • Q-Q plot to check if returns follow a normal distribution

Initial stock price: $100.00
Mean final price: $105.10
Median final price: $103.21
Standard deviation of final price: $24.78
95% confidence interval: [$63.76, $159.43]
Probability of ending above initial price ($100.00): 56.10%
Probability of ending above $110: 37.20%
Probability of ending above $120: 24.60%
Probability of ending below $90: 29.10%

Key Insights from the Visualization

  1. Sample Paths:
    The first plot shows $50$ possible price paths, illustrating the random nature of stock movements while maintaining the overall drift.

  2. Final Price Distribution:
    The histogram shows that final prices follow a log-normal distribution (which is expected from GBM).
    The mean final price is typically higher than the median due to the right-skewed nature of the distribution.

  3. Probabilities:
    The ECDF graph shows cumulative probabilities, allowing us to read off values like:

    • Probability of ending above $100 (initial price)
    • Probability of ending above $110
    • Probability of ending below $90
  4. Log Returns Normality:
    The Q-Q plot confirms that log returns closely follow a normal distribution, which is a key assumption of the GBM model.

Practical Applications

This simulation helps answer questions like:

  • What is the probability of a stock gaining more than $10$% in a year?
  • What is the expected range of stock prices after one year (confidence interval)?
  • What is the risk of the stock falling below a certain threshold?

These probability calculations are essential for risk management, option pricing, portfolio optimization, and investment decision-making.

The simulation demonstrates how seemingly complex financial phenomena can be modeled using probability theory and simulated using $Python$, providing actionable insights for investors and risk managers.

Option Pricing Analysis

Implementing the Black-Scholes Model with Python

I’ll provide you with a financial mathematics example and solve it using $Python$, including proper mathematical notation and visualization.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from matplotlib import cm
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D

# Black-Scholes Option Pricing Model
def black_scholes(S, K, T, r, sigma, option='call'):
"""
Calculate Black-Scholes option price for a call or put

Parameters:
S: Current stock price
K: Strike price
T: Time to maturity (in years)
r: Risk-free interest rate
sigma: Volatility
option: 'call' or 'put'

Returns:
Option price
"""
d1 = (np.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))
d2 = d1 - sigma * np.sqrt(T)

if option == 'call':
price = S * stats.norm.cdf(d1) - K * np.exp(-r * T) * stats.norm.cdf(d2)
else:
price = K * np.exp(-r * T) * stats.norm.cdf(-d2) - S * stats.norm.cdf(-d1)

return price

# Example 1: Visualize call option price as a function of stock price and time to maturity
# Parameters
S_range = np.linspace(80, 120, 50) # Stock price range
T_range = np.linspace(0.1, 1, 50) # Time to maturity range
K = 100 # Strike price
r = 0.05 # Risk-free rate
sigma = 0.2 # Volatility

# Create meshgrid
S_mesh, T_mesh = np.meshgrid(S_range, T_range)
option_price = np.zeros_like(S_mesh)

# Calculate option price for each combination
for i in range(len(T_range)):
for j in range(len(S_range)):
option_price[i, j] = black_scholes(S_mesh[i, j], K, T_mesh[i, j], r, sigma, 'call')

# Plot the 3D surface
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(111, projection='3d')
surf = ax.plot_surface(S_mesh, T_mesh, option_price, cmap=cm.coolwarm, alpha=0.8)

# Add labels
ax.set_xlabel('Stock Price ($)')
ax.set_ylabel('Time to Maturity (years)')
ax.set_zlabel('Call Option Price ($)')
ax.set_title('Black-Scholes Call Option Price')

# Add a color bar
fig.colorbar(surf, ax=ax, shrink=0.5, aspect=5)

plt.savefig('black_scholes_3d.png')
plt.show()

# Example 2: Visualize the Greeks (Delta, Gamma, Theta)
def bs_delta(S, K, T, r, sigma, option='call'):
"""Calculate option Delta"""
d1 = (np.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))

if option == 'call':
return stats.norm.cdf(d1)
else:
return stats.norm.cdf(d1) - 1

def bs_gamma(S, K, T, r, sigma):
"""Calculate option Gamma (same for calls and puts)"""
d1 = (np.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))
return stats.norm.pdf(d1) / (S * sigma * np.sqrt(T))

def bs_theta(S, K, T, r, sigma, option='call'):
"""Calculate option Theta"""
d1 = (np.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))
d2 = d1 - sigma * np.sqrt(T)

if option == 'call':
theta = -S * stats.norm.pdf(d1) * sigma / (2 * np.sqrt(T)) - r * K * np.exp(-r * T) * stats.norm.cdf(d2)
else:
theta = -S * stats.norm.pdf(d1) * sigma / (2 * np.sqrt(T)) + r * K * np.exp(-r * T) * stats.norm.cdf(-d2)

return theta / 365 # Convert to daily

# Calculate Greeks for various stock prices
S_range = np.linspace(80, 120, 100)
T = 0.5 # 6 months

delta_call = [bs_delta(S, K, T, r, sigma, 'call') for S in S_range]
delta_put = [bs_delta(S, K, T, r, sigma, 'put') for S in S_range]
gamma = [bs_gamma(S, K, T, r, sigma) for S in S_range]
theta_call = [bs_theta(S, K, T, r, sigma, 'call') for S in S_range]
theta_put = [bs_theta(S, K, T, r, sigma, 'put') for S in S_range]

# Plot the Greeks
fig, axs = plt.subplots(3, 1, figsize=(12, 15))

# Delta
axs[0].plot(S_range, delta_call, 'b-', label='Call Delta')
axs[0].plot(S_range, delta_put, 'r-', label='Put Delta')
axs[0].axhline(y=0, color='k', linestyle='-', alpha=0.3)
axs[0].axvline(x=K, color='k', linestyle='--', alpha=0.3)
axs[0].set_xlabel('Stock Price ($)')
axs[0].set_ylabel('Delta')
axs[0].set_title('Option Delta vs Stock Price')
axs[0].grid(True)
axs[0].legend()

# Gamma
axs[1].plot(S_range, gamma, 'g-')
axs[1].axhline(y=0, color='k', linestyle='-', alpha=0.3)
axs[1].axvline(x=K, color='k', linestyle='--', alpha=0.3)
axs[1].set_xlabel('Stock Price ($)')
axs[1].set_ylabel('Gamma')
axs[1].set_title('Option Gamma vs Stock Price')
axs[1].grid(True)

# Theta
axs[2].plot(S_range, theta_call, 'b-', label='Call Theta')
axs[2].plot(S_range, theta_put, 'r-', label='Put Theta')
axs[2].axhline(y=0, color='k', linestyle='-', alpha=0.3)
axs[2].axvline(x=K, color='k', linestyle='--', alpha=0.3)
axs[2].set_xlabel('Stock Price ($)')
axs[2].set_ylabel('Theta ($ per day)')
axs[2].set_title('Option Theta vs Stock Price')
axs[2].grid(True)
axs[2].legend()

plt.tight_layout()
plt.savefig('option_greeks.png')
plt.show()

# Example 3: Monte Carlo simulation for option pricing
def monte_carlo_option_pricing(S0, K, T, r, sigma, num_simulations=10000, option='call'):
"""
Price options using Monte Carlo simulation

Parameters:
S0: Initial stock price
K: Strike price
T: Time to maturity (in years)
r: Risk-free interest rate
sigma: Volatility
num_simulations: Number of simulation paths
option: 'call' or 'put'

Returns:
Option price
"""
# Generate random simulations of stock price at expiration
z = np.random.standard_normal(num_simulations)
ST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * z)

# Calculate payoff at expiration
if option == 'call':
payoff = np.maximum(ST - K, 0)
else:
payoff = np.maximum(K - ST, 0)

# Calculate present value
option_price = np.exp(-r * T) * np.mean(payoff)

return option_price, ST

# Run Monte Carlo simulation
np.random.seed(42) # For reproducibility
S0 = 100
num_simulations = 10000

mc_price, ST = monte_carlo_option_pricing(S0, K, T, r, sigma, num_simulations, 'call')
bs_price = black_scholes(S0, K, T, r, sigma, 'call')

# Create histogram of simulated stock prices at expiration
plt.figure(figsize=(12, 6))
plt.hist(ST, bins=50, alpha=0.5, color='blue', density=True)

# Add a normal distribution curve for comparison
x = np.linspace(min(ST), max(ST), 100)
mu = S0 * np.exp(r * T) # Expected value under risk-neutral measure
std = S0 * np.exp(r * T) * np.sqrt(np.exp(sigma**2 * T) - 1)
plt.plot(x, stats.norm.pdf(x, mu, std), 'r-', linewidth=2)

plt.axvline(x=K, color='k', linestyle='--', label=f'Strike Price (K={K})')
plt.axvline(x=S0, color='g', linestyle='-', label=f'Initial Price (S0={S0})')

plt.title(f'Monte Carlo Simulation of Stock Prices at Expiration (T={T} years)')
plt.xlabel('Stock Price at Expiration ($)')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.savefig('monte_carlo_simulation.png')
plt.show()

# Compare Monte Carlo with Black-Scholes
print(f"Monte Carlo Call Option Price: ${mc_price:.4f}")
print(f"Black-Scholes Call Option Price: ${bs_price:.4f}")
print(f"Difference: ${abs(mc_price - bs_price):.4f}")

Black-Scholes Option Pricing Model in Financial Mathematics

In this example, I’ll solve and analyze a classic financial mathematics problem: option pricing using the Black-Scholes model.

The Black-Scholes model is fundamental in financial mathematics for pricing European-style options.

Mathematical Background

The Black-Scholes formula for European call and put options is given by:

For a call option:
$$C(S,t) = S \cdot N(d_1) - K e^{-r(T-t)} \cdot N(d_2)$$

For a put option:
$$P(S,t) = K e^{-r(T-t)} \cdot N(-d_2) - S \cdot N(-d_1)$$

Where:
$$d_1 = \frac{\ln(S/K) + (r + \sigma^2/2)(T-t)}{\sigma\sqrt{T-t}}$$
$$d_2 = d_1 - \sigma\sqrt{T-t}$$

And:

  • $S$ is the current stock price
  • $K$ is the strike price
  • $r$ is the risk-free interest rate
  • $T-t$ is the time to maturity
  • $\sigma$ is the volatility of the stock
  • $N(·)$ is the cumulative distribution function of the standard normal distribution

Code Explanation

The code implements the Black-Scholes model and provides three detailed examples:

  1. 3D Visualization of Option Prices: Shows how call option prices change with stock price and time to maturity
  2. Option Greeks Analysis: Calculates and visualizes Delta, Gamma, and Theta
  3. Monte Carlo Simulation: Prices options using simulation and compares with the analytical solution

Key Functions:

  1. black_scholes(): Implements the analytical Black-Scholes formula
  2. bs_delta(), bs_gamma(), bs_theta(): Calculate option Greeks
  3. monte_carlo_option_pricing(): Simulates stock price paths using geometric Brownian motion

Visualization Analysis

1. 3D Surface Plot of Call Option Prices

The 3D plot shows how the call option price (z-axis) varies with:

  • Stock price (x-axis): As stock price increases, call option value increases
  • Time to maturity (y-axis): The time value of the option is visible

This visualization helps understand the non-linear relationship between these variables.

2. Option Greeks Visualization

The Greeks represent sensitivities of option prices to various factors:

  • Delta: Measures the rate of change of option price with respect to changes in the underlying asset’s price

    • Call delta ranges from $0$ to $1$
    • Put delta ranges from $-1$ to $0$
    • At-the-money options have delta near $0.5$ (calls) or $-0.5$ (puts)
  • Gamma: Measures the rate of change of delta with respect to changes in the underlying price

    • Highest at-the-money and decreases as the option moves in or out of the money
    • Same for both calls and puts
  • Theta: Measures the rate of change of option price with respect to the passage of time (time decay)

    • Generally negative for both calls and puts (options lose value as time passes)
    • Most significant for at-the-money options

3. Monte Carlo Simulation

Monte Carlo Call Option Price: $6.8874
Black-Scholes Call Option Price: $6.8887
Difference: $0.0013

The histogram shows the distribution of simulated stock prices at expiration:

  • The red curve represents the theoretical lognormal distribution
  • The vertical lines mark the initial stock price and strike price
  • The simulation validates the Black-Scholes model by producing a similar price

When we compare the Monte Carlo estimate with the analytical Black-Scholes solution, they should be very close, with differences attributable to simulation error.

Conclusion

This example demonstrates several fundamental concepts in financial mathematics:

  1. Analytical solution of the Black-Scholes equation
  2. Option Greeks for risk management
  3. Monte Carlo methods for pricing financial instruments

The visualizations help develop intuition about option pricing behavior and the relationships between different variables in the model.