from bokeh.plotting import figure, show, output_notebook from bokeh.layouts import gridplot from bokeh.models import ColumnDataSource import numpy as np
# Prepare data x = np.linspace(0, 4 * np.pi, 100) y = np.sin(x)
Here is a basic example of using $OR-Tools$ in $Python$ to solve a simple linear programming problem.
$OR-Tools$ is a powerful optimization library provided by Google, and it can be used to solve a wide range of problems, including linear programming, mixed-integer programming, constraint programming, and more.
Example: Linear Programming with OR-Tools
This example demonstrates solving a linear programming problem where we want to maximize the objective function $3x + 4y$ subject to some constraints.
defmain(): # Create the solver with the SCIP backend. solver = pywraplp.Solver.CreateSolver('SCIP') ifnot solver: return
# Create the variables x and y. x = solver.NumVar(0, solver.infinity(), 'x') y = solver.NumVar(0, solver.infinity(), 'y')
# Create the constraints. solver.Add(2 * x + 3 * y <= 12) solver.Add(4 * x + y <= 14) solver.Add(3 * x - y >= 0)
# Define the objective function. objective = solver.Maximize(3 * x + 4 * y)
# Solve the problem. status = solver.Solve()
# Check the result status. if status == pywraplp.Solver.OPTIMAL: print('Solution:') print('Objective value =', solver.Objective().Value()) print('x =', x.solution_value()) print('y =', y.solution_value()) else: print('The problem does not have an optimal solution.')
if __name__ == '__main__': main()
Explanation
Solver: We create a solver instance using pywraplp.Solver.CreateSolver('SCIP'). $SCIP$ is a powerful mixed-integer programming solver, and $OR-Tools$ uses it as one of its backends.
Variables: We define two variables, x and y, both with a lower bound of 0 and an upper bound of infinity.
Constraints: We add three constraints:
$(2x + 3y \leq 12)$
$(4x + y \leq 14)$
$(3x - y \geq 0)$
Objective: We want to maximize the function $3x + 4y$.
Solve: The solver solves the problem, and we check if an optimal solution was found.
Result: If a solution is found, it prints the optimal objective value and the values of x and y.
Output
1 2 3 4
Solution: Objective value = 17.0 x = 2.9999999999999996 y = 2.0000000000000018
Let’s break down the result of the optimization problem using $OR-Tools$:
Objective Value:
Objective value = 17.0: This is the maximum value of the objective function 3x + 4y given the constraints. The solver found that this is the highest value that can be achieved without violating any of the constraints.
Variable Values:
x = 2.9999999999999996: This is the optimal value of the variable x that maximizes the objective function. Due to floating-point precision in computational mathematics, this value is extremely close to 3 (but not exactly 3).
y = 2.0000000000000018: Similarly, this is the optimal value of the variable y. This value is extremely close to 2.
Interpretation:
Floating-Point Precision: The values 2.9999999999999996 for x and 2.0000000000000018 for y are due to the way computers handle floating-point arithmetic. In practice, these values can be considered as x = 3 and y = 2.
Objective Function Calculation: Given the optimal values of x and y, we can calculate the objective function: $$ 3x + 4y = 3(3) + 4(2) = 9 + 8 = 17 $$ This confirms that the objective value of 17.0 is indeed the maximum value that can be achieved under the given constraints.
Summary:
The solver has determined that to achieve the maximum value of 17 for the objective function 3x + 4y, the values of x and y should be approximately 3 and 2, respectively.
The slight deviations from exact integers are due to the limitations of floating-point representation in computers.
Running the Code
To run this code, ensure you have installed the $OR-Tools$ package. You can install it using pip:
1
pip install ortools
This example should give you a good starting point for working with $OR-Tools$ in $Python$.
# Calculate some basic metrics print("Nodes in the graph:", G.nodes()) print("Edges in the graph:", G.edges())
# Degree of each node print("\nNode degrees:") for node, degree in G.degree(): print(f"{node}: {degree}")
# Shortest path from A to D print("\nShortest path from A to D:", nx.shortest_path(G, source="A", target="D"))
# Clustering coefficient of each node print("\nClustering coefficient:") for node, clustering in nx.clustering(G).items(): print(f"{node}: {clustering}")
# Graph density print("\nGraph density:", nx.density(G))
Explanation of the Code:
Graph Creation:
We create a new undirected graph using nx.Graph().
Nodes (“A”, “B”, “C”, “D”) are added individually.
Edges are added between nodes to define the relationships.
Graph Visualization:
The nx.draw() function is used to visualize the graph. Nodes and edges are displayed with specified colors and sizes.
plt.show() displays the plot.
Basic Graph Metrics:
Nodes and Edges: G.nodes() and G.edges() list all nodes and edges in the graph.
Degree: The degree of each node is calculated using G.degree(), which tells you how many connections each node has.
Shortest Path: The shortest path between two nodes is calculated using nx.shortest_path().
Clustering Coefficient: The clustering coefficient measures the degree to which nodes in the graph tend to cluster together.
Density: The density of a graph is calculated using nx.density(), which gives the ratio of the number of edges to the number of possible edges.
Output
The graph is visualized with nodes labeled “A”, “B”, “C”, and “D”.
The console will display the nodes, edges, degree of each node, the shortest path from node “A” to node “D”, the clustering coefficient of each node, and the overall graph density.
3. Advanced Example: Directed Graph with Weighted Edges
Here’s an example with a directed graph, weighted edges, and calculation of PageRank.
# Define the data for the problem c = np.array([1, 2]) # Coefficients for the objective function A = np.array([[1, 1], [1, -1], [-1, 2]]) # Coefficients for the constraints b = np.array([2, 1, 2]) # Right-hand side values for the constraints
# Define the optimization variables x = cp.Variable(2)
# Define the objective function: minimize c^T x objective = cp.Minimize(c @ x)
# Define the constraints: Ax <= b and x >= 0 constraints = [A @ x <= b, x >= 0]
# Formulate the problem problem = cp.Problem(objective, constraints)
# Solve the problem problem.solve()
# Print the results print("Status:", problem.status) print("Optimal value of the objective function:", problem.value) print("Optimal values of the variables x:", x.value)
Explanation
Objective Function: c @ x is the dot product of the vector c with the variable vector x. We aim to minimize this value.
Constraints: A @ x <= b represents the inequality constraints, and x >= 0 ensures that the variables are non-negative.
Optimization: problem.solve() solves the optimization problem, and the optimal solution is stored in x.value.
Output
When you run this code, it will output the status of the optimization (e.g., “optimal”), the optimal value of the objective function, and the optimal values of the decision variables $( x )$.
1 2 3
Status: optimal Optimal value of the objective function: 3.698490338406144e-10 Optimal values of the variables x: [3.59660015e-10 5.09450927e-12]
This example is a basic introduction, but $CVXPY$ can handle more complex problems, including quadratic programming, mixed-integer programming, and other types of convex optimization problems.
# Step 6: Plot the forecast model.plot(forecast) plt.show()
# Step 7: Plot the forecast components model.plot_components(forecast) plt.show()
Explanation
ds: The column containing the dates.
y: The column containing the values to be forecasted.
fit: The method to train the model with your time series data.
make_future_dataframe: Prepares a dataframe to hold future predictions.
predict: Generates predictions for the given dates.
plot: Visualizes the forecast along with the observed data.
plot_components: Breaks down the forecast into its components (e.g., trend, weekly seasonality).
Result
Running this code will generate a plot of the time series data with the forecasted values and their uncertainty intervals, as well as a breakdown of the forecast components.
# Step 4: Set up the evolutionary algorithm defmain(): population = toolbox.population(n=300) NGEN = 40 CXPB = 0.7 MUTPB = 0.2
# Evaluate the entire population fitnesses = list(map(toolbox.evaluate, population)) for ind, fit inzip(population, fitnesses): ind.fitness.values = fit
# Step 5: Run the algorithm for gen inrange(NGEN): offspring = toolbox.select(population, len(population)) offspring = list(map(toolbox.clone, offspring))
for child1, child2 inzip(offspring[::2], offspring[1::2]): if random.random() < CXPB: toolbox.mate(child1, child2) del child1.fitness.values del child2.fitness.values
for mutant in offspring: if random.random() < MUTPB: toolbox.mutate(mutant) del mutant.fitness.values
invalid_ind = [ind for ind in offspring ifnot ind.fitness.valid] fitnesses = map(toolbox.evaluate, invalid_ind) for ind, fit inzip(invalid_ind, fitnesses): ind.fitness.values = fit
population[:] = offspring
fits = [ind.fitness.values[0] for ind in population]
length = len(population) mean = sum(fits) / length sum2 = sum(x*x for x in fits) std = abs(sum2 / length - mean**2)**0.5
print(f"Gen {gen}: Min {min(fits)} Max {max(fits)} Avg {mean} Std {std}")
Problem Definition: The fitness function (evalOneMax) is defined to maximize x^2.
Individual and Population: An individual is a list of integers, and the population is a list of individuals.
Evolutionary Operators:
mate: Uses two-point crossover.
mutate: Uses bit flip mutation with a small probability.
select: Uses tournament selection.
Algorithm Flow: The algorithm runs for a specified number of generations (NGEN), performing selection, crossover, mutation, and evaluation in each generation.
Output: The algorithm prints the minimum, maximum, average, and standard deviation of fitness values at each generation and finally outputs the best individual found.
Gen 0: Min 309.0 Max 787.0 Avg 562.7166666666667 Std 85.22716540060568 Gen 1: Min 358.0 Max 832.0 Avg 627.3066666666666 Std 76.52445767349342 Gen 2: Min 500.0 Max 862.0 Avg 680.6666666666666 Std 69.2917182801974 Gen 3: Min 521.0 Max 868.0 Avg 741.4666666666667 Std 60.18418027651102 Gen 4: Min 596.0 Max 924.0 Avg 781.9566666666667 Std 53.113226120136446 Gen 5: Min 624.0 Max 929.0 Avg 820.27 Std 50.70007001967588 Gen 6: Min 567.0 Max 941.0 Avg 849.4633333333334 Std 57.31010954059929 Gen 7: Min 681.0 Max 948.0 Avg 883.84 Std 42.49165878930357 Gen 8: Min 645.0 Max 971.0 Avg 904.7033333333334 Std 44.194969422119485 Gen 9: Min 744.0 Max 971.0 Avg 920.1233333333333 Std 34.01541006988307 Gen 10: Min 663.0 Max 976.0 Avg 930.0766666666667 Std 35.58638488086127 Gen 11: Min 671.0 Max 978.0 Avg 936.6533333333333 Std 38.082539597856474 Gen 12: Min 745.0 Max 981.0 Avg 947.5033333333333 Std 31.812628345080054 Gen 13: Min 854.0 Max 983.0 Avg 957.0966666666667 Std 25.950606715236244 Gen 14: Min 764.0 Max 987.0 Avg 960.02 Std 33.497755148667416 Gen 15: Min 691.0 Max 987.0 Avg 967.7833333333333 Std 32.976401494941406 Gen 16: Min 587.0 Max 988.0 Avg 966.6833333333333 Std 43.08553185880072 Gen 17: Min 784.0 Max 988.0 Avg 973.6033333333334 Std 27.84276067889405 Gen 18: Min 692.0 Max 988.0 Avg 974.3433333333334 Std 33.028151460366374 Gen 19: Min 690.0 Max 988.0 Avg 973.64 Std 39.178187809035656 Gen 20: Min 785.0 Max 988.0 Avg 980.99 Std 27.707578385703428 Gen 21: Min 789.0 Max 988.0 Avg 980.0 Std 31.1358314486693 Gen 22: Min 690.0 Max 988.0 Avg 978.8 Std 32.96391966984678 Gen 23: Min 794.0 Max 988.0 Avg 981.1233333333333 Std 26.293499619149838 Gen 24: Min 691.0 Max 988.0 Avg 981.0566666666666 Std 33.116463008937174 Gen 25: Min 788.0 Max 988.0 Avg 979.43 Std 31.13858967048373 Gen 26: Min 694.0 Max 988.0 Avg 981.4366666666666 Std 30.476209118300122 Gen 27: Min 788.0 Max 988.0 Avg 975.47 Std 38.43005464476852 Gen 28: Min 689.0 Max 988.0 Avg 977.1333333333333 Std 40.12283915953909 Gen 29: Min 789.0 Max 988.0 Avg 979.78 Std 32.672694001772946 Gen 30: Min 688.0 Max 988.0 Avg 978.4633333333334 Std 35.322164744281395 Gen 31: Min 697.0 Max 988.0 Avg 977.1433333333333 Std 35.65991384672285 Gen 32: Min 691.0 Max 988.0 Avg 977.4666666666667 Std 39.729530019313195 Gen 33: Min 689.0 Max 988.0 Avg 977.0733333333334 Std 35.092752654769896 Gen 34: Min 694.0 Max 988.0 Avg 979.7866666666666 Std 33.611919843347344 Gen 35: Min 689.0 Max 988.0 Avg 981.41 Std 29.468094045369302 Gen 36: Min 788.0 Max 988.0 Avg 979.11 Std 34.46947296763569 Gen 37: Min 694.0 Max 988.0 Avg 980.1366666666667 Std 31.0949833395823 Gen 38: Min 788.0 Max 988.0 Avg 981.07 Std 30.00364033468625 Gen 39: Min 691.0 Max 988.0 Avg 981.0266666666666 Std 32.25077707935549 Best individual is: [98, 100, 100, 97, 94, 99, 100, 100, 100, 100] (988.0,)
The output represents the results of an evolutionary algorithm running for $40$ generations.
Here’s a breakdown of what each part of the output means:
Generations (Gen 0, Gen 1, etc.):
Each “Gen” line corresponds to a generation in the evolutionary process. The algorithm starts at Gen 0 and evolves through to Gen $39$.
Min, Max, Avg, Std:
Min: The minimum fitness value in the population for that generation.
Max: The maximum fitness value in the population for that generation.
Avg: The average fitness value across the entire population for that generation.
Std: The standard deviation of fitness values in the population, indicating how spread out the fitness values are.
Trends over Generations:
Min: The minimum fitness value generally increases over time, suggesting that the worst-performing individuals are improving.
Max: The maximum fitness value also increases, indicating that the best individuals are becoming fitter.
Avg: The average fitness steadily rises, showing that the overall population is improving.
Std: The standard deviation decreases over time, meaning the population is becoming more homogenous, with individuals having similar fitness values.
Final Generation (Gen 39):
Min 691.0, Max 988.0, Avg 981.0266666666666, Std 32.25077707935549: By the final generation, the population has a high average fitness, with the best individual achieving a fitness of $988.0$.
Best Individual:
The best individual found during the evolutionary process is [98, 100, 100, 97, 94, 99, 100, 100, 100, 100] with a fitness value of 988.0. This individual likely represents an optimal or near-optimal solution to the problem.
In summary, the algorithm successfully evolves a population over $40$ generations, improving the fitness of individuals, and converging towards an optimal solution, as indicated by the high average and maximum fitness values in the later generations.
Customizing DEAP
Custom Fitness Functions: You can create custom fitness functions based on the problem at hand.
Custom Selection, Crossover, and Mutation: $DEAP$ allows you to define and use custom operators.
Multi-Objective Optimization: $DEAP$ also supports multi-objective optimization, where the fitness function can return multiple values (e.g., minimizing both time and cost).
This example demonstrates how to set up and run a basic genetic algorithm using $DEAP$.
You can extend it by adjusting the problem, operators, and parameters to suit different optimization tasks.
# Filter rows where age > 30 filtered_df = df.filter(pl.col('age') > 30) print("\nFiltered DataFrame (age > 30):") print(filtered_df)
# Add a new column 'bonus' which is 10% of the salary df = df.with_column((pl.col('salary') * 0.1).alias('bonus')) print("\nDataFrame with Bonus Column:") print(df)
# Group by 'department' and calculate the average salary and total bonus grouped_df = df.groupby('department').agg([ pl.col('salary').mean().alias('avg_salary'), pl.col('bonus').sum().alias('total_bonus') ]) print("\nGrouped by Department with Aggregations:") print(grouped_df)
# Sort the DataFrame by 'salary' in descending order sorted_df = df.sort('salary', reverse=True) print("\nSorted DataFrame by Salary (Descending):") print(sorted_df)
Explanation
Reading CSV:
pl.read_csv('data.csv') reads the CSV file into a $Polars$ DataFrame.
Displaying the DataFrame:
print(df) shows the original DataFrame.
Selecting Specific Columns:
df.select(['name', 'age', 'salary']) selects only the name, age, and salary columns.
Filtering Rows:
df.filter(pl.col('age') > 30) filters the DataFrame to include only rows where the age is greater than $30$.
Adding a New Column:
df.with_column((pl.col('salary') * 0.1).alias('bonus')) adds a new column bonus which is $10$% of the salary.
Grouping and Aggregating:
df.groupby('department').agg([pl.col('salary').mean().alias('avg_salary'), pl.col('bonus').sum().alias('total_bonus')]) groups the DataFrame by department and calculates the average salary and total bonus for each department.
Sorting:
df.sort('salary', reverse=True) sorts the DataFrame by salary in descending order.
Here are a few useful SciPy sample codes showcasing different aspects of the library, including optimization, integration, interpolation, and signal processing.
1. Optimization: Minimizing a Function
This example demonstrates how to minimize a mathematical function using scipy.optimize.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
import numpy as np from scipy.optimize import minimize
# Define a function to minimize defobjective_function(x): return x**2 + 10*np.sin(x)
# Initial guess x0 = 2.0
# Perform the minimization result = minimize(objective_function, x0, method='BFGS')
print("Minimum value of the function:", result.fun) print("Value of x that minimizes the function:", result.x)
[Output]
Minimum value of the function: 8.31558557947746
Value of x that minimizes the function: [3.83746713]
2. Integration: Calculating the Area Under a Curve
This example shows how to calculate the definite integral of a function using scipy.integrate.quad.
1 2 3 4 5 6 7 8 9 10
from scipy.integrate import quad
# Define a function to integrate defintegrand(x): return np.exp(-x**2)
# Compute the integral from 0 to infinity result, error = quad(integrand, 0, np.inf)
print("Integral result:", result)
[Output]
Integral result: 0.8862269254527579
3. Interpolation: Interpolating Data Points
This example demonstrates how to interpolate data points using scipy.interpolate.interp1d.
import numpy as np import matplotlib.pyplot as plt from scipy.signal import butter, lfilter
# Generate a noisy signal np.random.seed(0) t = np.linspace(0, 1, 500, endpoint=False) signal = np.sin(2 * np.pi * 7 * t) + 0.5 * np.random.randn(500)
# Design a low-pass filter defbutter_lowpass(cutoff, fs, order=5): nyquist = 0.5 * fs normal_cutoff = cutoff / nyquist b, a = butter(order, normal_cutoff, btype='low', analog=False) return b, a
# Apply the filter to the signal deflowpass_filter(data, cutoff, fs, order=5): b, a = butter_lowpass(cutoff, fs, order=order) y = lfilter(b, a, data) return y
# Parameters cutoff = 3.5# Desired cutoff frequency of the filter, Hz fs = 50.0# Sample rate, Hz
# Filter the signal filtered_signal = lowpass_filter(signal, cutoff, fs)
These examples provide a glimpse of the powerful tools available in SciPy for scientific computing, including optimization, integration, interpolation, and signal processing.
PuLP is a Python library used for linear programming (LP) and mixed-integer linear programming (MILP).
It allows you to define and solve optimization problems where you want to minimize or maximize a linear objective function subject to linear constraints.
Installation
First, you need to install PuLP. You can do this using pip:
1
pip install pulp
Basic Usage of PuLP
Here’s a step-by-step guide to solving a simple linear programming problem with PuLP.
Example Problem:
Suppose you are a factory manager and you want to determine the optimal number of products $A$ and $B$ to produce to maximize your profit. Each product requires a certain amount of resources (time, materials) and yields a certain profit.
Objective: Maximize profit.
Constraints:
You have $100$ units of material.
You have $80$ hours of labor.
Product $A$ requires $4$ units of material and $2$ hours of labor.
Product $B$ requires $3$ units of material and $5$ hours of labor.
# Step 1: Define the problem # Create a maximization problem problem = pulp.LpProblem("Maximize_Profit", pulp.LpMaximize)
# Step 2: Define the decision variables # Let x be the number of product A to produce, y be the number of product B to produce x = pulp.LpVariable('x', lowBound=0, cat='Continuous') y = pulp.LpVariable('y', lowBound=0, cat='Continuous')
# Step 3: Define the objective function # Objective function: Maximize 20x + 25y problem += 20*x + 25*y, "Profit"
# Step 6: Display the results print("Status:", pulp.LpStatus[problem.status]) print("Optimal number of product A to produce:", pulp.value(x)) print("Optimal number of product B to produce:", pulp.value(y)) print("Maximum Profit:", pulp.value(problem.objective))
Explanation of the Code:
Define the Problem: We create a linear programming problem with the objective to maximize profit.
Decision Variables: We define the decision variables x and y for the number of products A and B to produce. These variables are continuous and non-negative.
Objective Function: We define the objective function to maximize, which is the total profit: 20*x + 25*y.
Constraints: We add constraints based on the available resources:
Material constraint: 4*x + 3*y <= 100
Labor constraint: 2*x + 5*y <= 80
Solve the Problem: We solve the linear programming problem using PuLP’s solve method.
Display the Results: We print the optimal values of x and y, and the maximum profit.
Interpretation of Output:
Output
1 2 3 4
Status: Optimal Optimal number of product A to produce: 18.571429 Optimal number of product B to produce: 8.5714286 Maximum Profit: 585.714295
Interpretation
The optimal solution is to produce $18.5$ units of product A and $8.57$ units of product B, which yields a maximum profit of $585.71.
This is a basic example of using PuLP in Python.
The library is powerful and can handle more complex constraints, variables, and objectives, including mixed-integer programming.
Here’s a advanced and useful sample code using Altair, which demonstrates how to create a layered chart with interactivity, such as tooltips and selection, along with custom encoding:
import altair as alt from vega_datasets import data
# Load dataset source = data.cars()
# Define a brush selection brush = alt.selection(type='interval')
# Base chart with a scatter plot base = alt.Chart(source).mark_point().encode( x='Horsepower:Q', y='Miles_per_Gallon:Q', color=alt.condition(brush, 'Origin:N', alt.value('lightgray')), tooltip=['Name:N', 'Origin:N', 'Horsepower:Q', 'Miles_per_Gallon:Q'] ).properties( width=600, height=400 ).add_selection( brush )
# Layer a bar chart on top that shows the distribution of 'Origin' for selected points bars = alt.Chart(source).mark_bar().encode( x='count():Q', y='Origin:N', color='Origin:N' ).transform_filter( brush )
# Combine the scatter plot and the bar chart chart = base & bars chart
Result
Explanation
Brush Selection: An interactive selection that allows users to drag over the scatter plot to select a region.
Scatter Plot: A basic plot where points are colored by their origin, and the color changes based on the selection.
Bar Chart: Shows the distribution of car origins, updated based on the selection made in the scatter plot.
Interactivity:
Tooltips provide detailed information when hovering over the points, and the chart updates dynamically based on the user’s selection.
This example combines interactivity with advanced encoding and layout, making it useful for exploratory data analysis.