Bokeh in Python

August 12, 2024

Bokeh in Python

Here’s an example of how to create some useful graphs using $Bokeh$ in $Python$:

Line Plot:
A simple line plot showing trends over time.
Bar Plot:
A bar plot to compare categories.
Scatter Plot:
A scatter plot to see the relationship between two variables.
Histogram:
A histogram to visualize the distribution of data.

First, you need to install $Bokeh$ if you haven’t already:

1	pip install bokeh

Here’s the code to create these plots:

from bokeh.plotting import figure, show, output_notebook
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource
import numpy as np

# Prepare data
x = np.linspace(0, 4 * np.pi, 100)
y = np.sin(x)

categories = ['A', 'B', 'C', 'D']
values = [10, 20, 30, 40]

np.random.seed(1)
x_scatter = np.random.rand(100)
y_scatter = np.random.rand(100)

data = np.random.randn(1000)

# Output to notebook
output_notebook()

# Line plot
p1 = figure(title="Line Plot", x_axis_label='x', y_axis_label='y')
p1.line(x, y, legend_label="sin(x)", line_width=2)

# Bar plot
p2 = figure(x_range=categories, title="Bar Plot", x_axis_label='Category', y_axis_label='Values')
p2.vbar(x=categories, top=values, width=0.5)

# Scatter plot
p3 = figure(title="Scatter Plot", x_axis_label='X', y_axis_label='Y')
p3.scatter(x_scatter, y_scatter, size=8, color="navy", alpha=0.5)

# Histogram
hist, edges = np.histogram(data, bins=30)
p4 = figure(title="Histogram", x_axis_label='Value', y_axis_label='Frequency')
p4.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], fill_color="navy", line_color="white", alpha=0.7)

# Arrange plots in a grid
grid = gridplot([[p1, p2], [p3, p4]])

# Show the plots
show(grid)

Explanation

Line Plot:
- figure() creates a new plot with a title and axis labels.
- p1.line(x, y, ...) adds a line to the plot using the data in x and y.
Bar Plot:
- The x-axis is categorical, so x_range=categories.
- p2.vbar(...) draws vertical bars.
Scatter Plot:
- A scatter plot shows the relationship between two variables using dots.
- p3.scatter(...) plots the points.
Histogram:
- np.histogram computes the histogram of the data.
- p4.quad(...) creates the bars of the histogram.

Running the Code

This code should be run in a $Jupyter notebook$ or a similar environment that supports $Bokeh$’s interactive plots.

It will output the plots inline.

Output

Google OR-Tools

August 11, 2024

Google OR-Tools

Here is a basic example of using $OR-Tools$ in $Python$ to solve a simple linear programming problem.

$OR-Tools$ is a powerful optimization library provided by Google, and it can be used to solve a wide range of problems, including linear programming, mixed-integer programming, constraint programming, and more.

Example: Linear Programming with OR-Tools

This example demonstrates solving a linear programming problem where we want to maximize the objective function $3x + 4y$ subject to some constraints.

from ortools.linear_solver import pywraplp

def main():
    # Create the solver with the SCIP backend.
    solver = pywraplp.Solver.CreateSolver('SCIP')
    if not solver:
        return

    # Create the variables x and y.
    x = solver.NumVar(0, solver.infinity(), 'x')
    y = solver.NumVar(0, solver.infinity(), 'y')

    # Create the constraints.
    solver.Add(2 * x + 3 * y <= 12)
    solver.Add(4 * x + y <= 14)
    solver.Add(3 * x - y >= 0)

    # Define the objective function.
    objective = solver.Maximize(3 * x + 4 * y)

    # Solve the problem.
    status = solver.Solve()

    # Check the result status.
    if status == pywraplp.Solver.OPTIMAL:
        print('Solution:')
        print('Objective value =', solver.Objective().Value())
        print('x =', x.solution_value())
        print('y =', y.solution_value())
    else:
        print('The problem does not have an optimal solution.')

if __name__ == '__main__':
    main()

Explanation

Solver: We create a solver instance using pywraplp.Solver.CreateSolver('SCIP'). $SCIP$ is a powerful mixed-integer programming solver, and $OR-Tools$ uses it as one of its backends.
Variables: We define two variables, x and y, both with a lower bound of 0 and an upper bound of infinity.
Constraints: We add three constraints:
1. $(2x + 3y \leq 12)$
2. $(4x + y \leq 14)$
3. $(3x - y \geq 0)$
Objective: We want to maximize the function $3x + 4y$.
Solve: The solver solves the problem, and we check if an optimal solution was found.
Result: If a solution is found, it prints the optimal objective value and the values of x and y.

Output

Solution:
Objective value = 17.0
x = 2.9999999999999996
y = 2.0000000000000018

Let’s break down the result of the optimization problem using $OR-Tools$:

Objective Value:

Objective value = 17.0: This is the maximum value of the objective function 3x + 4y given the constraints.
The solver found that this is the highest value that can be achieved without violating any of the constraints.

Variable Values:

x = 2.9999999999999996: This is the optimal value of the variable x that maximizes the objective function.
Due to floating-point precision in computational mathematics, this value is extremely close to 3 (but not exactly 3).
y = 2.0000000000000018: Similarly, this is the optimal value of the variable y. This value is extremely close to 2.

Interpretation:

Floating-Point Precision: The values 2.9999999999999996 for x and 2.0000000000000018 for y are due to the way computers handle floating-point arithmetic. In practice, these values can be considered as x = 3 and y = 2.
Objective Function Calculation: Given the optimal values of x and y, we can calculate the objective function:
$$
3x + 4y = 3(3) + 4(2) = 9 + 8 = 17
$$
This confirms that the objective value of 17.0 is indeed the maximum value that can be achieved under the given constraints.

Summary:

The solver has determined that to achieve the maximum value of 17 for the objective function 3x + 4y, the values of x and y should be approximately 3 and 2, respectively.

The slight deviations from exact integers are due to the limitations of floating-point representation in computers.

Running the Code

To run this code, ensure you have installed the $OR-Tools$ package.
You can install it using pip:

1	pip install ortools

This example should give you a good starting point for working with $OR-Tools$ in $Python$.

NetworkX in Python

August 10, 2024

NetworkX in Python

Here’s a useful sample code in $Python$ that uses the $NetworkX$ library to create, visualize, and analyze a graph.

This example covers basic graph operations, such as adding nodes and edges, visualizing the graph, and calculating common network metrics.

1. Installation

First, if you haven’t installed $NetworkX$, you can do so using:

1 2	!pip install networkx !pip install matplotlib

2. Creating and Visualizing a Graph

Here’s a basic example of how to create a graph, add nodes and edges, visualize it, and calculate some metrics.

import networkx as nx
import matplotlib.pyplot as plt

# Create a new graph
G = nx.Graph()

# Add nodes
G.add_node("A")
G.add_node("B")
G.add_node("C")
G.add_node("D")

# Add edges
G.add_edge("A", "B")
G.add_edge("A", "C")
G.add_edge("B", "D")
G.add_edge("C", "D")
G.add_edge("A", "D")

# Draw the graph
nx.draw(G, with_labels=True, node_color='lightblue', edge_color='gray', node_size=3000, font_size=20)
plt.show()

# Calculate some basic metrics
print("Nodes in the graph:", G.nodes())
print("Edges in the graph:", G.edges())

# Degree of each node
print("\nNode degrees:")
for node, degree in G.degree():
    print(f"{node}: {degree}")

# Shortest path from A to D
print("\nShortest path from A to D:", nx.shortest_path(G, source="A", target="D"))

# Clustering coefficient of each node
print("\nClustering coefficient:")
for node, clustering in nx.clustering(G).items():
    print(f"{node}: {clustering}")

# Graph density
print("\nGraph density:", nx.density(G))

Explanation of the Code:

Graph Creation:
- We create a new undirected graph using nx.Graph().
- Nodes (“A”, “B”, “C”, “D”) are added individually.
- Edges are added between nodes to define the relationships.
Graph Visualization:
- The nx.draw() function is used to visualize the graph.
  Nodes and edges are displayed with specified colors and sizes.
- plt.show() displays the plot.
Basic Graph Metrics:
- Nodes and Edges: G.nodes() and G.edges() list all nodes and edges in the graph.
- Degree: The degree of each node is calculated using G.degree(), which tells you how many connections each node has.
- Shortest Path: The shortest path between two nodes is calculated using nx.shortest_path().
- Clustering Coefficient: The clustering coefficient measures the degree to which nodes in the graph tend to cluster together.
- Density: The density of a graph is calculated using nx.density(), which gives the ratio of the number of edges to the number of possible edges.

Output

The graph is visualized with nodes labeled “A”, “B”, “C”, and “D”.

The console will display the nodes, edges, degree of each node, the shortest path from node “A” to node “D”, the clustering coefficient of each node, and the overall graph density.

3. Advanced Example: Directed Graph with Weighted Edges

Here’s an example with a directed graph, weighted edges, and calculation of PageRank.

import networkx as nx
import matplotlib.pyplot as plt

# Create a directed graph
DG = nx.DiGraph()

# Add nodes
DG.add_nodes_from(["A", "B", "C", "D"])

# Add weighted edges
DG.add_weighted_edges_from([("A", "B", 0.6), ("B", "C", 0.2), ("C", "D", 0.1), ("A", "D", 0.7)])

# Draw the directed graph with edge labels showing weights
pos = nx.spring_layout(DG)
nx.draw(DG, pos, with_labels=True, node_color='lightgreen', edge_color='gray', node_size=3000, font_size=20)
edge_labels = nx.get_edge_attributes(DG, 'weight')
nx.draw_networkx_edge_labels(DG, pos, edge_labels=edge_labels)
plt.show()

# PageRank calculation
pagerank = nx.pagerank(DG)
print("\nPageRank:")
for node, rank in pagerank.items():
    print(f"{node}: {rank:.4f}")

Output:

Explanation:

Directed Graph:
A directed graph is created using nx.DiGraph().
Weighted Edges:
Edges are added with weights, which represent the strength or importance of the connection.
Visualization:
The directed graph is visualized with edge labels showing weights.
PageRank:
The PageRank algorithm is used to rank nodes, showing the importance of each node in the graph.

This should give you a good starting point for working with $NetworkX$ in Python.

CVXPY in Python

August 9, 2024

CVXPY in Python

Here’s a basic example of using $CVXPY$, a $Python$ library for convex optimization.

This example solves a simple linear programming problem:

Problem

Minimize the function $( c^T x )$ subject to the constraint $( Ax \leq b )$ and $( x \geq 0 )$, where:

$( c )$ is a vector of coefficients for the objective function.
$( A )$ is a matrix of coefficients for the inequality constraints.
$( b )$ is a vector representing the upper bounds for the constraints.

Sample Code

import cvxpy as cp
import numpy as np

# Define the data for the problem
c = np.array([1, 2])        # Coefficients for the objective function
A = np.array([[1, 1], [1, -1], [-1, 2]])  # Coefficients for the constraints
b = np.array([2, 1, 2])     # Right-hand side values for the constraints

# Define the optimization variables
x = cp.Variable(2)

# Define the objective function: minimize c^T x
objective = cp.Minimize(c @ x)

# Define the constraints: Ax <= b and x >= 0
constraints = [A @ x <= b, x >= 0]

# Formulate the problem
problem = cp.Problem(objective, constraints)

# Solve the problem
problem.solve()

# Print the results
print("Status:", problem.status)
print("Optimal value of the objective function:", problem.value)
print("Optimal values of the variables x:", x.value)

Explanation

Objective Function:
c @ x is the dot product of the vector c with the variable vector x. We aim to minimize this value.
Constraints:
A @ x <= b represents the inequality constraints, and x >= 0 ensures that the variables are non-negative.
Optimization:
problem.solve() solves the optimization problem, and the optimal solution is stored in x.value.

Output

When you run this code, it will output the status of the optimization (e.g., “optimal”), the optimal value of the objective function, and the optimal values of the decision variables $( x )$.

1
2
3

Status: optimal
Optimal value of the objective function: 3.698490338406144e-10
Optimal values of the variables x: [3.59660015e-10 5.09450927e-12]

This example is a basic introduction, but $CVXPY$ can handle more complex problems, including quadratic programming, mixed-integer programming, and other types of convex optimization problems.

Prophet in Python

August 8, 2024

Prophet in Python

$Prophet$ is a popular library developed by $Facebook$ for time series forecasting.

It’s particularly effective for data that has strong seasonal effects and multiple seasonality with daily observations.

Here’s a basic example of how to use $Prophet$ in $Python$:

Step-by-Step Example:

Install Prophet (if you haven’t already):
1
pip install prophet

Import Required Libraries:

1
2
3

import pandas as pd
from prophet import Prophet
import matplotlib.pyplot as plt

Load Your Data:
For this example, let’s create a simple time series dataset.

# Create a simple time series dataset
dates = pd.date_range(start='2022-01-01', periods=365)
data = pd.DataFrame({
    'ds': dates,
    'y': 100 + (dates.dayofyear - 183) ** 2 / 100 + np.random.randn(365) * 5
})

Initialize and Fit the Prophet Model:

# Initialize the Prophet model
model = Prophet()

# Fit the model to the data
model.fit(data)

Make Predictions:
You can make future predictions using the model by specifying the number of days into the future you want to forecast.

# Create a dataframe for future predictions
future = model.make_future_dataframe(periods=30)  # Predict 30 days into the future

# Predict future values
forecast = model.predict(future)

Visualize the Forecast:
$Prophet$ has a built-in plot function to visualize the forecasted data.
1
2
3
# Plot the forecast
model.plot(forecast)
plt.show()
Plot Components:
You can also plot the components (trend, weekly seasonality, yearly seasonality) of the forecast.
1
2
3
# Plot the forecast components
model.plot_components(forecast)
plt.show()

Full Code Example:

import pandas as pd
import numpy as np
from prophet import Prophet
import matplotlib.pyplot as plt

# Step 1: Create a simple time series dataset
dates = pd.date_range(start='2022-01-01', periods=365)
data = pd.DataFrame({
    'ds': dates,
    'y': 100 + (dates.dayofyear - 183) ** 2 / 100 + np.random.randn(365) * 5
})

# Step 2: Initialize the Prophet model
model = Prophet()

# Step 3: Fit the model to the data
model.fit(data)

# Step 4: Create a dataframe for future predictions
future = model.make_future_dataframe(periods=30)  # Predict 30 days into the future

# Step 5: Predict future values
forecast = model.predict(future)

# Step 6: Plot the forecast
model.plot(forecast)
plt.show()

# Step 7: Plot the forecast components
model.plot_components(forecast)
plt.show()

Explanation

ds: The column containing the dates.
y: The column containing the values to be forecasted.
fit: The method to train the model with your time series data.
make_future_dataframe: Prepares a dataframe to hold future predictions.
predict: Generates predictions for the given dates.
plot: Visualizes the forecast along with the observed data.
plot_components: Breaks down the forecast into its components (e.g., trend, weekly seasonality).

Result

Running this code will generate a plot of the time series data with the forecasted values and their uncertainty intervals, as well as a breakdown of the forecast components.

DEAP in Python

August 7, 2024

The $DEAP (Distributed Evolutionary Algorithms in Python)$ library is a powerful and flexible library used for implementing evolutionary algorithms.

It is widely used for optimization tasks, particularly in the fields of genetic algorithms, genetic programming, and other evolutionary strategies.

Steps to Use DEAP Library in Python

Installation
You can install $DEAP$ using pip:
1
pip install deap
Basic Structure of a $DEAP$ Program
A typical $DEAP$ program involves the following steps:
- Importing $DEAP$ modules
- Defining the problem (fitness function)
- Creating individuals and population
- Defining the evolutionary operators (selection, mutation, crossover)
- Setting up the algorithm flow
- Running the algorithm

Example: Simple Genetic Algorithm

Below is an example of a simple genetic algorithm using $DEAP$ to maximize the function f(x) = x^2, where x is an integer.

import random
from deap import base, creator, tools, algorithms

# Step 1: Define the problem (Fitness Function)
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)

# Step 2: Define the individual and population
toolbox = base.Toolbox()
toolbox.register("attr_int", random.randint, 0, 100)
toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_int, 10)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

# Step 3: Define the fitness function
def evalOneMax(individual):
   return sum(individual),  # Summing all values in the individual

toolbox.register("evaluate", evalOneMax)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutFlipBit, indpb=0.05)
toolbox.register("select", tools.selTournament, tournsize=3)

# Step 4: Set up the evolutionary algorithm
def main():
   population = toolbox.population(n=300)
   NGEN = 40
   CXPB = 0.7
   MUTPB = 0.2

   # Evaluate the entire population
   fitnesses = list(map(toolbox.evaluate, population))
   for ind, fit in zip(population, fitnesses):
       ind.fitness.values = fit

   # Step 5: Run the algorithm
   for gen in range(NGEN):
       offspring = toolbox.select(population, len(population))
       offspring = list(map(toolbox.clone, offspring))

       for child1, child2 in zip(offspring[::2], offspring[1::2]):
           if random.random() < CXPB:
               toolbox.mate(child1, child2)
               del child1.fitness.values
               del child2.fitness.values

       for mutant in offspring:
           if random.random() < MUTPB:
               toolbox.mutate(mutant)
               del mutant.fitness.values

       invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
       fitnesses = map(toolbox.evaluate, invalid_ind)
       for ind, fit in zip(invalid_ind, fitnesses):
           ind.fitness.values = fit

       population[:] = offspring

       fits = [ind.fitness.values[0] for ind in population]

       length = len(population)
       mean = sum(fits) / length
       sum2 = sum(x*x for x in fits)
       std = abs(sum2 / length - mean**2)**0.5

       print(f"Gen {gen}: Min {min(fits)} Max {max(fits)} Avg {mean} Std {std}")

   best_ind = tools.selBest(population, 1)[0]
   print("Best individual is:", best_ind, best_ind.fitness.values)

if __name__ == "__main__":
   main()

Explanation

Problem Definition: The fitness function (evalOneMax) is defined to maximize x^2.
Individual and Population: An individual is a list of integers, and the population is a list of individuals.
Evolutionary Operators:
- mate: Uses two-point crossover.
- mutate: Uses bit flip mutation with a small probability.
- select: Uses tournament selection.
Algorithm Flow: The algorithm runs for a specified number of generations (NGEN), performing selection, crossover, mutation, and evaluation in each generation.
Output: The algorithm prints the minimum, maximum, average, and standard deviation of fitness values at each generation and finally outputs the best individual found.

Output

Gen 0: Min 309.0 Max 787.0 Avg 562.7166666666667 Std 85.22716540060568
Gen 1: Min 358.0 Max 832.0 Avg 627.3066666666666 Std 76.52445767349342
Gen 2: Min 500.0 Max 862.0 Avg 680.6666666666666 Std 69.2917182801974
Gen 3: Min 521.0 Max 868.0 Avg 741.4666666666667 Std 60.18418027651102
Gen 4: Min 596.0 Max 924.0 Avg 781.9566666666667 Std 53.113226120136446
Gen 5: Min 624.0 Max 929.0 Avg 820.27 Std 50.70007001967588
Gen 6: Min 567.0 Max 941.0 Avg 849.4633333333334 Std 57.31010954059929
Gen 7: Min 681.0 Max 948.0 Avg 883.84 Std 42.49165878930357
Gen 8: Min 645.0 Max 971.0 Avg 904.7033333333334 Std 44.194969422119485
Gen 9: Min 744.0 Max 971.0 Avg 920.1233333333333 Std 34.01541006988307
Gen 10: Min 663.0 Max 976.0 Avg 930.0766666666667 Std 35.58638488086127
Gen 11: Min 671.0 Max 978.0 Avg 936.6533333333333 Std 38.082539597856474
Gen 12: Min 745.0 Max 981.0 Avg 947.5033333333333 Std 31.812628345080054
Gen 13: Min 854.0 Max 983.0 Avg 957.0966666666667 Std 25.950606715236244
Gen 14: Min 764.0 Max 987.0 Avg 960.02 Std 33.497755148667416
Gen 15: Min 691.0 Max 987.0 Avg 967.7833333333333 Std 32.976401494941406
Gen 16: Min 587.0 Max 988.0 Avg 966.6833333333333 Std 43.08553185880072
Gen 17: Min 784.0 Max 988.0 Avg 973.6033333333334 Std 27.84276067889405
Gen 18: Min 692.0 Max 988.0 Avg 974.3433333333334 Std 33.028151460366374
Gen 19: Min 690.0 Max 988.0 Avg 973.64 Std 39.178187809035656
Gen 20: Min 785.0 Max 988.0 Avg 980.99 Std 27.707578385703428
Gen 21: Min 789.0 Max 988.0 Avg 980.0 Std 31.1358314486693
Gen 22: Min 690.0 Max 988.0 Avg 978.8 Std 32.96391966984678
Gen 23: Min 794.0 Max 988.0 Avg 981.1233333333333 Std 26.293499619149838
Gen 24: Min 691.0 Max 988.0 Avg 981.0566666666666 Std 33.116463008937174
Gen 25: Min 788.0 Max 988.0 Avg 979.43 Std 31.13858967048373
Gen 26: Min 694.0 Max 988.0 Avg 981.4366666666666 Std 30.476209118300122
Gen 27: Min 788.0 Max 988.0 Avg 975.47 Std 38.43005464476852
Gen 28: Min 689.0 Max 988.0 Avg 977.1333333333333 Std 40.12283915953909
Gen 29: Min 789.0 Max 988.0 Avg 979.78 Std 32.672694001772946
Gen 30: Min 688.0 Max 988.0 Avg 978.4633333333334 Std 35.322164744281395
Gen 31: Min 697.0 Max 988.0 Avg 977.1433333333333 Std 35.65991384672285
Gen 32: Min 691.0 Max 988.0 Avg 977.4666666666667 Std 39.729530019313195
Gen 33: Min 689.0 Max 988.0 Avg 977.0733333333334 Std 35.092752654769896
Gen 34: Min 694.0 Max 988.0 Avg 979.7866666666666 Std 33.611919843347344
Gen 35: Min 689.0 Max 988.0 Avg 981.41 Std 29.468094045369302
Gen 36: Min 788.0 Max 988.0 Avg 979.11 Std 34.46947296763569
Gen 37: Min 694.0 Max 988.0 Avg 980.1366666666667 Std 31.0949833395823
Gen 38: Min 788.0 Max 988.0 Avg 981.07 Std 30.00364033468625
Gen 39: Min 691.0 Max 988.0 Avg 981.0266666666666 Std 32.25077707935549
Best individual is: [98, 100, 100, 97, 94, 99, 100, 100, 100, 100] (988.0,)

The output represents the results of an evolutionary algorithm running for $40$ generations.

Here’s a breakdown of what each part of the output means:

Generations (Gen 0, Gen 1, etc.):
- Each “Gen” line corresponds to a generation in the evolutionary process. The algorithm starts at Gen 0 and evolves through to Gen $39$.
Min, Max, Avg, Std:
- Min: The minimum fitness value in the population for that generation.
- Max: The maximum fitness value in the population for that generation.
- Avg: The average fitness value across the entire population for that generation.
- Std: The standard deviation of fitness values in the population, indicating how spread out the fitness values are.
Trends over Generations:
- Min: The minimum fitness value generally increases over time, suggesting that the worst-performing individuals are improving.
- Max: The maximum fitness value also increases, indicating that the best individuals are becoming fitter.
- Avg: The average fitness steadily rises, showing that the overall population is improving.
- Std: The standard deviation decreases over time, meaning the population is becoming more homogenous, with individuals having similar fitness values.
Final Generation (Gen 39):
- Min 691.0, Max 988.0, Avg 981.0266666666666, Std 32.25077707935549: By the final generation, the population has a high average fitness, with the best individual achieving a fitness of $988.0$.
Best Individual:
- The best individual found during the evolutionary process is [98, 100, 100, 97, 94, 99, 100, 100, 100, 100] with a fitness value of 988.0. This individual likely represents an optimal or near-optimal solution to the problem.

In summary, the algorithm successfully evolves a population over $40$ generations, improving the fitness of individuals, and converging towards an optimal solution, as indicated by the high average and maximum fitness values in the later generations.

Customizing DEAP

Custom Fitness Functions: You can create custom fitness functions based on the problem at hand.
Custom Selection, Crossover, and Mutation: $DEAP$ allows you to define and use custom operators.
Multi-Objective Optimization: $DEAP$ also supports multi-objective optimization, where the fitness function can return multiple values (e.g., minimizing both time and cost).

This example demonstrates how to set up and run a basic genetic algorithm using $DEAP$.

You can extend it by adjusting the problem, operators, and parameters to suit different optimization tasks.

Polars in Python

August 6, 2024

Polars in Python

$Polars$ is a powerful DataFrame library in Python, known for its speed and efficiency, especially with large datasets.

Here’s a useful sample demonstrating some key features of $Polars$, including reading data, data manipulation, and aggregations.

We’ll use a sample dataset to illustrate these functionalities.

Sample Data

Let’s assume we have a CSV file named data.csv with the following content:

id,name,age,salary,department
1,Alice,30,70000,HR
2,Bob,40,80000,Engineering
3,Charlie,25,65000,Marketing
4,Diana,35,72000,HR
5,Edward,50,90000,Engineering
6,Frank,45,85000,Marketing

Code Example

import polars as pl

# Read CSV file into a Polars DataFrame
df = pl.read_csv('data.csv')

# Display the DataFrame
print("Original DataFrame:")
print(df)

# Select specific columns
selected_df = df.select(['name', 'age', 'salary'])
print("\nSelected Columns (name, age, salary):")
print(selected_df)

# Filter rows where age > 30
filtered_df = df.filter(pl.col('age') > 30)
print("\nFiltered DataFrame (age > 30):")
print(filtered_df)

# Add a new column 'bonus' which is 10% of the salary
df = df.with_column((pl.col('salary') * 0.1).alias('bonus'))
print("\nDataFrame with Bonus Column:")
print(df)

# Group by 'department' and calculate the average salary and total bonus
grouped_df = df.groupby('department').agg([
    pl.col('salary').mean().alias('avg_salary'),
    pl.col('bonus').sum().alias('total_bonus')
])
print("\nGrouped by Department with Aggregations:")
print(grouped_df)

# Sort the DataFrame by 'salary' in descending order
sorted_df = df.sort('salary', reverse=True)
print("\nSorted DataFrame by Salary (Descending):")
print(sorted_df)

Explanation

Reading CSV:
- pl.read_csv('data.csv') reads the CSV file into a $Polars$ DataFrame.
Displaying the DataFrame:
- print(df) shows the original DataFrame.
Selecting Specific Columns:
- df.select(['name', 'age', 'salary']) selects only the name, age, and salary columns.
Filtering Rows:
- df.filter(pl.col('age') > 30) filters the DataFrame to include only rows where the age is greater than $30$.
Adding a New Column:
- df.with_column((pl.col('salary') * 0.1).alias('bonus')) adds a new column bonus which is $10$% of the salary.
Grouping and Aggregating:
- df.groupby('department').agg([pl.col('salary').mean().alias('avg_salary'), pl.col('bonus').sum().alias('total_bonus')]) groups the DataFrame by department and calculates the average salary and total bonus for each department.
Sorting:
- df.sort('salary', reverse=True) sorts the DataFrame by salary in descending order.

Output

The output will look something like this:

Original DataFrame:
shape: (6, 5)
┌─────┬─────────┬─────┬────────┬─────────────┐
│ id  │ name    │ age │ salary │ department  │
│ --- │ ---     │ --- │ ---    │ ---         │
│ i64 │ str     │ i64 │ i64    │ str         │
├─────┼─────────┼─────┼────────┼─────────────┤
│ 1   │ Alice   │ 30  │ 70000  │ HR          │
│ 2   │ Bob     │ 40  │ 80000  │ Engineering │
│ 3   │ Charlie │ 25  │ 65000  │ Marketing   │
│ 4   │ Diana   │ 35  │ 72000  │ HR          │
│ 5   │ Edward  │ 50  │ 90000  │ Engineering │
│ 6   │ Frank   │ 45  │ 85000  │ Marketing   │
└─────┴─────────┴─────┴────────┴─────────────┘

Selected Columns (name, age, salary):
shape: (6, 3)
┌─────────┬─────┬────────┐
│ name    │ age │ salary │
│ ---     │ --- │ ---    │
│ str     │ i64 │ i64    │
├─────────┼─────┼────────┤
│ Alice   │ 30  │ 70000  │
│ Bob     │ 40  │ 80000  │
│ Charlie │ 25  │ 65000  │
│ Diana   │ 35  │ 72000  │
│ Edward  │ 50  │ 90000  │
│ Frank   │ 45  │ 85000  │
└─────────┴─────┴────────┘

Filtered DataFrame (age > 30):
shape: (4, 5)
┌─────┬─────────┬─────┬────────┬─────────────┐
│ id  │ name    │ age │ salary │ department  │
│ --- │ ---     │ --- │ ---    │ ---         │
│ i64 │ str     │ i64 │ i64    │ str         │
├─────┼─────────┼─────┼────────┼─────────────┤
│ 2   │ Bob     │ 40  │ 80000  │ Engineering │
│ 4   │ Diana   │ 35  │ 72000  │ HR          │
│ 5   │ Edward  │ 50  │ 90000  │ Engineering │
│ 6   │ Frank   │ 45  │ 85000  │ Marketing   │
└─────┴─────────┴─────┴────────┴─────────────┘

DataFrame with Bonus Column:
shape: (6, 6)
┌─────┬─────────┬─────┬────────┬─────────────┬───────┐
│ id  │ name    │ age │ salary │ department  │ bonus │
│ --- │ ---     │ --- │ ---    │ ---         │ ---   │
│ i64 │ str     │ i64 │ i64    │ str         │ f64   │
├─────┼─────────┼─────┼────────┼─────────────┼───────┤
│ 1   │ Alice   │ 30  │ 70000  │ HR          │ 7000  │
│ 2   │ Bob     │ 40  │ 80000  │ Engineering │ 8000  │
│ 3   │ Charlie │ 25  │ 65000  │ Marketing   │ 6500  │
│ 4   │ Diana   │ 35  │ 72000  │ HR          │ 7200  │
│ 5   │ Edward  │ 50  │ 90000  │ Engineering │ 9000  │
│ 6   │ Frank   │ 45  │ 85000  │ Marketing   │ 8500  │
└─────┴─────────┴─────┴────────┴─────────────┴───────┘

Grouped by Department with Aggregations:
shape: (3, 3)
┌─────────────┬────────────┬────────────┐
│ department  │ avg_salary │ total_bonus│
│ ---         │ ---        │ ---        │
│ str         │ f64        │ f64        │
├─────────────┼────────────┼────────────┤
│ HR          │ 71000      │ 14200      │
│ Engineering │ 85000      │ 17000      │
│ Marketing   │ 75000      │ 15000      │
└─────────────┴────────────┴────────────┘

Sorted DataFrame by Salary (Descending):
shape: (6, 6)
┌─────┬─────────┬─────┬────────┬─────────────┬───────┐
│ id  │ name    │ age │ salary │ department  │ bonus │
│ --- │ ---     │ --- │ ---    │ ---         │ ---   │
│ i64 │ str     │ i64 │ i64    │ str         │ f64   │
├─────┼─────────┼─────┼────────┼─────────────┼───────┤
│ 5   │ Edward  │ 50  │ 90000  │ Engineering │ 9000  │
│ 6   │ Frank   │ 45  │ 85000  │ Marketing   │ 8500  │
│ 2   │ Bob     │ 40  │ 80000  │ Engineering │ 8000  │
│ 4   │ Diana   │ 35  │ 72000  │ HR          │ 7200  │
│ 

1   │ Alice   │ 30  │ 70000  │ HR          │ 7000  │
│ 3   │ Charlie │ 25  │ 65000  │ Marketing   │ 6500  │
└─────┴─────────┴─────┴────────┴─────────────┴───────┘

This example covers some fundamental operations with the $Polars$ library, making it a useful starting point for data manipulation and analysis tasks.

SciPy in Python

August 5, 2024

SciPy in Python

Here are a few useful SciPy sample codes showcasing different aspects of the library, including optimization, integration, interpolation, and signal processing.

1. Optimization: Minimizing a Function

This example demonstrates how to minimize a mathematical function using scipy.optimize.

import numpy as np
from scipy.optimize import minimize

# Define a function to minimize
def objective_function(x):
    return x**2 + 10*np.sin(x)

# Initial guess
x0 = 2.0

# Perform the minimization
result = minimize(objective_function, x0, method='BFGS')

print("Minimum value of the function:", result.fun)
print("Value of x that minimizes the function:", result.x)

[Output]

Minimum value of the function: 8.31558557947746
Value of x that minimizes the function: [3.83746713]

2. Integration: Calculating the Area Under a Curve

This example shows how to calculate the definite integral of a function using scipy.integrate.quad.

from scipy.integrate import quad

# Define a function to integrate
def integrand(x):
    return np.exp(-x**2)

# Compute the integral from 0 to infinity
result, error = quad(integrand, 0, np.inf)

print("Integral result:", result)

[Output]

Integral result: 0.8862269254527579

3. Interpolation: Interpolating Data Points

This example demonstrates how to interpolate data points using scipy.interpolate.interp1d.

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d

# Sample data points
x = np.linspace(0, 10, 10)
y = np.sin(x)

# Create an interpolation function
f = interp1d(x, y, kind='cubic')

# Generate new x values and interpolate
x_new = np.linspace(0, 10, 100)
y_new = f(x_new)

# Plot the results
plt.plot(x, y, 'o', label='Data points')
plt.plot(x_new, y_new, '-', label='Cubic interpolation')
plt.legend()
plt.show()

[Output]

4. Signal Processing: Filtering a Signal

This example shows how to apply a low-pass filter to a noisy signal using scipy.signal.

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import butter, lfilter

# Generate a noisy signal
np.random.seed(0)
t = np.linspace(0, 1, 500, endpoint=False)
signal = np.sin(2 * np.pi * 7 * t) + 0.5 * np.random.randn(500)

# Design a low-pass filter
def butter_lowpass(cutoff, fs, order=5):
    nyquist = 0.5 * fs
    normal_cutoff = cutoff / nyquist
    b, a = butter(order, normal_cutoff, btype='low', analog=False)
    return b, a

# Apply the filter to the signal
def lowpass_filter(data, cutoff, fs, order=5):
    b, a = butter_lowpass(cutoff, fs, order=order)
    y = lfilter(b, a, data)
    return y

# Parameters
cutoff = 3.5  # Desired cutoff frequency of the filter, Hz
fs = 50.0     # Sample rate, Hz

# Filter the signal
filtered_signal = lowpass_filter(signal, cutoff, fs)

# Plot the results
plt.plot(t, signal, label='Noisy signal')
plt.plot(t, filtered_signal, label='Filtered signal', linewidth=2)
plt.legend()
plt.show()

[Output]

These examples provide a glimpse of the powerful tools available in SciPy for scientific computing, including optimization, integration, interpolation, and signal processing.

PuLP in Python

August 4, 2024

PuLP in Python

PuLP is a Python library used for linear programming (LP) and mixed-integer linear programming (MILP).

It allows you to define and solve optimization problems where you want to minimize or maximize a linear objective function subject to linear constraints.

Installation

First, you need to install PuLP. You can do this using pip:

1	pip install pulp

Basic Usage of PuLP

Here’s a step-by-step guide to solving a simple linear programming problem with PuLP.

Example Problem:

Suppose you are a factory manager and you want to determine the optimal number of products $A$ and $B$ to produce to maximize your profit.
Each product requires a certain amount of resources (time, materials) and yields a certain profit.

Objective: Maximize profit.
Constraints:
- You have $100$ units of material.
- You have $80$ hours of labor.
- Product $A$ requires $4$ units of material and $2$ hours of labor.
- Product $B$ requires $3$ units of material and $5$ hours of labor.
Profit:
- Product $A$ gives $20 profit per unit.
- Product $B$ gives $25 profit per unit.

Step-by-Step Implementation

import pulp

# Step 1: Define the problem
# Create a maximization problem
problem = pulp.LpProblem("Maximize_Profit", pulp.LpMaximize)

# Step 2: Define the decision variables
# Let x be the number of product A to produce, y be the number of product B to produce
x = pulp.LpVariable('x', lowBound=0, cat='Continuous')
y = pulp.LpVariable('y', lowBound=0, cat='Continuous')

# Step 3: Define the objective function
# Objective function: Maximize 20x + 25y
problem += 20*x + 25*y, "Profit"

# Step 4: Define the constraints
# Constraint 1: 4x + 3y <= 100 (Material constraint)
problem += 4*x + 3*y <= 100, "Material"

# Constraint 2: 2x + 5y <= 80 (Labor constraint)
problem += 2*x + 5*y <= 80, "Labor"

# Step 5: Solve the problem
problem.solve()

# Step 6: Display the results
print("Status:", pulp.LpStatus[problem.status])
print("Optimal number of product A to produce:", pulp.value(x))
print("Optimal number of product B to produce:", pulp.value(y))
print("Maximum Profit:", pulp.value(problem.objective))

Explanation of the Code:

Define the Problem: We create a linear programming problem with the objective to maximize profit.
Decision Variables: We define the decision variables x and y for the number of products A and B to produce.
These variables are continuous and non-negative.
Objective Function: We define the objective function to maximize, which is the total profit: 20*x + 25*y.
Constraints: We add constraints based on the available resources:
- Material constraint: 4*x + 3*y <= 100
- Labor constraint: 2*x + 5*y <= 80
Solve the Problem: We solve the linear programming problem using PuLP’s solve method.
Display the Results: We print the optimal values of x and y, and the maximum profit.

Interpretation of Output:

Output

Status: Optimal
Optimal number of product A to produce: 18.571429
Optimal number of product B to produce: 8.5714286
Maximum Profit: 585.714295

Interpretation

The optimal solution is to produce $18.5$ units of product A and $8.57$ units of product B, which yields a maximum profit of $585.71.

This is a basic example of using PuLP in Python.

The library is powerful and can handle more complex constraints, variables, and objectives, including mixed-integer programming.

Altair

August 3, 2024

Altair

Here’s a advanced and useful sample code using Altair, which demonstrates how to create a layered chart with interactivity, such as tooltips and selection, along with custom encoding:

import altair as alt
from vega_datasets import data

# Load dataset
source = data.cars()

# Define a brush selection
brush = alt.selection(type='interval')

# Base chart with a scatter plot
base = alt.Chart(source).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=alt.condition(brush, 'Origin:N', alt.value('lightgray')),
    tooltip=['Name:N', 'Origin:N', 'Horsepower:Q', 'Miles_per_Gallon:Q']
).properties(
    width=600,
    height=400
).add_selection(
    brush
)

# Layer a bar chart on top that shows the distribution of 'Origin' for selected points
bars = alt.Chart(source).mark_bar().encode(
    x='count():Q',
    y='Origin:N',
    color='Origin:N'
).transform_filter(
    brush
)

# Combine the scatter plot and the bar chart
chart = base & bars
chart

Result

Explanation

Brush Selection:
An interactive selection that allows users to drag over the scatter plot to select a region.
Scatter Plot:
A basic plot where points are colored by their origin, and the color changes based on the selection.
Bar Chart:
Shows the distribution of car origins, updated based on the selection made in the scatter plot.
Interactivity:

Tooltips provide detailed information when hovering over the points, and the chart updates dynamically based on the user’s selection.

This example combines interactivity with advanced encoding and layout, making it useful for exploratory data analysis.