Portfolio Optimization (CVXPY)

CVXPY

Let’s dive into a complex optimization problem using $CVXPY$.

This example involves solving a portfolio optimization problem.

The goal is to allocate investments across multiple assets to maximize expected returns while minimizing risk, with constraints on the investment.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import cvxpy as cp
import numpy as np
import matplotlib.pyplot as plt

# Seed for reproducibility
np.random.seed(42)

# Number of assets
n = 10

# Generate random expected returns for each asset
mu = np.random.rand(n)

# Generate a random covariance matrix for asset returns
Sigma = np.random.rand(n, n)
Sigma = Sigma.T @ Sigma # To ensure it is positive semi-definite

# Define the variables (allocation of investments in each asset)
w = cp.Variable(n)

# Define the constraints
constraints = [
cp.sum(w) == 1, # The weights must sum to 1 (100% investment)
w >= 0, # No short-selling (non-negative weights)
]

# Define the objective function (maximize returns, minimize risk)
gamma = cp.Parameter(nonneg=True) # Risk aversion parameter
objective = cp.Maximize(mu.T @ w - gamma * cp.quad_form(w, Sigma))

# Define the problem
problem = cp.Problem(objective, constraints)

# Solve the problem for different values of gamma (risk aversion)
gammas = np.logspace(-3, 3, 100)
returns = []
risks = []

for g in gammas:
gamma.value = g
problem.solve()
returns.append(mu.T @ w.value)
risks.append(np.sqrt(w.value.T @ Sigma @ w.value))

# Plot the efficient frontier
plt.figure(figsize=(10, 6))
plt.plot(risks, returns, marker='o')
plt.xlabel("Risk (Standard Deviation)")
plt.ylabel("Expected Return")
plt.title("Efficient Frontier in Portfolio Optimization")
plt.grid(True)
plt.show()

Explanation of the Code:

  1. Problem Context:

    • The task is to allocate capital across n assets to maximize returns while managing risk. This is a typical problem in finance known as portfolio optimization.
  2. Expected Returns (mu):

    • mu represents the expected returns of each asset. In this example, these are randomly generated.
  3. Covariance Matrix (Sigma):

    • Sigma represents the covariance matrix of asset returns, showing how the returns on different assets move together. It’s crucial for understanding the risk of the portfolio.
  4. Investment Weights (w):

    • w is the decision variable, representing the fraction of total capital allocated to each asset.
  5. Objective Function:

    • The objective is to maximize the portfolio’s expected return (mu.T @ w) while penalizing risk (gamma * cp.quad_form(w, Sigma)). The risk aversion parameter gamma controls the trade-off between return and risk.
  6. Constraints:

    • The constraints ensure that the sum of all investments equals 1 (cp.sum(w) == 1), meaning all capital is invested, and no short-selling is allowed (w >= 0).
  7. Efficient Frontier:

    • The code solves the optimization problem for different levels of risk aversion (gamma). The results are used to plot the “efficient frontier,” a curve that shows the best possible expected return for each level of risk.
  8. Plotting:

    • The efficient frontier is plotted, showing the relationship between risk and return for the optimized portfolios.

Important Notes:

  • Efficient Frontier: The efficient frontier is a key concept in portfolio theory. It represents the set of portfolios that offer the highest expected return for a given level of risk.
  • Quadratic Programming: The problem involves quadratic programming due to the risk term (cp.quad_form(w, Sigma)), which is a quadratic function of the weights.
  • Risk Aversion: By varying gamma, the level of risk aversion, you can explore how different trade-offs between risk and return affect the optimal portfolio.

This example demonstrates a more advanced use of $CVXPY$ in a practical financial context, incorporating quadratic programming and exploring the trade-offs in a multi-objective optimization problem.

Output:

The graph you’ve provided is an Efficient Frontier plot in the context of portfolio optimization.

Here’s a breakdown of what this graph represents:

1. Axes:

  • X-axis (Risk - Standard Deviation): This axis represents the risk associated with the portfolio, measured by the standard deviation of the portfolio’s returns. Higher standard deviation indicates higher risk.
  • Y-axis (Expected Return): This axis shows the expected return of the portfolio. Higher values indicate greater expected returns.

2. Efficient Frontier:

  • The curve itself represents the Efficient Frontier. This is a fundamental concept in portfolio theory. Each point on the curve corresponds to a portfolio that offers the highest expected return for a given level of risk.
  • Moving along the curve from left to right, the risk increases (standard deviation increases), and the expected return also increases. However, the rate of increase in return diminishes as risk increases, which reflects the diminishing returns of taking on additional risk.

3. Interpretation:

  • Leftmost part of the curve: This section represents portfolios with the lowest risk. These portfolios also tend to have lower returns, but they are considered the “safest” in terms of volatility.
  • Rightmost part of the curve: This section shows portfolios with higher risk and higher expected returns. However, these portfolios are more volatile and less predictable.
  • Middle of the curve: This section typically represents the best trade-offs between risk and return. These portfolios are often preferred by investors seeking an optimal balance between risk and return.

4. Why the Curve?:

  • The shape of the curve is due to the fact that increasing risk does not linearly increase expected returns. Initially, small increases in risk can lead to significant increases in return, but after a certain point, additional risk leads to only marginal increases in return.

5. Use in Investment Decisions:

  • Investors use the efficient frontier to choose a portfolio that matches their risk tolerance. For example:
    • A risk-averse investor might choose a portfolio on the lower left of the curve, accepting lower returns for lower risk.
    • A risk-seeking investor might choose a portfolio on the upper right, accepting higher risk for higher potential returns.

6. The Underlying Optimization:

  • The curve is derived from solving multiple optimization problems where the objective is to maximize expected return for a given level of risk or minimize risk for a given level of expected return. These problems are typically solved using mathematical optimization techniques like quadratic programming.

In summary, this graph visualizes the trade-offs between risk and return for a set of optimal portfolios.

Investors can use it to select a portfolio that best aligns with their financial goals and risk tolerance.

Seaborn in Python(3)

Seaborn in Python(3)

Here’s a more complicated visualization example using $Seaborn$, which includes a combination of different plot types, customized aesthetics, and advanced features like hue, size, and style mappings.

This graph will visualize the relationship between multiple variables in a dataset, offering insights into complex interactions.

Complex Seaborn Visualization: Multi-Variable Plot with Customizations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load the 'penguins' dataset from Seaborn
penguins = sns.load_dataset("penguins")

# Set the aesthetic style of the plots
sns.set_style("whitegrid")

# Create a scatter plot with different markers for species and a regression line
plt.figure(figsize=(12, 8))
sns.scatterplot(
data=penguins,
x="bill_length_mm",
y="bill_depth_mm",
hue="species",
style="island",
size="flipper_length_mm",
sizes=(20, 200),
palette="deep",
markers=["o", "s", "D"],
edgecolor="black"
)

# Add a regression line for each species
sns.regplot(
data=penguins,
x="bill_length_mm",
y="bill_depth_mm",
scatter=False,
color="gray",
line_kws={'lw': 1, 'linestyle': '--'}
)

# Customize the legend
plt.legend(
title='Species, Island, & Flipper Length',
title_fontsize='13',
fontsize='10',
loc='upper left',
bbox_to_anchor=(1, 1),
borderaxespad=0
)

# Add a title and labels
plt.title("Penguins: Bill Length vs. Bill Depth with Flipper Length and Island Information", fontsize=16)
plt.xlabel("Bill Length (mm)", fontsize=14)
plt.ylabel("Bill Depth (mm)", fontsize=14)

# Show the plot
plt.tight_layout()
plt.show()

Explanation of the Plot

  • Scatter Plot:
    The scatter plot displays the relationship between bill length and bill depth for penguins, with points representing individual penguins.
  • Hue:
    Different species of penguins are distinguished by different colors.
  • Style:
    Penguins from different islands are shown with different marker styles (circle, square, diamond).
  • Size:
    The size of the markers corresponds to the flipper length, with larger markers representing longer flippers.
  • Regression Line:
    A dashed regression line is overlaid to show the trend between bill length and depth for the entire dataset.
  • Legend:
    The legend provides information on species, island, and flipper length, helping to interpret the plot.

This visualization combines multiple layers of information into a single, interpretable plot, making it ideal for exploring complex datasets where multiple variables interact.

Output

Seaborn in Python(2)

Seaborn in Python

Here’s a more complex example using $Seaborn$ that involves multiple types of plots combined into a single figure.

This example demonstrates a $FacetGrid$ with customizations, including different plots for subsets of data and a combined heatmap with a categorical scatter plot.

1. FacetGrid with Multiple Plots

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import gridspec

# Load the Titanic dataset
titanic = sns.load_dataset("titanic")

# Create a FacetGrid showing survival rate across different classes and genders
g = sns.FacetGrid(titanic, row="sex", col="class", margin_titles=True, height=4)
g.map(sns.histplot, "age", bins=20, kde=True)

# Add a title
plt.subplots_adjust(top=0.9)
g.fig.suptitle('Survival Rate by Age, Sex, and Class')

# Show the plot
plt.show()

Output:

2. Combined Heatmap and Categorical Scatter Plot

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load the flights dataset
flights = sns.load_dataset("flights")

# Pivot the dataset for a heatmap
flights_pivot = flights.pivot(index="month", columns="year", values="passengers")

# Set up the figure with a specific layout
fig = plt.figure(figsize=(14, 8))
gs = gridspec.GridSpec(2, 2, width_ratios=[3, 1], height_ratios=[1, 3])

# Heatmap on the left
ax0 = plt.subplot(gs[:, 0])
sns.heatmap(flights_pivot, annot=True, fmt="d", cmap="YlGnBu", ax=ax0)

# Categorical scatter plot (strip plot) on the right
ax1 = plt.subplot(gs[1, 1])
sns.stripplot(x="year", y="passengers", data=flights, jitter=True, ax=ax1)

# Add a title and labels
ax0.set_title('Flights Heatmap (Year vs Month)')
ax1.set_title('Yearly Passenger Distribution')
ax1.set_ylabel('')

# Show the plot
plt.tight_layout()
plt.show()

Output:

3. Violin Plot with a Swarm Plot Overlay

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import seaborn as sns
import matplotlib.pyplot as plt

# Load the iris dataset
iris = sns.load_dataset("iris")

# Create a violin plot
sns.violinplot(x="species", y="petal_length", data=iris, inner=None, palette="Set2")

# Overlay with a swarm plot
sns.swarmplot(x="species", y="petal_length", data=iris, color="k", alpha=0.7)

# Add a title
plt.title("Violin and Swarm Plot of Iris Petal Length by Species")

# Show the plot
plt.show()

Output:

4. PairGrid with Custom Plots

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import seaborn as sns
import matplotlib.pyplot as plt

# Load the tips dataset
tips = sns.load_dataset("tips")

# Create a PairGrid with different plots on the diagonal, upper, and lower triangles
g = sns.PairGrid(tips, diag_sharey=False)
g.map_upper(sns.scatterplot)
g.map_lower(sns.kdeplot, cmap="Blues_d")
g.map_diag(sns.histplot, kde_kws={"color": "k"})

# Add a title
g.fig.suptitle('PairGrid with Custom Plots', y=1.02)

# Show the plot
plt.show()

Output:


These examples demonstrate more advanced uses of $Seaborn$, including combining different plot types, customizing layouts, and visualizing complex datasets.

Seaborn in Python

Seaborn in Python

Here’s a sample code using $Seaborn$, a $Python$ visualization library based on $Matplotlib$, that creates a variety of plots.

This example demonstrates how to create a simple scatter plot with regression lines, a pair plot, and a heatmap.

1. Scatter Plot with Regression Line

1
2
3
4
5
6
7
8
9
10
11
import seaborn as sns
import matplotlib.pyplot as plt

# Load the example dataset for tips
tips = sns.load_dataset("tips")

# Create a scatter plot with a regression line
sns.lmplot(x="total_bill", y="tip", data=tips)

# Show the plot
plt.show()

Output:

2. Pair Plot

1
2
3
4
5
6
7
8
9
10
11
import seaborn as sns
import matplotlib.pyplot as plt

# Load the iris dataset
iris = sns.load_dataset("iris")

# Create a pair plot
sns.pairplot(iris, hue="species")

# Show the plot
plt.show()

Output:

3. Heatmap

1
2
3
4
5
6
7
8
9
10
11
12
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Create a random matrix
data = np.random.rand(10, 12)

# Create a heatmap
sns.heatmap(data, annot=True, cmap="coolwarm")

# Show the plot
plt.show()

Output:

4. Box Plot

1
2
3
4
5
6
7
8
9
10
11
import seaborn as sns
import matplotlib.pyplot as plt

# Load the Titanic dataset
titanic = sns.load_dataset("titanic")

# Create a box plot
sns.boxplot(x="class", y="age", data=titanic, palette="Set3")

# Show the plot
plt.show()

Output:


These examples showcase the basics of using $Seaborn$ for data visualization.

The library provides a variety of other plot types and customization options to explore!

XGBoost in Python

XGBoost in Python

Here’s a basic example of how to use $XGBoost$ in $Python$ for a classification task.

This example uses the popular $Iris$ $dataset$.

Install XGBoost

If you don’t have $XGBoost$ installed, you can install it using pip:

1
pip install xgboost

Sample Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import xgboost as xgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert the dataset into DMatrix, which is XGBoost's internal data structure
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Set parameters for XGBoost
params = {
'objective': 'multi:softmax', # Specify the loss function
'num_class': 3, # Number of classes in the dataset
'max_depth': 3, # Maximum depth of a tree
'eta': 0.3, # Step size shrinkage
'eval_metric': 'mlogloss' # Evaluation metric
}

# Train the model
num_rounds = 50 # Number of boosting rounds
bst = xgb.train(params, dtrain, num_rounds)

# Make predictions on the test set
y_pred = bst.predict(dtest)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')

Explanation

  1. Loading the Dataset: We use the $Iris$ $dataset$, which is a common dataset for classification tasks.
    It contains three classes of flowers.

  2. Splitting the Data: The dataset is split into training and testing sets using train_test_split.

  3. DMatrix: $XGBoost$ uses its own data structure called DMatrix for training.
    It is more efficient and optimized for $XGBoost$ operations.

  4. Setting Parameters:

    • objective: Defines the learning task and the corresponding objective function.
      In this case, multi:softmax is used for multiclass classification.
    • num_class: Specifies the number of classes.
    • max_depth: The maximum depth of the trees.
    • eta: The learning rate.
  5. Training: The model is trained using the train function with the specified parameters and number of boosting rounds.

  6. Prediction: The trained model makes predictions on the test set, and the accuracy is calculated using accuracy_score.

This is a basic example, but $XGBoost$ offers a wide range of parameters and options that can be fine-tuned for different types of data and tasks.

Output

1
Accuracy: 100.00%

statsmodels in Python

statsmodels in Python

$statsmodels$ is a powerful $Python$ library used for statistical modeling and analysis.

It provides classes and functions for many statistical models, such as linear regression, generalized linear models, time series analysis, and more.

Here are some sample codes using $statsmodels$:

1. Ordinary Least Squares (OLS) Regression

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import statsmodels.api as sm
import numpy as np
import pandas as pd

# Generate some example data
np.random.seed(0)
X = np.random.rand(100)
y = 2 * X + 1 + np.random.randn(100) * 0.1

# Add a constant term (intercept) to the independent variable
X = sm.add_constant(X)

# Fit an OLS model
model = sm.OLS(y, X)
results = model.fit()

# Print out the summary of the regression
print(results.summary())

Explanation:

  • X is the independent variable, and y is the dependent variable.
  • sm.add_constant(X) adds a constant (intercept) to the model.
  • model = sm.OLS(y, X) creates an OLS model.
  • results = model.fit() fits the model to the data.
  • results.summary() provides a detailed summary of the regression results.

Output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
                            OLS Regression Results                            
==============================================================================
Dep. Variable: y R-squared: 0.971
Model: OLS Adj. R-squared: 0.971
Method: Least Squares F-statistic: 3262.
Date: Mon, 12 Aug 2024 Prob (F-statistic): 4.88e-77
Time: 02:46:51 Log-Likelihood: 88.744
No. Observations: 100 AIC: -173.5
Df Residuals: 98 BIC: -168.3
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 1.0222 0.019 52.884 0.000 0.984 1.061
x1 1.9937 0.035 57.117 0.000 1.924 2.063
==============================================================================
Omnibus: 11.746 Durbin-Watson: 2.083
Prob(Omnibus): 0.003 Jarque-Bera (JB): 4.097
Skew: 0.138 Prob(JB): 0.129
Kurtosis: 2.047 Cond. No. 4.30
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

2. Logistic Regression

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import statsmodels.api as sm
import numpy as np

# Generate some example data
np.random.seed(0)
n_samples = 100
X = np.random.rand(n_samples, 1)
X = sm.add_constant(X) # Add intercept

# Binary outcome (0 or 1)
y = (X[:, 1] + np.random.randn(n_samples) * 0.1 > 0.5).astype(int)

# Fit a logistic regression model
model = sm.Logit(y, X)
results = model.fit()

# Print out the summary of the logistic regression
print(results.summary())

Explanation:

  • y is a binary outcome ($0$ or $1$).
  • sm.Logit(y, X) creates a logistic regression model.
  • results = model.fit() fits the logistic regression model to the data.
  • results.summary() provides a summary of the logistic regression results.

Output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Optimization terminated successfully.
Current function value: 0.195832
Iterations 9
Logit Regression Results
==============================================================================
Dep. Variable: y No. Observations: 100
Model: Logit Df Residuals: 98
Method: MLE Df Model: 1
Date: Mon, 12 Aug 2024 Pseudo R-squ.: 0.7175
Time: 02:47:53 Log-Likelihood: -19.583
converged: True LL-Null: -69.315
Covariance Type: nonrobust LLR p-value: 1.999e-23
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -8.1375 1.924 -4.229 0.000 -11.909 -4.366
x1 16.9601 3.829 4.430 0.000 9.456 24.464
==============================================================================

3. Time Series Analysis using ARIMA

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import statsmodels.api as sm
import numpy as np

# Generate some example time series data
np.random.seed(0)
n_samples = 100
time_series_data = np.cumsum(np.random.randn(n_samples))

# Fit an ARIMA model (ARIMA(p, d, q))
model = sm.tsa.ARIMA(time_series_data, order=(1, 1, 1))
results = model.fit()

# Print out the summary of the ARIMA model
print(results.summary())

# Plot the forecast
forecast = results.get_forecast(steps=10)
forecast_index = np.arange(len(time_series_data), len(time_series_data) + 10)
forecast_mean = forecast.predicted_mean

import matplotlib.pyplot as plt
plt.plot(time_series_data, label='Original Time Series')
plt.plot(forecast_index, forecast_mean, label='Forecast')
plt.legend()
plt.show()

Explanation:

  • time_series_data is the time series data to model.
  • sm.tsa.ARIMA(time_series_data, order=(1, 1, 1)) creates an ARIMA model with specific orders for AR, differencing, and MA.
  • results.get_forecast(steps=10) forecasts the next $10$ steps in the time series.

Output:

4. ANOVA (Analysis of Variance)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import statsmodels.api as sm
import pandas as pd
from statsmodels.formula.api import ols

# Create example data
np.random.seed(0)
df = pd.DataFrame({
'Group': np.repeat(['A', 'B', 'C'], 20),
'Value': np.concatenate([np.random.randn(20) + 1, np.random.randn(20), np.random.randn(20) - 1])
})

# Fit an OLS model with categorical variable
model = ols('Value ~ Group', data=df).fit()

# Perform ANOVA
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

Explanation:

  • df contains a categorical variable Group and a continuous variable Value.
  • ols('Value ~ Group', data=df) fits an OLS model treating Group as a categorical variable.
  • sm.stats.anova_lm(model, typ=2) performs ANOVA on the fitted model.

Output:

1
2
3
             sum_sq    df          F        PR(>F)
Group 87.890846 2.0 43.127056 3.920858e-12
Residual 58.081616 57.0 NaN NaN

5. Autoregressive Model (AR)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import statsmodels.api as sm
from statsmodels.tsa.ar_model import AutoReg
import numpy as np
import matplotlib.pyplot as plt

# Generate some example time series data
np.random.seed(0)
n_samples = 100
time_series_data = np.cumsum(np.random.randn(n_samples))

# Fit an Autoregressive (AR) model using AutoReg
model = AutoReg(time_series_data, lags=1)
results = model.fit()

# Print out the summary of the AR model
print(results.summary())

# Plot the predictions
predictions = results.predict(start=90, end=109)

plt.plot(time_series_data, label='Original Time Series')
plt.plot(range(90, 110), predictions, label='Predicted')
plt.legend()
plt.show()

Explanation:

  • AutoReg(time_series_data, lags=1) creates an autoregressive model with a specified number of lags.
  • The lags parameter specifies how many previous time points to use for predicting the next value.
  • results.predict(start=90, end=109) generates predictions for the specified range.

Output:


These examples demonstrate the versatility of $statsmodels$ in performing various statistical analyses and generating useful insights from data.

Bokeh in Python

Bokeh in Python

Here’s an example of how to create some useful graphs using $Bokeh$ in $Python$:

  1. Line Plot:
    A simple line plot showing trends over time.
  2. Bar Plot:
    A bar plot to compare categories.
  3. Scatter Plot:
    A scatter plot to see the relationship between two variables.
  4. Histogram:
    A histogram to visualize the distribution of data.

First, you need to install $Bokeh$ if you haven’t already:

1
pip install bokeh

Here’s the code to create these plots:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
from bokeh.plotting import figure, show, output_notebook
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource
import numpy as np

# Prepare data
x = np.linspace(0, 4 * np.pi, 100)
y = np.sin(x)

categories = ['A', 'B', 'C', 'D']
values = [10, 20, 30, 40]

np.random.seed(1)
x_scatter = np.random.rand(100)
y_scatter = np.random.rand(100)

data = np.random.randn(1000)

# Output to notebook
output_notebook()

# Line plot
p1 = figure(title="Line Plot", x_axis_label='x', y_axis_label='y')
p1.line(x, y, legend_label="sin(x)", line_width=2)

# Bar plot
p2 = figure(x_range=categories, title="Bar Plot", x_axis_label='Category', y_axis_label='Values')
p2.vbar(x=categories, top=values, width=0.5)

# Scatter plot
p3 = figure(title="Scatter Plot", x_axis_label='X', y_axis_label='Y')
p3.scatter(x_scatter, y_scatter, size=8, color="navy", alpha=0.5)

# Histogram
hist, edges = np.histogram(data, bins=30)
p4 = figure(title="Histogram", x_axis_label='Value', y_axis_label='Frequency')
p4.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], fill_color="navy", line_color="white", alpha=0.7)

# Arrange plots in a grid
grid = gridplot([[p1, p2], [p3, p4]])

# Show the plots
show(grid)

Explanation

  1. Line Plot:

    • figure() creates a new plot with a title and axis labels.
    • p1.line(x, y, ...) adds a line to the plot using the data in x and y.
  2. Bar Plot:

    • The x-axis is categorical, so x_range=categories.
    • p2.vbar(...) draws vertical bars.
  3. Scatter Plot:

    • A scatter plot shows the relationship between two variables using dots.
    • p3.scatter(...) plots the points.
  4. Histogram:

    • np.histogram computes the histogram of the data.
    • p4.quad(...) creates the bars of the histogram.

Running the Code

This code should be run in a $Jupyter notebook$ or a similar environment that supports $Bokeh$’s interactive plots.

It will output the plots inline.

Output


Google OR-Tools

Google OR-Tools

Here is a basic example of using $OR-Tools$ in $Python$ to solve a simple linear programming problem.

$OR-Tools$ is a powerful optimization library provided by Google, and it can be used to solve a wide range of problems, including linear programming, mixed-integer programming, constraint programming, and more.

Example: Linear Programming with OR-Tools

This example demonstrates solving a linear programming problem where we want to maximize the objective function $3x + 4y$ subject to some constraints.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from ortools.linear_solver import pywraplp

def main():
# Create the solver with the SCIP backend.
solver = pywraplp.Solver.CreateSolver('SCIP')
if not solver:
return

# Create the variables x and y.
x = solver.NumVar(0, solver.infinity(), 'x')
y = solver.NumVar(0, solver.infinity(), 'y')

# Create the constraints.
solver.Add(2 * x + 3 * y <= 12)
solver.Add(4 * x + y <= 14)
solver.Add(3 * x - y >= 0)

# Define the objective function.
objective = solver.Maximize(3 * x + 4 * y)

# Solve the problem.
status = solver.Solve()

# Check the result status.
if status == pywraplp.Solver.OPTIMAL:
print('Solution:')
print('Objective value =', solver.Objective().Value())
print('x =', x.solution_value())
print('y =', y.solution_value())
else:
print('The problem does not have an optimal solution.')

if __name__ == '__main__':
main()

Explanation

  • Solver: We create a solver instance using pywraplp.Solver.CreateSolver('SCIP'). $SCIP$ is a powerful mixed-integer programming solver, and $OR-Tools$ uses it as one of its backends.
  • Variables: We define two variables, x and y, both with a lower bound of 0 and an upper bound of infinity.
  • Constraints: We add three constraints:
    1. $(2x + 3y \leq 12)$
    2. $(4x + y \leq 14)$
    3. $(3x - y \geq 0)$
  • Objective: We want to maximize the function $3x + 4y$.
  • Solve: The solver solves the problem, and we check if an optimal solution was found.
  • Result: If a solution is found, it prints the optimal objective value and the values of x and y.

Output

1
2
3
4
Solution:
Objective value = 17.0
x = 2.9999999999999996
y = 2.0000000000000018

Let’s break down the result of the optimization problem using $OR-Tools$:

Objective Value:

  • Objective value = 17.0: This is the maximum value of the objective function 3x + 4y given the constraints.
    The solver found that this is the highest value that can be achieved without violating any of the constraints.

Variable Values:

  • x = 2.9999999999999996: This is the optimal value of the variable x that maximizes the objective function.
    Due to floating-point precision in computational mathematics, this value is extremely close to 3 (but not exactly 3).
  • y = 2.0000000000000018: Similarly, this is the optimal value of the variable y. This value is extremely close to 2.

Interpretation:

  • Floating-Point Precision: The values 2.9999999999999996 for x and 2.0000000000000018 for y are due to the way computers handle floating-point arithmetic. In practice, these values can be considered as x = 3 and y = 2.

  • Objective Function Calculation: Given the optimal values of x and y, we can calculate the objective function:
    $$
    3x + 4y = 3(3) + 4(2) = 9 + 8 = 17
    $$
    This confirms that the objective value of 17.0 is indeed the maximum value that can be achieved under the given constraints.

Summary:

The solver has determined that to achieve the maximum value of 17 for the objective function 3x + 4y, the values of x and y should be approximately 3 and 2, respectively.

The slight deviations from exact integers are due to the limitations of floating-point representation in computers.

Running the Code

To run this code, ensure you have installed the $OR-Tools$ package.
You can install it using pip:

1
pip install ortools

This example should give you a good starting point for working with $OR-Tools$ in $Python$.

NetworkX in Python

NetworkX in Python

Here’s a useful sample code in $Python$ that uses the $NetworkX$ library to create, visualize, and analyze a graph.

This example covers basic graph operations, such as adding nodes and edges, visualizing the graph, and calculating common network metrics.

1. Installation

First, if you haven’t installed $NetworkX$, you can do so using:

1
2
!pip install networkx
!pip install matplotlib

2. Creating and Visualizing a Graph

Here’s a basic example of how to create a graph, add nodes and edges, visualize it, and calculate some metrics.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import networkx as nx
import matplotlib.pyplot as plt

# Create a new graph
G = nx.Graph()

# Add nodes
G.add_node("A")
G.add_node("B")
G.add_node("C")
G.add_node("D")

# Add edges
G.add_edge("A", "B")
G.add_edge("A", "C")
G.add_edge("B", "D")
G.add_edge("C", "D")
G.add_edge("A", "D")

# Draw the graph
nx.draw(G, with_labels=True, node_color='lightblue', edge_color='gray', node_size=3000, font_size=20)
plt.show()

# Calculate some basic metrics
print("Nodes in the graph:", G.nodes())
print("Edges in the graph:", G.edges())

# Degree of each node
print("\nNode degrees:")
for node, degree in G.degree():
print(f"{node}: {degree}")

# Shortest path from A to D
print("\nShortest path from A to D:", nx.shortest_path(G, source="A", target="D"))

# Clustering coefficient of each node
print("\nClustering coefficient:")
for node, clustering in nx.clustering(G).items():
print(f"{node}: {clustering}")

# Graph density
print("\nGraph density:", nx.density(G))

Explanation of the Code:

  1. Graph Creation:

    • We create a new undirected graph using nx.Graph().
    • Nodes (“A”, “B”, “C”, “D”) are added individually.
    • Edges are added between nodes to define the relationships.
  2. Graph Visualization:

    • The nx.draw() function is used to visualize the graph.
      Nodes and edges are displayed with specified colors and sizes.
    • plt.show() displays the plot.
  3. Basic Graph Metrics:

    • Nodes and Edges: G.nodes() and G.edges() list all nodes and edges in the graph.
    • Degree: The degree of each node is calculated using G.degree(), which tells you how many connections each node has.
    • Shortest Path: The shortest path between two nodes is calculated using nx.shortest_path().
    • Clustering Coefficient: The clustering coefficient measures the degree to which nodes in the graph tend to cluster together.
    • Density: The density of a graph is calculated using nx.density(), which gives the ratio of the number of edges to the number of possible edges.

Output

The graph is visualized with nodes labeled “A”, “B”, “C”, and “D”.

The console will display the nodes, edges, degree of each node, the shortest path from node “A” to node “D”, the clustering coefficient of each node, and the overall graph density.

3. Advanced Example: Directed Graph with Weighted Edges

Here’s an example with a directed graph, weighted edges, and calculation of PageRank.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import networkx as nx
import matplotlib.pyplot as plt

# Create a directed graph
DG = nx.DiGraph()

# Add nodes
DG.add_nodes_from(["A", "B", "C", "D"])

# Add weighted edges
DG.add_weighted_edges_from([("A", "B", 0.6), ("B", "C", 0.2), ("C", "D", 0.1), ("A", "D", 0.7)])

# Draw the directed graph with edge labels showing weights
pos = nx.spring_layout(DG)
nx.draw(DG, pos, with_labels=True, node_color='lightgreen', edge_color='gray', node_size=3000, font_size=20)
edge_labels = nx.get_edge_attributes(DG, 'weight')
nx.draw_networkx_edge_labels(DG, pos, edge_labels=edge_labels)
plt.show()

# PageRank calculation
pagerank = nx.pagerank(DG)
print("\nPageRank:")
for node, rank in pagerank.items():
print(f"{node}: {rank:.4f}")

Output:

Explanation:

  • Directed Graph:
    A directed graph is created using nx.DiGraph().
  • Weighted Edges:
    Edges are added with weights, which represent the strength or importance of the connection.
  • Visualization:
    The directed graph is visualized with edge labels showing weights.
  • PageRank:
    The PageRank algorithm is used to rank nodes, showing the importance of each node in the graph.

This should give you a good starting point for working with $NetworkX$ in Python.

CVXPY in Python

CVXPY in Python

Here’s a basic example of using $CVXPY$, a $Python$ library for convex optimization.

This example solves a simple linear programming problem:

Problem

Minimize the function $( c^T x )$ subject to the constraint $( Ax \leq b )$ and $( x \geq 0 )$, where:

  • $( c )$ is a vector of coefficients for the objective function.
  • $( A )$ is a matrix of coefficients for the inequality constraints.
  • $( b )$ is a vector representing the upper bounds for the constraints.

Sample Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import cvxpy as cp
import numpy as np

# Define the data for the problem
c = np.array([1, 2]) # Coefficients for the objective function
A = np.array([[1, 1], [1, -1], [-1, 2]]) # Coefficients for the constraints
b = np.array([2, 1, 2]) # Right-hand side values for the constraints

# Define the optimization variables
x = cp.Variable(2)

# Define the objective function: minimize c^T x
objective = cp.Minimize(c @ x)

# Define the constraints: Ax <= b and x >= 0
constraints = [A @ x <= b, x >= 0]

# Formulate the problem
problem = cp.Problem(objective, constraints)

# Solve the problem
problem.solve()

# Print the results
print("Status:", problem.status)
print("Optimal value of the objective function:", problem.value)
print("Optimal values of the variables x:", x.value)

Explanation

  • Objective Function:
    c @ x is the dot product of the vector c with the variable vector x. We aim to minimize this value.
  • Constraints:
    A @ x <= b represents the inequality constraints, and x >= 0 ensures that the variables are non-negative.
  • Optimization:
    problem.solve() solves the optimization problem, and the optimal solution is stored in x.value.

Output

When you run this code, it will output the status of the optimization (e.g., “optimal”), the optimal value of the objective function, and the optimal values of the decision variables $( x )$.

1
2
3
Status: optimal
Optimal value of the objective function: 3.698490338406144e-10
Optimal values of the variables x: [3.59660015e-10 5.09450927e-12]

This example is a basic introduction, but $CVXPY$ can handle more complex problems, including quadratic programming, mixed-integer programming, and other types of convex optimization problems.