Seaborn in Python(3)

Seaborn in Python(3)

Here’s a more complicated visualization example using $Seaborn$, which includes a combination of different plot types, customized aesthetics, and advanced features like hue, size, and style mappings.

This graph will visualize the relationship between multiple variables in a dataset, offering insights into complex interactions.

Complex Seaborn Visualization: Multi-Variable Plot with Customizations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load the 'penguins' dataset from Seaborn
penguins = sns.load_dataset("penguins")

# Set the aesthetic style of the plots
sns.set_style("whitegrid")

# Create a scatter plot with different markers for species and a regression line
plt.figure(figsize=(12, 8))
sns.scatterplot(
data=penguins,
x="bill_length_mm",
y="bill_depth_mm",
hue="species",
style="island",
size="flipper_length_mm",
sizes=(20, 200),
palette="deep",
markers=["o", "s", "D"],
edgecolor="black"
)

# Add a regression line for each species
sns.regplot(
data=penguins,
x="bill_length_mm",
y="bill_depth_mm",
scatter=False,
color="gray",
line_kws={'lw': 1, 'linestyle': '--'}
)

# Customize the legend
plt.legend(
title='Species, Island, & Flipper Length',
title_fontsize='13',
fontsize='10',
loc='upper left',
bbox_to_anchor=(1, 1),
borderaxespad=0
)

# Add a title and labels
plt.title("Penguins: Bill Length vs. Bill Depth with Flipper Length and Island Information", fontsize=16)
plt.xlabel("Bill Length (mm)", fontsize=14)
plt.ylabel("Bill Depth (mm)", fontsize=14)

# Show the plot
plt.tight_layout()
plt.show()

Explanation of the Plot

  • Scatter Plot:
    The scatter plot displays the relationship between bill length and bill depth for penguins, with points representing individual penguins.
  • Hue:
    Different species of penguins are distinguished by different colors.
  • Style:
    Penguins from different islands are shown with different marker styles (circle, square, diamond).
  • Size:
    The size of the markers corresponds to the flipper length, with larger markers representing longer flippers.
  • Regression Line:
    A dashed regression line is overlaid to show the trend between bill length and depth for the entire dataset.
  • Legend:
    The legend provides information on species, island, and flipper length, helping to interpret the plot.

This visualization combines multiple layers of information into a single, interpretable plot, making it ideal for exploring complex datasets where multiple variables interact.

Output

Seaborn in Python(2)

Seaborn in Python

Here’s a more complex example using $Seaborn$ that involves multiple types of plots combined into a single figure.

This example demonstrates a $FacetGrid$ with customizations, including different plots for subsets of data and a combined heatmap with a categorical scatter plot.

1. FacetGrid with Multiple Plots

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import gridspec

# Load the Titanic dataset
titanic = sns.load_dataset("titanic")

# Create a FacetGrid showing survival rate across different classes and genders
g = sns.FacetGrid(titanic, row="sex", col="class", margin_titles=True, height=4)
g.map(sns.histplot, "age", bins=20, kde=True)

# Add a title
plt.subplots_adjust(top=0.9)
g.fig.suptitle('Survival Rate by Age, Sex, and Class')

# Show the plot
plt.show()

Output:

2. Combined Heatmap and Categorical Scatter Plot

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load the flights dataset
flights = sns.load_dataset("flights")

# Pivot the dataset for a heatmap
flights_pivot = flights.pivot(index="month", columns="year", values="passengers")

# Set up the figure with a specific layout
fig = plt.figure(figsize=(14, 8))
gs = gridspec.GridSpec(2, 2, width_ratios=[3, 1], height_ratios=[1, 3])

# Heatmap on the left
ax0 = plt.subplot(gs[:, 0])
sns.heatmap(flights_pivot, annot=True, fmt="d", cmap="YlGnBu", ax=ax0)

# Categorical scatter plot (strip plot) on the right
ax1 = plt.subplot(gs[1, 1])
sns.stripplot(x="year", y="passengers", data=flights, jitter=True, ax=ax1)

# Add a title and labels
ax0.set_title('Flights Heatmap (Year vs Month)')
ax1.set_title('Yearly Passenger Distribution')
ax1.set_ylabel('')

# Show the plot
plt.tight_layout()
plt.show()

Output:

3. Violin Plot with a Swarm Plot Overlay

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import seaborn as sns
import matplotlib.pyplot as plt

# Load the iris dataset
iris = sns.load_dataset("iris")

# Create a violin plot
sns.violinplot(x="species", y="petal_length", data=iris, inner=None, palette="Set2")

# Overlay with a swarm plot
sns.swarmplot(x="species", y="petal_length", data=iris, color="k", alpha=0.7)

# Add a title
plt.title("Violin and Swarm Plot of Iris Petal Length by Species")

# Show the plot
plt.show()

Output:

4. PairGrid with Custom Plots

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import seaborn as sns
import matplotlib.pyplot as plt

# Load the tips dataset
tips = sns.load_dataset("tips")

# Create a PairGrid with different plots on the diagonal, upper, and lower triangles
g = sns.PairGrid(tips, diag_sharey=False)
g.map_upper(sns.scatterplot)
g.map_lower(sns.kdeplot, cmap="Blues_d")
g.map_diag(sns.histplot, kde_kws={"color": "k"})

# Add a title
g.fig.suptitle('PairGrid with Custom Plots', y=1.02)

# Show the plot
plt.show()

Output:


These examples demonstrate more advanced uses of $Seaborn$, including combining different plot types, customizing layouts, and visualizing complex datasets.

Seaborn in Python

Seaborn in Python

Here’s a sample code using $Seaborn$, a $Python$ visualization library based on $Matplotlib$, that creates a variety of plots.

This example demonstrates how to create a simple scatter plot with regression lines, a pair plot, and a heatmap.

1. Scatter Plot with Regression Line

1
2
3
4
5
6
7
8
9
10
11
import seaborn as sns
import matplotlib.pyplot as plt

# Load the example dataset for tips
tips = sns.load_dataset("tips")

# Create a scatter plot with a regression line
sns.lmplot(x="total_bill", y="tip", data=tips)

# Show the plot
plt.show()

Output:

2. Pair Plot

1
2
3
4
5
6
7
8
9
10
11
import seaborn as sns
import matplotlib.pyplot as plt

# Load the iris dataset
iris = sns.load_dataset("iris")

# Create a pair plot
sns.pairplot(iris, hue="species")

# Show the plot
plt.show()

Output:

3. Heatmap

1
2
3
4
5
6
7
8
9
10
11
12
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Create a random matrix
data = np.random.rand(10, 12)

# Create a heatmap
sns.heatmap(data, annot=True, cmap="coolwarm")

# Show the plot
plt.show()

Output:

4. Box Plot

1
2
3
4
5
6
7
8
9
10
11
import seaborn as sns
import matplotlib.pyplot as plt

# Load the Titanic dataset
titanic = sns.load_dataset("titanic")

# Create a box plot
sns.boxplot(x="class", y="age", data=titanic, palette="Set3")

# Show the plot
plt.show()

Output:


These examples showcase the basics of using $Seaborn$ for data visualization.

The library provides a variety of other plot types and customization options to explore!

XGBoost in Python

XGBoost in Python

Here’s a basic example of how to use $XGBoost$ in $Python$ for a classification task.

This example uses the popular $Iris$ $dataset$.

Install XGBoost

If you don’t have $XGBoost$ installed, you can install it using pip:

1
pip install xgboost

Sample Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import xgboost as xgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert the dataset into DMatrix, which is XGBoost's internal data structure
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Set parameters for XGBoost
params = {
'objective': 'multi:softmax', # Specify the loss function
'num_class': 3, # Number of classes in the dataset
'max_depth': 3, # Maximum depth of a tree
'eta': 0.3, # Step size shrinkage
'eval_metric': 'mlogloss' # Evaluation metric
}

# Train the model
num_rounds = 50 # Number of boosting rounds
bst = xgb.train(params, dtrain, num_rounds)

# Make predictions on the test set
y_pred = bst.predict(dtest)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')

Explanation

  1. Loading the Dataset: We use the $Iris$ $dataset$, which is a common dataset for classification tasks.
    It contains three classes of flowers.

  2. Splitting the Data: The dataset is split into training and testing sets using train_test_split.

  3. DMatrix: $XGBoost$ uses its own data structure called DMatrix for training.
    It is more efficient and optimized for $XGBoost$ operations.

  4. Setting Parameters:

    • objective: Defines the learning task and the corresponding objective function.
      In this case, multi:softmax is used for multiclass classification.
    • num_class: Specifies the number of classes.
    • max_depth: The maximum depth of the trees.
    • eta: The learning rate.
  5. Training: The model is trained using the train function with the specified parameters and number of boosting rounds.

  6. Prediction: The trained model makes predictions on the test set, and the accuracy is calculated using accuracy_score.

This is a basic example, but $XGBoost$ offers a wide range of parameters and options that can be fine-tuned for different types of data and tasks.

Output

1
Accuracy: 100.00%

statsmodels in Python

statsmodels in Python

$statsmodels$ is a powerful $Python$ library used for statistical modeling and analysis.

It provides classes and functions for many statistical models, such as linear regression, generalized linear models, time series analysis, and more.

Here are some sample codes using $statsmodels$:

1. Ordinary Least Squares (OLS) Regression

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import statsmodels.api as sm
import numpy as np
import pandas as pd

# Generate some example data
np.random.seed(0)
X = np.random.rand(100)
y = 2 * X + 1 + np.random.randn(100) * 0.1

# Add a constant term (intercept) to the independent variable
X = sm.add_constant(X)

# Fit an OLS model
model = sm.OLS(y, X)
results = model.fit()

# Print out the summary of the regression
print(results.summary())

Explanation:

  • X is the independent variable, and y is the dependent variable.
  • sm.add_constant(X) adds a constant (intercept) to the model.
  • model = sm.OLS(y, X) creates an OLS model.
  • results = model.fit() fits the model to the data.
  • results.summary() provides a detailed summary of the regression results.

Output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
                            OLS Regression Results                            
==============================================================================
Dep. Variable: y R-squared: 0.971
Model: OLS Adj. R-squared: 0.971
Method: Least Squares F-statistic: 3262.
Date: Mon, 12 Aug 2024 Prob (F-statistic): 4.88e-77
Time: 02:46:51 Log-Likelihood: 88.744
No. Observations: 100 AIC: -173.5
Df Residuals: 98 BIC: -168.3
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 1.0222 0.019 52.884 0.000 0.984 1.061
x1 1.9937 0.035 57.117 0.000 1.924 2.063
==============================================================================
Omnibus: 11.746 Durbin-Watson: 2.083
Prob(Omnibus): 0.003 Jarque-Bera (JB): 4.097
Skew: 0.138 Prob(JB): 0.129
Kurtosis: 2.047 Cond. No. 4.30
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

2. Logistic Regression

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import statsmodels.api as sm
import numpy as np

# Generate some example data
np.random.seed(0)
n_samples = 100
X = np.random.rand(n_samples, 1)
X = sm.add_constant(X) # Add intercept

# Binary outcome (0 or 1)
y = (X[:, 1] + np.random.randn(n_samples) * 0.1 > 0.5).astype(int)

# Fit a logistic regression model
model = sm.Logit(y, X)
results = model.fit()

# Print out the summary of the logistic regression
print(results.summary())

Explanation:

  • y is a binary outcome ($0$ or $1$).
  • sm.Logit(y, X) creates a logistic regression model.
  • results = model.fit() fits the logistic regression model to the data.
  • results.summary() provides a summary of the logistic regression results.

Output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Optimization terminated successfully.
Current function value: 0.195832
Iterations 9
Logit Regression Results
==============================================================================
Dep. Variable: y No. Observations: 100
Model: Logit Df Residuals: 98
Method: MLE Df Model: 1
Date: Mon, 12 Aug 2024 Pseudo R-squ.: 0.7175
Time: 02:47:53 Log-Likelihood: -19.583
converged: True LL-Null: -69.315
Covariance Type: nonrobust LLR p-value: 1.999e-23
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -8.1375 1.924 -4.229 0.000 -11.909 -4.366
x1 16.9601 3.829 4.430 0.000 9.456 24.464
==============================================================================

3. Time Series Analysis using ARIMA

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import statsmodels.api as sm
import numpy as np

# Generate some example time series data
np.random.seed(0)
n_samples = 100
time_series_data = np.cumsum(np.random.randn(n_samples))

# Fit an ARIMA model (ARIMA(p, d, q))
model = sm.tsa.ARIMA(time_series_data, order=(1, 1, 1))
results = model.fit()

# Print out the summary of the ARIMA model
print(results.summary())

# Plot the forecast
forecast = results.get_forecast(steps=10)
forecast_index = np.arange(len(time_series_data), len(time_series_data) + 10)
forecast_mean = forecast.predicted_mean

import matplotlib.pyplot as plt
plt.plot(time_series_data, label='Original Time Series')
plt.plot(forecast_index, forecast_mean, label='Forecast')
plt.legend()
plt.show()

Explanation:

  • time_series_data is the time series data to model.
  • sm.tsa.ARIMA(time_series_data, order=(1, 1, 1)) creates an ARIMA model with specific orders for AR, differencing, and MA.
  • results.get_forecast(steps=10) forecasts the next $10$ steps in the time series.

Output:

4. ANOVA (Analysis of Variance)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import statsmodels.api as sm
import pandas as pd
from statsmodels.formula.api import ols

# Create example data
np.random.seed(0)
df = pd.DataFrame({
'Group': np.repeat(['A', 'B', 'C'], 20),
'Value': np.concatenate([np.random.randn(20) + 1, np.random.randn(20), np.random.randn(20) - 1])
})

# Fit an OLS model with categorical variable
model = ols('Value ~ Group', data=df).fit()

# Perform ANOVA
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

Explanation:

  • df contains a categorical variable Group and a continuous variable Value.
  • ols('Value ~ Group', data=df) fits an OLS model treating Group as a categorical variable.
  • sm.stats.anova_lm(model, typ=2) performs ANOVA on the fitted model.

Output:

1
2
3
             sum_sq    df          F        PR(>F)
Group 87.890846 2.0 43.127056 3.920858e-12
Residual 58.081616 57.0 NaN NaN

5. Autoregressive Model (AR)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import statsmodels.api as sm
from statsmodels.tsa.ar_model import AutoReg
import numpy as np
import matplotlib.pyplot as plt

# Generate some example time series data
np.random.seed(0)
n_samples = 100
time_series_data = np.cumsum(np.random.randn(n_samples))

# Fit an Autoregressive (AR) model using AutoReg
model = AutoReg(time_series_data, lags=1)
results = model.fit()

# Print out the summary of the AR model
print(results.summary())

# Plot the predictions
predictions = results.predict(start=90, end=109)

plt.plot(time_series_data, label='Original Time Series')
plt.plot(range(90, 110), predictions, label='Predicted')
plt.legend()
plt.show()

Explanation:

  • AutoReg(time_series_data, lags=1) creates an autoregressive model with a specified number of lags.
  • The lags parameter specifies how many previous time points to use for predicting the next value.
  • results.predict(start=90, end=109) generates predictions for the specified range.

Output:


These examples demonstrate the versatility of $statsmodels$ in performing various statistical analyses and generating useful insights from data.

Bokeh in Python

Bokeh in Python

Here’s an example of how to create some useful graphs using $Bokeh$ in $Python$:

  1. Line Plot:
    A simple line plot showing trends over time.
  2. Bar Plot:
    A bar plot to compare categories.
  3. Scatter Plot:
    A scatter plot to see the relationship between two variables.
  4. Histogram:
    A histogram to visualize the distribution of data.

First, you need to install $Bokeh$ if you haven’t already:

1
pip install bokeh

Here’s the code to create these plots:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
from bokeh.plotting import figure, show, output_notebook
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource
import numpy as np

# Prepare data
x = np.linspace(0, 4 * np.pi, 100)
y = np.sin(x)

categories = ['A', 'B', 'C', 'D']
values = [10, 20, 30, 40]

np.random.seed(1)
x_scatter = np.random.rand(100)
y_scatter = np.random.rand(100)

data = np.random.randn(1000)

# Output to notebook
output_notebook()

# Line plot
p1 = figure(title="Line Plot", x_axis_label='x', y_axis_label='y')
p1.line(x, y, legend_label="sin(x)", line_width=2)

# Bar plot
p2 = figure(x_range=categories, title="Bar Plot", x_axis_label='Category', y_axis_label='Values')
p2.vbar(x=categories, top=values, width=0.5)

# Scatter plot
p3 = figure(title="Scatter Plot", x_axis_label='X', y_axis_label='Y')
p3.scatter(x_scatter, y_scatter, size=8, color="navy", alpha=0.5)

# Histogram
hist, edges = np.histogram(data, bins=30)
p4 = figure(title="Histogram", x_axis_label='Value', y_axis_label='Frequency')
p4.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], fill_color="navy", line_color="white", alpha=0.7)

# Arrange plots in a grid
grid = gridplot([[p1, p2], [p3, p4]])

# Show the plots
show(grid)

Explanation

  1. Line Plot:

    • figure() creates a new plot with a title and axis labels.
    • p1.line(x, y, ...) adds a line to the plot using the data in x and y.
  2. Bar Plot:

    • The x-axis is categorical, so x_range=categories.
    • p2.vbar(...) draws vertical bars.
  3. Scatter Plot:

    • A scatter plot shows the relationship between two variables using dots.
    • p3.scatter(...) plots the points.
  4. Histogram:

    • np.histogram computes the histogram of the data.
    • p4.quad(...) creates the bars of the histogram.

Running the Code

This code should be run in a $Jupyter notebook$ or a similar environment that supports $Bokeh$’s interactive plots.

It will output the plots inline.

Output


Google OR-Tools

Google OR-Tools

Here is a basic example of using $OR-Tools$ in $Python$ to solve a simple linear programming problem.

$OR-Tools$ is a powerful optimization library provided by Google, and it can be used to solve a wide range of problems, including linear programming, mixed-integer programming, constraint programming, and more.

Example: Linear Programming with OR-Tools

This example demonstrates solving a linear programming problem where we want to maximize the objective function $3x + 4y$ subject to some constraints.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from ortools.linear_solver import pywraplp

def main():
# Create the solver with the SCIP backend.
solver = pywraplp.Solver.CreateSolver('SCIP')
if not solver:
return

# Create the variables x and y.
x = solver.NumVar(0, solver.infinity(), 'x')
y = solver.NumVar(0, solver.infinity(), 'y')

# Create the constraints.
solver.Add(2 * x + 3 * y <= 12)
solver.Add(4 * x + y <= 14)
solver.Add(3 * x - y >= 0)

# Define the objective function.
objective = solver.Maximize(3 * x + 4 * y)

# Solve the problem.
status = solver.Solve()

# Check the result status.
if status == pywraplp.Solver.OPTIMAL:
print('Solution:')
print('Objective value =', solver.Objective().Value())
print('x =', x.solution_value())
print('y =', y.solution_value())
else:
print('The problem does not have an optimal solution.')

if __name__ == '__main__':
main()

Explanation

  • Solver: We create a solver instance using pywraplp.Solver.CreateSolver('SCIP'). $SCIP$ is a powerful mixed-integer programming solver, and $OR-Tools$ uses it as one of its backends.
  • Variables: We define two variables, x and y, both with a lower bound of 0 and an upper bound of infinity.
  • Constraints: We add three constraints:
    1. $(2x + 3y \leq 12)$
    2. $(4x + y \leq 14)$
    3. $(3x - y \geq 0)$
  • Objective: We want to maximize the function $3x + 4y$.
  • Solve: The solver solves the problem, and we check if an optimal solution was found.
  • Result: If a solution is found, it prints the optimal objective value and the values of x and y.

Output

1
2
3
4
Solution:
Objective value = 17.0
x = 2.9999999999999996
y = 2.0000000000000018

Let’s break down the result of the optimization problem using $OR-Tools$:

Objective Value:

  • Objective value = 17.0: This is the maximum value of the objective function 3x + 4y given the constraints.
    The solver found that this is the highest value that can be achieved without violating any of the constraints.

Variable Values:

  • x = 2.9999999999999996: This is the optimal value of the variable x that maximizes the objective function.
    Due to floating-point precision in computational mathematics, this value is extremely close to 3 (but not exactly 3).
  • y = 2.0000000000000018: Similarly, this is the optimal value of the variable y. This value is extremely close to 2.

Interpretation:

  • Floating-Point Precision: The values 2.9999999999999996 for x and 2.0000000000000018 for y are due to the way computers handle floating-point arithmetic. In practice, these values can be considered as x = 3 and y = 2.

  • Objective Function Calculation: Given the optimal values of x and y, we can calculate the objective function:
    $$
    3x + 4y = 3(3) + 4(2) = 9 + 8 = 17
    $$
    This confirms that the objective value of 17.0 is indeed the maximum value that can be achieved under the given constraints.

Summary:

The solver has determined that to achieve the maximum value of 17 for the objective function 3x + 4y, the values of x and y should be approximately 3 and 2, respectively.

The slight deviations from exact integers are due to the limitations of floating-point representation in computers.

Running the Code

To run this code, ensure you have installed the $OR-Tools$ package.
You can install it using pip:

1
pip install ortools

This example should give you a good starting point for working with $OR-Tools$ in $Python$.

NetworkX in Python

NetworkX in Python

Here’s a useful sample code in $Python$ that uses the $NetworkX$ library to create, visualize, and analyze a graph.

This example covers basic graph operations, such as adding nodes and edges, visualizing the graph, and calculating common network metrics.

1. Installation

First, if you haven’t installed $NetworkX$, you can do so using:

1
2
!pip install networkx
!pip install matplotlib

2. Creating and Visualizing a Graph

Here’s a basic example of how to create a graph, add nodes and edges, visualize it, and calculate some metrics.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import networkx as nx
import matplotlib.pyplot as plt

# Create a new graph
G = nx.Graph()

# Add nodes
G.add_node("A")
G.add_node("B")
G.add_node("C")
G.add_node("D")

# Add edges
G.add_edge("A", "B")
G.add_edge("A", "C")
G.add_edge("B", "D")
G.add_edge("C", "D")
G.add_edge("A", "D")

# Draw the graph
nx.draw(G, with_labels=True, node_color='lightblue', edge_color='gray', node_size=3000, font_size=20)
plt.show()

# Calculate some basic metrics
print("Nodes in the graph:", G.nodes())
print("Edges in the graph:", G.edges())

# Degree of each node
print("\nNode degrees:")
for node, degree in G.degree():
print(f"{node}: {degree}")

# Shortest path from A to D
print("\nShortest path from A to D:", nx.shortest_path(G, source="A", target="D"))

# Clustering coefficient of each node
print("\nClustering coefficient:")
for node, clustering in nx.clustering(G).items():
print(f"{node}: {clustering}")

# Graph density
print("\nGraph density:", nx.density(G))

Explanation of the Code:

  1. Graph Creation:

    • We create a new undirected graph using nx.Graph().
    • Nodes (“A”, “B”, “C”, “D”) are added individually.
    • Edges are added between nodes to define the relationships.
  2. Graph Visualization:

    • The nx.draw() function is used to visualize the graph.
      Nodes and edges are displayed with specified colors and sizes.
    • plt.show() displays the plot.
  3. Basic Graph Metrics:

    • Nodes and Edges: G.nodes() and G.edges() list all nodes and edges in the graph.
    • Degree: The degree of each node is calculated using G.degree(), which tells you how many connections each node has.
    • Shortest Path: The shortest path between two nodes is calculated using nx.shortest_path().
    • Clustering Coefficient: The clustering coefficient measures the degree to which nodes in the graph tend to cluster together.
    • Density: The density of a graph is calculated using nx.density(), which gives the ratio of the number of edges to the number of possible edges.

Output

The graph is visualized with nodes labeled “A”, “B”, “C”, and “D”.

The console will display the nodes, edges, degree of each node, the shortest path from node “A” to node “D”, the clustering coefficient of each node, and the overall graph density.

3. Advanced Example: Directed Graph with Weighted Edges

Here’s an example with a directed graph, weighted edges, and calculation of PageRank.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import networkx as nx
import matplotlib.pyplot as plt

# Create a directed graph
DG = nx.DiGraph()

# Add nodes
DG.add_nodes_from(["A", "B", "C", "D"])

# Add weighted edges
DG.add_weighted_edges_from([("A", "B", 0.6), ("B", "C", 0.2), ("C", "D", 0.1), ("A", "D", 0.7)])

# Draw the directed graph with edge labels showing weights
pos = nx.spring_layout(DG)
nx.draw(DG, pos, with_labels=True, node_color='lightgreen', edge_color='gray', node_size=3000, font_size=20)
edge_labels = nx.get_edge_attributes(DG, 'weight')
nx.draw_networkx_edge_labels(DG, pos, edge_labels=edge_labels)
plt.show()

# PageRank calculation
pagerank = nx.pagerank(DG)
print("\nPageRank:")
for node, rank in pagerank.items():
print(f"{node}: {rank:.4f}")

Output:

Explanation:

  • Directed Graph:
    A directed graph is created using nx.DiGraph().
  • Weighted Edges:
    Edges are added with weights, which represent the strength or importance of the connection.
  • Visualization:
    The directed graph is visualized with edge labels showing weights.
  • PageRank:
    The PageRank algorithm is used to rank nodes, showing the importance of each node in the graph.

This should give you a good starting point for working with $NetworkX$ in Python.

CVXPY in Python

CVXPY in Python

Here’s a basic example of using $CVXPY$, a $Python$ library for convex optimization.

This example solves a simple linear programming problem:

Problem

Minimize the function $( c^T x )$ subject to the constraint $( Ax \leq b )$ and $( x \geq 0 )$, where:

  • $( c )$ is a vector of coefficients for the objective function.
  • $( A )$ is a matrix of coefficients for the inequality constraints.
  • $( b )$ is a vector representing the upper bounds for the constraints.

Sample Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import cvxpy as cp
import numpy as np

# Define the data for the problem
c = np.array([1, 2]) # Coefficients for the objective function
A = np.array([[1, 1], [1, -1], [-1, 2]]) # Coefficients for the constraints
b = np.array([2, 1, 2]) # Right-hand side values for the constraints

# Define the optimization variables
x = cp.Variable(2)

# Define the objective function: minimize c^T x
objective = cp.Minimize(c @ x)

# Define the constraints: Ax <= b and x >= 0
constraints = [A @ x <= b, x >= 0]

# Formulate the problem
problem = cp.Problem(objective, constraints)

# Solve the problem
problem.solve()

# Print the results
print("Status:", problem.status)
print("Optimal value of the objective function:", problem.value)
print("Optimal values of the variables x:", x.value)

Explanation

  • Objective Function:
    c @ x is the dot product of the vector c with the variable vector x. We aim to minimize this value.
  • Constraints:
    A @ x <= b represents the inequality constraints, and x >= 0 ensures that the variables are non-negative.
  • Optimization:
    problem.solve() solves the optimization problem, and the optimal solution is stored in x.value.

Output

When you run this code, it will output the status of the optimization (e.g., “optimal”), the optimal value of the objective function, and the optimal values of the decision variables $( x )$.

1
2
3
Status: optimal
Optimal value of the objective function: 3.698490338406144e-10
Optimal values of the variables x: [3.59660015e-10 5.09450927e-12]

This example is a basic introduction, but $CVXPY$ can handle more complex problems, including quadratic programming, mixed-integer programming, and other types of convex optimization problems.

Prophet in Python

Prophet in Python

$Prophet$ is a popular library developed by $Facebook$ for time series forecasting.

It’s particularly effective for data that has strong seasonal effects and multiple seasonality with daily observations.

Here’s a basic example of how to use $Prophet$ in $Python$:

Step-by-Step Example:

  1. Install Prophet (if you haven’t already):

    1
    pip install prophet
  2. Import Required Libraries:

    1
    2
    3
    import pandas as pd
    from prophet import Prophet
    import matplotlib.pyplot as plt
  3. Load Your Data:
    For this example, let’s create a simple time series dataset.

    1
    2
    3
    4
    5
    6
    # Create a simple time series dataset
    dates = pd.date_range(start='2022-01-01', periods=365)
    data = pd.DataFrame({
    'ds': dates,
    'y': 100 + (dates.dayofyear - 183) ** 2 / 100 + np.random.randn(365) * 5
    })
  4. Initialize and Fit the Prophet Model:

    1
    2
    3
    4
    5
    # Initialize the Prophet model
    model = Prophet()

    # Fit the model to the data
    model.fit(data)
  5. Make Predictions:
    You can make future predictions using the model by specifying the number of days into the future you want to forecast.

    1
    2
    3
    4
    5
    # Create a dataframe for future predictions
    future = model.make_future_dataframe(periods=30) # Predict 30 days into the future

    # Predict future values
    forecast = model.predict(future)
  6. Visualize the Forecast:
    $Prophet$ has a built-in plot function to visualize the forecasted data.

    1
    2
    3
    # Plot the forecast
    model.plot(forecast)
    plt.show()
  7. Plot Components:
    You can also plot the components (trend, weekly seasonality, yearly seasonality) of the forecast.

    1
    2
    3
    # Plot the forecast components
    model.plot_components(forecast)
    plt.show()

Full Code Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import pandas as pd
import numpy as np
from prophet import Prophet
import matplotlib.pyplot as plt

# Step 1: Create a simple time series dataset
dates = pd.date_range(start='2022-01-01', periods=365)
data = pd.DataFrame({
'ds': dates,
'y': 100 + (dates.dayofyear - 183) ** 2 / 100 + np.random.randn(365) * 5
})

# Step 2: Initialize the Prophet model
model = Prophet()

# Step 3: Fit the model to the data
model.fit(data)

# Step 4: Create a dataframe for future predictions
future = model.make_future_dataframe(periods=30) # Predict 30 days into the future

# Step 5: Predict future values
forecast = model.predict(future)

# Step 6: Plot the forecast
model.plot(forecast)
plt.show()

# Step 7: Plot the forecast components
model.plot_components(forecast)
plt.show()

Explanation

  • ds: The column containing the dates.
  • y: The column containing the values to be forecasted.
  • fit: The method to train the model with your time series data.
  • make_future_dataframe: Prepares a dataframe to hold future predictions.
  • predict: Generates predictions for the given dates.
  • plot: Visualizes the forecast along with the observed data.
  • plot_components: Breaks down the forecast into its components (e.g., trend, weekly seasonality).

Result

Running this code will generate a plot of the time series data with the forecasted values and their uncertainty intervals, as well as a breakdown of the forecast components.