Business Cycle Analysis with Real GDP Data

Problem Statement: Business Cycle Analysis with Real GDP Data

Objective:
Analyze the business cycle by identifying and visualizing the cyclical components of Real GDP using the $Hodrick$-$Prescott (HP) filter$.

This method separates the trend and cyclical components of GDP, allowing us to examine deviations from the long-term trend.


Steps:

  1. Data:
    Use simulated or publicly available Real GDP time series data.

  2. Hodrick-Prescott Filter:

    • Decompose GDP into its trend $(T_t)$ and cyclical $(C_t)$ components:
      $$
      GDP_t = T_t + C_t
      $$
    • The $HP filter$ minimizes the following loss function:
      $$
      \sum_t (GDP_t - T_t)^2 + \lambda \sum_t \left[(T_{t+1} - T_t) - (T_t - T_{t-1})\right]^2
      $$
      where $(\lambda)$ is a smoothing parameter (typically $1600$ for quarterly data).
  3. Visualize:
    Plot the original GDP, trend, and cyclical components to analyze economic fluctuations.


Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.filters.hp_filter import hpfilter

# Simulated GDP Data (quarterly, 50 periods)
np.random.seed(42)
time = np.arange(50)
trend = 2.5 * time + 100 # Linear trend
cyclical = 15 * np.sin(0.3 * time) # Cyclical component
noise = np.random.normal(0, 5, size=time.shape) # Random noise
gdp = trend + cyclical + noise # Simulated GDP

# Apply the HP filter
gdp_trend, gdp_cycle = hpfilter(gdp, lamb=1600)

# Plotting
plt.figure(figsize=(14, 8))

# Original and Trend
plt.subplot(2, 1, 1)
plt.plot(time, gdp, label="Original GDP", color="blue")
plt.plot(time, gdp_trend, label="Trend (HP Filter)", color="red", linestyle="--")
plt.title("Real GDP and Trend Component")
plt.xlabel("Time (Quarters)")
plt.ylabel("GDP")
plt.legend()
plt.grid()

# Cyclical Component
plt.subplot(2, 1, 2)
plt.plot(time, gdp_cycle, label="Cyclical Component", color="green")
plt.axhline(0, color="black", linestyle="--", linewidth=0.8)
plt.title("Cyclical Component of GDP (Deviations from Trend)")
plt.xlabel("Time (Quarters)")
plt.ylabel("GDP Deviations")
plt.legend()
plt.grid()

plt.tight_layout()
plt.show()

# Print summary statistics
print(f"Original GDP Mean: {np.mean(gdp):.2f}")
print(f"Trend Component Mean: {np.mean(gdp_trend):.2f}")
print(f"Cyclical Component Mean: {np.mean(gdp_cycle):.2f} (should be ~0)")

Explanation of the Code

  1. Simulated GDP Data:

    • Real GDP is modeled as a combination of a long-term trend, a cyclical fluctuation, and random noise.
  2. Hodrick-Prescott Filter:

    • The $HP filter$ separates the GDP into a smooth trend and short-term deviations (cyclical component).
    • The smoothing parameter $(\lambda)$ controls the smoothness of the trend; $1600$ is standard for quarterly data.
  3. Visualization:

    • The first plot shows the original GDP and its trend.
    • The second plot highlights the cyclical component, indicating deviations from the long-term trend.

Results

Original GDP Mean: 161.77
Trend Component Mean: 0.00
Cyclical Component Mean: 161.77 (should be ~0)
  1. Trend Component:

    • Represents the long-term economic growth path.
  2. Cyclical Component:

    • Indicates business cycle fluctuations around the trend.
    • Peaks and troughs correspond to economic expansions and contractions, respectively.
  3. Insights:

    • The cyclical component helps identify recessions and booms.
    • Policymakers and economists use this analysis to design countercyclical measures.

Price Discrimination in a Monopoly

Problem Statement: Price Discrimination in a Monopoly

Objective:
A monopolist sells its product to two distinct markets with different demand elasticities.
The monopolist can price discriminate, setting different prices for each market to maximize overall profit.

The goal is to:

  1. Simulate the optimal pricing strategy for the monopolist.
  2. Compute the quantities sold and profits in each market.
  3. Visualize the results to illustrate price discrimination.

Assumptions

  1. Demand Functions:

    • Market $1$: $( Q_1 = a_1 - b_1 \cdot P_1 )$
    • Market $2$: $( Q_2 = a_2 - b_2 \cdot P_2 )$
      $ P_1 $ and $ P_2 $ are prices set in each market.
  2. Profit Function:
    The monopolist’s total profit is:
    $$
    \pi = (P_1 - c) \cdot Q_1 + (P_2 - c) \cdot Q_2
    $$
    where $c$ is the marginal cost.

  3. Goal:
    Maximize $\pi$ by choosing optimal $P_1$ and $P_2$.


Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt

# Parameters
a1, b1 = 100, 2 # Market 1: Demand intercept and slope
a2, b2 = 120, 4 # Market 2: Demand intercept and slope
c = 20 # Marginal cost

# Demand functions
def demand1(p1):
return max(0, a1 - b1 * p1)

def demand2(p2):
return max(0, a2 - b2 * p2)

# Profit function
def total_profit(prices):
p1, p2 = prices
q1 = demand1(p1)
q2 = demand2(p2)
return -((p1 - c) * q1 + (p2 - c) * q2) # Negative for minimization

# Initial guesses and bounds
initial_prices = [50, 50]
bounds = [(c, a1 / b1), (c, a2 / b2)] # Prices must be at least marginal cost

# Solve for optimal prices
result = minimize(total_profit, initial_prices, bounds=bounds)
p1_opt, p2_opt = result.x
q1_opt = demand1(p1_opt)
q2_opt = demand2(p2_opt)
profit_opt = -(result.fun)

# Visualization
prices = np.linspace(0, 80, 500)
revenues1 = [(p - c) * demand1(p) for p in prices]
revenues2 = [(p - c) * demand2(p) for p in prices]

plt.figure(figsize=(12, 6))

# Revenue curves
plt.plot(prices, revenues1, label="Market 1 Profit", color="blue")
plt.plot(prices, revenues2, label="Market 2 Profit", color="orange")

# Optimal points
plt.scatter(p1_opt, (p1_opt - c) * q1_opt, color="blue", label=f"Opt. Price Market 1: ${p1_opt:.2f}")
plt.scatter(p2_opt, (p2_opt - c) * q2_opt, color="orange", label=f"Opt. Price Market 2: ${p2_opt:.2f}")

plt.axvline(p1_opt, color="blue", linestyle="--", alpha=0.7)
plt.axvline(p2_opt, color="orange", linestyle="--", alpha=0.7)

plt.title("Profit Maximization with Price Discrimination")
plt.xlabel("Price")
plt.ylabel("Profit")
plt.legend()
plt.grid()
plt.tight_layout()
plt.show()

# Print results
print(f"Optimal Price in Market 1: ${p1_opt:.2f}")
print(f"Optimal Price in Market 2: ${p2_opt:.2f}")
print(f"Quantity Sold in Market 1: {q1_opt:.2f}")
print(f"Quantity Sold in Market 2: {q2_opt:.2f}")
print(f"Total Profit: ${profit_opt:.2f}")

Explanation of Code

  1. Demand Functions:

    • Market $1$ has a lower price sensitivity ($b_1 < b_2$), indicating it is less elastic.
    • Market $2$ is more elastic ($b_2 > b_1$), meaning customers are more price-sensitive.
  2. Profit Maximization:

    • The monopolist maximizes profit by finding the best prices for each market using scipy.optimize.minimize.
    • The constraints ensure prices are at least equal to the marginal cost.
  3. Visualization:

    • The revenue curves for both markets show how profit changes with price.
    • The optimal prices for both markets are highlighted.

Results

ptimal Price in Market $1$: $35.00
Optimal Price in Market $2$: $25.00
Quantity Sold in Market $1$: 30.00
Quantity Sold in Market $2$: 20.00
Total Profit: $550.00
  1. Optimal Prices:

    • The monopolist sets a higher price in the less elastic market (Market $1$).
    • The price is lower in the more elastic market (Market $2$).
  2. Quantities and Profit:

    • The quantity sold is higher in Market $2$ due to its larger demand intercept.
    • The monopolist achieves maximum profit by price discrimination.
  3. Graph:

    • The revenue curves illustrate the profit-maximizing prices visually.
    • The points of tangency indicate the optimal strategy.

Optimal Taxation and the Laffer Curve

Problem Statement: Optimal Taxation and the Laffer Curve

The Laffer Curve illustrates the relationship between tax rates and government revenue.

At very low tax rates, government revenue is minimal, and at very high tax rates, revenue also decreases because of reduced economic activity.

The goal is to:

  1. Simulate the Laffer Curve based on a theoretical economy.
  2. Identify the optimal tax rate that maximizes government revenue.
  3. Visualize the results.

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import numpy as np
import matplotlib.pyplot as plt

# Define the economy model
def government_revenue(tax_rate, max_income=1_000_000):
"""
Simulates government revenue based on a tax rate.
Revenue decreases at very high tax rates due to reduced economic activity.
"""
# Economic activity decreases as tax rate increases
economic_activity = max_income * (1 - 0.5 * tax_rate**2)
revenue = tax_rate * economic_activity
return max(0, revenue) # Ensure revenue is non-negative

# Generate tax rates
tax_rates = np.linspace(0, 1, 101) # Tax rates from 0% to 100%

# Compute government revenues for each tax rate
revenues = [government_revenue(rate) for rate in tax_rates]

# Find the optimal tax rate
optimal_index = np.argmax(revenues)
optimal_tax_rate = tax_rates[optimal_index]
optimal_revenue = revenues[optimal_index]

# Visualization
plt.figure(figsize=(12, 6))

# Laffer curve
plt.plot(tax_rates, revenues, label="Laffer Curve", color="blue", linewidth=2)
plt.axvline(optimal_tax_rate, color="red", linestyle="--", label="Optimal Tax Rate")
plt.scatter(optimal_tax_rate, optimal_revenue, color="red", zorder=5)
plt.title("Optimal Taxation and the Laffer Curve")
plt.xlabel("Tax Rate")
plt.ylabel("Government Revenue")
plt.legend()
plt.grid()

# Annotate the optimal point
plt.annotate(
f"Optimal Tax Rate = {optimal_tax_rate:.2%}\nRevenue = ${optimal_revenue:,.0f}",
(optimal_tax_rate, optimal_revenue),
textcoords="offset points",
xytext=(10, -50),
arrowprops=dict(arrowstyle="->", color="red")
)

plt.tight_layout()
plt.show()

Explanation of the Code

  1. Economic Model:
    • The government_revenue function calculates revenue as $( \text{Revenue} = \text{Tax Rate} \times \text{Economic Activity} )$.
    • Economic activity decreases quadratically with higher tax rates, representing reduced incentives for productivity at high taxation levels.
  2. Simulation:
    • Tax rates range from $0$% to $100$%.
    • Government revenue is computed for each tax rate.
  3. Optimization:
    • The optimal tax rate corresponds to the maximum government revenue, determined using np.argmax.
  4. Visualization:
    • The Laffer Curve shows the relationship between tax rates and revenue.
    • The optimal tax rate is highlighted with a vertical line and annotated.

Results

  1. Laffer Curve:
    • The curve rises as the tax rate increases initially but falls after a certain point due to reduced economic activity.
  2. Optimal Tax Rate:
    • This is the tax rate where government revenue is maximized.
    • The plot includes a clear marker and annotation for this point.

Principal Component Analysis (PCA) on High-Dimensional Data

Problem Statement: Principal Component Analysis (PCA) on High-Dimensional Data

Objective:
We have a dataset with multiple highly correlated features, which makes it difficult to interpret and use for predictive modeling.
The goal is to use Principal Component Analysis (PCA) to reduce the dimensionality of the dataset while retaining most of its variance.

The steps are:

  1. Generate synthetic high-dimensional data.
  2. Perform PCA to reduce dimensions.
  3. Visualize the explained variance and transformed dataset in $2D$ space.

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Generate synthetic high-dimensional data
np.random.seed(42)
n_samples = 500
n_features = 10

# Highly correlated features
base_feature = np.random.normal(0, 1, n_samples)
data = np.array([
base_feature + np.random.normal(0, 0.1, n_samples) for _ in range(n_features)
]).T

# Add some noise to create weaker correlations
for i in range(n_features):
data[:, i] += np.random.normal(0, 0.5, n_samples)

# Create a DataFrame
columns = [f"Feature_{i+1}" for i in range(n_features)]
df = pd.DataFrame(data, columns=columns)

# Standardize the data
scaler = StandardScaler()
data_scaled = scaler.fit_transform(df)

# Perform PCA
pca = PCA()
data_pca = pca.fit_transform(data_scaled)

# Explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_

# Reduce to 2 components for visualization
pca_2d = PCA(n_components=2)
data_2d = pca_2d.fit_transform(data_scaled)

# Visualization
plt.figure(figsize=(14, 6))

# Scree plot of explained variance
plt.subplot(1, 2, 1)
plt.plot(np.cumsum(explained_variance_ratio), marker='o', linestyle='--', color='b')
plt.title("Cumulative Explained Variance")
plt.xlabel("Number of Components")
plt.ylabel("Explained Variance Ratio")
plt.grid()

# Scatter plot of first two principal components
plt.subplot(1, 2, 2)
plt.scatter(data_2d[:, 0], data_2d[:, 1], alpha=0.6, color='green')
plt.title("2D Projection of Data (PCA)")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.grid()

plt.tight_layout()
plt.show()

# Print the explained variance ratio
print("Explained Variance Ratio for each component:")
for i, ratio in enumerate(explained_variance_ratio):
print(f"Component {i+1}: {ratio:.4f}")

Explanation of Code

  1. Data Generation:
    • A synthetic dataset with $10$ highly correlated features is generated by adding small random noise to a base feature.
    • Additional noise ensures some weaker correlations for realism.
  2. Standardization:
    • Features are standardized to have a mean of $0$ and a standard deviation of $1$, which is necessary for PCA.
  3. PCA:
    • Principal components are computed to transform the high-dimensional dataset into a lower-dimensional space.
    • The explained variance ratio shows how much variance each principal component captures.
  4. Visualization:
    • A scree plot shows the cumulative variance explained by the components.
    • A $2D$ scatter plot visualizes the dataset in the first two principal components.

Results

Explained Variance Ratio for each component:
Component 1: 0.8043
Component 2: 0.0274
Component 3: 0.0267
Component 4: 0.0237
Component 5: 0.0220
Component 6: 0.0209
Component 7: 0.0201
Component 8: 0.0198
Component 9: 0.0178
Component 10: 0.0172
  1. Scree Plot:
    • Shows how many components are required to explain most of the variance.
    • Typically, we select enough components to explain $90$-$95$% of the variance.
  2. 2D Scatter Plot:
    • Displays the high-dimensional data projected onto the first two principal components, which often reveal underlying patterns or clusters.

Epidemic Spread Model (SIR Model)

Problem Statement: Epidemic Spread Model (SIR Model)

The SIR model is a basic mathematical model to describe the spread of infectious diseases.
It divides the population into three compartments:

  • S (Susceptible): People who can catch the disease.
  • I (Infected): People currently infected and able to spread the disease.
  • R (Recovered): People who have recovered and are now immune.

The dynamics are governed by these equations:
$$
\frac{dS}{dt} = -\beta S I
$$
$$
\frac{dI}{dt} = \beta S I - \gamma I
$$
$$
\frac{dR}{dt} = \gamma I
$$

  • $(\beta)$ is the infection rate.
  • $(\gamma)$ is the recovery rate.

Simulate the SIR model for $160$ days with a population of $1,000$, assuming an initial infection of $1$ person, $(\beta = 0.3)$, and $(\gamma = 0.1)$.

Visualize the results.


Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt

# Parameters
N = 1000 # Total population
beta = 0.3 # Infection rate
gamma = 0.1 # Recovery rate
I0 = 1 # Initial infected
R0 = 0 # Initial recovered
S0 = N - I0 - R0 # Initial susceptible
days = 160 # Simulation duration

# SIR model differential equations
def sir_model(y, t, N, beta, gamma):
S, I, R = y
dSdt = -beta * S * I / N
dIdt = beta * S * I / N - gamma * I
dRdt = gamma * I
return [dSdt, dIdt, dRdt]

# Initial conditions
y0 = [S0, I0, R0]

# Time points
t = np.linspace(0, days, days)

# Solve ODEs
result = odeint(sir_model, y0, t, args=(N, beta, gamma))
S, I, R = result.T

# Visualization
plt.figure(figsize=(12, 6))

# Plot the results
plt.plot(t, S, label="Susceptible", color="blue")
plt.plot(t, I, label="Infected", color="red")
plt.plot(t, R, label="Recovered", color="green")
plt.title("SIR Epidemic Model")
plt.xlabel("Days")
plt.ylabel("Number of People")
plt.legend()
plt.grid()

plt.tight_layout()
plt.show()

Explanation of Code

  1. Model Dynamics:
    • The sir_model function defines the SIR equations.
    • The odeint function solves these differential equations numerically.
  2. Parameters:
    • $(N)$: Total population.
    • $(\beta)$: Controls how fast the disease spreads.
    • $(\gamma)$: Determines how fast infected individuals recover.
  3. Initial Conditions:
    • Start with $1$ infected person, $0$ recovered, and the rest susceptible.
  4. Time Points:
    • Simulate the system for $160$ days.
  5. Visualization:
    • Plot the population dynamics of the S, I, and R compartments over time.

Results

  1. Blue Curve (Susceptible): Shows the decreasing number of people vulnerable to infection.
  2. Red Curve (Infected): Demonstrates the rise and eventual decline of infections as the epidemic progresses.
  3. Green Curve (Recovered): Represents the growing number of immune individuals over time.

This visualization provides insight into how diseases spread and decline in a population.

Practical Example in Econometrics:Estimating the Effect of Advertising on Sales Using Multiple Regression

In this example, we analyze the impact of different advertising channels (TV, radio, and online ads) on product sales.

This example demonstrates the application of econometrics in marketing analysis.


Problem Statement

We aim to estimate the following relationship:
$$
\text{Sales} = \beta_0 + \beta_1 \cdot \text{TV} + \beta_2 \cdot \text{Radio} + \beta_3 \cdot \text{Online} + \epsilon
$$
where:

  • $ \text{Sales} $: Units sold.
  • $ \text{TV}, \text{Radio}, \text{Online} $: Advertising budgets in respective channels.
  • $ \beta_0, \beta_1, \beta_2, \beta_3 $: Coefficients to be estimated.

Python Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

# Generate synthetic data
np.random.seed(42)
n = 100
tv = np.random.uniform(0, 100, n) # TV advertising budget
radio = np.random.uniform(0, 50, n) # Radio advertising budget
online = np.random.uniform(0, 30, n) # Online advertising budget
error = np.random.normal(0, 10, n) # Random noise
sales = 5 + 0.3 * tv + 0.4 * radio + 0.2 * online + error # Sales formula

# Create a DataFrame
data = pd.DataFrame({
'TV': tv,
'Radio': radio,
'Online': online,
'Sales': sales
})

# Prepare data for regression
X = data[['TV', 'Radio', 'Online']]
X = sm.add_constant(X) # Add intercept
y = data['Sales']

# Fit multiple regression model
model = sm.OLS(y, X).fit()
print(model.summary())

# Visualize coefficients
coefficients = model.params[1:]
plt.figure(figsize=(8, 5))
coefficients.plot(kind='bar', color=['blue', 'green', 'orange'])
plt.title('Impact of Advertising Budgets on Sales')
plt.ylabel('Coefficient Value')
plt.xticks(rotation=0)
plt.grid(axis='y')
plt.show()

# Pairplot to examine relationships
sns.pairplot(data, x_vars=['TV', 'Radio', 'Online'], y_vars='Sales', kind='reg', height=4)
plt.suptitle("Relationships Between Advertising and Sales", y=1.02)
plt.show()

Explanation of Code

  1. Synthetic Data Generation:

    • Simulates advertising budgets and their effect on sales.
    • Adds random noise $ \epsilon $ to make the data realistic.
  2. Regression Analysis:

    • OLS (Ordinary Least Squares) is used to estimate the coefficients $( \beta_0, \beta_1, \beta_2, \beta_3 $).
  3. Visualizations:

    • Bar Plot: Displays the estimated impact (coefficients) of each advertising channel on sales.
    • Pairplot: Shows scatter plots with regression lines to visualize individual relationships.

Key Outputs

  1. Regression Summary:
    • Coefficients:
      • Quantify the effect of each advertising channel on sales.
    • R-squared:
      • Indicates how much of the variability in sales is explained by the model.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  Sales   R-squared:                       0.600
Model:                            OLS   Adj. R-squared:                  0.588
Method:                 Least Squares   F-statistic:                     48.08
Date:                Tue, 03 Dec 2024   Prob (F-statistic):           4.65e-19
Time:                        03:01:43   Log-Likelihood:                -368.43
No. Observations:                 100   AIC:                             744.9
Df Residuals:                      96   BIC:                             755.3
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.9997      3.278      0.305      0.761      -5.507       7.507
TV             0.3416      0.033     10.267      0.000       0.276       0.408
Radio          0.4417      0.068      6.474      0.000       0.306       0.577
Online         0.3098      0.114      2.727      0.008       0.084       0.535
==============================================================================
Omnibus:                        5.375   Durbin-Watson:                   2.376
Prob(Omnibus):                  0.068   Jarque-Bera (JB):                4.964
Skew:                          -0.402   Prob(JB):                       0.0836
Kurtosis:                       3.738   Cond. No.                         205.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
  1. Bar Plot:
    • Highlights the relative importance of each advertising channel.

  1. Pairplot:
    • Provides an intuitive visualization of relationships between budgets and sales.


This example can be extended to include interaction terms (e.g., $ \text{TV} \times \text{Online} $) or to model diminishing returns on advertising investments.

Practical Example in Geometry:Finding the Intersection of Two Circles

We solve a common geometric problem: determining the intersection points of two circles.

This has applications in fields like $computer$ $graphics$, $robotics$, and $navigation$.


Problem Statement

Two circles are defined by:

  1. Circle 1: Center $(x_1, y_1)$, radius $r_1$.
  2. Circle 2: Center $(x_2, y_2)$, radius $r_2$.

The task is to find the intersection points of these circles.

If they intersect, there can be two points, one point (tangent), or no intersection.


Python Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
import numpy as np
import matplotlib.pyplot as plt

def find_circle_intersection(x1, y1, r1, x2, y2, r2):
# Distance between circle centers
d = np.sqrt((x2 - x1)**2 + (y2 - y1)**2)

# Check for no intersection or containment
if d > r1 + r2 or d < abs(r1 - r2):
return None, None # No intersection

# Check for tangency
if d == 0 and r1 == r2:
return "Infinite", None # Infinite intersections (identical circles)

# Calculate intersection points
a = (r1**2 - r2**2 + d**2) / (2 * d)
h = np.sqrt(r1**2 - a**2)

# Midpoint between circle centers
x3 = x1 + a * (x2 - x1) / d
y3 = y1 + a * (y2 - y1) / d

# Offset for intersection points
x_intersect1 = x3 + h * (y2 - y1) / d
y_intersect1 = y3 - h * (x2 - x1) / d

x_intersect2 = x3 - h * (y2 - y1) / d
y_intersect2 = y3 + h * (x2 - x1) / d

return (x_intersect1, y_intersect1), (x_intersect2, y_intersect2)

# Example: Define two circles
x1, y1, r1 = 0, 0, 5
x2, y2, r2 = 6, 0, 5

# Find intersection points
intersection1, intersection2 = find_circle_intersection(x1, y1, r1, x2, y2, r2)

# Visualization
circle1 = plt.Circle((x1, y1), r1, color='blue', fill=False, label='Circle 1')
circle2 = plt.Circle((x2, y2), r2, color='red', fill=False, label='Circle 2')

fig, ax = plt.subplots(figsize=(8, 8))
ax.add_artist(circle1)
ax.add_artist(circle2)
plt.scatter([x1, x2], [y1, y2], color='black', label='Centers') # Centers of circles

# Plot intersection points if they exist
if intersection1 and intersection2:
plt.scatter([intersection1[0], intersection2[0]], [intersection1[1], intersection2[1]],
color='green', label='Intersection Points')
elif intersection1 == "Infinite":
plt.text(0, 0, "Infinite Intersections (Identical Circles)", color="purple")

# Adjust plot
plt.axhline(0, color='gray', linestyle='--', linewidth=0.5)
plt.axvline(0, color='gray', linestyle='--', linewidth=0.5)
plt.xlim(-10, 15)
plt.ylim(-10, 10)
plt.gca().set_aspect('equal', adjustable='box')
plt.legend()
plt.title("Intersection of Two Circles")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid()
plt.show()

Explanation of Code

  1. Distance Check:

    • The distance $( d )$ between the circle centers determines whether they intersect:
      • $( d > r_1 + r_2 )$: No intersection (too far apart).
      • $( d < |r_1 - r_2| )$: No intersection (one circle is contained within the other).
      • $( d = 0 )$ and $( r_1 = r_2 )$: Infinite intersections (identical circles).
  2. Intersection Calculation:

    • The formula for $( a )$ calculates the distance from the first circle’s center to the midpoint between intersection points.
    • $( h )$ is the perpendicular distance from the midpoint to the intersection points.
  3. Visualization:

    • The circles are drawn with plt.Circle.
    • Intersection points are plotted in green, and the centers are marked in black.

Key Outputs

  1. Intersection Points:

    • The two green points indicate where the circles intersect.
    • If no points exist, the code returns None.
  2. Graph:

    • The visualization shows the circles, their centers, and their intersection points clearly.

This example can be extended to solve 3D sphere intersection problems or optimized for large datasets of circles.

Practical Example in Data Science:Predicting House Prices Using Linear Regression

Practical Example in Data Science: Predicting House Prices Using Linear Regression

We will solve a common data science problem: predicting house prices based on features like square footage and number of bedrooms.

This showcases the application of linear regression, a fundamental machine learning algorithm.


Problem Statement

We aim to predict house prices $( Y )$ using two features:

  1. Square footage $( X_1 )$
  2. Number of bedrooms $( X_2 )$.

The relationship is assumed to be linear:
$$
\text{Price} = \beta_0 + \beta_1 \cdot \text{SquareFootage} + \beta_2 \cdot \text{Bedrooms} + \epsilon
$$
where $( \epsilon )$ is the error term.


Python Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Generate synthetic data
np.random.seed(42)
n = 100
square_footage = np.random.normal(2000, 500, n) # Average size: 2000 sq ft
bedrooms = np.random.randint(2, 6, n) # Bedrooms: 2 to 5
error = np.random.normal(0, 10000, n) # Random noise
price = 50000 + 150 * square_footage + 30000 * bedrooms + error

# Create a DataFrame
data = pd.DataFrame({
'SquareFootage': square_footage,
'Bedrooms': bedrooms,
'Price': price
})

# Prepare data for regression
X = data[['SquareFootage', 'Bedrooms']]
X = sm.add_constant(X) # Add intercept
y = data['Price']

# Fit linear regression model
model = sm.OLS(y, X).fit()
print(model.summary())

# Visualization
fig = plt.figure(figsize=(10, 6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(data['SquareFootage'], data['Bedrooms'], data['Price'], color='blue', label='Observed Data')

# Predicted values
predicted = model.predict(X)
ax.plot_trisurf(data['SquareFootage'], data['Bedrooms'], predicted, color='red', alpha=0.7)

# Labels and legend
ax.set_xlabel('Square Footage')
ax.set_ylabel('Bedrooms')
ax.set_zlabel('Price')
ax.set_title('3D Visualization of Predicted House Prices')
plt.legend()
plt.show()

Explanation of Code

  1. Data Generation:

    • square_footage and bedrooms simulate the main features of a house.
    • price is calculated using a predefined linear relationship plus random noise for realism.
  2. Linear Regression:

    • Using statsmodels.OLS, we estimate coefficients $( \beta_0 )$, $( \beta_1 )$, and $( \beta_2 )$.
  3. Visualization:

    • A 3D scatter plot shows observed data points (blue dots).
    • The red surface represents the predicted house prices based on the model.

Key Outputs

  • Regression Summary:
    • $Coefficients$: Show how house prices change with each feature.
    • $R$-$squared$: Measures how well the model explains the variability in house prices.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  Price   R-squared:                       0.984
Model:                            OLS   Adj. R-squared:                  0.984
Method:                 Least Squares   F-statistic:                     3049.
Date:                Sun, 01 Dec 2024   Prob (F-statistic):           2.77e-88
Time:                        03:00:00   Log-Likelihood:                -1060.8
No. Observations:                 100   AIC:                             2128.
Df Residuals:                      97   BIC:                             2135.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
=================================================================================
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
const          4.882e+04   5331.911      9.157      0.000    3.82e+04    5.94e+04
SquareFootage   152.0480      2.201     69.074      0.000     147.679     156.417
Bedrooms       2.954e+04    868.797     33.999      0.000    2.78e+04    3.13e+04
==============================================================================
Omnibus:                        7.854   Durbin-Watson:                   2.034
Prob(Omnibus):                  0.020   Jarque-Bera (JB):                8.131
Skew:                           0.502   Prob(JB):                       0.0172
Kurtosis:                       3.971   Cond. No.                     1.08e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.08e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
  • 3D Plot:
    • The blue points are the actual data.
    • The red plane shows the predicted relationship, helping to visualize the regression fit.


This example can be extended to include feature scaling, regularization (e.g., Ridge or Lasso regression), or additional predictors like location or age of the house.

Estimating the Effect of Education on Wages

Practical Example in Econometrics: Estimating the Effect of Education on Wages

We will use a simplified dataset to estimate how education affects wages using Ordinary Least Squares (OLS) regression.

This example demonstrates a common econometric problem: understanding causal relationships using regression analysis.


Problem

The objective is to estimate the relationship between education (measured in years) and wages (hourly wage).

We hypothesize that higher education leads to higher wages.

Assumptions

  • The relationship is linear: $( \text{Wages} = \beta_0 + \beta_1 \cdot \text{Education} + \epsilon )$, where $( \epsilon )$ is the error term.
  • No omitted variable bias for simplicity.

Python Implementation

Below is the $Python$ implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt

# Generate synthetic data
np.random.seed(42)
n = 100
education = np.random.normal(12, 2, n) # Average education years: 12
error = np.random.normal(0, 5, n)
wages = 5 + 2.5 * education + error # True relationship: beta_0=5, beta_1=2.5

# Create a DataFrame
data = pd.DataFrame({'Education': education, 'Wages': wages})

# OLS Regression
X = sm.add_constant(data['Education']) # Add intercept
model = sm.OLS(data['Wages'], X).fit()
print(model.summary())

# Plot the data and regression line
plt.scatter(data['Education'], data['Wages'], color='blue', label='Observed Data')
plt.plot(data['Education'], model.predict(X), color='red', label='Fitted Line')
plt.xlabel('Education (Years)')
plt.ylabel('Wages (Hourly)')
plt.title('Relationship Between Education and Wages')
plt.legend()
plt.show()

Explanation of Code

  1. Data Generation:

    • education is randomly generated to simulate years of schooling.
    • error introduces random noise to mimic real-world data variability.
    • wages is computed using the true relationship with some error.
  2. OLS Regression:

    • statsmodels.OLS is used to estimate the parameters $( \beta_0 )$ (intercept) and $( \beta_1 )$ (slope).
  3. Visualization:

    • A scatter plot shows observed data (blue dots).
    • The regression line (red) represents the predicted relationship between education and wages.

Key Outputs

  • Regression Summary:

    • $Coefficients$ ( $\beta_0, \beta_1 $): These indicate the estimated impact of education on wages.
    • $R$-$squared$: Indicates the goodness of fit (closer to $1$ is better).
  • Graph:

    • The red line demonstrates the estimated relationship.
      The slope corresponds to the $ \beta_1 $ value, showing how wages change with education.

Real-World Problem in Algebra:Polynomial Root Finding in Economics

Problem Statement

A company produces a product whose profit $ P(x) $ (in dollars) as a function of production quantity $ x $ (in units) is modeled by the polynomial:
$$
P(x) = -2x^3 + 15x^2 - 36x + 50
$$

  1. Determine the production quantities $ x $ (roots of $ P(x) = 0 $) where the profit becomes zero (break-even points).
  2. Visualize the profit function $ P(x) $ to show its behavior and the identified roots.

Python Implementation

Here’s the $Python$ code to solve the problem and visualize the results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import numpy as np
import matplotlib.pyplot as plt
from numpy.polynomial.polynomial import Polynomial

# Define the profit polynomial P(x) = -2x^3 + 15x^2 - 36x + 50
coefficients = [-2, 15, -36, 50] # Coefficients in decreasing order of powers
poly = np.poly1d(coefficients)

# Find the roots of the polynomial
roots = np.roots(coefficients)

# Generate data for visualization
x = np.linspace(0, 10, 500) # Production quantity range
y = poly(x)

# Plotting the polynomial
plt.figure(figsize=(10, 6))
plt.plot(x, y, label="$P(x)$ (Profit Function)", color="blue")

# Highlight the roots
real_roots = roots[np.isreal(roots)].real # Filter real roots
for root in real_roots:
plt.scatter(root, 0, color="red", zorder=5, label=f"Root at x = {root:.2f}")

# Add annotations and labels
plt.title("Profit Function $P(x)$ and Break-Even Points", fontsize=14)
plt.axhline(0, color="black", linewidth=0.7, linestyle="--", alpha=0.7)
plt.xlabel("Production Quantity $x$", fontsize=12)
plt.ylabel("Profit $P(x)$", fontsize=12)
plt.legend()
plt.grid(alpha=0.3)

# Show plot
plt.show()

# Print results
print("Roots of P(x):", roots)
print("Real roots (Break-even points):", real_roots)

Explanation of Code

  1. Polynomial Representation:

    • The profit function $ P(x) = -2x^3 + 15x^2 - 36x + 50 $ is represented by its coefficients in decreasing powers of $ x $.
    • The numpy.poly1d function creates a polynomial object for evaluation and visualization.
  2. Root Finding:

    • The roots of $ P(x) $ are found using numpy.roots, which computes all roots (real and complex) of the polynomial.
    • Only real roots are relevant in this context as production quantities $ x $ must be real numbers.
  3. Visualization:

    • The profit function is plotted over a reasonable range of $ x $ values (e.g., $ x \in [0, 10] $).
    • Real roots (break-even points) are highlighted on the graph to show where the profit becomes zero.

Results and Graph Explanation

  1. Numerical Results:

    • Roots of $ P(x) $: These are the solutions to the equation $ P(x) = 0 $.
      Some may be complex, but only real roots are relevant for this real-world context.
    • Real roots (Break-even points): These are production levels where the company neither makes a profit nor a loss.
  2. Graph:

    • The blue curve represents the profit function $ P(x) $.
    • The red points indicate the break-even points where $ P(x) = 0 $.
    • The curve helps visualize how profit changes with production quantity $ x $, showing regions of profit and loss.

By finding and plotting the roots of the polynomial, the company can identify critical production levels to optimize operations.