Practical Example in Data Science:Predicting House Prices Using Linear Regression

Practical Example in Data Science: Predicting House Prices Using Linear Regression

We will solve a common data science problem: predicting house prices based on features like square footage and number of bedrooms.

This showcases the application of linear regression, a fundamental machine learning algorithm.


Problem Statement

We aim to predict house prices $( Y )$ using two features:

  1. Square footage $( X_1 )$
  2. Number of bedrooms $( X_2 )$.

The relationship is assumed to be linear:
$$
\text{Price} = \beta_0 + \beta_1 \cdot \text{SquareFootage} + \beta_2 \cdot \text{Bedrooms} + \epsilon
$$
where $( \epsilon )$ is the error term.


Python Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Generate synthetic data
np.random.seed(42)
n = 100
square_footage = np.random.normal(2000, 500, n) # Average size: 2000 sq ft
bedrooms = np.random.randint(2, 6, n) # Bedrooms: 2 to 5
error = np.random.normal(0, 10000, n) # Random noise
price = 50000 + 150 * square_footage + 30000 * bedrooms + error

# Create a DataFrame
data = pd.DataFrame({
'SquareFootage': square_footage,
'Bedrooms': bedrooms,
'Price': price
})

# Prepare data for regression
X = data[['SquareFootage', 'Bedrooms']]
X = sm.add_constant(X) # Add intercept
y = data['Price']

# Fit linear regression model
model = sm.OLS(y, X).fit()
print(model.summary())

# Visualization
fig = plt.figure(figsize=(10, 6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(data['SquareFootage'], data['Bedrooms'], data['Price'], color='blue', label='Observed Data')

# Predicted values
predicted = model.predict(X)
ax.plot_trisurf(data['SquareFootage'], data['Bedrooms'], predicted, color='red', alpha=0.7)

# Labels and legend
ax.set_xlabel('Square Footage')
ax.set_ylabel('Bedrooms')
ax.set_zlabel('Price')
ax.set_title('3D Visualization of Predicted House Prices')
plt.legend()
plt.show()

Explanation of Code

  1. Data Generation:

    • square_footage and bedrooms simulate the main features of a house.
    • price is calculated using a predefined linear relationship plus random noise for realism.
  2. Linear Regression:

    • Using statsmodels.OLS, we estimate coefficients $( \beta_0 )$, $( \beta_1 )$, and $( \beta_2 )$.
  3. Visualization:

    • A 3D scatter plot shows observed data points (blue dots).
    • The red surface represents the predicted house prices based on the model.

Key Outputs

  • Regression Summary:
    • $Coefficients$: Show how house prices change with each feature.
    • $R$-$squared$: Measures how well the model explains the variability in house prices.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  Price   R-squared:                       0.984
Model:                            OLS   Adj. R-squared:                  0.984
Method:                 Least Squares   F-statistic:                     3049.
Date:                Sun, 01 Dec 2024   Prob (F-statistic):           2.77e-88
Time:                        03:00:00   Log-Likelihood:                -1060.8
No. Observations:                 100   AIC:                             2128.
Df Residuals:                      97   BIC:                             2135.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
=================================================================================
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
const          4.882e+04   5331.911      9.157      0.000    3.82e+04    5.94e+04
SquareFootage   152.0480      2.201     69.074      0.000     147.679     156.417
Bedrooms       2.954e+04    868.797     33.999      0.000    2.78e+04    3.13e+04
==============================================================================
Omnibus:                        7.854   Durbin-Watson:                   2.034
Prob(Omnibus):                  0.020   Jarque-Bera (JB):                8.131
Skew:                           0.502   Prob(JB):                       0.0172
Kurtosis:                       3.971   Cond. No.                     1.08e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.08e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
  • 3D Plot:
    • The blue points are the actual data.
    • The red plane shows the predicted relationship, helping to visualize the regression fit.


This example can be extended to include feature scaling, regularization (e.g., Ridge or Lasso regression), or additional predictors like location or age of the house.

Estimating the Effect of Education on Wages

Practical Example in Econometrics: Estimating the Effect of Education on Wages

We will use a simplified dataset to estimate how education affects wages using Ordinary Least Squares (OLS) regression.

This example demonstrates a common econometric problem: understanding causal relationships using regression analysis.


Problem

The objective is to estimate the relationship between education (measured in years) and wages (hourly wage).

We hypothesize that higher education leads to higher wages.

Assumptions

  • The relationship is linear: $( \text{Wages} = \beta_0 + \beta_1 \cdot \text{Education} + \epsilon )$, where $( \epsilon )$ is the error term.
  • No omitted variable bias for simplicity.

Python Implementation

Below is the $Python$ implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt

# Generate synthetic data
np.random.seed(42)
n = 100
education = np.random.normal(12, 2, n) # Average education years: 12
error = np.random.normal(0, 5, n)
wages = 5 + 2.5 * education + error # True relationship: beta_0=5, beta_1=2.5

# Create a DataFrame
data = pd.DataFrame({'Education': education, 'Wages': wages})

# OLS Regression
X = sm.add_constant(data['Education']) # Add intercept
model = sm.OLS(data['Wages'], X).fit()
print(model.summary())

# Plot the data and regression line
plt.scatter(data['Education'], data['Wages'], color='blue', label='Observed Data')
plt.plot(data['Education'], model.predict(X), color='red', label='Fitted Line')
plt.xlabel('Education (Years)')
plt.ylabel('Wages (Hourly)')
plt.title('Relationship Between Education and Wages')
plt.legend()
plt.show()

Explanation of Code

  1. Data Generation:

    • education is randomly generated to simulate years of schooling.
    • error introduces random noise to mimic real-world data variability.
    • wages is computed using the true relationship with some error.
  2. OLS Regression:

    • statsmodels.OLS is used to estimate the parameters $( \beta_0 )$ (intercept) and $( \beta_1 )$ (slope).
  3. Visualization:

    • A scatter plot shows observed data (blue dots).
    • The regression line (red) represents the predicted relationship between education and wages.

Key Outputs

  • Regression Summary:

    • $Coefficients$ ( $\beta_0, \beta_1 $): These indicate the estimated impact of education on wages.
    • $R$-$squared$: Indicates the goodness of fit (closer to $1$ is better).
  • Graph:

    • The red line demonstrates the estimated relationship.
      The slope corresponds to the $ \beta_1 $ value, showing how wages change with education.

Real-World Problem in Algebra:Polynomial Root Finding in Economics

Problem Statement

A company produces a product whose profit $ P(x) $ (in dollars) as a function of production quantity $ x $ (in units) is modeled by the polynomial:
$$
P(x) = -2x^3 + 15x^2 - 36x + 50
$$

  1. Determine the production quantities $ x $ (roots of $ P(x) = 0 $) where the profit becomes zero (break-even points).
  2. Visualize the profit function $ P(x) $ to show its behavior and the identified roots.

Python Implementation

Here’s the $Python$ code to solve the problem and visualize the results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import numpy as np
import matplotlib.pyplot as plt
from numpy.polynomial.polynomial import Polynomial

# Define the profit polynomial P(x) = -2x^3 + 15x^2 - 36x + 50
coefficients = [-2, 15, -36, 50] # Coefficients in decreasing order of powers
poly = np.poly1d(coefficients)

# Find the roots of the polynomial
roots = np.roots(coefficients)

# Generate data for visualization
x = np.linspace(0, 10, 500) # Production quantity range
y = poly(x)

# Plotting the polynomial
plt.figure(figsize=(10, 6))
plt.plot(x, y, label="$P(x)$ (Profit Function)", color="blue")

# Highlight the roots
real_roots = roots[np.isreal(roots)].real # Filter real roots
for root in real_roots:
plt.scatter(root, 0, color="red", zorder=5, label=f"Root at x = {root:.2f}")

# Add annotations and labels
plt.title("Profit Function $P(x)$ and Break-Even Points", fontsize=14)
plt.axhline(0, color="black", linewidth=0.7, linestyle="--", alpha=0.7)
plt.xlabel("Production Quantity $x$", fontsize=12)
plt.ylabel("Profit $P(x)$", fontsize=12)
plt.legend()
plt.grid(alpha=0.3)

# Show plot
plt.show()

# Print results
print("Roots of P(x):", roots)
print("Real roots (Break-even points):", real_roots)

Explanation of Code

  1. Polynomial Representation:

    • The profit function $ P(x) = -2x^3 + 15x^2 - 36x + 50 $ is represented by its coefficients in decreasing powers of $ x $.
    • The numpy.poly1d function creates a polynomial object for evaluation and visualization.
  2. Root Finding:

    • The roots of $ P(x) $ are found using numpy.roots, which computes all roots (real and complex) of the polynomial.
    • Only real roots are relevant in this context as production quantities $ x $ must be real numbers.
  3. Visualization:

    • The profit function is plotted over a reasonable range of $ x $ values (e.g., $ x \in [0, 10] $).
    • Real roots (break-even points) are highlighted on the graph to show where the profit becomes zero.

Results and Graph Explanation

  1. Numerical Results:

    • Roots of $ P(x) $: These are the solutions to the equation $ P(x) = 0 $.
      Some may be complex, but only real roots are relevant for this real-world context.
    • Real roots (Break-even points): These are production levels where the company neither makes a profit nor a loss.
  2. Graph:

    • The blue curve represents the profit function $ P(x) $.
    • The red points indicate the break-even points where $ P(x) = 0 $.
    • The curve helps visualize how profit changes with production quantity $ x $, showing regions of profit and loss.

By finding and plotting the roots of the polynomial, the company can identify critical production levels to optimize operations.

Real-World Problem in Numerical Analysis:Approximating an Integral Using the Trapezoidal Rule

Problem Statement

Consider a company analyzing its daily sales, modeled as a continuous function $ S(t) $, where $ S(t) $ represents the sales rate (in dollars per hour) at time $( t )$ (in hours) during a 10-hour workday.

The sales rate function is given by:

$$
S(t) = 200 + 50 \sin\left(\frac{\pi t}{10}\right)
$$

The company wants to compute the total sales revenue over the 10-hour workday.

Since $ S(t) $ is a continuous function, the total revenue can be calculated as:
$$
\text{Revenue} = \int_0^{10} S(t) , dt
$$

However, instead of solving this integral analytically, we’ll approximate it using the trapezoidal rule.


Solution

The trapezoidal rule approximates the integral:
$$
\int_a^b f(x) , dx \approx \frac{\Delta x}{2} \left[f(x_0) + 2f(x_1) + 2f(x_2) + \dots + 2f(x_{n-1}) + f(x_n)\right]
$$
where $ \Delta x = \frac{b-a}{n} $ is the width of each subinterval, and $( x_0, x_1, \dots, x_n )$ are the points dividing the interval $([a, b])$.


Python Implementation

Here’s the $Python$ code to solve the problem and visualize the results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import numpy as np
import matplotlib.pyplot as plt

# Sales rate function
def sales_rate(t):
return 200 + 50 * np.sin(np.pi * t / 10)

# Trapezoidal rule implementation
def trapezoidal_rule(f, a, b, n):
x = np.linspace(a, b, n + 1) # Divide the interval [a, b] into n subintervals
y = f(x) # Compute function values at these points
h = (b - a) / n # Width of each subinterval
integral = (h / 2) * (y[0] + 2 * sum(y[1:-1]) + y[-1])
return integral, x, y

# Parameters
a, b = 0, 10 # Integration limits
n = 10 # Number of subintervals

# Compute the integral and function values
integral, x_points, y_points = trapezoidal_rule(sales_rate, a, b, n)

# Exact integral for comparison (computed analytically)
exact_integral = 200 * (b - a) + (500 / np.pi) * (np.cos(np.pi * a / 10) - np.cos(np.pi * b / 10))

# Visualization
t = np.linspace(a, b, 500) # High-resolution points for the curve
sales = sales_rate(t)

plt.figure(figsize=(10, 6))

# Plot the sales rate curve
plt.plot(t, sales, label="Sales Rate $S(t)$", color="blue")

# Plot trapezoids
for i in range(len(x_points) - 1):
plt.fill_between([x_points[i], x_points[i+1]], [y_points[i], y_points[i+1]], color="skyblue", alpha=0.4)

# Annotate
plt.title("Approximating Total Sales with the Trapezoidal Rule", fontsize=14)
plt.xlabel("Time (hours)", fontsize=12)
plt.ylabel("Sales Rate ($/hour)", fontsize=12)
plt.axhline(0, color="black", linewidth=0.7, linestyle="--", alpha=0.7)
plt.legend(["Sales Rate $S(t)$", "Trapezoidal Approximation"], loc="upper right")
plt.grid(alpha=0.3)

# Show plot
plt.show()

# Print results
print(f"Trapezoidal Approximation of Revenue: ${integral:.2f}")
print(f"Exact Revenue (Analytical): ${exact_integral:.2f}")
print(f"Error: ${abs(integral - exact_integral):.2f}")

Explanation of Code

  1. Sales Rate Function:
    $ S(t) = 200 + 50 \sin\left(\frac{\pi t}{10}\right) $ models the sales rate as a function of time, peaking midway through the day.

  2. Trapezoidal Rule:

    • The interval $([0, 10])$ is divided into $( n )$ subintervals.
    • The rule computes the approximate integral using the formula provided.
    • The trapezoidal_rule function returns the approximate integral and points for visualization.
  3. Comparison with Exact Integral:

    • The exact integral is computed using the analytical solution for verification.
  4. Visualization:

    • The sales rate curve is plotted.
    • Trapezoids used for the approximation are shown as shaded regions.

Results and Visualization

  1. Numerical Result:

    • Trapezoidal Approximation: $$2315.69$
    • Exact Revenue (Analytical): $$2318.31$
    • Error: $$2.62$ (exact match for $( n = 10 )$)
  2. Graph:

    • The sales rate $ S(t) $ is shown as a smooth curve.
    • The trapezoidal approximation overlays the curve with shaded regions representing each subinterval.

The visualization highlights how the trapezoidal rule effectively estimates the total revenue, even with a small number of subintervals.

Real-World Problem in Discrete Mathematics:Task Scheduling with Deadlines

Problem Statement

A set of tasks, each with a profit and deadline, must be scheduled on a single resource (like a processor).

Each task takes one unit of time, and the goal is to maximize profit by scheduling as many tasks as possible within their deadlines.

Example

You are given the following tasks:

Task Deadline Profit
A 2 100
B 1 19
C 2 27
D 1 25
E 3 15

Objective: Maximize profit while ensuring that tasks are completed within their deadlines.


Solution

The problem can be solved using a greedy algorithm:

  1. Sort tasks in descending order of profit.
  2. Assign each task to the latest available time slot before its deadline (if available).
  3. Skip the task if no slot is available.

Python Implementation

Here’s the $Python$ code to solve the problem:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
import matplotlib.pyplot as plt

# Task class to hold task details
class Task:
def __init__(self, name, deadline, profit):
self.name = name
self.deadline = deadline
self.profit = profit

# Task scheduling function
def job_scheduling(tasks):
# Sort tasks by profit in descending order
tasks = sorted(tasks, key=lambda x: x.profit, reverse=True)
max_deadline = max(task.deadline for task in tasks)
schedule = [None] * max_deadline # Initialize schedule
total_profit = 0

for task in tasks:
# Try to schedule the task at the latest available time slot before its deadline
for t in range(min(task.deadline, max_deadline) - 1, -1, -1):
if schedule[t] is None:
schedule[t] = task
total_profit += task.profit
break

return schedule, total_profit

# Input tasks
tasks = [
Task("A", 2, 100),
Task("B", 1, 19),
Task("C", 2, 27),
Task("D", 1, 25),
Task("E", 3, 15)
]

# Schedule tasks
schedule, total_profit = job_scheduling(tasks)

# Extract data for visualization
scheduled_tasks = [task.name if task else None for task in schedule]
profits = [task.profit if task else 0 for task in schedule]

# Print results
print("Scheduled Tasks:", scheduled_tasks)
print("Total Profit:", total_profit)

# Visualization
fig, ax = plt.subplots(figsize=(10, 5))

# Bar chart for profit
ax.bar(range(1, len(profits) + 1), profits, color='skyblue', label='Profit')

# Annotations for task names
for i, task in enumerate(schedule):
if task:
ax.text(i + 1, profits[i] + 2, task.name, ha='center', fontsize=12, color='black')

# Graph settings
ax.set_title("Scheduled Tasks and Their Profits", fontsize=14)
ax.set_xlabel("Time Slot", fontsize=12)
ax.set_ylabel("Profit", fontsize=12)
ax.set_xticks(range(1, len(profits) + 1))
ax.set_xticklabels([f"Slot {i}" for i in range(1, len(profits) + 1)])
ax.legend()
plt.grid(alpha=0.3)
plt.show()

Explanation of Code

  1. Class Definition:
    A Task class is defined to store details about each task, including its name, deadline, and profit.

  2. Sorting Tasks:
    Tasks are sorted in descending order of profit to maximize the total profit.

  3. Scheduling:
    For each task, the algorithm attempts to schedule it in the latest available time slot before its deadline.

  4. Visualization:
    A bar chart is used to represent the profits for the scheduled tasks, with annotations for task names.


Results and Visualization

For the given example:

  • Scheduled Tasks: ['A', 'C', 'E']
  • Total Profit: 142

The graph displays the profits earned from scheduled tasks across time slots, with task names annotated above each bar:

  1. Slot 1: Task A (Profit = 100)
  2. Slot 2: Task C (Profit = 27)
  3. Slot 3: Task E (Profit = 15)

This approach ensures that the total profit is maximized while adhering to the deadlines.

Optimal Resource Allocation in Transportation

Problem Statement: Transportation Problem

A company needs to ship goods from two warehouses to three retail stores.
The goal is to minimize the total shipping cost, given the following constraints:

  1. Supply at warehouses:

    • Warehouse $1$: $20$ units
    • Warehouse $2$: $30$ units
  2. Demand at stores:

    • Store $A$: $10$ units
    • Store $B$: $25$ units
    • Store $C$: $15$ units
  3. Cost per unit shipped (in $):

Store A Store B Store C
Warehouse 1 8 6 10
Warehouse 2 9 12 5

Objective

Minimize the total shipping cost while meeting supply and demand constraints.


Python Solution

We’ll use scipy.optimize.linprog to solve this transportation problem.


Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import numpy as np
from scipy.optimize import linprog
import matplotlib.pyplot as plt

# Cost matrix
cost = np.array([
[8, 6, 10], # Warehouse 1 to Stores A, B, C
[9, 12, 5] # Warehouse 2 to Stores A, B, C
])

# Flatten the cost matrix for linprog
c = cost.flatten()

# Supply constraints (Warehouse capacities)
supply = [20, 30]

# Demand constraints (Store requirements)
demand = [10, 25, 15]

# Constraint matrix
A_eq = []
b_eq = []

# Supply constraints (rows of cost matrix)
for i in range(len(supply)):
row = [0] * cost.size
row[i * cost.shape[1]:(i + 1) * cost.shape[1]] = [1] * cost.shape[1]
A_eq.append(row)
b_eq.append(supply[i])

# Demand constraints (columns of cost matrix)
for j in range(len(demand)):
col = [0] * cost.size
for i in range(cost.shape[0]):
col[j + i * cost.shape[1]] = 1
A_eq.append(col)
b_eq.append(demand[j])


# Bounds (all variables >= 0)
bounds = [(0, None) for _ in c]

# Solve the linear programming problem
result = linprog(c, A_eq=A_eq, b_eq=b_eq, bounds=bounds, method="highs")

# Display results
if result.success:
print("Optimal solution found:")
print(f"Minimum shipping cost: ${result.fun:.2f}")
solution = result.x.reshape(cost.shape)
for i in range(solution.shape[0]):
for j in range(solution.shape[1]):
print(f"Ship {solution[i, j]:.2f} units from Warehouse {i + 1} to Store {chr(65 + j)}")
else:
print("No solution found.")

# Visualization
fig, ax = plt.subplots(figsize=(10, 6))
ax.set_title("Optimal Transportation Plan", fontsize=14)
ax.set_xlabel("Stores")
ax.set_ylabel("Warehouses")

# Heatmap of shipping amounts
im = ax.imshow(solution, cmap="Blues", aspect="auto")
ax.set_xticks(range(len(demand)))
ax.set_yticks(range(len(supply)))
ax.set_xticklabels(["Store A", "Store B", "Store C"])
ax.set_yticklabels(["Warehouse 1", "Warehouse 2"])

# Annotate each cell with the shipping amount
for i in range(solution.shape[0]):
for j in range(solution.shape[1]):
ax.text(j, i, f"{solution[i, j]:.2f}", ha="center", va="center", color="black")

plt.colorbar(im, ax=ax, label="Units shipped")
plt.show()

Explanation

  1. Objective Function:
    Minimize total shipping cost:
    $$
    Z = 8x_{11} + 6x_{12} + 10x_{13} + 9x_{21} + 12x_{22} + 5x_{23}
    $$

  2. Constraints:

    • Supply:
      $$
      x_{11} + x_{12} + x_{13} \leq 20 \quad \text{(Warehouse 1 supply)}
      $$
      $$
      x_{21} + x_{22} + x_{23} \leq 30 \quad \text{(Warehouse 2 supply)}
      $$
    • Demand:
      $$
      x_{11} + x_{21} = 10 \quad \text{(Store A demand)}
      $$
      $$
      x_{12} + x_{22} = 25 \quad \text{(Store B demand)}
      $$
      $$
      x_{13} + x_{23} = 15 \quad \text{(Store C demand)}
      $$
  3. Result:
    The solver finds the optimal shipping plan to minimize the cost.

  4. Visualization:

    • The $heatmap$ shows shipping quantities from each warehouse to each store.

Example Output

  • Optimal cost: $$345.00$
  • Ship:
    • $0.00$ units from Warehouse $1$ to Store $A$
    • $20.00$ units from Warehouse $1$ to Store $B$
    • $0.00$ units from Warehouse $1$ to Store $C$
    • $10.00$ units from Warehouse $2$ to Store $A$
    • $5.00$ units from Warehouse $2$ to Store $B$
    • $15.00$ units from Warehouse $2$ to Store $C$

This demonstrates how optimization efficiently allocates resources while minimizing costs.

Solving a System of Nonlinear Equations

Problem: Solving a System of Nonlinear Equations

In applied mathematics, solving nonlinear systems of equations is a common challenge.

These systems appear in engineering, physics, and optimization problems.


Problem Statement

We need to solve the following system of nonlinear equations:
$$
f_1(x, y) = x^2 + y^2 - 1 = 0
$$
$$
f_2(x, y) = x^3 - y = 0
$$

Here, $ f_1(x, y) $ represents a unit circle, and $ f_2(x, y) $ represents a cubic curve.

We aim to find the intersection points of these two curves.


Approach

  1. Use a numerical solver to find solutions to the equations.
  2. Visualize the results by plotting both curves and highlighting the intersection points.

Python Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import fsolve

# Define the system of equations
def equations(vars):
x, y = vars
f1 = x**2 + y**2 - 1 # Equation of the unit circle
f2 = x**3 - y # Equation of the cubic curve
return [f1, f2]

# Initial guesses for solutions
initial_guesses = [
(0.5, 0.5), # Near the first intersection
(-0.5, 0.5), # Near the second intersection
(-0.5, -0.5), # Near the third intersection
(0.5, -0.5) # Near the fourth intersection
]

# Solve the system for each initial guess
solutions = [fsolve(equations, guess) for guess in initial_guesses]

# Unique solutions (rounded to avoid numerical artifacts)
unique_solutions = np.unique(np.round(solutions, decimals=5), axis=0)

# Print the solutions
print("Intersection points:")
for sol in unique_solutions:
print(f"x = {sol[0]:.5f}, y = {sol[1]:.5f}")

# Visualization
x = np.linspace(-1.5, 1.5, 400)
y_circle = np.sqrt(1 - x**2) # y-values for the unit circle (top half)
y_cubic = x**3 # y-values for the cubic curve

plt.figure(figsize=(8, 8))
plt.plot(x, y_circle, label="Unit Circle (top half)", color="blue")
plt.plot(x, -y_circle, label="Unit Circle (bottom half)", color="blue", linestyle="--")
plt.plot(x, y_cubic, label="Cubic Curve", color="green")

# Mark the solutions
for sol in unique_solutions:
plt.scatter(sol[0], sol[1], color="red", zorder=5)
plt.text(sol[0], sol[1], f"({sol[0]:.2f}, {sol[1]:.2f})", fontsize=10, color="red")

# Formatting the plot
plt.title("Intersection of Unit Circle and Cubic Curve", fontsize=14)
plt.xlabel("x", fontsize=12)
plt.ylabel("y", fontsize=12)
plt.axhline(0, color="black", linewidth=0.7, linestyle="--", alpha=0.7)
plt.axvline(0, color="black", linewidth=0.7, linestyle="--", alpha=0.7)
plt.grid(alpha=0.3)
plt.legend()
plt.axis("equal") # Ensure aspect ratio is square for proper visualization
plt.show()

Explanation of Code

  1. System of Equations:

    • $ f_1(x, y) = x^2 + y^2 - 1 $: Represents the unit circle.
    • $ f_2(x, y) = x^3 - y $: Represents the cubic curve.
  2. Numerical Solver:

    • scipy.optimize.fsolve is used to find approximate solutions to the nonlinear equations starting from initial guesses.
  3. Visualization:

    • The unit circle and cubic curve are plotted.
    • The intersection points are marked and labeled on the graph.

Expected Output

  1. Intersection Points:

    1
    2
    3
    4
    x = -0.82603, y = -0.56362
    x = -0.48198, y = 0.48869
    x = 0.48198, y = -0.48869
    x = 0.82603, y = 0.56362
  2. Graph:

    • A plot showing the unit circle and cubic curve with red dots highlighting their intersection points.

Modeling Motion:Car's Acceleration, Velocity, and Distance via Integration

Here’s a realistic integration problem with a $Python$ solution and graphical representation:


Problem

A car is accelerating from rest along a straight road. The acceleration $ a(t) $ (in $ \text{m/s}^2 $) as a function of time $ t $ (in seconds) is given by:
$$
a(t) = 3t - 0.2t^2 \quad \text{for } 0 \leq t \leq 15.
$$

  1. Find the velocity $ v(t) $ of the car as a function of time by integrating the acceleration.
  2. Calculate the total distance traveled by the car in the $15$-second interval by integrating the velocity.
  3. Visualize $ a(t) $, $ v(t) $, and the distance $ s(t) $ on a graph.

Solution Explanation

  1. Integration of Acceleration:
    Velocity is the integral of acceleration:
    $$
    v(t) = \int a(t) , dt + C,
    $$
    where $ C $ is the integration constant. Since the car starts from rest $( v(0) = 0 $), $ C = 0 $.

  2. Integration of Velocity:
    Distance is the integral of velocity:
    $$
    s(t) = \int v(t) , dt.
    $$

  3. Visualization:
    Plot $ a(t) $, $ v(t) $, and $ s(t) $ on the same graph to understand how acceleration, velocity, and distance evolve over time.


Python Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad

# Acceleration function
def acceleration(t):
return 3 * t - 0.2 * t**2

# Velocity function (integrating acceleration)
def velocity(t):
v, _ = quad(acceleration, 0, t) # Integrate acceleration from 0 to t
return v

# Distance function (integrating velocity)
def distance(t):
s, _ = quad(velocity, 0, t) # Integrate velocity from 0 to t
return s

# Generate data
time = np.linspace(0, 15, 500)
acc_values = acceleration(time)
vel_values = [velocity(t) for t in time]
dist_values = [distance(t) for t in time]

# Plotting
plt.figure(figsize=(10, 6))

# Acceleration
plt.plot(time, acc_values, label='Acceleration $a(t)$', color='red')
# Velocity
plt.plot(time, vel_values, label='Velocity $v(t)$', color='blue')
# Distance
plt.plot(time, dist_values, label='Distance $s(t)$', color='green')

# Annotations and labels
plt.title("Car's Acceleration, Velocity, and Distance over Time", fontsize=14)
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Value", fontsize=12)
plt.axhline(0, color='black', linewidth=0.7, linestyle='--', alpha=0.7)
plt.legend()
plt.grid(alpha=0.3)

# Display the plot
plt.show()

# Total distance traveled
total_distance = distance(15)
print(f"Total distance traveled in 15 seconds: {total_distance:.2f} meters")

Explanation of Code

  1. Acceleration Function:
    $ a(t) = 3t - 0.2t^2 $ defines the instantaneous acceleration.
  2. Integration:
    • scipy.integrate.quad is used for numerical integration of $ a(t) $ and $ v(t) $.
  3. Visualization:
    • All three functions $ a(t), v(t), s(t) $ are plotted together for easy comparison.
  4. Total Distance:
    • The result of $ s(15) $ gives the total distance traveled by the car in $15$ seconds.

Expected Output

This graph illustrates the relationship between acceleration, velocity, and distance over time for a car accelerating along a straight road:

  1. Red Curve (Acceleration, $ a(t) $):

    • The car’s acceleration starts at a high value, increases initially, and then decreases to zero at $ t = 15 $ seconds. This indicates the car is initially speeding up but stops accelerating as $ t $ approaches $15$ seconds.
  2. Blue Curve (Velocity, $ v(t) $):

    • Velocity increases over time, but the rate of increase slows down as acceleration decreases. This reflects that while the car continues to gain speed, it does so more gradually as $ t $ approaches $15$ seconds.
  3. Green Curve (Distance, $ s(t) $):

    • The distance traveled by the car increases monotonically and steeply. The curvature reflects the combined effect of velocity and acceleration, resulting in greater distances covered as time progresses.
  4. Total Distance Traveled:

    • The text at the bottom indicates the total distance traveled by the car in $15$ seconds, which is approximately 843.75 meters.

This visualization effectively demonstrates how changes in acceleration impact velocity and how these two factors contribute to the total distance covered by the car.

PMF of Number of Tests Passed in a Dual Test System

Here’s a challenging probability problem with a $Python$ solution and a graphical visualization:


Problem

A factory produces items that pass through two quality control tests: Test $A$ and Test $B$.

The probabilities are as follows:

  1. The probability of passing Test $A$ is $( P(A) = 0.7 )$.
  2. The probability of passing Test $B$ is $( P(B) = 0.8 )$.
  3. The probability of passing both tests is $( P(A \cap B) = 0.6 )$.

An item is selected at random.

Let $( X )$ denote the number of tests the item passes.

Find and visualize the probability mass function ($PMF$) of $( X )$, i.e., the probabilities of passing $0$, $1$, or $2$ tests.


Solution Explanation

1. Definitions and Relationships:

  • $ P(X = 0) $: Probability of failing both tests.
    $ P(X = 0) = 1 - P(A \cup B) $, where $ P(A \cup B) = P(A) + P(B) - P(A \cap B) $.
  • $ P(X = 1) $: Probability of passing exactly one test.
    $ P(X = 1) = P(A) + P(B) - 2P(A \cap B) $.
  • $ P(X = 2) $: Probability of passing both tests.
    $ P(X = 2) = P(A \cap B) $.

2. PMF:

The $PMF$ of $( X )$ is:
$$
P(X = k) \quad \text{for } k \in {0, 1, 2}.
$$

3. Visualization:

We will plot the $PMF$ as a bar chart.


Python Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import matplotlib.pyplot as plt
import numpy as np

# Given probabilities
P_A = 0.7
P_B = 0.8
P_A_and_B = 0.6

# Calculating probabilities
P_A_or_B = P_A + P_B - P_A_and_B
P_X_0 = 1 - P_A_or_B # Probability of failing both
P_X_1 = P_A + P_B - 2 * P_A_and_B # Probability of passing exactly one test
P_X_2 = P_A_and_B # Probability of passing both tests

# PMF
x_values = [0, 1, 2]
pmf = [P_X_0, P_X_1, P_X_2]

# Visualization
plt.figure(figsize=(8, 5))
plt.bar(x_values, pmf, color=['red', 'blue', 'green'], alpha=0.7)
plt.xticks(x_values, labels=[f"Pass {i} Tests" for i in x_values])
plt.ylim(0, 1)
plt.title("Probability Mass Function of X (Number of Tests Passed)", fontsize=14)
plt.ylabel("Probability", fontsize=12)
plt.xlabel("Number of Tests Passed (X)", fontsize=12)
plt.grid(axis='y', alpha=0.3)
for i, p in enumerate(pmf):
plt.text(x_values[i], p + 0.02, f"{p:.2f}", ha='center', fontsize=10)
plt.show()

# Print probabilities
print(f"PMF:")
print(f"P(X = 0) = {P_X_0:.2f}")
print(f"P(X = 1) = {P_X_1:.2f}")
print(f"P(X = 2) = {P_X_2:.2f}")

Explanation of Code

  1. PMF Calculation:
    • Using set relationships to calculate $ P(X = 0) $, $ P(X = 1) $, and $ P(X = 2) $.
  2. Bar Chart:
    • Bar heights represent probabilities for $ X = 0, 1, 2 $.
    • Labels and annotations make the graph interpretable.

Expected Output

  • Text:

    1
    2
    3
    4
    PMF:
    P(X = 0) = 0.10
    P(X = 1) = 0.30
    P(X = 2) = 0.60
  • Graph:

    • A bar chart with $( X = 0, 1, 2 )$ on the x-axis.
    • Heights correspond to $( P(X = 0) )$, $( P(X = 1) )$, $( P(X = 2) )$.
    • $( P(X = 2) )$ is the highest due to the significant overlap between Test $A$ and Test $B$.

Visualizing Solutions to Simultaneous Modular Congruences

Here’s a challenging number theory problem solved using $Python$, with a clear explanation and visualization.

The problem is related to modular arithmetic and the Chinese Remainder Theorem (CRT):


Problem

Find all integers $( x )$ such that:
$$
x \equiv 2 \pmod{3}, \quad x \equiv 3 \pmod{5}, \quad x \equiv 2 \pmod{7}
$$
Then visualize the results within a range, highlighting the congruences visually.


Explanation

This is a system of simultaneous congruences.

The Chinese Remainder Theorem guarantees a unique solution modulo $( n = 3 \cdot 5 \cdot 7 = 105 )$ since the moduli are pairwise coprime.

  1. Solve for $( x )$ satisfying the given congruences.
  2. Verify solutions by computing $( x \mod 3 )$, $( x \mod 5 )$, and $( x \mod 7 )$.
  3. Visualize the congruences within a given range, e.g., $( 0 \leq x < 200 )$.

Python Implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import numpy as np
import matplotlib.pyplot as plt
from sympy.ntheory.modular import solve_congruence

# Define the congruences
congruences = [(2, 3), (3, 5), (2, 7)]

# Solve using the Chinese Remainder Theorem
solution, modulus = solve_congruence(*congruences)

print(f"Solution: x ≡ {solution} (mod {modulus})")

# Generate values of x within a range
x_range = 200 # Define the range for visualization
x_values = [solution + i * modulus for i in range((x_range // modulus) + 1)]

# Verify the solution
assert all(x % m == r for r, m in congruences for x in x_values)

# Visualization
plt.figure(figsize=(10, 6))

# Plot the values of x
plt.scatter(x_values, [0] * len(x_values), color='blue', label='Solutions (x)')

# Show congruences modulo 3, 5, and 7
moduli = [3, 5, 7]
colors = ['red', 'green', 'orange']
for mod, color in zip(moduli, colors):
congruent_vals = [x for x in range(x_range) if x % mod == solution % mod]
plt.scatter(congruent_vals, [mod] * len(congruent_vals), color=color, alpha=0.5, label=f'x ≡ {solution % mod} (mod {mod})')

# Annotations and labels
plt.title('Visualization of Solutions to Simultaneous Congruences', fontsize=14)
plt.xlabel('x', fontsize=12)
plt.ylabel('Modulo Conditions', fontsize=12)
plt.legend()
plt.grid(alpha=0.3)
plt.show()

Explanation of Code

  1. Solve_congruence: Utilizes $SymPy$ to solve the system of congruences.
  2. x_values: Computes the values satisfying the congruences within a specified range.
  3. Visualization:
    • Scatter plot for $( x )$ values.
    • Highlights modulo congruence relationships $( x \mod 3 )$, $( x \mod 5 )$, $( x \mod 7 )$.

Expected Output

  • Text Output:

    1
    Solution: x ≡ 23 (mod 105)

    This means the smallest solution is $( x = 23 )$, and all solutions are of the form $( x = 23 + 105k )$ for integers $( k )$.

  • Graphical Output:

    • Blue points indicate solutions $( x )$.
    • Colored scatter plots at $( y = 3, 5, 7 )$ show congruences modulo $3$, $5$, and $7$ respectively.