Simulating and Visualizing a Normal Distribution in Python

Problem

Suppose we want to simulate a normal distribution $( \mathcal{N}(\mu, \sigma^2) )$ with a mean $( \mu = 50 )$ and a standard deviation $( \sigma = 10 )$.

Using this distribution, we will:

  1. Generate a random sample of size $1000$.
  2. Compute the sample mean and standard deviation to verify they align with the theoretical values.
  3. Visualize the distribution as a histogram overlaid with the theoretical probability density function ($PDF$).

Python Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Parameters of the normal distribution
mu = 50 # Mean
sigma = 10 # Standard deviation
sample_size = 1000

# Generate a random sample
np.random.seed(42) # For reproducibility
sample = np.random.normal(mu, sigma, sample_size)

# Compute sample statistics
sample_mean = np.mean(sample)
sample_std = np.std(sample)

print(f"Sample Mean: {sample_mean:.2f}")
print(f"Sample Standard Deviation: {sample_std:.2f}")

# Visualization
x = np.linspace(mu - 4 * sigma, mu + 4 * sigma, 1000) # x-axis range
pdf = norm.pdf(x, mu, sigma) # Theoretical PDF

plt.figure(figsize=(10, 6))

# Plot the histogram of the sample
plt.hist(sample, bins=30, density=True, alpha=0.6, color='skyblue', label="Sample Histogram")

# Overlay the theoretical PDF
plt.plot(x, pdf, color='red', linewidth=2, label="Theoretical PDF")

# Add labels and legend
plt.title("Normal Distribution: Histogram and Theoretical PDF")
plt.xlabel("Value")
plt.ylabel("Density")
plt.axvline(sample_mean, color='green', linestyle='--', label=f"Sample Mean: {sample_mean:.2f}")
plt.axvline(mu, color='purple', linestyle=':', label=f"Theoretical Mean: {mu}")
plt.legend()
plt.grid(True)
plt.show()

Explanation

  1. Random Sample:

    • We generate a sample of size $1000$ from the normal distribution $ \mathcal{N}(\mu=50, \sigma=10) $ using np.random.normal.
  2. Sample Statistics:

    • The mean $( \bar{x} )$ and standard deviation $( s )$ of the sample are computed using np.mean and np.std.
      These values should closely match the theoretical values $( \mu = 50 )$ and $( \sigma = 10 )$ due to the large sample size.
  3. Visualization:

    • The histogram of the sample shows the empirical distribution of the generated data.
    • The theoretical PDF $( \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} )$ is overlaid to illustrate how closely the sample follows the theoretical distribution.

Output

  1. Sample Statistics:

    1
    2
    Sample Mean: 50.19
    Sample Standard Deviation: 9.79

    These values are very close to the theoretical $( \mu = 50 )$ and $( \sigma = 10 )$, verifying the correctness of the simulation.

  2. Graph:

    • The histogram of the sample aligns closely with the red line (theoretical $PDF$).
    • Vertical lines indicate the sample mean (green) and the theoretical mean (purple), showing their proximity.

This example demonstrates how to simulate, analyze, and visualize data from a probability distribution effectively using $Python$.