A Visual Exploration with Python
I’ll solve a probability theory problem using $Python$, explain the mathematical concepts, and visualize the results with graphs.
Let’s look at a classic probability problem: the Central Limit Theorem demonstration.
1 | import numpy as np |
Central Limit Theorem Example
The Central Limit Theorem (CLT) is one of the most important results in probability theory.
It states that when independent random variables are added, their properly normalized sum tends toward a normal distribution even if the original variables are not normally distributed.
Mathematical Formulation
Let $X_1, X_2, \ldots, X_n$ be independent and identically distributed random variables with mean $\mu$ and finite variance $\sigma^2$.
The CLT states that as $n$ approaches infinity, the distribution of:
$$Z_n = \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}}$$
converges to the standard normal distribution $N(0,1)$,
where is the sample mean.
In other words, the sampling distribution of the mean approaches a normal distribution with mean $\mu$ and standard deviation $\frac{\sigma}{\sqrt{n}}$ as the sample size increases.
Code Explanation
My code demonstrates the Central Limit Theorem by:
- Generating samples from three different distributions (uniform, exponential, and Poisson)
- Computing the mean of each sample for various sample sizes
- Plotting the distribution of these sample means alongside a normal distribution curve
- Tracking how quickly the standard deviation of the sample means converges to the theoretical value
Key components:
generate_sample_means()
creates multiple samples of a specified size and calculates their means- The first plot shows histograms of sample means for different distributions and sample sizes
- The second plot shows how the standard deviation ratio converges to the expected value
- The third plot visualizes the original distributions to highlight their different shapes
Theoretical Insights
For each distribution, I’ve calculated the true mean and standard deviation:
Uniform distribution on [0,1]:
- Mean: $\mu = \frac{0+1}{2} = 0.5$
- Variance: $\sigma^2 = \frac{(1-0)^2}{12} = \frac{1}{12}$
Exponential distribution with scale=1:
- Mean: $\mu = \lambda = 1$
- Variance: $\sigma^2 = \lambda^2 = 1$
Poisson distribution with λ=5:
- Mean: $\mu = \lambda = 5$
- Variance: $\sigma^2 = \lambda = 5$
The standard deviation of the sampling distribution should be $\frac{\sigma}{\sqrt{n}}$ where n is the sample size.
Results Interpretation
The graphs demonstrate several important aspects of the CLT:
- As sample size increases, the distribution of sample means becomes more normal (bell-shaped), regardless of the original distribution shape
- The larger the sample size, the smaller the standard deviation of the sampling distribution
- The convergence happens relatively quickly - even with sample sizes of $30$, the approximation is quite good
- The standard deviation of the sample means scales proportionally to $\frac{1}{\sqrt{n}}$
The bottom plot showing the ratio of observed to theoretical standard deviation confirms that our experimental results match the theory as it approaches $1.0$ for all distributions as sample size increases.
This demonstrates why the CLT is so powerful in statistics - it allows us to make inferences about population parameters regardless of the underlying distribution, as long as we have a sufficiently large sample size.