Let’s tackle a basic econometrics problem often used to understand relationships between variables in economics: Simple Linear Regression.
This example can be applied to analyze the relationship between two variables, such as income and expenditure, or education and wage.
Problem: Relationship Between Education Level and Wage
The goal is to analyze the relationship between years of education and wage.
This example assumes we have a dataset where each data point represents an individual’s years of education and corresponding wage.
By fitting a simple linear regression model, we can determine if there is a statistically significant relationship between these variables and interpret the results.
Objective
Use Python to:
- Fit a simple linear regression model where the independent variable $(X)$ is Years of Education and the dependent variable $(Y)$ is Wage.
- Plot the regression line with the data points to visualize the relationship.
Example Dataset
For simplicity, we’ll create a synthetic dataset of years of education and wage, simulating a realistic scenario where wage generally increases with more years of education, though with some variability.
Python Code
Here’s the $Python$ code to create the dataset, fit the linear regression model, and visualize the results:
1 | import numpy as np |
Explanation of the Code
- Data Generation:
- We generate random years of education for $50$ individuals, ranging from $10$ to $20$ years.
- Wage is simulated as a linear function of education with added noise to introduce variability.
- Model Fitting:
- The
LinearRegressionmodel is used to fit the data, with Years of Education as the predictor and Wage as the outcome.
- The
- Visualization:
- A scatter plot displays the actual data points.
- The regression line represents the predicted wage based on years of education.
- Model Evaluation:
- The R-squared value measures how well the model explains the variance in wage based on years of education.
- The slope of the regression line indicates the increase in wage for each additional year of education.
Visualization

The plot shows:
- Data Points: Each point represents an individual’s education level and wage.
- Regression Line: The red line shows the model’s predictions, helping visualize the general trend that wage increases with education level.
Interpretation
- Slope: If the slope is, for example, $2.5$, this suggests that each additional year of education is associated with an increase of $$2.50$ in wage.
- Intercept: This is the estimated wage for an individual with $0$ years of education.
- R-squared: A higher R-squared value (close to $1$) indicates that the model effectively explains the variance in wage, though it’s normal to have a moderate value in real-world data with noise.
Applications in Econometrics
Simple linear regression is a foundational tool in econometrics used to explore and quantify relationships between economic variables.
By understanding such relationships, economists can make informed predictions and policy recommendations.







