set.seed(123) # Please don't change this seed.
# You may load packages you plan to use (optional):
library(tidyverse)
ECON 4370/6370 — Homework 1
- This homework is a coding assignment. Please follow the instructions to complete the tasks.
- Use Quarto/Knitr chunks for all work.
- Turn in the rendered HTML and your
.qmd
source via Canvas.
0. Setup
1. OLS Estimation
Recall that we implemented the OLS estimator in the lecture (link).
Please write a function ols_est(X, y)
that implements the OLS estimator and returns both the estimate dcoefficients and the t-values.
# TODO: finish the function ols_est(X, y).
<- function(X, y) {
ols_est <- solve( t(X) %*% X, t(X) %*% Y )
bhat # ... please continue
}
Please generate a simple test dataset to check if your function is working properly by comparing with the result from lm()
.
Use ?lm
to see how to use lm()
to run the linear regression.
# TODO: Test if ols_est() is working properly
# Generate a simple test dataset
# Compare the result from ols_est() and lm()
2. Monte Carlo Simulation
Let’s apply the skills learned in the class to implement a Monte Carlo simulation.
In such experiments, we sample data from a specified statistical model and examine the finite sample performance of estimation/inference procedures.
For different data generating processes, different primitive parameter values, and different sample sizes \(n\), we simulate data and estimate the model for many times and summarized the results (in most cases) by measures like (empirical) Bias, RMSE, size and empirical power functions, or empirical density plots, to examine the finite sample behavior of the estimation/inference procedures.
In this exercise, we will focus on the finite sample estimation accuracy of the OLS estimator in linear regression models. The accuracy can be measured by bias and root mean square error (RMSE).
Generically, bias and root mean square error (RMSE) are calculated by \[bias = R^{-1}\sum_{r=1}^R \left( \hat{ \theta}^{(r)} - \theta_0 \right),\] \[RMSE = \left(R^{-1}\sum_{r= 1}^R \left( \hat{\theta}^{(r)} -\theta_0 \right)^2\right)^{1/2},\] for true parameter \(\theta_0\) and its estimate \(\hat{\theta}^{(r)}\), and \(R\) is the number of replications.
Model
Consider a linear regression model \[y_i = \alpha + x_{i1}\beta_1 + x_{i2}\beta_2 + u_i\] for \(i = 1, 2, \ldots, n\), where \(n = 100\). \((y_i, x_i)\) are independently and identically distributed (i.i.d.) with \[u_i\sim i.i.d.N(0,1), \quad (x_{i1}, x_{i2})^\prime \sim i.i.d. N\left(\begin{pmatrix}0 \\ 1 \end{pmatrix}, \begin{pmatrix} \sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2 \end{pmatrix} \right).\] True parameters are \(a = 0.11\), \(\beta = (0.22, 0.33)^\prime\), \(\rho = 0.5\), \(\sigma_1 = 1\), and \(\sigma_2 = 4\).
Step 1: Data generating function
Pleaes write a function dgp(...)
that takes the sample size \(n\) and model parameters as inputs and returns the simulated data \((y_i, x_{i1}, x_{i2})\) for \(i = 1, 2, \ldots, n\).
# TODO: simulate data
<- function(...) {
dgp # please continue
}
Step 2: Setup Primitive Parameters
# TODO: setup primitive parameters: sample size, true parameters, etc.
Step 3: Run Simulation
# TODO: Run the simulation (generate data - estimation - save results) for 1000 replications
Step 4: Summarize Results
# TODO: Write a function to calculate bias and RMSE
<- function(...) {
sum_results # please continue
}
# TODO: Summarize and report the bias and RMSE
# ...
# TODO: plot the empirical density of the estimated coefficient beta_1 across replications
Step 5: Interpret your results
Please interpret your results and discuss the findings.
Step 6: Run simulation with different sample sizes
Let’s investigate how the estimation accuracy of the OLS estimator changes as the sample size increases.
# TODO: Run the simulation with different sample sizes
<- c(100, 200, 500, 1000)
sample_sizes
# please continue
What do you observe? Why?
Step 7: Run simulation with misspecified model
Now, we re-do the simulation with the same data generating process.
However, when we run the regression, we only include the first regressor \(x_{i1}\) and the intercept while omitting the second regressor \(x_{i2}\).
# TODO: Run the simulation with misspecified model
# please continue
# TODO: Check the bias and RMSE for beta_1 with the misspecified model
# please continue
What do you observe? Why?