ECON 4370/6370 — Homework 1

Author

Your Name (SMU ID)

Published

September 2, 2025

This homework is a coding assignment. Please follow the instructions to complete the tasks.
Use Quarto/Knitr chunks for all work.
Turn in the rendered HTML and your .qmd source via Canvas.

0. Setup

set.seed(123)  # Please don't change this seed.

# You may load packages you plan to use (optional):
library(tidyverse)

1. OLS Estimation

Recall that we implemented the OLS estimator in the lecture (link).

Please write a function ols_est(X, y) that implements the OLS estimator and returns both the estimate dcoefficients and the t-values.

# TODO: finish the function ols_est(X, y).

ols_est <- function(X, y) {
    bhat <- solve( t(X) %*% X, t(X) %*% Y )
    # ... please continue
}

Please generate a simple test dataset to check if your function is working properly by comparing with the result from lm().

Tip

Use ?lm to see how to use lm() to run the linear regression.

# TODO: Test if ols_est() is working properly

# Generate a simple test dataset

# Compare the result from ols_est() and lm()

2. Monte Carlo Simulation

Let’s apply the skills learned in the class to implement a Monte Carlo simulation.

In such experiments, we sample data from a specified statistical model and examine the finite sample performance of estimation/inference procedures.

For different data generating processes, different primitive parameter values, and different sample sizes \(n\), we simulate data and estimate the model for many times and summarized the results (in most cases) by measures like (empirical) Bias, RMSE, size and empirical power functions, or empirical density plots, to examine the finite sample behavior of the estimation/inference procedures.

In this exercise, we will focus on the finite sample estimation accuracy of the OLS estimator in linear regression models. The accuracy can be measured by bias and root mean square error (RMSE).

Generically, bias and root mean square error (RMSE) are calculated by \[bias = R^{-1}\sum_{r=1}^R \left( \hat{ \theta}^{(r)} - \theta_0 \right),\] \[RMSE = \left(R^{-1}\sum_{r= 1}^R \left( \hat{\theta}^{(r)} -\theta_0 \right)^2\right)^{1/2},\] for true parameter \(\theta_0\) and its estimate \(\hat{\theta}^{(r)}\), and \(R\) is the number of replications.

Model

Consider a linear regression model \[y_i = \alpha + x_{i1}\beta_1 + x_{i2}\beta_2 + u_i\] for \(i = 1, 2, \ldots, n\), where \(n = 100\). \((y_i, x_i)\) are independently and identically distributed (i.i.d.) with \[u_i\sim i.i.d.N(0,1), \quad (x_{i1}, x_{i2})^\prime \sim i.i.d. N\left(\begin{pmatrix}0 \\ 1 \end{pmatrix}, \begin{pmatrix} \sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2 \end{pmatrix} \right).\] True parameters are \(a = 0.11\), \(\beta = (0.22, 0.33)^\prime\), \(\rho = 0.5\), \(\sigma_1 = 1\), and \(\sigma_2 = 4\).

Step 1: Data generating function

Pleaes write a function dgp(...) that takes the sample size \(n\) and model parameters as inputs and returns the simulated data \((y_i, x_{i1}, x_{i2})\) for \(i = 1, 2, \ldots, n\).

# TODO: simulate data
dgp <- function(...) {
  # please continue
}

Step 2: Setup Primitive Parameters

# TODO: setup primitive parameters: sample size, true parameters, etc.

Step 3: Run Simulation

# TODO: Run the simulation (generate data - estimation - save results) for 1000 replications

Step 4: Summarize Results

# TODO: Write a function to calculate bias and RMSE
sum_results <- function(...) {
  # please continue
}

# TODO: Summarize and report the bias and RMSE
# ...

# TODO: plot the empirical density of the estimated coefficient beta_1 across replications

Step 5: Interpret your results

Please interpret your results and discuss the findings.

Step 6: Run simulation with different sample sizes

Let’s investigate how the estimation accuracy of the OLS estimator changes as the sample size increases.

# TODO: Run the simulation with different sample sizes
sample_sizes <- c(100, 200, 500, 1000)

# please continue

What do you observe? Why?

Step 7: Run simulation with misspecified model

Now, we re-do the simulation with the same data generating process.

However, when we run the regression, we only include the first regressor \(x_{i1}\) and the intercept while omitting the second regressor \(x_{i2}\).

# TODO: Run the simulation with misspecified model
# please continue

# TODO: Check the bias and RMSE for beta_1 with the misspecified model
# please continue

What do you observe? Why?