Problem Set 5

Handed out: February 23, 2026 | Due: March 4, 2026

Submit on Gradescope

Download Persistence_preferences_rural_Guatemala.dta from Canvas

1. Time Series Theory

(35 points total)

(15 points) Recall from class the AR(1) process can be written as: \[ Y_t = \alpha_0 \sum_{j=0}^{t-1} \alpha_1^j + \alpha_1^t Y_0 + \sum_{j=0}^{t-1} \alpha_1^j \varepsilon_{t-j} \tag{14.26} \]

Show that if \(\alpha_1 \le -1\), then \(\sum_{j=0}^{\infty} \alpha_1^j = \infty\).

Solution (a)

Recall that a necessary condition for \(\sum_{j=0}^{\infty} a_j\) to converge is that \(a_j \to 0\).

Case 1: \(\alpha_1 = -1\). The terms are \(\alpha_1^j = (-1)^j\), which alternate between \(+1\) and \(-1\) and never converge to zero. The partial sums \(S_t = \sum_{j=0}^{t-1}(-1)^j\) alternate between 1 and 0, so the series diverges.

Case 2: \(\alpha_1 < -1\). Then \(|\alpha_1| > 1\), so \(|\alpha_1^j| = |\alpha_1|^j \to \infty\) as \(j \to \infty\). In particular \(\alpha_1^j \not\to 0\), so the necessary condition for convergence fails and \(\sum_{j=0}^{\infty} \alpha_1^j\) diverges.

In both cases \(\sum_{j=0}^{\infty} \alpha_1^j = \infty\). Returning to (14.26), the term \(\alpha_0 \sum_{j=0}^{t-1} \alpha_1^j\) grows without bound as \(t \to \infty\), so \(Y_t\) has no finite limiting distribution and the AR(1) process is non-stationary.

(20 points) Recall the AR(2) process, which can be written as \[ \tilde{Y}_t = A \tilde{Y}_{t-1} + \tilde{e}_t \] where \[ A = \begin{pmatrix} \alpha_1 & \alpha_2 \\ 1 & 0 \end{pmatrix}, \qquad \tilde{e}_t = \begin{pmatrix} \alpha_0 + \varepsilon_t \\ 0 \end{pmatrix} \]

Theorem 15.6 states that this process is stationary if and only if all eigenvalues \(\lambda\) of \(A\) satisfy \(|\lambda| < 1\). The eigenvalues solve \[ \det(A - \lambda I) = 0 \implies \lambda^2 - \alpha_1 \lambda - \alpha_2 = 0 \] so \[ \lambda_j = \frac{\alpha_1 \pm \sqrt{\alpha_1^2 + 4\alpha_2}}{2} \]

Show that the AR(2) process is stationary if and only if: \[ \alpha_1 + \alpha_2 < 1 \tag{14.35} \] \[ \alpha_2 - \alpha_1 < 1 \tag{14.36} \] \[ \alpha_2 > -1 \tag{14.37} \]

Solution (b)

The AR(2) is stationary iff \(|\lambda_1| < 1\) and \(|\lambda_2| < 1\), where \(\lambda_1, \lambda_2\) are roots of \(p(\lambda) = \lambda^2 - \alpha_1\lambda - \alpha_2\). Note that \(p(\lambda)\) is an upward-opening parabola in \(\lambda\), and we can write: \[ p(\lambda) = (\lambda - \lambda_1)(\lambda - \lambda_2) \]

The condition \(|\lambda_1| < 1\) and \(|\lambda_2| < 1\) is equivalent to requiring \(p(\lambda) > 0\) at \(\lambda = 1\) and \(\lambda = -1\) (since the parabola is positive outside both roots when both roots are inside \((-1,1)\)), plus the product of the roots satisfying \(|\lambda_1 \lambda_2| < 1\).

Evaluating at \(\lambda = 1\): \[ p(1) = 1 - \alpha_1 - \alpha_2 > 0 \iff \alpha_1 + \alpha_2 < 1 \tag{14.35} \]

Evaluating at \(\lambda = -1\): \[ p(-1) = 1 + \alpha_1 - \alpha_2 > 0 \iff \alpha_2 - \alpha_1 < 1 \tag{14.36} \]

The upper bound \(\alpha_2 < 1\) is already implied by (14.35) (since \(\alpha_1 + \alpha_2 < 1\) and \(\alpha_1 \geq 0\) in many cases, but more generally \(\alpha_2 < 1 - \alpha_1\), and the parabola argument ensures this is not binding independently). The binding constraint is therefore: \[ \alpha_2 > -1 \tag{14.37} \]

The three conditions \(p(1) > 0\), \(p(-1) > 0\), and \(\lambda_1\lambda_2 > -1\) are necessary and sufficient for both roots to lie inside the unit circle, giving (14.35)–(14.37).

2. AR(4) Time Series Estimation

(25 points total)

Download the S&P CoreLogic Case-Shiller U.S. National Home Price Index (series ID: CSUSHPISA) from FRED. The series is monthly beginning in 1987. For this problem, convert to a quarterly series by taking the value from the last month of each quarter (March, June, September, December).

Transform the series by taking first differences of the log index (i.e., compute \(\Delta \log P_t\), the approximate quarterly growth rate of home prices). (7 points)

Solution (a)

Code

# Read FRED download (CSV with observation_date and CSUSHPISA columns)
hpi_raw <- fread(here::here("assignment", "data", "CSUSHPISA.csv"))

# Convert to quarterly (last month of each quarter) and first-difference log
hpi <- hpi_raw[month(observation_date) %in% c(3, 6, 9, 12)]
setorder(hpi, observation_date)
hpi[, log_hpi := log(CSUSHPISA)]
hpi[, d_log_hpi := log_hpi - shift(log_hpi, 1, type = "lag")]
hpi <- hpi[!is.na(d_log_hpi)]

ggplot(hpi, aes(x = observation_date, y = d_log_hpi)) +
  geom_line(color = "steelblue") +
  labs(
    title = "Quarterly Log-Difference of Case-Shiller Home Price Index",
    x     = "Date",
    y     = expression(Delta * log(P[t]))
  ) +
  theme_bw()

Estimate an AR(4) model. Report using heteroskedasticity-consistent standard errors. (6 points)

Solution (b)

Code

# Create lags
hpi_ar <- copy(hpi)
hpi_ar[, y    := d_log_hpi]
hpi_ar[, y_l1 := shift(y, 1, type = "lag")]
hpi_ar[, y_l2 := shift(y, 2, type = "lag")]
hpi_ar[, y_l3 := shift(y, 3, type = "lag")]
hpi_ar[, y_l4 := shift(y, 4, type = "lag")]
hpi_ar <- hpi_ar[!is.na(y_l4)]

ar4_fit <- lm(y ~ y_l1 + y_l2 + y_l3 + y_l4, data = hpi_ar)

# HC-robust standard errors
coeftest(ar4_fit, vcov = vcovHC(ar4_fit, type = "HC3"))


t test of coefficients:

              Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  0.0015292  0.0010987  1.3919    0.1661    
y_l1         0.9328360  0.1284740  7.2609 2.115e-11 ***
y_l2        -0.2288571  0.2014554 -1.1360    0.2578    
y_l3         0.2258756  0.1930617  1.1700    0.2439    
y_l4        -0.0801517  0.1521765 -0.5267    0.5992    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Test the hypothesis that real home prices follow a random walk by testing that the four AR coefficients jointly equal zero. (6 points)

Solution (c)

Code

# Wald test with HC3 covariance
linearHypothesis(
  ar4_fit,
  c("y_l1 = 0", "y_l2 = 0", "y_l3 = 0", "y_l4 = 0"),
  vcov = vcovHC(ar4_fit, type = "HC3")
)

Linear hypothesis test

Hypothesis:
y_l1 = 0
y_l2 = 0
y_l3 = 0
y_l4 = 0

Model 1: restricted model
Model 2: y ~ y_l1 + y_l2 + y_l3 + y_l4

Note: Coefficient covariance matrix supplied.

  Res.Df Df      F    Pr(>F)    
1    150                        
2    146  4 56.221 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpret the coefficient estimates and test result. (6 points)

Solution (d)

The coefficient on y_l1 (the first lag) is large (0.93), positive, and highly significant (p < 0.001). This indicates strong momentum: quarterly home price growth is highly persistent from one quarter to the next. The coefficients on y_l2, y_l3, and y_l4 are not statistically significant (p > 0.1), suggesting weaker or no additional predictive power from longer lags.

The joint Wald test (F = 56.19, p < 2.2e-16) strongly rejects the null hypothesis that all four AR coefficients are zero. This means the series does not follow a random walk; past home price growth helps predict future growth.

3. Panel IV Estimation

(40 points total)

Load the rural Guatemala panel dataset (Persistence_preferences_rural_Guatemala.dta). This dataset tracks households across four survey waves: 2019 (pre-COVID, wave 0), 2020, 2021, and 2022. The variable risk measures willingness to take risks on a scale from 0 (not willing) to 10 (fully willing), recorded as risk_0 through risk_3. The variable Tcases records confirmed COVID-19 cases per 10,000 people in the household’s municipality, recorded as Tcases_0 through Tcases_3. Treat risk as a continuous outcome.

You will estimate the panel AR(1) model: \[ R_{it} = \alpha R_{i,t-1} + u_i + \varepsilon_{it} \]

where \(R_{it}\) is risk tolerance for household \(i\) in wave \(t\) and \(u_i\) is a household fixed effect.

Reshape the data to long format and estimate the model using Arellano-Bond two-step GMM with all available lags as instruments and clustered standard errors. Report and briefly interpret the estimated persistence coefficient \(\hat{\alpha}\). (16 points)

Solution (a)

Code

guat <- haven::read_dta(here::here("assignment", "data",
                        "Persistence_preferences_rural_Guatemala.dta"))

Code

library(plm)

# Reshape risk and Tcases to long using data.table
risk_long <- melt(
  as.data.table(guat)[, .(SbjNum, risk_0, risk_1, risk_2, risk_3, Tcases_0, Tcases_1, Tcases_2, Tcases_3, A_com_mobrest_pmo_0, A_com_mobrest_pmo_1, A_com_mobrest_pmo_2, A_com_mobrest_pmo_3, hh_Hurri2020)],
  id.vars       = c("SbjNum", "hh_Hurri2020"),
  measure.vars  = list(
    risk   = c("risk_0",   "risk_1",   "risk_2",   "risk_3"),
    Tcases = c("Tcases_0", "Tcases_1", "Tcases_2", "Tcases_3"),
    A_com_mobrest_pmo    = c("A_com_mobrest_pmo_0", "A_com_mobrest_pmo_1", "A_com_mobrest_pmo_2", "A_com_mobrest_pmo_3")
  ),
  variable.name = "wave",
  value.name    = c("risk", "Tcases", "A_com_mobrest_pmo")
)[, wave := as.integer(wave) - 1L
][!is.na(risk)
][order(SbjNum, wave)]

p_risk <- pdata.frame(risk_long, index = c("SbjNum", "wave"))

# Arellano-Bond two-step, all available lags
ab_all <- pgmm(
  risk ~ lag(risk, 1) | lag(risk, 2:99),
  data           = p_risk,
  effect         = "individual",
  model          = "twosteps",
  transformation = "d",
  robust         = TRUE
)

summary(ab_all, robust = TRUE)

Oneway (individual) effect Two-steps model Difference GMM 

Call:
pgmm(formula = risk ~ lag(risk, 1) | lag(risk, 2:99), data = p_risk, 
    effect = "individual", model = "twosteps", transformation = "d", 
    robust = TRUE)

Balanced Panel: n = 1262, T = 4, N = 5048

Number of Observations Used: 2489
Residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-20.604  -6.362  -1.879  -1.698   3.060  20.604 

Coefficients:
             Estimate Std. Error z-value  Pr(>|z|)    
lag(risk, 1)  1.06040    0.17036  6.2244 4.836e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(2) = 54.07719 (p-value = 1.8084e-12)
Autocorrelation test (1): normal = -8.576952 (p-value = < 2.22e-16)
Wald test for coefficients: chisq(1) = 38.74254 (p-value = 4.8355e-10)

Code

ab_alpha     <- round(coef(ab_all)[["lag(risk, 1)"]], 3)
ab_alpha_se  <- round(summary(ab_all, robust=TRUE)$coefficients["lag(risk, 1)", "Std. Error"], 3)
ab_sargan    <- round(summary(ab_all)$sargan[["statistic"]], 3)

The estimated persistence coefficient is 1.06 (SE = 0.17, p < 0.001), which is greater than 1, implying explosive dynamics in risk tolerance. This is substantively implausible and is a red flag for estimator failure rather than a genuine finding. The Sargan test strongly rejects instrument validity (\(\chi^2(2)\) = 54.077, p < 0.001), confirming that the AB moment conditions do not hold. The likely explanation is that risk is highly persistent, so lagged levels carry little information about future differences; the classic weak instrument problem that causes AB to break down. We should not interpret \(\hat{\alpha}\) at face value here.

Re-estimate using Blundell-Bond two-step GMM with all available lags as instruments. Report the estimated \(\hat{\alpha}\) and compare it to the Arellano-Bond estimate. What does the difference, if any, suggest about the AB instruments? (12 points)

Solution (b)

Code

# Blundell-Bond two-step, all available lags
bb_all <- pgmm(
  risk ~ lag(risk, 1) | lag(risk, 2:99),
  data           = p_risk,
  effect         = "individual",
  model          = "twosteps",
  transformation = "ld",
  robust         = TRUE
)

summary(bb_all, robust = TRUE)

Oneway (individual) effect Two-steps model System GMM 

Call:
pgmm(formula = risk ~ lag(risk, 1) | lag(risk, 2:99), data = p_risk, 
    effect = "individual", model = "twosteps", transformation = "ld", 
    robust = TRUE)

Balanced Panel: n = 1262, T = 4, N = 5048

Number of Observations Used: 6256
Residuals:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-15.8468  -2.1694   1.4919   0.9122   4.1532  15.8468 

Coefficients:
             Estimate Std. Error z-value  Pr(>|z|)    
lag(risk, 1)  0.58468    0.02165  27.006 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(4) = 184.889 (p-value = < 2.22e-16)
Autocorrelation test (1): normal = -7.623322 (p-value = 2.4723e-14)
Autocorrelation test (2): normal = NaN (p-value = NA)
Wald test for coefficients: chisq(1) = 729.3529 (p-value = < 2.22e-16)

Code

bb_alpha     <- round(coef(bb_all)[["lag(risk, 1)"]], 3)
bb_sargan    <- round(summary(bb_all)$sargan[["statistic"]], 3)

The BB estimate of 0.585 is substantially lower than the AB estimate of 1.06 and more substantively plausible, indicating moderate persistence in risk tolerance across waves. The large divergence between the two estimators is itself informative: it confirms that the AB instruments were weak, and that BB’s additional levels moment conditions help recover a less attenuated estimate.

However, the BB Sargan test also strongly rejects (\(\chi^2(4)\) = 184.889, p < 0.001), casting doubt on BB’s own moment conditions, specifically the assumption of mean stationarity. This assumption requires that deviations of \(R_{i0}\) from the long-run mean are uncorrelated with the fixed effect \(u_i\), which is plausibly violated here: the 2020 hurricanes and COVID pandemic hit mid-panel and may have permanently shifted risk tolerance for affected households rather than producing transitory deviations. Sargan rejections are also common in short panels with \(T = 4\), where the test has limited power. Neither estimator is fully credible in isolation. Part (c) attempts to address this by controlling for the shocks directly.

Re-estimate using Blundell-Bond two-step GMM, adding Tcases (wave-varying municipal COVID case rate), hh_Hurri2020 (an indicator for whether the household was affected by the 2020 hurricanes), and A_com_mobrest_pmo (percentage of months the community was under mobility restriction) as controls. Why is BB more appropriate than AB here for identifying the effect of hh_Hurri2020? How does controlling for these shocks affect the estimated persistence coefficient \(\hat{\alpha}\) relative to part (b)? (12 points)

Solution (c)

Code

bb_covars <- pgmm(
  risk ~ lag(risk, 1) + Tcases + hh_Hurri2020 + A_com_mobrest_pmo | lag(risk, 2:99),
  data           = p_risk,
  effect         = "individual",
  model          = "twosteps",
  transformation = "ld",
  robust         = TRUE
)

summary(bb_covars, robust = TRUE)

Oneway (individual) effect Two-steps model System GMM 

Call:
pgmm(formula = risk ~ lag(risk, 1) + Tcases + hh_Hurri2020 + 
    A_com_mobrest_pmo | lag(risk, 2:99), data = p_risk, effect = "individual", 
    model = "twosteps", transformation = "ld", robust = TRUE)

Balanced Panel: n = 1262, T = 4, N = 5048

Number of Observations Used: 6309
Residuals:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-14.3424  -2.8288   0.4553   0.1467   3.0724  17.0656 

Coefficients:
                    Estimate Std. Error z-value  Pr(>|z|)    
lag(risk, 1)      0.42147428 0.02415460 17.4490 < 2.2e-16 ***
Tcases            0.00249882 0.00033819  7.3887 1.483e-13 ***
hh_Hurri2020      0.96518216 0.11988793  8.0507 8.232e-16 ***
A_com_mobrest_pmo 0.06935580 0.00270331 25.6559 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(7) = 252.2463 (p-value = < 2.22e-16)
Autocorrelation test (1): normal = -7.41554 (p-value = 1.2113e-13)
Autocorrelation test (2): normal = NaN (p-value = NA)
Wald test for coefficients: chisq(4) = 10178.89 (p-value = < 2.22e-16)

Code

bb_cov_alpha        <- round(coef(bb_covars)[["lag(risk, 1)"]], 3)
bb_cov_hurri        <- round(coef(bb_covars)[["hh_Hurri2020"]], 3)
bb_cov_tcases       <- round(coef(bb_covars)[["Tcases"]], 3)
bb_cov_mobrest      <- round(coef(bb_covars)[["A_com_mobrest_pmo"]], 3)
bb_cov_sargan       <- round(summary(bb_covars)$sargan[["statistic"]], 3)

The persistence coefficient falls from 0.585 in part (b) to 0.421 once shocks are controlled for, a meaningful reduction suggesting that part of the apparent persistence in part (b) was the sustained nature of the pandemic being absorbed into \(\hat{\alpha}\) rather than genuine state dependence in risk preferences.

As anticipated, hh_Hurri2020 is retained in the BB estimation (\(\hat{\beta}\) = 0.965, p < 0.001) because the levels equation identifies time-invariant regressors; AB would have differenced it out entirely. The positive coefficient indicates that hurricane-affected households report higher risk tolerance on average, even conditional on the lagged outcome and fixed effects. Similarly, both Tcases (\(\hat{\beta}\) = 0.002, p < 0.001) and A_com_mobrest_pmo (\(\hat{\beta}\) = 0.069, p < 0.001) enter positively: municipalities with higher COVID caseloads and longer mobility restrictions are associated with increased willingness to take risks. This is counterintuitive at first glance but consistent with literature suggesting aggregate shocks can increase risk-taking through wealth effects, reduced opportunity costs, or fatalistic responses to sustained hardship.

The Sargan test continues to reject (\(\chi^2(7)\) = 252.246, p < 0.001). This is unsurprising given the same mean stationarity concerns from part (b), and the addition of covariates does not resolve the underlying violation. Results should be interpreted with caution. The part (c) specification is the most defensible of the three, but all estimates are sensitive to the GMM moment conditions that the data appear to violate.