IV and Dynamic Panel Data Models

Coding Review

Setup: Data and Packages

Key Packages

For Panel IV and Dynamic GMM:

  • plm::plm(): Fixed effects, random effects, and panel IV (2SLS)
  • plm::pgmm(): Arellano-Bond and Blundell-Bond GMM
  • plm::pdata.frame(): Panel-indexed data frame
  • lmtest::coeftest() + sandwich::vcovHC(): Robust standard errors

The Dataset: EmplUK

We use the EmplUK dataset from the plm package; the same dataset used in the original Arellano and Bond (1991) paper. It tracks 140 UK manufacturing firms across 7–9 years (1976–1984), with information on employment, wages, capital, and output.

Variable Description
firm Firm identifier
year Year (1976–1984)
emp Log employment
wage Log real wages
capital Log capital stock
output Log real output

The model of interest is a dynamic employment equation. Firms smooth employment over time (persistence), but wages and capital also affect hiring decisions. The fixed effect \(u_i\) captures permanent firm-level differences (management quality, sector, location).


Visualizing the Problem: Why Fixed Effects Fails

Exercise 1: The Bias in Simulation

Before touching the real data, we simulate the bias directly to build intuition. We generate panels under a known AR(1) DGP and compare fixed effects to a consistent estimator across different values of \(T\).

# The code above is complete, just run it.
# Key result: as T grows, the Nickell bias shrinks toward zero.
# At T=3, the bias is close to the theoretical -(1+0.7)/2 = -0.85.
Visualizing how the bias shrinks with T

The fixed effects estimator converges to the true \(\alpha\) only as \(T \to \infty\). For the short panels typical in economic research (\(T = 3\)\(6\)), the bias is severe, often large enough to flip the sign of the estimated coefficient. This is not a small-sample curiosity; it is a structural inconsistency that does not vanish with more individuals \(N\).


Panel Data Setup with plm

Exercise 2: Creating a pdata.frame and Exploring Persistence

Before estimating anything, we need to tell R that the data has a panel structure. The pdata.frame() function indexes the data by individual and time, enabling panel-aware functions.

Note

The index argument takes a length-2 character vector: the individual identifier and the time identifier. For EmplUK, these are "firm" and "year".

p_empl <- pdata.frame(EmplUK, index = c("firm", "year"))

The Three Estimators Side by Side

Exercise 3: OLS, Fixed Effects, and the Bias

We now estimate the dynamic employment equation three ways: pooled OLS, fixed effects, and the true value implied by theory, and directly observe the bias in real data.

The model is:

\[ \text{emp}_{it} = \alpha \, \text{emp}_{i,t-1} + \beta_1 \text{wage}_{it} + \beta_2 \text{capital}_{it} + u_i + \varepsilon_{it} \]


Arellano-Bond: Difference GMM

Exercise 4: Estimating Arellano-Bond with pgmm()

The pgmm() function implements both Arellano-Bond (difference GMM, transformation = "d") and Blundell-Bond (system GMM, transformation = "ld"). The instrument specification uses the pipe | to separate the model formula from the instrument formula.

Note

The instrument formula lag(emp, 2:3) uses lags 2 and 3. Fill in 2 and 3. The | separates the model from the instruments, everything after it specifies what instruments to use for the lagged dependent variable.

ab_fit <- pgmm(
  emp ~ lag(emp, 1) + wage + capital | lag(emp, 2:3),
  data           = p_empl,
  effect         = "individual",
  model          = "twosteps",
  transformation = "d",
  robust         = TRUE
)
summary(ab_fit, robust = TRUE)
Reading the pgmm output

The summary() output includes several key diagnostics beyond the coefficient table. Here are the actual results for the Arellano-Bond (Difference GMM, lags 2:3) fit:

Sargan test: chisq(12) = 17.86 (p-value = 0.12)

The null hypothesis is that all instruments are exogenous. Here, p = 0.12 > 0.05, so we do not reject the null: the instruments pass the test (no evidence they are invalid).

Autocorrelation test: - AR(1): normal = -1.32 (p-value = 0.19) - AR(2): normal = -1.15 (p-value = 0.25)

For AR(1), you expect to reject (find autocorrelation) due to differencing, but for AR(2), you want to not reject (p = 0.25 > 0.05), which is the case here: no evidence of problematic second-order autocorrelation.

Wald test for coefficients: chisq(3) = 51.97 (p-value = 3.0e-11)

The null hypothesis is that all coefficients are zero. Here, p < 0.001, so we reject the null: at least one coefficient is statistically significant.

Exercise 5: The Many-Instruments Problem

One danger with Arellano-Bond is using too many instruments. Here we compare a restricted instrument set (lags 2–3) against the full set (all available lags) and examine how the Sargan test responds.

ab_limited <- pgmm(
  emp ~ lag(emp, 1) + wage + capital | lag(emp, 2:3),
  data = p_empl, effect = "individual",
  model = "twosteps", transformation = "d", robust = TRUE
)

ab_all <- pgmm(
  emp ~ lag(emp, 1) + wage + capital | lag(emp, 2:99),
  data = p_empl, effect = "individual",
  model = "twosteps", transformation = "d", robust = TRUE
)

Blundell-Bond: System GMM

Exercise 6: When AB is Weak -> Blundell-Bond

Blundell-Bond adds a second set of moment conditions using the level equation, instrumented with lagged differences. The only change in pgmm() is transformation = "ld" (levels + differences = system GMM).

Note

For Blundell-Bond (system GMM), use transformation = "ld": “l” for the levels equation, “d” for the differenced equation. For Arellano-Bond (difference GMM only), use transformation = "d".

bb_fit <- pgmm(
  emp ~ lag(emp, 1) + wage + capital | lag(emp, 2:3),
  data           = p_empl,
  effect         = "individual",
  model          = "twosteps",
  transformation = "ld",
  robust         = TRUE
)

Comparing All Estimators

Exercise 7: Building the Comparison Table

This is the key diagnostic exercise: side-by-side comparison of all four estimates of \(\alpha\) with a visualization that connects directly to the theory.

# All code above is complete, just run it.
# The plot shows the four estimates with 95% confidence intervals.
# The green band marks the range between the two biased bounds.
# AB and BB should fall within or near this band.
What the plot tells you

Reading the plot from left to right, we see how the estimators behave in real data:

Pooled OLS (red): The OLS estimate of \(\alpha\) is upward biased, as it absorbs the persistent individual effect \(u_i\) into the lagged coefficient. Firms with high permanent employment appear to have high persistence, even if that’s not truly the case.

Fixed Effects (blue): The FE estimate is much lower, reflecting the bias. With \(T \approx 7\)\(9\) for most firms in this dataset, the bias is moderate but still present.

Arellano-Bond and Blundell-Bond (green): In theory, these GMM estimates should fall between OLS and FE, correcting the bias without absorbing \(u_i\). However, in this real-world example, both GMM estimates are actually above OLS. This can happen due to weak instruments, short panel length, or other data quirks, even when diagnostic tests (like Sargan and AR(2)) do not reject the model. This highlights the importance of not just relying on theory, but also carefully checking diagnostics and interpreting results in context.


Panel IV for Endogenous Regressors

Exercise 8: Fixed Effects 2SLS with plm()

How does panel IV differ from GMM?

Panel IV (2SLS) estimation, as shown here, is used when you suspect a specific regressor (like wage) is endogenous—correlated with the idiosyncratic error term—often due to simultaneity or omitted variables. You provide external instruments (e.g., lagged values of wage) that are assumed to be correlated with the endogenous regressor but not with the error.

In contrast, GMM methods for dynamic panels (like Arellano-Bond and Blundell-Bond) are designed to address the bias that arises when the lagged dependent variable is included as a regressor (dynamic endogeneity). GMM uses internal instruments: lags of the dependent variable itself, exploiting the panel structure to generate valid instruments under certain assumptions. GMM can also handle additional endogenous regressors, but its main innovation is dealing with the dynamic panel bias.

Summary: - Use panel IV when you have a specific endogenous regressor and valid external instruments. - Use GMM when you need to address dynamic panel bias (lagged dependent variable) and want to use internal instruments (lags of the outcome and/or regressors).

Sometimes a time-varying regressor is endogenous in the idiosyncratic error, not just correlated with the fixed effect. The plm() function handles panel IV estimation with the inst argument (or using a two-part formula with |).

Here we treat wages (wage) as potentially endogenous in the employment equation (reverse causality: firms with unexpectedly high employment growth may bid up wages). We use lagged wages as an instrument.

fe_ols <- plm(emp ~ wage + capital,
              data = p_empl, model = "within", effect = "individual")

fe_iv  <- plm(emp ~ wage + capital | lag(wage, 1) + capital,
              data = p_empl, model = "within", effect = "individual")

coeftest(fe_ols, vcov = vcovHC(fe_ols, type = "HC3"))
coeftest(fe_iv,  vcov = vcovHC(fe_iv,  type = "HC3"))
Interpretation

If wages are endogenous (correlated with \(\varepsilon_{it}\) in the employment equation), the FE-OLS estimate of the wage coefficient will be biased. The direction depends on the nature of the simultaneity: if positive demand shocks both raise employment and wages, OLS overstates the wage elasticity. If wages are sticky and high-wage firms cut employment, OLS could understate it.

The lagged wage \(\text{wage}_{i,t-1}\) is a natural instrument because past wages predict current wages (relevant) but are unlikely to be driven by current-period idiosyncratic shocks (exogenous under strict exogeneity). The comparison between FE-OLS and FE-IV tells you how much the endogeneity bias matters in practice.


Summary

Estimator R function Key argument Bias When to use
Pooled OLS lm() Upward (absorbs \(u_i\)) Never for dynamic models
Fixed Effects plm(..., model="within") Nickell (downward, fixed-T) Never for dynamic models
Arellano-Bond pgmm(..., transformation="d") lag(y, 2:k) Consistent Moderate \(\alpha\), larger \(T\)
Blundell-Bond pgmm(..., transformation="ld") lag(y, 2:k) Consistent Near-unit-root, small \(T\)
Panel IV (static) plm(y ~ X | Z, model="within") | instruments Consistent if Z valid Endogenous time-varying X