Problem Set 3

Handed out: February 9, 2026 | Due: February 18, 2026

Submit on Gradescope

Download CHJ2004.dta from Canvas

Download DDK2011.dta from Canvas

1. Tobit and CLAD Estimation

(30 points total)

Load the CHJ2004 dataset. The variables tinkind and income are household transfers received in-kind and household income, respectively. Divide both variables by 1000 to standardize. Create the regressor \(\text{Dincome} = (\text{income} - 1) \times \mathbb{1}\{\text{income} > 1\}\).

  1. Estimate a linear regression of tinkind on income and Dincome. Interpret the results. (5 points)
Code
# Load the data
data <- haven::read_dta(here::here("assignment", "data", "CHJ2004.dta"))
data$tinkind <- as.numeric(data$tinkind)
data$income <- as.numeric(data$income)

# Standardize variables
data <- data %>%
  mutate(
    tinkind = tinkind / 1000,
    income = income / 1000,      # now in thousands: median ~33
    Dincome = (income - 1) * as.numeric(income > 1)  # knot at $1,000
  )

# Linear regression
ols_model <- lm(tinkind ~ income + Dincome, data = data)
summary(ols_model)

Call:
lm(formula = tinkind ~ income + Dincome, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
 -4.647  -1.547  -1.166  -0.136 124.881 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.7024     0.6614   4.086 4.43e-05 ***
income       -1.5349     0.6674  -2.300   0.0215 *  
Dincome       1.5474     0.6675   2.318   0.0205 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.971 on 8681 degrees of freedom
Multiple R-squared:  0.01003,   Adjusted R-squared:  0.009801 
F-statistic: 43.97 on 2 and 8681 DF,  p-value: < 2.2e-16

The OLS regression estimates the linear relationship between in-kind transfers and income. The coefficient on income represents the marginal effect of income on transfers for income ≤ 1 (i.e., ≤$1000). The coefficient on Dincome captures the additional marginal effect for income > 1, so the total marginal effect for higher incomes is the sum of both coefficients. If transfers are targeted to lower-income households, we expect a negative relationship. However, OLS ignores the censoring at zero, which may lead to biased estimates.

  1. Calculate the percentage of censored observations (the percentage for which tinkind = 0). Do you expect censoring bias to be a problem in this example? (5 points)
Code
# Calculate percentage of censored observations
censored_pct <- mean(data$tinkind == 0) * 100

# Visualize the distribution
ggplot(data, aes(x = tinkind)) +
  geom_histogram(bins = 50, fill = "steelblue", alpha = 0.7) +
  labs(
    title = "Distribution of In-Kind Transfers",
    x = "In-Kind Transfers (thousands)",
    y = "Frequency"
  ) +
  theme_minimal()

If the censoring percentage is substantial (e.g., >20-30%), censoring bias is likely a concern. In this sample, the percentage of censored observations is 25.37%. The censored observations at zero represent a qualitatively different outcome (no transfers received) that standard OLS treats as just another value. This can lead to biased coefficient estimates, particularly if the censoring is related to the covariates. The larger the fraction of censored observations, the more severe the bias.

  1. Suppose you try and fix the problem by omitting the censored observations. Estimate the regression on the subsample of observations for which tinkind > 0. (5 points)
Code
# Estimate on uncensored subsample only
data_uncensored <- data %>% filter(tinkind > 0)

ols_uncensored <- lm(tinkind ~ income + Dincome, data = data_uncensored)
summary(ols_uncensored)

Call:
lm(formula = tinkind ~ income + Dincome, data = data_uncensored)

Residuals:
    Min      1Q  Median      3Q     Max 
 -7.261  -1.826  -1.291  -0.065 124.248 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.5602     0.8578   4.150 3.36e-05 ***
income       -2.1383     0.8658  -2.470   0.0135 *  
Dincome       2.1592     0.8660   2.493   0.0127 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.606 on 6478 degrees of freedom
Multiple R-squared:  0.02043,   Adjusted R-squared:  0.02013 
F-statistic: 67.57 on 2 and 6478 DF,  p-value: < 2.2e-16

Restricting the sample to positive transfers creates selection bias. In this sample, the full sample size is 8684 and the uncensored sample size is 6481. We’re only analyzing households that received transfers, which is a non-random subset of the population. This can produce inconsistent estimates because the conditional expectation of the error term is non-zero for the selected sample. The estimates from this truncated sample cannot be generalized to the full population and may give misleading inferences about the income-transfer relationship.

  1. Estimate a Tobit regression of tinkind on income and Dincome. (5 points)
Code
# Tobit model (left-censored at 0)
tobit_model <- tobit(tinkind ~ income + Dincome, 
                     left = 0, 
                     data = data)
summary(tobit_model)

Call:
tobit(formula = tinkind ~ income + Dincome, left = 0, data = data)

Observations:
         Total  Left-censored     Uncensored Right-censored 
          8684           2203           6481              0 

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.730699   0.828436   2.089   0.0367 *  
income      -1.564160   0.835993  -1.871   0.0613 .  
Dincome      1.573073   0.836165   1.881   0.0599 .  
Log(scale)   1.789990   0.008987 199.176   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Scale: 5.989 

Gaussian distribution
Number of Newton-Raphson Iterations: 2 
Log-likelihood: -2.242e+04 on 4 Df
Wald-statistic: 29.38 on 2 Df, p-value: 4.1679e-07 

The Tobit model explicitly accounts for the censoring at zero by modeling both the probability of receiving transfers (extensive margin) and the amount received conditional on receiving them (intensive margin). The coefficients represent the effect of covariates on the latent variable (desired transfers), not the observed transfers. To interpret effects on observed transfers, we need to compute marginal effects, which account for the censoring probability. Tobit is consistent under the assumption of normally distributed and homoskedastic errors.

  1. Estimate the same regression using CLAD. (5 points)
Code
# CLAD (Censored Least Absolute Deviations) using censored regression
# This is median regression for censored data
clad_model <- crq(
  Curv(tinkind, rep(0, nrow(data))) ~ income + Dincome,
  tau = 0.5,
  data = data,
  method = "Powell"
)
summary(clad_model)

tau: [1] 0.5

Coefficients:
            Value    Lower Bd Upper Bd Std Error T Value  Pr(>|t|)
(Intercept)  0.96400  0.08120  1.84680  0.45035   2.14054  0.03234
income      -0.56248 -1.45264  0.32768  0.45411  -1.23864  0.21551
Dincome      0.56257 -0.32770  1.45284  0.45416   1.23870  0.21549

CLAD (Censored Least Absolute Deviations) is a robust alternative to Tobit that estimates the conditional median instead of the conditional mean. Unlike Tobit, CLAD does not assume normality or homoskedasticity and is robust to outliers and distributional misspecification. However, it is computationally more intensive and may be less efficient than Tobit if the Tobit assumptions hold.

  1. Interpret and explain the differences between your results in (a)-(e). (5 points)
Code
# Compare coefficients across models
# Extract coefficients for comparison
coef_comparison <- data.frame(
  Model = c("OLS (Full)", "OLS (Uncensored)", "Tobit", "CLAD"),
  Income_Coef = c(
    coef(ols_model)["income"],
    coef(ols_uncensored)["income"],
    coef(tobit_model)["income"],
    clad_model$coefficients[2]
  ),
  Dincome_Coef = c(
    coef(ols_model)["Dincome"],
    coef(ols_uncensored)["Dincome"],
    coef(tobit_model)["Dincome"],
    clad_model$coefficients[3]
  )
)

print(coef_comparison)
             Model Income_Coef Dincome_Coef
1       OLS (Full)  -1.5349414     1.547391
2 OLS (Uncensored)  -2.1383392     2.159177
3            Tobit  -1.5641601     1.573073
4             CLAD  -0.5624775     0.562572

The CLAD coefficients are substantially smaller in magnitude than OLS/Tobit estimates, suggesting the median transfer is much less sensitive to income than the mean transfer. This indicates a right-skewed distribution where a few households receive large transfers, pulling up the mean but not the median.

  1. OLS (Full Sample): Treats censored observations as actual zeros, leading to biased estimates when censoring is present. The estimates are inconsistent because the conditional expectation of errors is non-zero at the censoring point.

  2. OLS (Uncensored Only): Creates selection bias by analyzing only the selected subsample that received transfers. This produces inconsistent estimates that don’t represent the population relationship.

  3. Tobit: Accounts for censoring under normality and homoskedasticity assumptions. Provides consistent estimates of the latent variable coefficients if assumptions hold. Most efficient when correctly specified but sensitive to misspecification.

  4. CLAD: Robust to distributional assumptions, estimates conditional median effects. More reliable when Tobit assumptions are violated but potentially less efficient when they hold.

2. Censored Outcomes

(30 points total)

Load the DDK2011 dataset. Create a variable testscore which is totalscore standardized to have mean zero and variance one. The variable tracking is a dummy indicating that the students were assigned to different classes based on initial test scores. The variable percentile is the student’s percentile in the initial distribution. For the following regressions, cluster by school.

  1. Estimate a linear regression of testscore on tracking, percentile, and \(\text{percentile}^2\). Interpret the results. (8 points)
Code
# Load the data
data2 <- read_dta(here::here("assignment", "data", "DDK2011.dta"))

# Standardize test score
data2 <- data2 %>%
  mutate(
    testscore = (totalscore - mean(totalscore, na.rm = TRUE)) / 
                 sd(totalscore, na.rm = TRUE),
    percentile2 = percentile^2
  )

# Linear regression with clustered standard errors
reg_a <- lm(testscore ~ tracking + percentile + percentile2, data = data2)

# Cluster standard errors by school
coeftest(reg_a, vcov = vcovCL, cluster = ~ schoolid, data = data2)

t test of coefficients:

               Estimate  Std. Error  t value  Pr(>|t|)    
(Intercept) -8.3886e-01  6.0695e-02 -13.8210 < 2.2e-16 ***
tracking     1.6196e-01  7.6135e-02   2.1273 0.0334414 *  
percentile   1.0368e-02  1.7724e-03   5.8499 5.212e-09 ***
percentile2  6.4021e-05  1.7122e-05   3.7391 0.0001866 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This regression estimates the effect of tracking on standardized test scores, controlling for students’ initial percentile rank (allowing for a nonlinear relationship via the quadratic term).

  • The tracking coefficient measures the average treatment effect of being in a tracked classroom.
  • The percentile and percentile2 coefficients capture the baseline relationship between initial ability and outcomes (potentially U-shaped or inverted U-shaped).
  • Clustering by school accounts for within-school correlation in outcomes, providing robust standard errors for inference.

If tracking is positive and significant, it suggests tracking improves test scores on average. The percentile terms control for mean reversion and differential effects by initial ability.

  1. Suppose the scores were censored from below. Create a variable ctest which is testscore censored at 0. Estimate a linear regression of ctest on tracking, percentile, and \(\text{percentile}^2\). How would you interpret these results if you were unaware that the dependent variable was censored? (8 points)
Code
# Create censored test score
data2 <- data2 %>%
  mutate(ctest = pmax(testscore, 0))

# Check censoring
censored_count <- sum(data2$testscore < 0)
censored_pct2 <- mean(data2$testscore < 0) * 100
cat(sprintf("Number of censored observations: %d (%.2f%%)\n", 
            censored_count, censored_pct2))
Number of censored observations: 3343 (57.69%)
Code
# Linear regression on censored outcome
reg_b <- lm(ctest ~ tracking + percentile + percentile2, data = data2)
coeftest(reg_b, vcov = vcovCL, cluster = ~ schoolid, data = data2)

t test of coefficients:

               Estimate  Std. Error t value Pr(>|t|)    
(Intercept)  8.5641e-02  3.3674e-02  2.5433  0.01101 *  
tracking     8.8785e-02  4.9456e-02  1.7952  0.07267 .  
percentile  -1.9514e-03  1.1308e-03 -1.7256  0.08447 .  
percentile2  1.0806e-04  1.2589e-05  8.5833  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this sample, the number of censored observations is 3343 (57.69%). If you were unaware that the dependent variable was censored, you might interpret the coefficients as usual linear regression estimates of the effects on test scores. However, this would be misleading because:

  1. Attenuation Bias: The censoring compresses the distribution from below, attenuating the estimated effects toward zero. The true effects are likely larger in magnitude.
  2. Nonlinearity Ignored: OLS assumes a linear relationship, but censoring introduces a nonlinearity (kink at zero) that OLS cannot capture.
  3. Heterogeneous Effects: Censoring affects different groups differently. Students near the censoring point experience larger measurement distortions, so the estimated average effects don’t represent any particular subgroup well.
  4. Incorrect Standard Errors: The standard errors don’t account for the censoring, leading to invalid inference.
  1. Suppose you try and fix the problem by omitting the censored observations. Estimate the regression on the subsample of observations for which ctest is positive. (7 points)
Code
# Estimate on uncensored subsample only
data2_uncensored <- data2 %>% filter(ctest > 0)

reg_c <- lm(ctest ~ tracking + percentile + percentile2, data = data2_uncensored)
coeftest(reg_c, vcov = vcovCL, cluster = ~ schoolid, data = data2_uncensored)

t test of coefficients:

               Estimate  Std. Error t value  Pr(>|t|)    
(Intercept)  7.5590e-01  9.4651e-02  7.9861 2.193e-15 ***
tracking     7.5412e-02  6.1186e-02  1.2325   0.21788    
percentile  -6.1390e-03  2.9928e-03 -2.0513   0.04035 *  
percentile2  1.1357e-04  2.4721e-05  4.5940 4.587e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Dropping censored observations creates selection bias. In this sample, the full sample size is 5795, the uncensored sample size is 2452, and the number of observations dropped is 3343. We’re now estimating the effect of tracking only among students who scored above zero (higher-performing students). This produces several problems:

  1. Non-representative Sample: The estimates apply only to the selected subsample, not the full population of students.
  2. Sample Selection Bias: If tracking affects who gets censored, we’re conditioning on a post-treatment outcome, which violates the exogeneity assumption.
  3. Truncation Bias: Even if selection is exogenous, truncating the sample changes the conditional distribution being estimated, leading to inconsistent estimates of the population parameters.
  4. Loss of Information: Discarding observations reduces statistical power and efficiency.

The estimates from this truncated regression cannot be interpreted as causal effects for the full population.

  1. Interpret and explain the differences between your results in (a), (b), and (c). (7 points)
Code
# Create comparison table
comparison <- data.frame(
  Model = c("(a) Uncensored", "(b) OLS Censored", "(c) OLS Truncated"),
  Tracking = c(
    coef(reg_a)["tracking"],
    coef(reg_b)["tracking"],
    coef(reg_c)["tracking"]
  ),
  SE_Tracking = c(
    sqrt(vcovCL(reg_a, cluster = ~ schoolid)["tracking", "tracking"]),
    sqrt(vcovCL(reg_b, cluster = ~ schoolid)["tracking", "tracking"]),
    sqrt(vcovCL(reg_c, cluster = ~ schoolid)["tracking", "tracking"])
  ),
  N = c(nrow(data2), nrow(data2), nrow(data2_uncensored))
)

print(comparison)
              Model   Tracking SE_Tracking    N
1    (a) Uncensored 0.16196137  0.07613480 5795
2  (b) OLS Censored 0.08878514  0.04945559 5795
3 (c) OLS Truncated 0.07541222  0.06118566 2452
Code
# Visualization comparing distributions
plot_data <- data2 %>%
  dplyr::select(testscore, ctest) %>%
  pivot_longer(everything(), names_to = "Variable", values_to = "Score")

ggplot(plot_data, aes(x = Score, fill = Variable)) +
  geom_histogram(alpha = 0.5, bins = 50, position = "identity") +
  geom_vline(xintercept = 0, linetype = "dashed", color = "red") +
  labs(
    title = "Distribution of Test Scores: Censored vs Uncensored",
    x = "Test Score",
    y = "Frequency"
  ) +
  theme_minimal()

Model (a) - True Uncensored Data:

  • Estimates the true relationship between tracking and test scores
  • No bias from censoring or selection
  • Provides the most reliable estimates for population inference
  • Serves as the benchmark for comparison

Model (b) - OLS on Censored Data:

  • Attenuation Bias: Coefficients are biased toward zero because censoring compresses the distribution
  • The pile-up of observations at zero (censoring point) violates OLS assumptions
  • Standard errors are incorrect because they don’t account for the discrete mass at zero
  • Underestimates the true treatment effect magnitude
  • Despite bias, uses full sample information

Model (c) - OLS on Truncated Data:

  • Selection Bias: Estimates apply only to higher-scoring students (ctest > 0)
  • Cannot be generalized to the full population
  • May over- or under-estimate effects depending on whether tracking differentially affects low vs. high scorers
  • Loss of precision due to smaller sample size
  • Conditioning on outcome-related variable (post-treatment) creates endogeneity

3. Silverman’s Optimal Bandwidth Rule of Thumb

(40 points total)

Recall that the asymptotically optimal bandwidth for kernel density estimation, obtained by minimizing the asymptotic integrated mean squared error (AIMSE), is

\[ h_0 = \left( \frac{R(K)}{\mu_2(K)^2 R(f'')} \right)^{1/5} n^{-1/5}, \tag{17.11} \]

where \[ R(K) = \int K(u)^2 \, du, \quad \mu_2(K) = \int u^2 K(u)\, du, \quad R(f'') = \int (f''(x))^2 \, dx. \]

The quantity \(R(f'')\) is unknown in practice.

Assume that the data are generated from a normal distribution \(X \sim \mathcal{N}(0, \sigma^2)\). Derive Silverman’s rule-of-thumb bandwidth and show that the optimal bandwidth can be written as \(h_r = \sigma C_K n^{-1/5}\), where the constant \(C_K\) depends only on the kernel function. For the Gaussian kernel, show that \(C_K \approx 1.059\).

Hint: Start from equation (17.11). Compute \(R(f'')\) explicitly under the assumption that \(f\) is the normal density, and substitute the result into the optimal bandwidth formula. You may use the following standard Gaussian integrals for \(n\) even:

\[\int_{-\infty}^{\infty} z^n e^{-z^2} \, dz = \frac{(n-1)!!}{2^{n/2}} \sqrt{\pi}\]

where \((n-1)!! = (n-1)(n-3)\cdots 3 \cdot 1\) is the double factorial.

Specifically: \(\int_{-\infty}^{\infty} e^{-z^2} \, dz = \sqrt{\pi}\), \(\int_{-\infty}^{\infty} z^2 e^{-z^2} \, dz = \frac{\sqrt{\pi}}{2}\), \(\int_{-\infty}^{\infty} z^4 e^{-z^2} \, dz = \frac{3\sqrt{\pi}}{4}\).

Overview

Silverman’s rule-of-thumb provides a simple, data-driven bandwidth selection method for kernel density estimation by assuming the underlying distribution is normal. This allows us to compute the unknown quantity \(R(f'')\) analytically and derive a closed-form bandwidth that depends only on the sample standard deviation and sample size.

Step 1: Write the Normal Density and Its Derivatives

Let \(X \sim \mathcal{N}(0, \sigma^2)\), so the probability density function is

\[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} \exp\!\left(-\frac{x^2}{2\sigma^2}\right). \]

First Derivative:

\[ f'(x) = \frac{d}{dx}\left[\frac{1}{\sigma\sqrt{2\pi}} \exp\!\left(-\frac{x^2}{2\sigma^2}\right)\right] = -\frac{x}{\sigma^3\sqrt{2\pi}} \exp\!\left(-\frac{x^2}{2\sigma^2}\right). \]

Second Derivative:

Using the product rule on \(f'(x) = -\frac{x}{\sigma^3\sqrt{2\pi}} e^{-x^2/(2\sigma^2)}\):

\[ f''(x) = \frac{d}{dx}\left[-\frac{x}{\sigma^3\sqrt{2\pi}} e^{-x^2/(2\sigma^2)}\right] \]

\[ = -\frac{1}{\sigma^3\sqrt{2\pi}} e^{-x^2/(2\sigma^2)} + \frac{x^2}{\sigma^5\sqrt{2\pi}} e^{-x^2/(2\sigma^2)} \]

\[ = \frac{1}{\sigma^5\sqrt{2\pi}} (x^2 - \sigma^2) \exp\!\left(-\frac{x^2}{2\sigma^2}\right). \]

Step 2: Compute \(R(f'') = \int_{-\infty}^{\infty} [f''(x)]^2 \, dx\)

Squaring \(f''(x)\):

\[ [f''(x)]^2 = \frac{1}{\sigma^{10} \cdot 2\pi} (x^2 - \sigma^2)^2 \exp\!\left(-\frac{x^2}{\sigma^2}\right). \]

Therefore,

\[ R(f'') = \int_{-\infty}^{\infty} [f''(x)]^2 \, dx = \frac{1}{2\pi \sigma^{10}} \int_{-\infty}^{\infty} (x^2 - \sigma^2)^2 \exp\!\left(-\frac{x^2}{\sigma^2}\right) dx. \]

Change of Variables: Let \(z = \frac{x}{\sigma}\), so \(x = \sigma z\) and \(dx = \sigma \, dz\):

\[ R(f'') = \frac{1}{2\pi\sigma^{10}} \int_{-\infty}^{\infty} (\sigma^2 z^2 - \sigma^2)^2 e^{-z^2} \sigma \, dz \]

\[ = \frac{1}{2\pi\sigma^{10}} \cdot \sigma^4 \cdot \sigma \int_{-\infty}^{\infty} (z^2 - 1)^2 e^{-z^2} dz \]

\[ = \frac{1}{2\pi\sigma^{5}} \int_{-\infty}^{\infty} (z^2 - 1)^2 e^{-z^2} dz. \]

Expand \((z^2 - 1)^2\):

\[ (z^2 - 1)^2 = z^4 - 2z^2 + 1. \]

So we need:

\[ \int_{-\infty}^{\infty} (z^4 - 2z^2 + 1) e^{-z^2} dz = \int_{-\infty}^{\infty} z^4 e^{-z^2} dz - 2\int_{-\infty}^{\infty} z^2 e^{-z^2} dz + \int_{-\infty}^{\infty} e^{-z^2} dz. \]

Using Standard Gaussian Integrals:

Recall that for \(n\) even, \[ \int_{-\infty}^{\infty} z^n e^{-z^2} dz = \frac{(n-1)!!}{2^{n/2}} \sqrt{\pi}, \] where \((n-1)!! = (n-1)(n-3)\cdots 3 \cdot 1\).

  • For \(n=0\): \(\int e^{-z^2} dz = \sqrt{\pi}\)
  • For \(n=2\): \(\int z^2 e^{-z^2} dz = \frac{1}{2}\sqrt{\pi}\)
  • For \(n=4\): \(\int z^4 e^{-z^2} dz = \frac{3}{4}\sqrt{\pi}\)

Therefore:

\[ \int_{-\infty}^{\infty} (z^2-1)^2 e^{-z^2} dz = \frac{3\sqrt{\pi}}{4} - 2 \cdot \frac{\sqrt{\pi}}{2} + \sqrt{\pi} = \frac{3\sqrt{\pi}}{4} - \sqrt{\pi} + \sqrt{\pi} = \frac{3\sqrt{\pi}}{4}. \]

Substitute back:

\[ R(f'') = \frac{1}{2\pi\sigma^5} \cdot \frac{3\sqrt{\pi}}{4} = \frac{3}{8\sqrt{\pi}\sigma^5}. \]

Step 3: Substitute into Optimal Bandwidth Formula

From equation (17.11):

\[ h_0 = \left(\frac{R(K)}{\mu_2(K)^2 R(f'')}\right)^{1/5} n^{-1/5}. \]

Substituting \(R(f'') = \frac{3}{8\sqrt{\pi}\sigma^5}\):

\[ h_0 = \left(\frac{R(K)}{\mu_2(K)^2} \cdot \frac{8\sqrt{\pi}\sigma^5}{3}\right)^{1/5} n^{-1/5} \]

\[ = \left(\frac{8\sqrt{\pi} R(K)}{3\mu_2(K)^2}\right)^{1/5} \sigma n^{-1/5}. \]

Define the constant:

\[ C_K = \left(\frac{8\sqrt{\pi} R(K)}{3\mu_2(K)^2}\right)^{1/5}. \]

Thus, Silverman’s rule-of-thumb bandwidth is:

\[ h_r = \sigma C_K n^{-1/5} \]

Step 4: Evaluate \(C_K\) for the Gaussian Kernel

For the Gaussian kernel:

\[ K(u) = \frac{1}{\sqrt{2\pi}} \exp\!\left(-\frac{u^2}{2}\right), \]

we need to compute \(R(K)\) and \(\mu_2(K)\).

Compute \(R(K) = \int K(u)^2 \, du\):

\[ K(u)^2 = \frac{1}{2\pi} \exp(-u^2). \]

\[ R(K) = \int_{-\infty}^{\infty} \frac{1}{2\pi} e^{-u^2} du = \frac{1}{2\pi} \cdot \sqrt{\pi} = \frac{1}{2\sqrt{\pi}}. \]

Compute \(\mu_2(K) = \int u^2 K(u) \, du\):

\[ \mu_2(K) = \int_{-\infty}^{\infty} u^2 \cdot \frac{1}{\sqrt{2\pi}} e^{-u^2/2} du. \]

This is the second moment of a standard normal distribution, so:

\[ \mu_2(K) = 1. \]

Substitute into \(C_K\):

\[ C_K = \left(\frac{8\sqrt{\pi} \cdot \frac{1}{2\sqrt{\pi}}}{3 \cdot 1^2}\right)^{1/5} = \left(\frac{8\sqrt{\pi}}{2\sqrt{\pi} \cdot 3}\right)^{1/5} = \left(\frac{4}{3}\right)^{1/5}. \]

Numerical evaluation:

\[ \left(\frac{4}{3}\right)^{1/5} = (1.3333...)^{0.2} \approx 1.0593. \]

Therefore:

\[ C_K \approx 1.059 \]

and Silverman’s rule-of-thumb bandwidth for the Gaussian kernel is:

\[ h_r = 1.059 \, \sigma \, n^{-1/5}. \]