Problem Set 1

Handed out: January 26, 2026 | Due: February 4, 2026

Submit on Gradescope

Download grogger.dta from Canvas

1. Maximum Likelihood

(40 points total)

Let \(X\) be distributed as N(\(\mu,\sigma^2\)). The unknown parameters are \(\mu\) and \(\sigma^2\).

  1. Find the log-likelihood function \(\ell_n (\mu,\sigma^2)\). (8 points)

The probability density function for a normal distribution is: \[f(x|\mu,\sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\]

The log-likelihood for a single observation is: \[\log f(x_i|\mu,\sigma^2) = -\frac{1}{2}\log(2\pi) - \frac{1}{2}\log(\sigma^2) - \frac{(x_i-\mu)^2}{2\sigma^2}\]

For \(n\) observations, the log-likelihood function is: \[\ell_n(\mu,\sigma^2) = \sum_{i=1}^n \log f(x_i|\mu,\sigma^2) = -\frac{n}{2}\log(2\pi) - \frac{n}{2}\log(\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^n(x_i-\mu)^2\]

  1. Take the first order condition with respect to \(\mu\), and show that the solution for \(\hat{\mu}\) does not depend on \(\sigma^2\). (10 points)

Taking the derivative with respect to μ: \[\frac{\partial \ell_n}{\partial \mu} = \frac{1}{2\sigma^2} \cdot 2\sum_{i=1}^n(x_i-\mu) = \frac{1}{\sigma^2}\sum_{i=1}^n(x_i-\mu)\]

Setting the first order condition equal to zero: \[\frac{1}{\sigma^2}\sum_{i=1}^n(x_i-\mu) = 0\]

\[\sum_{i=1}^n x_i - n\mu = 0\]

\[\hat{\mu} = \frac{1}{n}\sum_{i=1}^n x_i = \bar{x}\]

Note that \(\sigma^2\) cancels out, so \(\hat{\mu}\) does not depend on \(\sigma^2\).

  1. Define the concentrated log-likelihood function \(\ell_n (\hat{\mu},\sigma^2)\). The concentrated log-likelihood substitutes the MLE \(\hat{\mu} = \bar{x}\) from part (b) into the original log-likelihood, treating it as a function of \(\sigma^2\) only. Take the first order condition with respect to \(\sigma^2\), and find the MLE \(\hat{\sigma}^2\) for \(\sigma^2\). (12 points)

From part (b), we found that \(\hat{\mu} = \bar{x}\). We now substitute this into the log-likelihood from part (a), treating \(\hat{\mu}\) as given (no longer a parameter to estimate).

The concentrated log-likelihood is: \[\ell_n(\hat{\mu},\sigma^2) = -\frac{n}{2}\log(2\pi) - \frac{n}{2}\log(\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^n(x_i-\bar{x})^2\]

Taking the derivative with respect to \(\sigma^2\): \[\frac{\partial \ell_n}{\partial \sigma^2} = -\frac{n}{2\sigma^2} + \frac{1}{2(\sigma^2)^2}\sum_{i=1}^n(x_i-\bar{x})^2\]

Setting equal to zero: \[-\frac{n}{2\sigma^2} + \frac{1}{2(\sigma^2)^2}\sum_{i=1}^n(x_i-\bar{x})^2 = 0\]

\[\frac{1}{(\sigma^2)^2}\sum_{i=1}^n(x_i-\bar{x})^2 = \frac{n}{\sigma^2}\]

\[\hat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2\]

This is the MLE for \(\sigma^2\). By concentrating out \(\mu\) first, we simplified the optimization to a single-parameter problem.

  1. Generate 1000 observations from N(\(\mu = 10, \sigma^2 = 4\)). (10 points)
  1. Write an R function loglik_normal(theta, y) that computes the log-likelihood, where theta = c(mu, sigma2).

  2. Use optim() to estimate μ and σ². Use starting values theta_init = c(0, 1) and set.seed(8213).

  3. Compare your MLE estimates to:

    • The true parameters (μ = 10, σ² = 4)
    • The sample mean \(\bar{y}\) and sample variance \(s^2 = \frac{1}{n-1}\sum_{i=1}^n(y_i - \bar{y})^2\)

    Present your results in a table.

Code
# Set seed for reproducibility
set.seed(8213)

# Generate data
n <- 1000
mu_true <- 10
sigma2_true <- 4
y <- rnorm(n, mean = mu_true, sd = sqrt(sigma2_true))

# i. Log-likelihood function
loglik_normal <- function(theta, y) {
  mu <- theta[1]
  sigma2 <- theta[2]
  if (sigma2 <= 0) return(-Inf)
  n <- length(y)
  ll <- -n/2 * log(2*pi) - n/2 * log(sigma2) - 
        sum((y - mu)^2) / (2*sigma2)
  return(ll)
}

# ii. Estimate using optim
theta_init <- c(0, 1)
result <- optim(
  par = theta_init,
  fn = loglik_normal,
  y = y,
  control = list(fnscale = -1),
  method = "L-BFGS-B",
  lower = c(-Inf, 0.01)
)

mu_hat <- result$par[1]
sigma2_hat <- result$par[2]

# iii. Compare estimates
sample_mean <- mean(y)
sample_var_mle <- sum((y - mean(y))^2) / n

comparison <- data.frame(
  Parameter = c("μ", "μ", "σ²", "σ²"),
  Method = c("True value", "MLE / Sample mean", 
             "True value", "MLE (n denominator)"),
  Estimate = c(mu_true, mu_hat, sigma2_true, sigma2_hat)
)

knitr::kable(comparison, digits = 4, 
             caption = "Comparison of MLE Estimates")
Comparison of MLE Estimates
Parameter Method Estimate
μ True value 10.0000
μ MLE / Sample mean 10.0274
σ² True value 4.0000
σ² MLE (n denominator) 4.1313

2. Binary Choice Logistic Distribution

(20 points total)

For the logistic distribution \(\Lambda(x) = (1+\exp(-x))^{-1}\), verify that

  1. \(\frac{d}{dx} \Lambda(x) = \Lambda(x)(1-\Lambda(x))\). (5 points)

Let \(\Lambda(x) = (1+e^{-x})^{-1}\). Using the chain rule:

\[\frac{d}{dx}\Lambda(x) = -(1+e^{-x})^{-2} \cdot (-e^{-x}) = \frac{e^{-x}}{(1+e^{-x})^2}\]

Now note that: \[1 - \Lambda(x) = 1 - \frac{1}{1+e^{-x}} = \frac{1+e^{-x}-1}{1+e^{-x}} = \frac{e^{-x}}{1+e^{-x}}\]

Therefore: \[\Lambda(x)(1-\Lambda(x)) = \frac{1}{1+e^{-x}} \cdot \frac{e^{-x}}{1+e^{-x}} = \frac{e^{-x}}{(1+e^{-x})^2} = \frac{d}{dx}\Lambda(x)\]

  1. \(h_{\text{logit}}(x) = \frac{d}{dx}\log\Lambda(x) = 1-\Lambda(x)\). (5 points)

Using the chain rule and the result from part (a): \[\frac{d}{dx}\log\Lambda(x) = \frac{1}{\Lambda(x)} \cdot \frac{d\Lambda(x)}{dx} = \frac{1}{\Lambda(x)} \cdot \Lambda(x)(1-\Lambda(x)) = 1-\Lambda(x)\]

  1. \(H_{\text{logit}}(x) = -\frac{d^2}{dx^2}\log\Lambda(x) = \Lambda(x)(1-\Lambda(x))\). (5 points)

From part (b), we have \(\frac{d}{dx}\log\Lambda(x) = 1-\Lambda(x)\).

Taking the derivative again: \[\frac{d^2}{dx^2}\log\Lambda(x) = \frac{d}{dx}[1-\Lambda(x)] = -\frac{d\Lambda(x)}{dx}\]

From part (a), we know \(\frac{d\Lambda(x)}{dx} = \Lambda(x)(1-\Lambda(x))\).

Therefore: \[H_{\text{logit}}(x) = -\frac{d^2}{dx^2}\log\Lambda(x) = -[-\Lambda(x)(1-\Lambda(x))] = \Lambda(x)(1-\Lambda(x))\]

  1. \(| H_{\text{logit}}(x) | \le 1\). (5 points)

From part (c), \(H_{\text{logit}}(x) = \Lambda(x)(1-\Lambda(x))\).

Since \(\Lambda(x) = \frac{1}{1+e^{-x}}\), we have \(0 < \Lambda(x) < 1\) for all \(x\).

Therefore, \(0 < 1-\Lambda(x) < 1\) as well.

The product \(\Lambda(x)(1-\Lambda(x))\) is maximized when \(\Lambda(x) = 1-\Lambda(x) = 0.5\), which gives: \[H_{\text{logit}}(x) \le 0.5 \times 0.5 = 0.25 < 1\]

Since \(H_{\text{logit}}(x) > 0\) for all \(x\) (as both factors are positive), we have: \[0 < H_{\text{logit}}(x) \le 0.25 < 1\]

Therefore, \(|H_{\text{logit}}(x)| \le 1\).

3. Binary Choice Estimation

(40 points total)

The data for Question 3 come from Grogger, J. (1991), “Certainty vs. Severity of Punishment,” Economic Inquiry 29, 297-309. The dataset grogger.dta is available on Canvas.

Dataset description:

  • Unit of observation: Individual men
  • Sample: Men arrested at least once prior to 1986
  • Time period: 1986 arrests and prior criminal history
  • Key variables:
    • narr86: Number of arrests in 1986
    • pcnv: Proportion of prior arrests resulting in conviction (probability of conviction)
    • avgsen: Average sentence length (months) from prior convictions
    • tottime: Total time spent in prison since age 18 (months)
    • ptime86: Months spent in prison during 1986
    • inc86: Legal income in 1986 (hundreds of dollars)
    • black: =1 if black
    • hispan: =1 if Hispanic
    • born60: =1 if born in 1960 or later
Code
# Load data
crime <- haven::read_dta(here::here("assignment", "data", "grogger.dta"))

# Create binary arrest variable
crime <- crime %>%
  mutate(arr86 = ifelse(narr86 > 0, 1, 0))
  1. Define a binary variable, \(arr86\), equal to unity if a man was arrested at least once during 1986, and zero otherwise. Estimate an LPM relating \(arr86\) to \(pcnv\), \(avgsen\), \(tottime\), \(ptime86\), \(inc86\), \(black\), \(hispan\), and \(born60\). Report the usual and heteroskedasticity-robust standard errors. What is the estimated effect on the probability of arrest if \(pcnv\) goes from .25 to .75? (8 points)
Code
# Estimate LPM
lpm <- lm(arr86 ~ pcnv + avgsen + tottime + ptime86 + inc86 + 
          black + hispan + born60, data = crime)

# Get robust standard errors
robust_se <- sqrt(diag(vcovHC(lpm, type = "HC1")))

# Create comparison table
stargazer(lpm, lpm,
          se = list(NULL, robust_se),
          type = "text",
          column.labels = c("Standard SE", "Robust SE"),
          model.names = FALSE,
          title = "Linear Probability Model Results")

Linear Probability Model Results
============================================================
                                    Dependent variable:     
                                ----------------------------
                                           arr86            
                                  Standard SE    Robust SE  
                                      (1)           (2)     
------------------------------------------------------------
pcnv                               -0.154***     -0.154***  
                                    (0.021)       (0.019)   
                                                            
avgsen                               0.004         0.004    
                                    (0.006)       (0.006)   
                                                            
tottime                             -0.002         -0.002   
                                    (0.005)       (0.004)   
                                                            
ptime86                            -0.022***     -0.022***  
                                    (0.004)       (0.003)   
                                                            
inc86                              -0.001***     -0.001***  
                                   (0.0001)       (0.0001)  
                                                            
black                              0.162***       0.162***  
                                    (0.024)       (0.026)   
                                                            
hispan                             0.089***       0.089***  
                                    (0.021)       (0.021)   
                                                            
born60                               0.003         0.003    
                                    (0.017)       (0.017)   
                                                            
Constant                           0.361***       0.361***  
                                    (0.016)       (0.017)   
                                                            
------------------------------------------------------------
Observations                         2,725         2,725    
R2                                   0.082         0.082    
Adjusted R2                          0.080         0.080    
Residual Std. Error (df = 2716)      0.429         0.429    
F Statistic (df = 8; 2716)         30.485***     30.485***  
============================================================
Note:                            *p<0.1; **p<0.05; ***p<0.01

The estimated effect on the probability of arrest when pcnv increases from 0.25 to 0.75 is approximately -0.0772. This means that a 0.50 increase in the proportion of prior convictions is associated with a -7.72 percentage point change in the probability of arrest in 1986. The robust standard errors are generally larger than the standard errors, accounting for heteroskedasticity in the LPM.

  1. Test the joint significance of \(avgsen\) and \(tottime\), using a nonrobust and robust test. (6 points)
Code
# Nonrobust F-test
f_test <- linearHypothesis(lpm, c("avgsen = 0", "tottime = 0"))
print(f_test)
Linear hypothesis test

Hypothesis:
avgsen = 0
tottime = 0

Model 1: restricted model
Model 2: arr86 ~ pcnv + avgsen + tottime + ptime86 + inc86 + black + hispan + 
    born60

  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1   2718 500.91                           
2   2716 500.84  2   0.06606 0.1791  0.836
Code
# Robust F-test
f_test_robust <- linearHypothesis(lpm, c("avgsen = 0", "tottime = 0"),
                                  vcov = vcovHC(lpm, type = "HC1"))
print(f_test_robust)
Linear hypothesis test

Hypothesis:
avgsen = 0
tottime = 0

Model 1: restricted model
Model 2: arr86 ~ pcnv + avgsen + tottime + ptime86 + inc86 + black + hispan + 
    born60

Note: Coefficient covariance matrix supplied.

  Res.Df Df      F Pr(>F)
1   2718                 
2   2716  2 0.1839  0.832

The nonrobust F-test gives an F-statistic of 0.179 with a p-value of 0.836. The robust F-test gives an F-statistic of 0.184 with a p-value of 0.832. Both tests suggest that we fail to reject the null hypothesis that avgsen and tottime are jointly insignificant at the 5% level. In fact, with p-values above 0.80, these variables appear to have virtually no joint effect on arrest probability.

  1. Now estimate the model by probit. At the average values of \(avgsen\), \(tottime\), \(inc86\), and \(ptime86\) in the sample, and with \(black=1\), \(hispan=0\), and \(born60=1\), what is the estimated effect on the probability of arrest if \(pcnv\) goes from .25 to .75? Compare this result with the answer from part a. (8 points)
Code
# Estimate probit model
probit <- glm(arr86 ~ pcnv + avgsen + tottime + ptime86 + inc86 + 
              black + hispan + born60, 
              data = crime, family = binomial(link = "probit"))

# Create comparison table
stargazer(lpm, probit,
          se = list(robust_se, NULL),
          type = "text",
          column.labels = c("LPM", "Probit Model"),
          model.names = FALSE,
          title = "Model Results")

Model Results
=========================================================
                             Dependent variable:         
                    -------------------------------------
                                    arr86                
                              LPM            Probit Model
                              (1)                (2)     
---------------------------------------------------------
pcnv                       -0.154***          -0.553***  
                            (0.019)            (0.072)   
                                                         
avgsen                       0.004              0.013    
                            (0.006)            (0.021)   
                                                         
tottime                      -0.002             -0.008   
                            (0.004)            (0.017)   
                                                         
ptime86                    -0.022***          -0.081***  
                            (0.003)            (0.017)   
                                                         
inc86                      -0.001***          -0.005***  
                            (0.0001)           (0.0005)  
                                                         
black                       0.162***           0.467***  
                            (0.026)            (0.072)   
                                                         
hispan                      0.089***           0.291***  
                            (0.021)            (0.065)   
                                                         
born60                       0.003              0.011    
                            (0.017)            (0.056)   
                                                         
Constant                    0.361***          -0.314***  
                            (0.017)            (0.051)   
                                                         
---------------------------------------------------------
Observations                 2,725              2,725    
R2                           0.082                       
Adjusted R2                  0.080                       
Log Likelihood                                -1,483.641 
Akaike Inf. Crit.                             2,985.281  
Residual Std. Error    0.429 (df = 2716)                 
F Statistic         30.485*** (df = 8; 2716)             
=========================================================
Note:                         *p<0.1; **p<0.05; ***p<0.01
Code
# Calculate averages for continuous variables
avg_avgsen <- mean(crime$avgsen, na.rm = TRUE)
avg_tottime <- mean(crime$tottime, na.rm = TRUE)
avg_inc86 <- mean(crime$inc86, na.rm = TRUE)
avg_ptime86 <- mean(crime$ptime86, na.rm = TRUE)

# Create data for prediction at pcnv = 0.25
newdata_25 <- data.frame(
  pcnv = 0.25, avgsen = avg_avgsen, tottime = avg_tottime,
  ptime86 = avg_ptime86, inc86 = avg_inc86,
  black = 1, hispan = 0, born60 = 1
)

# Create data for prediction at pcnv = 0.75
newdata_75 <- data.frame(
  pcnv = 0.75, avgsen = avg_avgsen, tottime = avg_tottime,
  ptime86 = avg_ptime86, inc86 = avg_inc86,
  black = 1, hispan = 0, born60 = 1
)

# Predict probabilities
prob_25 <- predict(probit, newdata = newdata_25, type = "response")
prob_75 <- predict(probit, newdata = newdata_75, type = "response")

# Calculate effect
probit_effect <- prob_75 - prob_25

At the specified covariate values, the probit model predicts an arrest probability of 0.3979 when pcnv = 0.25 and 0.2962 when pcnv = 0.75, for an effect of -0.1017 (-10.17 percentage points). The LPM estimates a -7.72 percentage point effect. The probit effect is larger in magnitude because probit accounts for the nonlinear relationship between covariates and probability, while the LPM assumes a constant linear effect.

  1. For the probit model estimated in part c, obtain the percent correctly predicted. What is the percent correctly predicted when \(narr86=0\)? When \(narr86=1\)? What do you make of these findings? (8 points)
Code
# Get predicted probabilities for all observations
crime$pred_prob <- predict(probit, type = "response")

# Use 0.5 as cutoff for prediction
crime$pred_arr86 <- ifelse(crime$pred_prob > 0.5, 1, 0)

# Overall percent correctly predicted
overall_correct <- mean(crime$pred_arr86 == crime$arr86, na.rm = TRUE)

# Percent correct when narr86 = 0 (arr86 = 0)
correct_0 <- mean(crime$pred_arr86[crime$arr86 == 0] == 0, na.rm = TRUE)

# Percent correct when narr86 >= 1 (arr86 = 1)
correct_1 <- mean(crime$pred_arr86[crime$arr86 == 1] == 1, na.rm = TRUE)

# Create summary table
accuracy_table <- data.frame(
  Category = c("Overall", "When arr86=0", "When arr86=1"),
  Percent_Correct = c(overall_correct, correct_0, correct_1) * 100
)

knitr::kable(accuracy_table, digits = 2,
             col.names = c("Category", "Percent Correct"),
             caption = "Probit Model Prediction Accuracy")
Probit Model Prediction Accuracy
Category Percent Correct
Overall 72.70
When arr86=0 96.60
When arr86=1 10.33
Code
# Confusion matrix
table_pred <- table(Predicted = crime$pred_arr86, Actual = crime$arr86)
print(table_pred)
         Actual
Predicted    0    1
        0 1903  677
        1   67   78

The overall percent correctly predicted is 72.7%. When arr86=0, the model correctly predicts 96.6% of cases. When arr86=1, the model correctly predicts 10.33% of cases.

These findings suggest that the model is much better at predicting non-arrest (arr86=0) than arrest (arr86=1). This is common in binary choice models when one outcome is much more frequent than the other. The model tends to predict the more common outcome (no arrest) for most observations. This indicates that while the model has high overall accuracy, it has limited ability to predict actual arrests, which may be the outcome of greater interest.

  1. In the probit model, add the terms \(pcnv^2\), \(ptime86^2\), and \(inc86^2\) to the model. Are these individually or jointly significant? Describe the estimated relationship between the probability of arrest and pcnv. In particular, at what point does the probability of conviction have a negative effect on probability of arrest? (10 points)
Code
# Create squared terms
crime <- crime %>%
  mutate(
    pcnv2 = pcnv^2,
    ptime862 = ptime86^2,
    inc862 = inc86^2
  )

# Estimate expanded probit model
probit_quad <- glm(arr86 ~ pcnv + pcnv2 + avgsen + tottime + 
                   ptime86 + ptime862 + inc86 + inc862 + 
                   black + hispan + born60, 
                   data = crime, family = binomial(link = "probit"))

# Compare models with stargazer
stargazer(lpm, probit, probit_quad,
          se = list(robust_se, NULL),
          type = "text",
          column.labels = c("LPM", "Probit Model", "Probit with Quadratics"),
          model.names = FALSE,
          title = "Model Results")

Model Results
================================================================================
                                        Dependent variable:                     
                    ------------------------------------------------------------
                                               arr86                            
                              LPM            Probit Model Probit with Quadratics
                              (1)                (2)               (3)          
--------------------------------------------------------------------------------
pcnv                       -0.154***          -0.553***           0.217         
                            (0.019)            (0.072)           (0.259)        
                                                                                
pcnv2                                                           -0.857***       
                                                                 (0.270)        
                                                                                
avgsen                       0.004              0.013             0.014         
                            (0.006)            (0.021)           (0.026)        
                                                                                
tottime                      -0.002             -0.008            -0.018        
                            (0.004)            (0.017)           (0.021)        
                                                                                
ptime86                    -0.022***          -0.081***          0.745***       
                            (0.003)            (0.017)           (0.144)        
                                                                                
ptime862                                                        -0.104***       
                                                                 (0.022)        
                                                                                
inc86                      -0.001***          -0.005***         -0.006***       
                            (0.0001)           (0.0005)          (0.001)        
                                                                                
inc862                                                          0.00001**       
                                                                (0.00000)       
                                                                                
black                       0.162***           0.467***          0.437***       
                            (0.026)            (0.072)           (0.073)        
                                                                                
hispan                      0.089***           0.291***          0.266***       
                            (0.021)            (0.065)           (0.067)        
                                                                                
born60                       0.003              0.011             -0.015        
                            (0.017)            (0.056)           (0.057)        
                                                                                
Constant                    0.361***          -0.314***         -0.337***       
                            (0.017)            (0.051)           (0.056)        
                                                                                
--------------------------------------------------------------------------------
Observations                 2,725              2,725             2,725         
R2                           0.082                                              
Adjusted R2                  0.080                                              
Log Likelihood                                -1,483.641        -1,439.800      
Akaike Inf. Crit.                             2,985.281         2,903.601       
Residual Std. Error    0.429 (df = 2716)                                        
F Statistic         30.485*** (df = 8; 2716)                                    
================================================================================
Note:                                                *p<0.1; **p<0.05; ***p<0.01
Code
# Test joint significance of squared terms
library(lmtest)
lrtest(probit, probit_quad)
Likelihood ratio test

Model 1: arr86 ~ pcnv + avgsen + tottime + ptime86 + inc86 + black + hispan + 
    born60
Model 2: arr86 ~ pcnv + pcnv2 + avgsen + tottime + ptime86 + ptime862 + 
    inc86 + inc862 + black + hispan + born60
  #Df  LogLik Df Chisq Pr(>Chisq)    
1   9 -1483.6                        
2  12 -1439.8  3 87.68  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# Extract coefficients for interpretation
coef_pcnv <- coef(probit_quad)["pcnv"]
coef_pcnv2 <- coef(probit_quad)["pcnv2"]
coef_ptime862 <- coef(probit_quad)["ptime862"]
coef_inc862 <- coef(probit_quad)["inc862"]

# Extract p-values
summary_quad <- summary(probit_quad)
pval_pcnv2 <- summary_quad$coefficients["pcnv2", "Pr(>|z|)"]
pval_ptime862 <- summary_quad$coefficients["ptime862", "Pr(>|z|)"]
pval_inc862 <- summary_quad$coefficients["inc862", "Pr(>|z|)"]

# Turning point: where beta_pcnv + 2*beta_pcnv2 * pcnv = 0
turning_point <- -coef_pcnv / (2 * coef_pcnv2)

# Plot the relationship
pcnv_seq <- seq(0, 1, length.out = 100)
newdata_plot <- data.frame(
  pcnv = pcnv_seq,
  pcnv2 = pcnv_seq^2,
  avgsen = avg_avgsen,
  tottime = avg_tottime,
  ptime86 = avg_ptime86,
  ptime862 = avg_ptime86^2,
  inc86 = avg_inc86,
  inc862 = avg_inc86^2,
  black = 1,
  hispan = 0,
  born60 = 1
)

pred_probs <- predict(probit_quad, newdata = newdata_plot, type = "response")

plot(pcnv_seq, pred_probs, type = "l", lwd = 2,
     xlab = "Prior Conviction Rate (pcnv)", 
     ylab = "Predicted Probability of Arrest",
     main = "Relationship between pcnv and Arrest Probability")
grid()
abline(v = turning_point, col = "red", lty = 2, lwd = 2)
text(turning_point, max(pred_probs) * 0.9, 
     sprintf("Turning point\npcnv = %.3f", turning_point),
     pos = 4, col = "red")

The likelihood ratio test gives a chi-square statistic of 87.68 with 3 degrees of freedom (p < 0.001), indicating that the squared terms are jointly highly significant.

Individually examining the coefficients:

  • pcnv²: coefficient = -0.8571, p = 0.0015 (significant)
  • ptime86²: coefficient = -0.1035, p < 0.001 (significant)
  • inc86²: coefficient = 8.747e-06, p = 0.035 (significant)

The relationship between pcnv and arrest probability is nonlinear. The coefficient on pcnv is 0.2168 (positive but not significant), while the coefficient on pcnv² is -0.8571 (negative and significant). This creates an inverted-U relationship.

The turning point occurs at pcnv = 0.1265. Below this threshold, increases in conviction probability are associated with increases in arrest probability (contrary to deterrence theory). However, beyond pcnv = 0.1265, further increases in conviction probability are associated with decreases in arrest probability, consistent with a strong deterrence effect.

This suggests that deterrence effects may be weak or absent for individuals with low conviction rates, but become substantial for those with higher conviction histories. Alternatively, this could reflect selection effects where individuals with very high conviction rates modify their behavior to avoid detection.