Maximum Likelihood & Binary Choice Models
Coding Exercises
Setup: R Packages
Key Packages
For MLE:
stats::optim(): Base R optimization for MLEmaxLik::maxLik(): Dedicated MLE package with better output
For Binary Choice Models:
glm(): Generalized linear models (logit and probit)marginaleffects::comparisons(): Calculate marginal effects
Key Dataset:
mtcars: We’ll use transmission type (am: 0=automatic, 1=manual) as our binary outcome
MLE for Bernoulli Distribution
Let’s start with estimating the probability parameter from binary data.
Generate sample data for a Bernoulli distribution with 100 observations and a probability of 0.6.
Use rbinom to generate binary data.
Set size = 1 for Bernoulli trials (each trial is a single 0/1 outcome).
# Generate Bernoulli data (binary 0/1 outcomes)
data <- rbinom(100, size = 1, prob = 0.6)Complete the log-likelihood function for a Bernoulli distribution, then use optim() to estimate p.
Bernoulli MLE: Negative Log-Likelihood
Recall the Bernoulli distribution: \(P(X = x | p) = p^x (1-p)^{1-x}\)
In R: sum(data * log(p) + (1 - data) * log(1 - p))
Return the negative log-likelihood: -ll
loglik_bernoulli <- function(p, data) {
ll <- sum(data * log(p) + (1 - data) * log(1 - p))
return(-ll)
}Bernoulli MLE: Optimization
Always specify the starting values for par (in optim). This ensures your results match the solution and makes your code reproducible. For example, use par = 0.5 for Bernoulli MLE.
Use par = 0.5, fn = loglik_bernoulli, data = data, method = "BFGS", lower = 0.01, upper = 0.99
result <- optim(
par = 0.5,
fn = loglik_bernoulli,
data = data,
method = "BFGS",
lower = 0.01,
upper = 0.99
)Logit Estimation
Load mtcars Data
Load the mtcars dataset, create a binary outcome variable Y for transmission type (am), and a predictor matrix X with an intercept and the variables mpg, hp, and wt. Fill in the blanks below.
Use data(mtcars) to load the dataset. Outcome is am, predictors are mpg, hp, wt.
data(mtcars)
Y <- mtcars$am
X <- cbind(1, mtcars$mpg, mtcars$hp, mtcars$wt)
colnames(X) <- c("intercept", "mpg", "hp", "wt")Write the Log-Likelihood Function
Recall the logistic distribution: \(P(Y=1|X) = \frac{1}{1+e^{-X\beta}}\)
Use: ll <- sum(Y * xb - log(1 + exp(xb)))
loglik_logit <- function(beta, Y, X) {
xb <- X %*% beta
ll <- sum(Y * xb - log(1 + exp(xb)))
return(ll)
}Estimate Logit Coefficients Using optim
Always specify the starting values for par in optim. This ensures your results match the solution and makes your code reproducible. For logit, use par = rep(0, 4).
Use fn = function(beta, Y, X) -loglik_logit(beta, Y, X) to minimize the negative log-likelihood with optim. Pass Y and X as arguments.
result_optim <- optim(
par = rep(0, 4),
fn = function(beta, Y, X) -loglik_logit(beta, Y, X),
Y = Y,
X = X,
method = "BFGS"
)Estimate Logit Model Using maxLik
Always specify the starting values for start in maxLik. This ensures your results match the solution and makes your code reproducible. For logit, use start = rep(0, 4).
Use the same log-likelihood function as above. maxLik maximizes, so no need to negate.
library(maxLik)
result_maxlik <- maxLik(
logLik = loglik_logit,
start = rep(0, 4),
Y = Y,
X = X
)Estimate Logit Model Using glm
Use glm(am ~ mpg + hp + wt, data = mtcars, family = binomial(link = "logit"))
model_glm <- glm(am ~ mpg + hp + wt, data = mtcars, family = binomial(link = "logit"))Binary Choice Models Post Estimation
Calculate Effect of hp Change (Manual)
Calculate the effect of increasing hp from 100 to 150 on the predicted probability, holding mpg and wt at their mean values (use model_glm).
Calculate predicted probabilities at hp = 100 and hp = 150 (with other variables at their means), then take the difference: prob_150 - prob_100
effect_hp <- prob_150 - prob_100Marginal Effect at the Mean for hp
Calculate the average marginal effect at the mean (MEM) for hp using the logit model. The AME averages the marginal effect for each observation.
Use mean(coef(model_glm)[“hp”] * g_xb)
manual_ame_hp <- mean(coef(model_glm)[‘hp’] * g_xb)
Marginal Effects with marginaleffects package
Compare the average marginal effect (AME) for hp from the manual calculation to the mean of comparisons().
Use mean(comp_ame_hp$estimate) to extract the AME from marginaleffects results.
comp_ame <- mean(comp_ame_hpv$estimate)