Modeling Binary Outcomes
Modeling Binary Outcomes
Mathematical Model
\[
\left(\begin{array}{c}
Yes \\
No
\end{array}\right) = \beta_0 + \beta_1X
\]
Let …
\[
Y = \left\{\begin{array}{cc}
1 & Yes \\
0 & No
\end{array}\right.
\]
Construct a Model
\[
P\left(Y = 1\right) = \beta_0 + \beta_1X
\]
Construct a Model
\[
P\left(Y = 1\right) = \frac{e^{\beta_0 + \beta_1X}}{1 + e^{\beta_0 + \beta_1X}}
\]
Construct a Model
\[
\frac{P(Y = 1)}{P(Y = 0)} = e^{\beta_0 + \beta_1X}
\]
where \(\frac{P(Y = 1)}{P(Y = 0)}\) are considered the odds of observing \(Y = 1\).
The Logistic Model
\[
\log\left\{\frac{P(Y = 1)}{P(Y = 0)}\right\} = \beta_0 + \beta_1X
\]
Notation
\[
\log\left\{\frac{P(Y = 1)}{P(Y = 0)}\right\} =
\log\left\{odds\ of \ 1\right\} =
lo(1)
\]
Logistic Regression
Logistic Regression is used to model the association between a predictor and a binary outcome variable.
This is similar Linear Regression which models the association between a predictor and a numerical outcome variable.
Logistic Regression
Logistic Regression uses the logistic model to formulate the relationship between a predictor and the outcome.
More specifically, for an outcome of Y:
\[
Y = \left\{\begin{array}{cc}
1 & \text{Category 1} \\
0 & \text{Category 2}
\end{array}\right.
\]
The predictor variable will model the probability of observing category 1 (\(P(Y=1)\))
Logistic Model
\[
\log\left\{\frac{P(Y = 1)}{P(Y = 0)}\right\} = \beta_0 + \beta_1X
\]
Regression Coefficients \(\beta\)
The regression coefficients quantify how a specific predictor changes the odds of observing the first category of the outcome (\(Y = 1\))
Estimating \(\beta\)
To obtain the numerical value for \(\beta\), denoted as \(\hat \beta\), we will be finding the values of \(\hat \beta\) that maximizes the likelihood function:
\[
L(\boldsymbol \beta) = \prod_{i=1}^n \left(\frac{e^{\beta_0 + \beta_1X}}{1 + e^{\beta_0 + \beta_1X}}\right)^{Y_i}\left(\frac{1}{1 + e^{\beta_0 + \beta_1X}}\right)^{1-Y_i}
\]
The likelihood function can be thought as the probability of observing the entire data set. Therefore, we want to choose the values the \(\beta_0\) and \(\beta_1\) that will result in the highest probability of observing the data.
Estimated Parameters
The values you obtain (\(\hat \beta\)) tell you the relationship between the a predictor variable and the log odds of observing the first category of the outcome \(Y=1\).
Exponentiating the estimate (\(e^{\hat \beta}\)) will give you the relationship between a predictor variable and the odds of observing the first category of the outcome \(Y=1\).
Interpreting \(\hat \beta\)
For a continuous predictor variable:
As X increases by 1 unit, the odds of observing the first category (\(Y = 1\)) increases by a factor of \(e^{\hat\beta}\).