Motivation
Modeling Binary Outcomes
Logistic Regression
Prediction with Models
Melanoma is a type of skin cancer that causes the cells that produce melanin to grow out of control. What makes Melanoma so dangerous is that it can metastasize to other parts of the body.
We are interested in learning how do different factors affect and individual’s chances of survival. Therefore, we are measuring patients and if they lived or died during a study period.
We will be using the Melanoma data set with the following variables: dead (died by Melanoma, 1=yes, 0= no) and thickness (tumour thickness in mm).
Motivation
Modeling Binary Outcomes
Logistic Regression
Prediction with Models
\[ \left(\begin{array}{c} Dead \\ Alive \end{array}\right) = \beta_0 + \beta_1X \]
\[ Y = \left\{\begin{array}{cc} 1 & Dead \\ 0 & Alive \end{array}\right. \]
\[ P\left(Y = 1\right) = \beta_0 + \beta_1X \]
\[ P\left(Y = 1\right) = \frac{e^{\beta_0 + \beta_1X}}{1 + e^{\beta_0 + \beta_1X}} \]
\[ \frac{P(Y = 1)}{P(Y = 0)} = e^{\beta_0 + \beta_1X} \]
where \(\frac{P(Y = 1)}{P(Y = 0)}\) are considered the odds of observing \(Y = 1\).
\[ \log\left\{\frac{P(Y = 1)}{P(Y = 0)}\right\} = \beta_0 + \beta_1X \]
Motivation
Modeling Binary Outcomes
Logistic Regression
Prediction with Models
Logistic Regression is used to model the association between a set of predictors and a binary outcome variable.
This is similar Linear Regression which models the association between a set of predictors and a numerical outcome variable.
Logistic Regression uses the logistic model to formulate the relationship between the predictors and the outcome.
More specifically, for an outcome of Y:
\[ Y = \left\{\begin{array}{cc} 1 & \text{Category 1} \\ 0 & \text{Category 2} \end{array}\right. \]
The Predictors variable will model the probability of observing category 1 (\(P(Y=1)\))
\[ \log\left\{\frac{P(Y = 1)}{P(Y = 0)}\right\} = \beta_0 + \beta_1X \]
The regression coefficients quantify how a specific predictor changes the odds of observing the first category of the outcome (\(Y = 1\))
To obtain the numerical value for \(\beta\), denoted as \(\hat \beta\), we will be finding the values of \(\hat \beta\) that maximizes the likelihood function:
\[ L(\boldsymbol \beta) = \prod_{i=1}^n \left(\frac{e^{\beta_0 + \beta_1X}}{1 + e^{\beta_0 + \beta_1X}}\right)^{Y_i}\left(\frac{1}{1 + e^{\beta_0 + \beta_1X}}\right)^{1-Y_i} \]
The likelihood function can be thought as the probability of observing the entire data set. Therefore, we want to choose the values the \(\beta_0\) and \(\beta_1\) that will result in the highest probability of observing the data.
The values you obtain (\(\hat \beta\)) tell you the relationship between the a predictor variable and the log odds of observing the first category of the outcome \(Y=1\).
Exponentiating the estimate (\(e^{\hat \beta}\)) will give you the relationship between a predictor variable and the odds of observing the first category of the outcome \(Y=1\).
For a continuous predictor variable:
As X increases by 1 unit, the odds of observing the first category (\(Y = 1\)) increases by a factor of \(e^{\hat\beta}\).
Modelling dead by thickness:
#>
#> Call: glm(formula = dead ~ thickness, family = binomial(), data = Melanoma)
#>
#> Coefficients:
#> (Intercept) thickness
#> -1.6140 0.2088
#>
#> Degrees of Freedom: 204 Total (i.e. Null); 203 Residual
#> Null Deviance: 242.4
#> Residual Deviance: 226.1 AIC: 230.1
\[ \log(odds\ of\ dying ) = -1.614 + 0.21 (thickness) \]
\[ \log(odds\ of\ dying ) = -1.614 + 0.21 (thickness) \]
As age increases by 1 year, the odds of experiencing death increases by a factor of 1.232.
Motivation
Modeling Binary Outcomes
Logistic Regression
Prediction with Models
As you can see, working with odds may be unintuitive for the average person. It will be better to predict the probability and display those results to individuals.
\[ \hat P\left(Y = 1\right) = \frac{e^{\hat\beta_0 + \hat\beta_1X}}{1 + e^{\hat\beta_0 + \hat\beta_1X}} \]
Predict the probability of observing death for a patient with a tumor thickness of 2.9.
Predict the probability of observing death for a patient who a tumor thickness of 1.9.
Predict the probability of observing death for a patient who a tumor thickness of 3.9.
| Tumor Thickness | 1.9 | 2.9 | 3.9 |
|---|---|---|---|
| Probabiltiy | 22.8% | 26.7% | 31.0% |