Multivariable Linear Regression

2025-03-04

Modeling Outcomes

  • Modeling Outcomes

  • Multivariable Linear Regression

  • Adjusted \(R^2\)

  • Model Selection

Explaining Variation

This is the process where we try to reduce the variation with the use of other variables.

Can be thought of as getting it less wrong when taking an educated guess.

Taylor Swift’s Songs Danceability

Code
ggplot(taylor_album_songs, aes(danceability)) +
  geom_density()

Danceability by mode_name

Code
taylor_album_songs |> 
  drop_na(mode_name) |> 
  ggplot(aes(danceability)) +
  geom_density() +
  facet_wrap(~ mode_name)

Danceability by Valence

Code
ggplot(taylor_album_songs, aes(valence, danceability)) +
  geom_point() +
  geom_smooth(method = "lm")

Danceability by Energy

Code
ggplot(taylor_album_songs, aes(energy, danceability)) +
  geom_point() +
  geom_smooth(method = "lm")

Modelling Danceability

How do we use all the variables to explain danceability?

Multivariable Linear Regression

  • Modeling Outcomes

  • Multivariable Linear Regression

  • Adjusted \(R^2\)

  • Model Selection

MLR

Multivariable Linear Regression (MLR) is used to model an outcome variable (\(Y\)) by multiple predictor variables (\(X_1, X_2, \ldots, X_p\)).

Using MLR, you propose that the ouctome variable was constructed from a set of predictors, with their corresponding regression coefficients (\(\beta\)), and a bit of error

\[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p + \varepsilon \]

\[ \varepsilon \sim DGP \]

Model Data

\[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p + \varepsilon \]

\[ \varepsilon \sim DGP \]

\[ Y_i = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \cdots + \beta_p X_{ip} + \varepsilon_i \]

\[ \varepsilon_i \sim DGP \]

Unknown Parameters

\[ Y_i = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \cdots + \beta_p X_{ip} + \varepsilon_i \]

\[ \beta_0, \beta_1, \beta_2, \beta_3, \ldots, \beta_p \]

Estimated Model

\[ Y_i = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \cdots + \beta_p X_{ip} + \varepsilon_i \]

\[ \hat Y_i = \hat\beta_0 + \hat\beta_1 X_{i1} + \hat\beta_2 X_{i2} + \cdots + \hat\beta_p X_{ip} \]

Estimating Prameters

\(\beta_0, \beta_1, \beta_2, \beta_3, \ldots, \beta_p\) are estimated by minimizing the following function:

\[ \sum^n_{i=1} (Y_i-\hat Y_i)^2 \]

Fitting a Model in R

Code
lm(Y ~ X1 + X2 + ... + Xp, data = DATA)

Modelling Danceability

Code
lm(danceability ~ mode_name + valence + energy, 
   data = taylor_album_songs)
#> 
#> Call:
#> lm(formula = danceability ~ mode_name + valence + energy, data = taylor_album_songs)
#> 
#> Coefficients:
#>    (Intercept)  mode_nameminor         valence          energy  
#>        0.53816         0.08386         0.17554        -0.05944

Estimated Model

\[ dance = 0.54 + 0.07\times Minor + 0.18 \times valence - 0.06 \times energy \]

\(\hat \beta_i\) Representation

Each regression coefficient \(\beta_i\) represents how the predictor variable changes the outcome, as it increase by 1 unit.

For categorical dummy variables, the \(\beta_i\) represents how the outcome will change when the data point belongs to that value.

\(\hat \beta_i\) Interpretation

For \(hat \beta_i\), which is the regression coefficient (slope) of \(X_i\):

As \(X_i\) increases by 1 unit, the outcome (\(Y\)) will increase/decrease by \(\hat \beta_i\) units, adjusting for all other predictor variables.

For categorical dummy variables \(X_i\):

The outcome \(Y_i\) increases/decreases by \(\beta_i\) units for category \(X_i\) compared to the reference category, adjusting for all other predictor variables.

Intepreting Minor coefficient

\[ dance = 0.54 + 0.07 Minor + 0.18 valence - 0.06 energy \] Minor song’s average danceability score is 0.07 units higher compared to Major song’s, adjusting for valence and energy.

Intepreting valence coefficient

\[ dance = 0.54 + 0.07Minor + 0.18 valence - 0.06 energy \]

As valence increases by 1 unit, danceability increases by an average of 0.18 units, adjusting for energy and type of song.

Intepreting energy coefficient

\[ dance = 0.54 + 0.07 Minor + 0.18 valence - 0.06 energy \]

As energy increases by 1 unit, danceability decreases by an average of 0.06 units, adjusting for valence and type of song.

Adjusted \(R^2\)

  • Modeling Outcomes

  • Multivariable Linear Regression

  • Adjusted \(R^2\)

  • Model Selection

\(R^2\)

Computing \(R^2\) is done by determining how much the variation in the outcome is explained by model and divided by the variation of the outcome.

\[ R^2 = \frac{\text{variation explained by model}}{\text{variation from outcome}} \\ = 1-\frac{\text{variation of residuals}}{\text{variation from outcome}} \]

\(R^2\)

Problems arise when multiple predictors are added to the model. As a new predictor is added to the model, new information is added to the model which will always reduce the variation in the residuals. Therefore, the \(R^2\) will always increase.

Problems with \(R^2\) in MLR

When the number of variables increase, the regular \(R²\) will be biased in its prediction capability when new data is obtained.

Therefore, statisticians uses the adjusted \(R^2\), that penalizes the model when more variables are added. This ensures that a variable added will have a significant effect in predicting outcomes.

Adjusted \(R^2\)

\[ R_a^2 = 1-\frac{\text{variation of residuals}}{\text{variation from outcome}}\times \frac{n-1}{n-k-1} \]

  • \(n\): Number of data points
  • \(k\): Number of predictor variables in the model

Adjusted \(R^2\) in R

Code
xlm <- lm(Y ~ X1 + X2 + ... + Xp, data = DATA)
ar2(xlm)

Example

Code
xlm <- lm(danceability ~ mode_name + valence + energy, 
   data = taylor_album_songs)
ar2(xlm)
#> [1] 0.09148268

8.1% of the variation in danceability is explained by the model.

Model Selection

  • Modeling Outcomes

  • Multivariable Linear Regression

  • Adjusted \(R^2\)

  • Model Selection

Model Selection

Model Selection is the process of obtaining a “final” model containing all the necessary predictors, and eliminating any that are not necessary.

Forward Selection

Begin with the null model (\(Y\sim 1\)) and add variables until a final model is chosen.

Backward Selection

Begin with the full model, and remove variable until the final model is chosen.

Hybrid Selection

A hybrid approach between the forward and backward building approach.

About Model Selection

Generally, it is not a good idea to conduct model selection. The predictor variables in your model should be guided by a literature review that illustrates important predictor variables in a model.