Statistical Inference

2025-04-10

Statistical Inference

  • Statistical Inference

  • Hypothesis Testing

  • Decision Making

  • Power Analysis

  • Confidence Intervals

  • Linear Regression Inference in R

  • Linear Regression Example

  • Logistic Regression Inference in R

  • Logistic Regression Example

What is Statistical Inference?

  • Drawing conclusions about a population based on a sample
  • Population = entire group
  • Sample = subset

Two Main Types of Inference

  1. Estimation
  2. Hypothesis Testing

Estimation

  • Point Estimate: Single best guess (e.g., \(\hat \beta_1\))
  • Interval Estimate: Range likely to contain the true value

Hypothesis Testing

  • \(H_0\): No effect or difference
  • \(H_1\): Some effect or difference
  • We use sample data to support or reject \(H_0\)

Key Concepts and Tools

  • Sampling Distribution
  • Central Limit Theorem
  • Standard Error

p-values

  • Probability of observing data as extreme as this if \(H_0\) is true

Misinterpretation of p-values is common. Emphasize: low p-value means data is unusual under \(H_0\).

Confidence Intervals

  • A range where we expect the true value to fall

Hypothesis Testing

  • Statistical Inference

  • Hypothesis Testing

  • Decision Making

  • Power Analysis

  • Confidence Intervals

  • Linear Regression Inference in R

  • Linear Regression Example

  • Logistic Regression Inference in R

  • Logistic Regression Example

Hypothesis Tests

Hypothesis tests are used to test whether claims are valid or not. This is conducted by collecting data, setting the Null and Alternative Hypothesis.

Null Hypothesis \(H_0\)

The null hypothesis is the claim that is initially believed to be true. For the most part, it is always equal to the hypothesized value.

Alternative Hypothesis \(H_a\)

The alternative hypothesis contradicts the null hypothesis.

Example of Null and Alternative Hypothesis

We want to see if \(\beta\) is different from \(\beta^*\)

Null Hypothesis Alternative Hypothesis
\(H_0: \beta=\beta^*\) \(H_a: \beta\ne\beta^*\)
\(H_0: \beta\le\beta^*\) \(H_a: \beta>\beta^*\)
\(H_0: \beta\ge\beta^*\) \(H_0: \beta<\beta^*\)

One-Side vs Two-Side Hypothesis Tests

Notice how there are 3 types of null and alternative hypothesis, The first type of hypothesis (\(H_a:\beta\ne\beta^*\)) is considered a 2-sided hypothesis because the rejection region is located in 2 regions. The remaining two hypotheses are considered 1-sided because the rejection region is located on one side of the distribution.

Null Hypothesis Alternative Hypothesis Side
\(H_0: \beta=\beta^*\) \(H_a: \beta\ne\beta^*\) 2-Sided
\(H_0: \beta\le\beta^*\) \(H_a: \beta>\beta^*\) 1-Sided
\(H_0: \beta\ge\beta^*\) \(H_0: \beta<\beta^*\) 1-Sided

Hypothesis Testing Steps

  1. State \(H_0\) and \(H_1\)
  2. Choose \(\alpha\)
  3. Compute confidence interval/p-value
  4. Make a decision

Rejection Region

Rejection Region

Code
alpha <- 0.05

# Critical values for two-tailed test
z_critical <- qnorm(1 - alpha / 2)

# Create data for the normal curve
x <- seq(-4, 4, length = 1000)
y <- dnorm(x)

df <- data.frame(x = x, y = y)

ggplot(df, aes(x = x, y = y)) +
  geom_line(color = "deepskyblue", size = 1) +
  geom_area(data = subset(df, x <= -z_critical), aes(y = y), fill = "firebrick", alpha = 0.5) +
  geom_area(data = subset(df, x >= z_critical), aes(y = y), fill = "firebrick", alpha = 0.5) +
  geom_vline(xintercept = c(-z_critical, z_critical), linetype = "dashed", color = "black") +
  theme_bw()

Decision Making

  • Statistical Inference

  • Hypothesis Testing

  • Decision Making

  • Power Analysis

  • Confidence Intervals

  • Linear Regression Inference in R

  • Linear Regression Example

  • Logistic Regression Inference in R

  • Logistic Regression Example

Decision Making

Hypothesis Testing will force you to make a decision: Reject \(H_0\) OR Fail to Reject \(H_0\)

Reject \(H_0\): The effect seen is not due to random chance, there is a process making contributing to the effect.

Fail to Reject \(H_0\): The effect seen is due to random chance. Random sampling is the reason why an effect is displayed, not an underlying process.

Decision Making: P-Value

The p-value approach is one of the most common methods to report significant results. It is easier to interpret the p-value because it provides the probability of observing our test statistics, or something more extreme, given that the null hypothesis is true.

If \(p < \alpha\), then you reject \(H_0\); otherwise, you will fail to reject \(H_0\).

Decision Making: Confidence Interval Approach

The confidence interval approach can evaluate a hypothesis test where the alternative hypothesis is \(\beta\ne\beta^*\). The bootstrapping approach will result in a lower and upper bound denoted as: \((LB, UB)\).

If \(\beta^*\) is in \((LB, UB)\), then you fail to reject \(H_0\). If \(\beta^*\) is not in \((LB,UB)\), then you reject \(H_0\).

Significance Level \(\alpha\)

The significance level \(\alpha\) is the probability you will reject the null hypothesis given that it was true.

In other words, \(\alpha\) is the error rate that a research controls.

Typically, we want this error rate to be small (\(\alpha = 0.05\)).

Power Analysis

  • Statistical Inference

  • Hypothesis Testing

  • Decision Making

  • Power Analysis

  • Confidence Intervals

  • Linear Regression Inference in R

  • Linear Regression Example

  • Logistic Regression Inference in R

  • Logistic Regression Example

What is Statistical Power

  • Statistical Power is the probability of correctly rejecting a false null hypothesis.
  • In other words, it’s the chance of detecting a real effect when it exists.

Why Power Matters

  • Low power → high risk of Type II Error (false negatives)
  • High power → better chance of finding true effects
  • Common threshold: 80% power

Errors in Inference

Type I Reject \(H_0\) when true False positive
Type II Don’t reject \(H_0\) when false False negative
Power \(1 - P(\text{Type II})\) Detecting a true effect

Type I Error (False Positive)

  • Rejecting \(H_0\) when it is actually true
  • Probability = \(\alpha\) (significance level)

Type II Error (False Negative)

  • Failing to reject \(H_0\) when it is actually false
  • Probability = \(\beta\)
  • Power = \(1 - \beta\)

Balancing Errors

  • Lowering \(\alpha\) reduces Type I errors, but increases risk of Type II errors.
  • To reduce both:
    • Increase sample size
    • Use more appropriate statistical tests

What Affects Power?

  1. Effect Size
    • Bigger effects are easier to detect
  2. Sample Size (\(n\))
    • Larger samples reduce standard error
  3. Significance Level (\(\alpha\))
    • Higher \(\alpha\) increases power (but riskier!)
  4. Variability
    • Less noise in data = better power

Boosting Power

  • Power = Probability of rejecting \(H_0\) when it’s false
  • Helps avoid Type II Errors
  • Driven by:
    • Sample size
    • Effect size
    • \(\alpha\)
    • Variability
  • Aim for 80% or higher

Confidence Intervals

  • Statistical Inference

  • Hypothesis Testing

  • Decision Making

  • Power Analysis

  • Confidence Intervals

  • Linear Regression Inference in R

  • Linear Regression Example

  • Logistic Regression Inference in R

  • Logistic Regression Example

Confidence Intervals

  • A confidence interval gives a range of plausible values for a population parameter.
  • It reflects uncertainty in point estimates from sample data.

Interpretation

“We are 95% confident that the true mean lies between A and B.”

  • This does not mean there’s a 95% chance the mean is in that interval.
  • It means: if we repeated the sampling process many times, 95% of the intervals would contain the true value.

Factors Affecting CI Width

  • Sample size (\(n\)): larger \(n\) → narrower CI
  • Standard deviation (\(s\) or \(\sigma\)): more variability → wider CI
  • Confidence level: higher confidence → wider CI

Linear Regression Inference in R

  • Statistical Inference

  • Hypothesis Testing

  • Decision Making

  • Power Analysis

  • Confidence Intervals

  • Linear Regression Inference in R

  • Linear Regression Example

  • Logistic Regression Inference in R

  • Logistic Regression Example

Conducting HT of \(\beta_j\)

Code
xlm <- lm(Y ~ X, data = DATA)
summary(xlm)
  • xlm: name of the stored model
  • Y: Name of the outcome variable in DATA
  • X: Name of the Predictor Variable(s) in DATA
  • DATA: Name of the data set

Example

Code
m1 <- lm(body_mass_g ~ species + flipper_length_mm, penguins)
summary(m1)
#> 
#> Call:
#> lm(formula = body_mass_g ~ species + flipper_length_mm, data = penguins)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -927.70 -254.82  -23.92  241.16 1191.68 
#> 
#> Coefficients:
#>                    Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)       -4031.477    584.151  -6.901 2.55e-11 ***
#> speciesChinstrap   -206.510     57.731  -3.577 0.000398 ***
#> speciesGentoo       266.810     95.264   2.801 0.005392 ** 
#> flipper_length_mm    40.705      3.071  13.255  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 375.5 on 338 degrees of freedom
#>   (2 observations deleted due to missingness)
#> Multiple R-squared:  0.7826, Adjusted R-squared:  0.7807 
#> F-statistic: 405.7 on 3 and 338 DF,  p-value: < 2.2e-16

Confidence Interval

Code
confint(xlm, level = LEVEL)
  • xlm: Name of the model saved in R
  • LEVEL: A number between 0 and 1 to specify confidence level

Example

Code
confint(m1, level = 0.90)
#>                           5 %        95 %
#> (Intercept)       -4994.96108 -3067.99270
#> speciesChinstrap   -301.72956  -111.29068
#> speciesGentoo       109.68404   423.93517
#> flipper_length_mm    35.64014    45.77066

Linear Regression Example

  • Statistical Inference

  • Hypothesis Testing

  • Decision Making

  • Power Analysis

  • Confidence Intervals

  • Linear Regression Inference in R

  • Linear Regression Example

  • Logistic Regression Inference in R

  • Logistic Regression Example

Wage Data Example

The Wage data set contains data on 3000 male workers in the atlantic region. We are interested if there is a significant effect on the outcome wage based on the predictor variable age, adjusting for marital status (maritl), race (race), and education level (education).

Red Wine Data

The Wine Quality data set contains data on information on both red and white wine from North Portugal. We are interested in seeing if density of the red wine (predictor variable) affects the quality (outcome variable), adjusting for alcohol, p_h, residual_sugar, and fixed_acidity.

Code
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
wine <- read_delim(url, delim = ";")

Logistic Regression Inference in R

  • Statistical Inference

  • Hypothesis Testing

  • Decision Making

  • Power Analysis

  • Confidence Intervals

  • Linear Regression Inference in R

  • Linear Regression Example

  • Logistic Regression Inference in R

  • Logistic Regression Example

Conducting HT of \(\beta_j\)

Code
xlm <- glm(Y ~ X, data = DATA, family = binomial())
summary(xlm)
  • xlm: name of the stored model
  • Y: Name of the outcome variable in DATA
  • X: Name of the Predictor Variable(s) in DATA
  • DATA: Name of the data set

Example

Code
m1 <- glm(death ~ recur + number + size, bladder1, family = binomial())
summary(m1)
#> 
#> Call:
#> glm(formula = death ~ recur + number + size, family = binomial(), 
#>     data = bladder1)
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -0.8525259  0.4462559  -1.910 0.056082 .  
#> recur       -0.3897480  0.1062848  -3.667 0.000245 ***
#> number       0.0008451  0.1124503   0.008 0.994004    
#> size        -0.2240419  0.1626749  -1.377 0.168439    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 189.38  on 293  degrees of freedom
#> Residual deviance: 166.43  on 290  degrees of freedom
#> AIC: 174.43
#> 
#> Number of Fisher Scoring iterations: 6

Confidence Interval

Code
confint(xlm, level = LEVEL)
  • xlm: Name of the model saved in R
  • LEVEL: A number between 0 and 1 to specify confidence level

Example

Code
confint(m1, level = 0.95)
#>                  2.5 %      97.5 %
#> (Intercept) -1.7353779  0.02529523
#> recur       -0.6217831 -0.20078281
#> number      -0.2421738  0.20731479
#> size        -0.5880581  0.06061498

Confidence Interval for Odds Ratio

Code
exp(confint(m1, level = 0.95))
#>                 2.5 %    97.5 %
#> (Intercept) 0.1763335 1.0256179
#> recur       0.5369861 0.8180901
#> number      0.7849197 1.2303698
#> size        0.5554048 1.0624898

Logistic Regression Example

  • Statistical Inference

  • Hypothesis Testing

  • Decision Making

  • Power Analysis

  • Confidence Intervals

  • Linear Regression Inference in R

  • Linear Regression Example

  • Logistic Regression Inference in R

  • Logistic Regression Example

Breast Cancer Data

The Breast Cancer data set contains information about image diagnosis of individuals from Wisconsin. We are interested if breast cancer diagnosis (outcome variable; Benign or Malignant), is affected by tumor radius, adjusting for texture, perimeter, and smoothness.

Code
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data"
bc <- read.csv(url, header = FALSE)

# Add column names
colnames(bc) <- c("id", "diagnosis", paste0("V", 3:32))

# Convert diagnosis to factor
bc$diagnosis <- factor(bc$diagnosis, levels = c("B", "M"), labels = c("Benign", "Malignant"))

Bank Note Classification

The Bank Note data set contains information about bank note authentication based on images. We are interested in seeing if class (outcome variable; real or fake) is associated by image skewness (predictor), adjusting for variance, and entropy.

Code
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/00267/data_banknote_authentication.txt"
bank <- read.csv(url, header = FALSE)

colnames(bank) <- c("variance", "skewness", "curtosis", "entropy", "class")
bank$class <- factor(bank$class, levels = c(0, 1), labels = c("Genuine", "Forged"))