Sampling Distribution

2025-04-01

Sampling Distribution

  • Sampling Distribution

  • Simulating Unicorns

  • Central Limit Theorem

  • Common Sampling Distributions

  • Sampling Distributions for Regression Models

Sampling Distribution

Sampling Distribution is the idea that the statistics that you generate (slopes and intercepts) have their own data generation process.

In other words, the numerical values you obtain from the lm and glm function can be different if we got a different data set.

Some values will be more common than others. Because of this, they have their own data generating process, like the outcome of interest has it’s own data generating process.

Sampling Distributions

  • Distribution of a statistic over repeated samples

  • Different Samples yield different statistics

Standard Error

The Standard Error (SE) is the standard deviation of a statistic itself.

SE tells us how much a statistic varies from sample to sample. Smaller SE = more precision.

Modelling the Data

\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]

  • \(Y_i\): Outcome data
  • \(X_i\): Predictor data
  • \(\beta_0, \beta_1\): parameters
  • \(\varepsilon_i\): error term

Error Term

\[ \varepsilon_i \sim DGP \]

Randomness Effect

The randomness effect is a sampling phenomenom where you will get different samples everytime you sample a population.

Getting different samples means you will get different statistics.

These statistics will have a distribution on their own.

Randomness Effect 1

Code
x <- rnorm(1000)
y <- 4 + 5 * x + rnorm(1000)
bb <- round(b(lm(y ~ x),1),2)
ggplot(tibble(x = x, y = y), aes(x,y)) +
  geom_point() +
  annotate("text", 
           x = -1, y = 15, 
           label = TeX(sprintf(r'($\hat{\beta}_1 = %g$)', bb)),
           parse = TRUE,
           size = 8) 

Randomness Effect 2

Code
x <- rnorm(1000)
y <- 4 + 5 * x + rnorm(1000)
bb <- round(b(lm(y ~ x),1),2)
ggplot(tibble(x = x, y = y), aes(x,y)) +
  geom_point() +
  annotate("text", 
           x = -1, y = 15, 
           label = TeX(sprintf(r'($\hat{\beta}_1 = %g$)', bb)),
           parse = TRUE,
           size = 8) 

Randomness Effect 3

Code
x <- rnorm(1000)
y <- 4 + 5 * x + rnorm(1000)
bb <- round(b(lm(y ~ x),1),2)
ggplot(tibble(x = x, y = y), aes(x,y)) +
  geom_point() +
  annotate("text", 
           x = -1, y = 15, 
           label = TeX(sprintf(r'($\hat{\beta}_1 = %g$)', bb)),
           parse = TRUE,
           size = 8) 

Randomness Effect 4

Code
x <- rnorm(1000)
y <- 4 + 5 * x + rnorm(1000)
bb <- round(b(lm(y ~ x),1),2)
ggplot(tibble(x = x, y = y), aes(x,y)) +
  geom_point() +
  annotate("text", 
           x = -1, y = 15, 
           label = TeX(sprintf(r'($\hat{\beta}_1 = %g$)', bb)),
           parse = TRUE,
           size = 8) 

Randomness Effect 5

Code
x <- rnorm(1000)
y <- 4 + 5 * x + rnorm(1000)
bb <- round(b(lm(y ~ x),1),2)
ggplot(tibble(x = x, y = y), aes(x,y)) +
  geom_point() +
  annotate("text", 
           x = -1, y = 15, 
           label = TeX(sprintf(r'($\hat{\beta}_1 = %g$)', bb)),
           parse = TRUE,
           size = 8) 

Simulating Unicorns

  • Sampling Distribution

  • Simulating Unicorns

  • Central Limit Theorem

  • Common Sampling Distributions

  • Sampling Distributions for Regression Models

Simulating Unicorns

To better understand the variation in statistics, let’s simulate a data set of unicorn characteristics to visualize and understand the variation.

We will simulate a data set using the unicorns function and only need to specify how many unicorns you want to simulate.

Simulating Unicorn Data

Code
unicorns(10)

Unicorn Data Variables

Code
names(unicorns(10))
#>  [1] "Unicorn_ID"        "Age"               "Gender"           
#>  [4] "Color"             "Type_of_Unicorn"   "Type_of_Horn"     
#>  [7] "Horn_Length"       "Horn_Strength"     "Weight"           
#> [10] "Health_Score"      "Personality_Score" "Magical_Score"    
#> [13] "Elusiveness_Score" "Gentleness_Score"  "Nature_Score"

We will only look at Magical_Score and Nature_Score.

Magical and Nature Score

\[ Magical = 3423 + 8 \times Nature + \varepsilon \]

\[ \varepsilon \sim N(0, 3.24) \]

Simulating \(N(0, 3.24)\)

Code
rnorm(1, 0, sqrt(3.24))
#> [1] -0.02052839

Collecting

Code
unicorns(10) |> select(Nature_Score, Magical_Score)

DGP of Magical Score 1

Code
ggplot(unicorns(500), aes(Magical_Score)) +
  geom_density()

DGP of Magical Score 2

Code
ggplot(unicorns(500), aes(Magical_Score)) +
  geom_density()

Estimating \(\beta_1\) via lm

Code
u1 <- unicorns(500)
lm(Magical_Score ~ Nature_Score, u1)
#> 
#> Call:
#> lm(formula = Magical_Score ~ Nature_Score, data = u1)
#> 
#> Coefficients:
#>  (Intercept)  Nature_Score  
#>     3427.447         7.995

Collecting a new sample

Code
u2 <- unicorns(500)
lm(Magical_Score ~ Nature_Score, u2)
#> 
#> Call:
#> lm(formula = Magical_Score ~ Nature_Score, data = u2)
#> 
#> Coefficients:
#>  (Intercept)  Nature_Score  
#>     3420.995         8.002

Collecting a new sample

Code
u3 <- unicorns(500)
lm(Magical_Score ~ Nature_Score, u3)
#> 
#> Call:
#> lm(formula = Magical_Score ~ Nature_Score, data = u3)
#> 
#> Coefficients:
#>  (Intercept)  Nature_Score  
#>     3424.498         7.998

Collecting a new sample

Code
u4 <- unicorns(500)
lm(Magical_Score ~ Nature_Score, u4)
#> 
#> Call:
#> lm(formula = Magical_Score ~ Nature_Score, data = u4)
#> 
#> Coefficients:
#>  (Intercept)  Nature_Score  
#>     3422.049         8.001

Replicating Processes

Code
replicate(N, CODE)
  • N: number of times to repeat a process
  • CODE: what is to repeated

Extracting \(\hat \beta\) Coefficeints

Code
b(MODEL, INDEX)
  • MODEL: a model that can be used to extract components
  • INDEX: which component do you want to use
    • 0: Intercept
    • 1: first slope
    • 2: second slope
    • ...

Collecting 1000 Samples

Code
betas <- replicate(1000,
                   b(lm(Magical_Score ~ Nature_Score, unicorns(500)), 1))

betas
#>    [1] 7.997632 7.997991 7.998313 8.001240 7.995315 8.001904 7.998877 8.000884
#>    [9] 8.004513 7.994004 7.998884 7.999972 8.002184 8.007035 7.990211 7.993707
#>   [17] 7.996064 7.999105 8.001524 8.000990 7.998822 8.000173 7.997829 8.004573
#>   [25] 8.003037 7.994623 7.996811 7.994341 7.995896 7.998858 8.001211 8.001380
#>   [33] 7.994521 8.000795 8.001994 7.999628 8.005461 8.005022 8.004779 7.997077
#>   [41] 7.996466 7.998786 8.002236 8.000670 7.999901 7.995504 7.995391 7.996135
#>   [49] 7.995058 8.001030 7.998628 8.006220 8.004327 8.000139 7.997487 7.995011
#>   [57] 7.999462 7.997409 8.003891 7.997757 8.005654 8.000364 8.001224 8.001460
#>   [65] 7.997232 7.999980 7.996626 7.993305 8.004015 8.005500 7.997155 7.998066
#>   [73] 7.994468 7.995546 8.001601 7.999792 7.991261 8.000052 8.002325 7.996592
#>   [81] 7.997234 8.003938 8.000648 7.999911 8.001573 8.002062 7.995522 8.000860
#>   [89] 8.002080 7.998516 8.005041 7.996588 8.002593 8.003953 8.003344 8.004013
#>   [97] 8.000579 8.001397 7.989768 8.000604 7.999746 7.999621 8.000390 7.990722
#>  [105] 8.001206 8.004442 8.003553 7.997145 8.004399 8.005904 8.002679 8.000688
#>  [113] 8.001036 7.992793 7.997595 8.000565 7.999792 7.998692 8.000178 8.001915
#>  [121] 8.000520 7.997410 8.000869 8.002938 7.998961 7.997764 7.999482 7.997923
#>  [129] 7.998568 8.000316 8.002184 7.993755 7.998149 8.002428 7.997807 7.998602
#>  [137] 7.995664 8.001783 7.998465 7.991418 8.008901 7.995744 8.005698 8.003770
#>  [145] 8.001356 8.000229 8.007533 7.995410 8.007421 8.001301 8.000581 7.996916
#>  [153] 7.996796 7.994429 8.006515 7.995615 8.005645 7.999882 7.996875 8.004060
#>  [161] 8.001877 7.993342 8.003987 8.003547 8.005892 7.998167 7.999365 8.001173
#>  [169] 8.005663 8.002674 7.996723 8.000224 7.997127 7.993992 7.998085 8.000749
#>  [177] 8.002087 8.005411 7.996130 7.995419 8.000196 7.996848 7.998802 8.003887
#>  [185] 8.001192 7.996315 7.996958 7.995781 7.999735 7.999390 8.001352 8.008260
#>  [193] 7.999908 7.999516 7.998022 8.001395 8.002117 8.004142 8.009295 8.002288
#>  [201] 7.999406 7.992677 8.001016 8.009689 8.001298 7.998158 7.999456 7.997953
#>  [209] 8.001007 8.000668 8.001845 8.002892 8.002304 7.999994 8.007042 8.001809
#>  [217] 7.999995 7.997101 8.001664 8.000496 7.997758 8.001051 8.004977 8.001817
#>  [225] 8.001159 7.997231 8.002456 7.999789 7.996748 7.996686 7.998973 8.001998
#>  [233] 7.997369 8.007685 8.001897 8.005296 8.000780 7.993748 8.001021 7.999173
#>  [241] 7.996548 8.007710 7.997935 7.995372 8.000373 7.998715 8.001714 7.998022
#>  [249] 7.996228 8.002292 7.995358 8.001876 8.002586 7.994187 8.006756 7.998490
#>  [257] 8.005728 7.994582 7.996680 7.999812 7.991980 7.998519 8.009793 7.996479
#>  [265] 7.998855 8.003460 8.001306 8.003803 7.998812 8.004618 7.996986 7.999438
#>  [273] 8.001394 7.992100 8.004374 7.995438 8.000504 7.996229 8.008469 7.998516
#>  [281] 8.002244 7.995609 7.997142 8.000360 7.996050 8.000785 8.002780 8.001442
#>  [289] 8.005977 8.002599 8.001257 8.003878 7.999126 7.997683 8.002033 8.009570
#>  [297] 7.997712 7.997641 7.995195 7.996919 7.997046 8.005230 7.996316 7.999955
#>  [305] 7.999015 7.996410 8.001020 7.993133 8.000715 8.005989 8.000184 8.001312
#>  [313] 8.003210 8.001849 8.006486 7.997372 7.999987 7.997007 7.992363 8.003736
#>  [321] 7.992069 7.995016 7.999823 7.997141 8.011768 8.003432 7.998501 8.003971
#>  [329] 8.000479 7.999129 7.995160 7.994981 7.998561 7.994955 7.998409 8.007674
#>  [337] 7.997888 7.994762 8.001930 8.006570 8.001950 7.999250 8.000284 7.995737
#>  [345] 8.000319 8.006461 8.005864 8.000046 8.001012 7.999008 7.995284 8.006791
#>  [353] 8.003115 7.994044 8.004631 7.995351 7.999128 7.995745 7.999280 7.997070
#>  [361] 8.007538 7.997331 7.995013 8.005322 8.000991 7.999362 8.006051 7.997122
#>  [369] 8.004996 8.000992 7.997716 8.003837 8.000059 8.002150 8.007346 7.993704
#>  [377] 8.001270 7.999525 8.003832 7.993223 7.998136 7.996540 8.001663 8.009955
#>  [385] 8.007838 7.990166 7.994759 7.992888 7.997279 7.996033 7.996287 8.002078
#>  [393] 7.993917 8.001540 8.002114 7.999042 8.001065 7.996678 8.002962 8.004150
#>  [401] 8.002093 8.000925 7.995199 8.010969 7.999600 8.003938 7.997072 8.004924
#>  [409] 7.999060 8.002159 8.000991 7.999277 7.997814 7.999464 8.002479 7.998212
#>  [417] 8.002141 8.001678 7.995584 7.997144 8.000004 8.000549 7.999444 7.998909
#>  [425] 8.004682 7.998966 7.998172 8.003782 7.994302 8.003089 8.000923 7.998558
#>  [433] 7.998431 7.993781 7.996412 8.000274 7.998830 7.995294 8.002627 7.999700
#>  [441] 7.996207 7.996780 8.003381 8.002402 8.001564 7.998173 7.994030 8.003207
#>  [449] 7.997798 8.008855 7.998953 7.995520 7.997805 8.007508 8.004622 7.996270
#>  [457] 7.992690 7.996084 7.999052 7.995728 7.997300 7.996292 8.000436 7.994991
#>  [465] 7.996887 8.000965 8.001502 8.001619 7.997019 7.999044 8.001335 8.003610
#>  [473] 7.993896 8.007574 7.997126 8.005446 7.998837 7.999002 8.000315 8.000073
#>  [481] 8.000161 8.002095 8.001807 8.002282 8.000176 8.000345 7.999298 7.999235
#>  [489] 7.995719 7.998441 7.997762 8.000042 7.996234 8.000816 8.001066 7.998683
#>  [497] 7.997838 7.996375 8.001729 7.994191 8.000080 7.996549 8.007487 8.001098
#>  [505] 7.998409 7.996250 8.003333 7.994521 7.996367 8.005768 7.998474 8.001629
#>  [513] 7.999606 7.996648 7.995719 7.994631 8.002401 7.997761 7.997335 7.997079
#>  [521] 7.996777 7.997916 7.998305 8.002712 7.999604 7.996699 8.000274 7.998648
#>  [529] 7.997107 7.996326 7.999680 7.999334 7.994539 7.998234 8.007180 8.000673
#>  [537] 7.994288 8.004812 8.003099 7.998644 8.002055 8.000974 7.993440 8.002863
#>  [545] 7.998995 7.997504 7.998536 7.994636 8.003994 8.002990 8.001766 7.998984
#>  [553] 7.997173 8.008615 7.997733 7.997579 8.001019 8.006172 8.002001 8.008265
#>  [561] 8.001061 7.997695 7.998353 8.003952 8.000652 8.001701 8.003572 8.003257
#>  [569] 7.994804 8.003175 8.001382 8.002706 8.002183 7.998026 7.999154 7.999352
#>  [577] 7.999665 8.005176 7.990227 8.000597 8.003215 8.003437 8.001498 8.003144
#>  [585] 7.998229 7.999660 8.000639 8.008070 7.999598 8.005468 7.996505 8.001329
#>  [593] 7.999332 7.997065 7.999929 7.993784 8.005830 8.003127 8.004573 8.000348
#>  [601] 8.008174 7.995726 8.000811 7.994207 7.999449 8.001372 8.003679 7.998705
#>  [609] 8.003276 7.994131 8.001830 8.004065 7.998571 7.998798 7.998681 8.008508
#>  [617] 7.996200 8.001367 7.994609 7.994972 8.000584 8.003002 8.000861 8.003866
#>  [625] 7.999803 7.996106 8.000713 8.002230 7.999500 8.003769 7.998189 7.999470
#>  [633] 8.000123 7.999831 8.004364 7.994750 8.001197 8.006599 7.988337 7.999657
#>  [641] 7.999800 8.000910 7.995230 7.996065 8.002178 7.995075 8.004038 7.998311
#>  [649] 8.002365 7.999756 8.001890 8.003768 7.999474 7.987866 8.006968 7.993527
#>  [657] 7.992659 8.000325 7.997782 8.001551 8.002617 8.003518 7.996489 8.004587
#>  [665] 8.001517 7.997979 7.995782 8.002379 8.009516 8.004607 7.991440 7.997555
#>  [673] 7.999216 8.000457 8.002379 8.000499 7.990635 7.996781 8.005982 7.999255
#>  [681] 7.995245 7.999438 7.998389 8.003766 8.001695 7.999247 8.000565 7.996656
#>  [689] 7.996845 7.998076 8.004370 7.998120 8.002890 7.996849 8.000053 8.001132
#>  [697] 8.009077 8.008402 8.003318 7.998344 8.002449 7.997960 7.995285 7.998982
#>  [705] 7.995422 8.008223 7.995216 7.997825 8.000483 7.999852 7.999423 8.002478
#>  [713] 8.001344 7.994678 7.990713 8.003834 7.998299 7.996946 7.994607 8.000328
#>  [721] 7.996110 8.003888 7.998335 7.999073 8.000441 8.002857 7.998523 8.000739
#>  [729] 7.995230 7.995479 7.994644 8.001809 7.990181 8.005391 7.999380 7.997808
#>  [737] 8.002257 7.995072 7.999083 7.999359 8.001288 8.004255 7.997668 8.005375
#>  [745] 8.004211 7.999743 8.005616 7.998952 7.997841 8.005883 7.997368 7.999721
#>  [753] 7.992488 7.997812 7.999340 8.000893 7.992655 7.997732 8.001234 7.996200
#>  [761] 7.999748 7.997889 8.003297 8.008962 8.005601 8.003723 7.996326 8.001638
#>  [769] 7.994808 7.998551 8.004422 7.998065 8.000909 8.002278 7.997642 8.000182
#>  [777] 7.997340 7.996924 7.999020 8.003074 7.996943 7.997708 8.000464 8.001145
#>  [785] 7.998896 7.996108 8.000341 8.003032 8.004444 8.007393 7.996749 8.000608
#>  [793] 7.996521 8.004650 8.004262 8.009847 7.996171 7.995686 7.996880 8.003229
#>  [801] 7.994508 8.003067 7.995363 8.003659 7.997500 7.998886 7.998746 8.004563
#>  [809] 8.001324 7.994619 8.005400 7.998783 8.001472 8.000791 7.998021 8.003976
#>  [817] 7.998469 8.001588 7.996509 8.007386 7.999824 8.000333 7.999085 8.002929
#>  [825] 7.995887 8.004226 7.998790 7.997323 7.999113 7.994575 7.998476 7.998158
#>  [833] 8.001913 8.007543 8.002278 8.009369 7.994321 8.000223 8.009645 7.996227
#>  [841] 7.994862 7.996213 7.994047 8.004346 7.993120 8.007185 7.998158 8.005241
#>  [849] 8.006997 7.995658 7.996866 8.000384 7.997144 8.001407 7.997994 7.999099
#>  [857] 7.999396 8.007558 8.005803 7.997356 7.998743 8.014076 8.000982 7.996391
#>  [865] 7.989113 8.002935 8.002252 8.003984 7.992276 7.994938 8.001932 8.000250
#>  [873] 7.998719 7.997118 7.999218 8.003559 7.998347 8.003086 8.000969 7.998884
#>  [881] 8.001633 7.999580 7.998489 7.999047 7.991872 7.996039 8.001938 8.005591
#>  [889] 8.004580 7.998989 8.002732 7.998494 7.994115 8.005956 8.002765 8.002380
#>  [897] 7.993191 8.003226 8.000900 7.992528 7.998691 7.994608 8.002224 8.004165
#>  [905] 8.002920 7.993480 8.002642 8.002140 7.999405 8.003939 7.990446 8.001872
#>  [913] 8.000107 7.998172 8.000905 8.000465 8.001978 7.997166 7.992303 8.001303
#>  [921] 7.997530 8.003449 8.002642 7.995629 7.997961 8.005773 8.002240 8.001213
#>  [929] 8.000372 8.000279 7.994696 8.004001 7.998116 7.999590 8.000780 8.000737
#>  [937] 7.999731 8.001566 7.996633 7.992785 7.997424 7.996533 7.996289 8.005493
#>  [945] 8.002652 8.001489 8.000887 7.991953 7.999821 7.990551 7.997725 8.003789
#>  [953] 8.005503 7.998506 7.999595 7.996863 8.001345 7.998051 7.997424 8.003695
#>  [961] 7.996422 7.998953 7.993464 8.004175 8.006037 8.001828 7.997440 7.995781
#>  [969] 7.999212 7.996047 7.999232 7.997678 7.999144 8.000991 7.999227 7.998431
#>  [977] 8.003954 8.004121 7.987809 8.005990 8.002811 7.998413 8.001371 8.007867
#>  [985] 8.004962 7.997512 7.995998 7.997973 7.996137 8.000418 7.999125 8.002964
#>  [993] 7.990908 8.002621 7.996534 7.996232 7.998360 7.995406 8.004237 7.995312

Distributions of \(\hat \beta_1\)

Code
ggplot(data.frame(x = betas), aes(x = x)) +
  geom_density()

Central Limit Theorem

  • Sampling Distribution

  • Simulating Unicorns

  • Central Limit Theorem

  • Common Sampling Distributions

  • Sampling Distributions for Regression Models

Central Limit Theorem

The Central Limit Theorem (CLT) is a fundamental concept in probability and statistics. It states that the distribution of the sum (or average) of a large number of independent, identically distributed (i.i.d.) random variables will be approximately normal, regardless of the underlying distribution of those individual variables.

Formal Statement of the CLT

  • Let \(X_1\), \(X_2\), …, \(X_n\) be a sequence of i.i.d. random variables with mean \(\mu\) and standard deviation \(\sigma\).
  • Let \(\bar X\) be the sample mean of these variables.
  • As n (the sample size) approaches infinity, the distribution of \(\bar X\) approaches a normal distribution with:
    • Mean: \(\mu\)
    • Standard Deviation: \(\sigma/\sqrt{n}\)

CLT Example

  • Imagine: You’re flipping a fair coin many times.
    • Each flip is an independent event (heads or tails).
    • The probability of heads/tails is the same for each flip.
  • Now: Calculate the average number of heads after each set of 10 flips, then each set of 100 flips, and so on.
  • Observation: As the number of flips in each set increases, the distribution of these averages will start to resemble a bell-shaped curve (normal distribution), even though the individual coin flips are not normally distributed.

CLT Implications

  • Approximation: Even if the underlying data is not normally distributed, the distribution of the sample means will be approximately normal for large enough sample sizes.
  • Practical Rule: A common rule of thumb is that the sample size (n) should be at least 30 for the CLT to provide a good approximation. However, this is a guideline, and the actual required sample size can vary depending on the shape of the original distribution.

Normal Example \(n = 10\)

Simulating 500 samples of size 10 from a normal distribution with mean 5 and standard deviation of 2.

Code
#rnorm(10, 5, 2)
sims <- replicate(500, rnorm(10, 5, 2))
sims_mean <- colMeans(sims)
ggplot(data.frame(x = sims_mean), aes(x)) +
  geom_density() +
  stat_function(fun = dnorm, 
                args = list(mean = 5, sd = 2 / sqrt(10)),
                col = "red")

Normal Example \(n = 30\)

Simulating 500 samples of size 30 from a normal distribution with mean 5 and standard deviation of 2.

Code
# rnorm(30, 5, 2)
sims <- replicate(500, rnorm(30, 5, 2))
sims_mean <- colMeans(sims)
ggplot(data.frame(x = sims_mean), aes(x)) +
  geom_density() +
  stat_function(fun = dnorm, 
                args = list(mean = 5, sd = 2 / sqrt(30)),
                col = "red")

Normal Example \(n = 50\)

Simulating 500 samples of size 50 from a normal distribution with mean 5 and standard deviation of 2.

Code
# rnorm(50, 5, 2)
sims <- replicate(500, rnorm(50, 5, 2))
sims_mean <- colMeans(sims)
ggplot(data.frame(x = sims_mean), aes(x)) +
  geom_density() +
  stat_function(fun = dnorm, 
                args = list(mean = 5, sd = 2 / sqrt(50)),
                col = "red")

Normal Example \(n = 100\)

Simulating 500 samples of size 100 from a normal distribution with mean 5 and standard deviation of 2.

Code
# rnorm(100, 5, 2)
sims <- replicate(500, rnorm(100, 5, 2))
sims_mean <- colMeans(sims)
ggplot(data.frame(x = sims_mean), aes(x)) +
  geom_density() +
  stat_function(fun = dnorm, 
                args = list(mean = 5, sd = 2 / sqrt(100)),
                col = "red")

Common Sampling Distributions

  • Sampling Distribution

  • Simulating Unicorns

  • Central Limit Theorem

  • Common Sampling Distributions

  • Sampling Distributions for Regression Models

Normal DGP

When the data is said to have a normal distribution (DGP), there are special properties with both the mean and standard deviation, regardless of sample size.

Statistics

Mean \[ \bar X = \sum ^n_{i=1} X_i \]

Standard Deviation \[ s^2 = \frac{1}{n}\sum ^n_{i=1} (X_i - \bar X)^2 \]

When the true \(\mu\) and \(\sigma\) are known

A data sample of size \(n\) is generated from: \[ X_i \sim N(\mu, \sigma) \]

Distribution of \(\bar X\)

\[ \bar X \sim N(\mu, \sigma/\sqrt{n}) \]

Distribution of Z

\[ Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \sim N(0,1) \]

When the true \(\mu\) and \(\sigma\) are unknown

A data sample of size \(n\) is generated from: \[ X_i \sim N(\mu, \sigma) \]

Distribution of \(s^2\) (unknown \(\mu\))

\[ (n-1)s^2/\sigma^2 \sim \chi^2(n-1) \]

Distribution of Z (unknown \(\sigma\))

\[ Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \rightarrow \frac{\bar X - \mu}{s/\sqrt{n}} \sim t(n-1) \]

Sampling Distributions for Regression Models

  • Sampling Distribution

  • Simulating Unicorns

  • Central Limit Theorem

  • Common Sampling Distributions

  • Sampling Distributions for Regression Models

Regression Coefficients

The estimates of regression coefficients (slopes) have a distribution!

Based on our outcome, we will have 2 different distributions to work with: Normal or t.

Linear Regression

\[ \frac{\hat\beta_j-\beta_j}{\mathrm{se}(\hat\beta_j)} \sim t_{n-p^\prime} \]

\(\beta_j = 0\)

\[ \frac{\hat\beta_j}{\mathrm{se}(\hat\beta_j)} \sim t_{n-p^\prime} \]

Logistic Regression

\[ \frac{\hat\beta_j - \beta_j}{\mathrm{se}(\hat\beta_j)} \sim N(0,1) \]

\(\beta_j = 0\)

\[ \frac{\hat\beta_j}{\mathrm{se}(\hat\beta_j)} \sim N(0,1) \]