Sampling Distribution

Sampling Distribution

  • Sampling Distribution

  • Simulating Unicorns

  • Central Limit Theorem

  • Common Sampling Distributions

  • Sampling Distributions for Regression Models

Sampling Distribution

Sampling Distribution is the idea that the statistics that you generate (slopes and intercepts) have their own data generation process.

In other words, the numerical values you obtain from the lm and glm function can be different if we got a different data set.

Some values will be more common than others. Because of this, they have their own data generating process, like the outcome of interest has it’s own data generating process.

Sampling Distributions

  • Distribution of a statistic over repeated samples

  • Different Samples yield different statistics

Standard Error

The Standard Error (SE) is the standard deviation of a statistic itself.

SE tells us how much a statistic varies from sample to sample. Smaller SE = more precision.

Modelling the Data

\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]

  • \(Y_i\): Outcome data
  • \(X_i\): Predictor data
  • \(\beta_0, \beta_1\): parameters
  • \(\varepsilon_i\): error term

Error Term

\[ \varepsilon_i \sim DGP \]

Randomness Effect

The randomness effect is a sampling phenomenom where you will get different samples every time you sample a population.

Getting different samples means you will get different statistics.

These statistics will have a distribution on their own.

Randomness Effect 1

Code
x <- rnorm(1000)
y <- 4 + 5 * x + rnorm(1000)
bb <- round(b(lm(y ~ x),1),2)
ggplot(tibble(x = x, y = y), aes(x,y)) +
  geom_point() +
  annotate("text", 
           x = -1, y = 15, 
           label = TeX(sprintf(r'($\hat{\beta} = %g$)', bb),
                       output = "character"),
           parse = TRUE,
           size = 8) 

Randomness Effect 2

Code
x <- rnorm(1000)
y <- 4 + 5 * x + rnorm(1000)
bb <- round(b(lm(y ~ x),1),2)
ggplot(tibble(x = x, y = y), aes(x,y)) +
  geom_point() +
  annotate("text", 
           x = -1, y = 15, 
           label = TeX(sprintf(r'($\hat{\beta} = %g$)', bb),
                       output = "character"),
           parse = TRUE,
           size = 8) 

Randomness Effect 3

Code
x <- rnorm(1000)
y <- 4 + 5 * x + rnorm(1000)
bb <- round(b(lm(y ~ x),1),2)
ggplot(tibble(x = x, y = y), aes(x,y)) +
  geom_point() +
  annotate("text", 
           x = -1, y = 15, 
           label = TeX(sprintf(r'($\hat{\beta} = %g$)', bb),
                       output = "character"),
           parse = TRUE,
           size = 8) 

Randomness Effect 4

Code
x <- rnorm(1000)
y <- 4 + 5 * x + rnorm(1000)
bb <- round(b(lm(y ~ x),1),2)
ggplot(tibble(x = x, y = y), aes(x,y)) +
  geom_point() +
  annotate("text", 
           x = -1, y = 15, 
           label = TeX(sprintf(r'($\hat{\beta} = %g$)', bb),
                       output = "character"),
           parse = TRUE,
           size = 8) 

Randomness Effect 5

Code
x <- rnorm(1000)
y <- 4 + 5 * x + rnorm(1000)
bb <- round(b(lm(y ~ x),1),2)
ggplot(tibble(x = x, y = y), aes(x,y)) +
  geom_point() +
  annotate("text", 
           x = -1, y = 15, 
           label = TeX(sprintf(r'($\hat{\beta} = %g$)', bb),
                       output = "character"),
           parse = TRUE,
           size = 8) 

Simulating Unicorns

  • Sampling Distribution

  • Simulating Unicorns

  • Central Limit Theorem

  • Common Sampling Distributions

  • Sampling Distributions for Regression Models

Simulating Unicorns

To better understand the variation in statistics, let’s simulate a data set of unicorn characteristics to visualize and understand the variation.

We will simulate a data set using the unicorns function and only we need to specify how many unicorns you want to simulate.

Simulating Unicorn Data

Code
unicorns(10)
#>    Unicorn_ID Age      Gender  Color Type_of_Unicorn Type_of_Horn Horn_Length
#> 1           1  18  Non-binary Silver         Rainbow   Aquamarine    4.988714
#> 2           2  16        Male   Gold           Ember         Opal    5.072202
#> 3           3   4 Genderfluid Silver           Ember         Opal    5.011309
#> 4           4   3 Genderfluid  Black         Rainbow         Opal    5.208816
#> 5           5  11        Male   Pink         Rainbow   Aquamarine    5.155314
#> 6           6  12      Female   Gray           Jewel         Opal    5.452588
#> 7           7  15     Agender  White         Rainbow   Aquamarine    4.840136
#> 8           8  11        Male  Black           Ember   Aquamarine    4.603378
#> 9           9   7 Genderfluid  Brown         Rainbow         Opal    4.862706
#> 10         10  14      Female  White           Jewel   Aquamarine    5.165843
#>    Horn_Strength    Weight Health_Score Personality_Score Magical_Score
#> 1       29.08917 158.24746            9        1.24258693      11178.12
#> 2       28.79146  98.13809            5        1.21624385      11174.57
#> 3       24.38329 118.56501            7        0.41849898      10855.76
#> 4       30.60107 102.44132            1        0.05771830      10774.50
#> 5       30.74040  97.73896            5        0.16768542      10977.25
#> 6       26.79192 162.32819            6        0.02227278      11087.51
#> 7       32.36461 130.40431           10        2.00743797      11102.24
#> 8       29.55155 153.61094            3        2.17824506      10986.32
#> 9       27.69844 129.47895            7        0.71708960      10892.58
#> 10      30.19733  67.38834            1        0.53934383      11116.27
#>    Elusiveness_Score Gentleness_Score Nature_Score
#> 1           36.01488         7.821928     969.6667
#> 2           36.69148        15.658062     968.9276
#> 3           39.46995         5.539503     929.3217
#> 4           36.08758        13.870742     918.8624
#> 5           35.82860        55.510164     944.4110
#> 6           30.99412        72.735760     957.9081
#> 7           38.22692         7.780179     959.7173
#> 8           39.41312        -1.743923     945.2409
#> 9           29.93956        15.172132     933.7903
#> 10          32.67157        50.991588     961.5624

Unicorn Data Variables

Code
names(unicorns(10))
#>  [1] "Unicorn_ID"        "Age"               "Gender"           
#>  [4] "Color"             "Type_of_Unicorn"   "Type_of_Horn"     
#>  [7] "Horn_Length"       "Horn_Strength"     "Weight"           
#> [10] "Health_Score"      "Personality_Score" "Magical_Score"    
#> [13] "Elusiveness_Score" "Gentleness_Score"  "Nature_Score"

We will only look at Magical_Score and Nature_Score.

Magical and Nature Score

\[ Magical = 3423 + 8 \times Nature + \varepsilon \]

\[ \varepsilon \sim N(0, 3.24) \]

Simulating \(N(0, 3.24)\)

Code
rnorm(1, 0, sqrt(3.24))
#> [1] -4.456811

Collecting

Code
unicorns(10) |> select(Nature_Score, Magical_Score)
#>    Nature_Score Magical_Score
#> 1      915.4536      10751.59
#> 2      977.5376      11242.84
#> 3      959.3878      11096.57
#> 4      966.8805      11157.59
#> 5      915.4945      10747.38
#> 6      971.3866      11194.83
#> 7      950.1761      11023.78
#> 8      959.9471      11102.46
#> 9      923.4441      10808.77
#> 10     929.2300      10856.48

DGP of Magical Score 1

Code
ggplot(unicorns(500), aes(Magical_Score)) +
  geom_density()

DGP of Magical Score 2

Code
ggplot(unicorns(500), aes(Magical_Score)) +
  geom_density()

Estimating \(\beta_1\) via lm

Code
u1 <- unicorns(500)
lm(Magical_Score ~ Nature_Score, u1)
#> 
#> Call:
#> lm(formula = Magical_Score ~ Nature_Score, data = u1)
#> 
#> Coefficients:
#>  (Intercept)  Nature_Score  
#>         3423             8

Collecting a new sample

Code
u2 <- unicorns(500)
lm(Magical_Score ~ Nature_Score, u2)
#> 
#> Call:
#> lm(formula = Magical_Score ~ Nature_Score, data = u2)
#> 
#> Coefficients:
#>  (Intercept)  Nature_Score  
#>     3416.602         8.007

Collecting a new sample

Code
u3 <- unicorns(500)
lm(Magical_Score ~ Nature_Score, u3)
#> 
#> Call:
#> lm(formula = Magical_Score ~ Nature_Score, data = u3)
#> 
#> Coefficients:
#>  (Intercept)  Nature_Score  
#>         3423             8

Collecting a new sample

Code
u4 <- unicorns(500)
lm(Magical_Score ~ Nature_Score, u4)
#> 
#> Call:
#> lm(formula = Magical_Score ~ Nature_Score, data = u4)
#> 
#> Coefficients:
#>  (Intercept)  Nature_Score  
#>      3413.31          8.01

Replicating Processes

Code
replicate(N, CODE)
  • N: number of times to repeat a process
  • CODE: what is to repeated

Extracting \(\hat \beta\) Coefficeints

Code
b(MODEL, INDEX)
  • MODEL: a model that can be used to extract components
  • INDEX: which component do you want to use
    • 0: Intercept
    • 1: first slope
    • 2: second slope
    • ...

Collecting 1000 Samples

Code
betas <- replicate(1000,
                   b(lm(Magical_Score ~ Nature_Score, unicorns(500)), 1))

betas
#>    [1] 8.006005 8.000749 8.000242 8.000970 7.999794 8.004019 7.995709 8.004831
#>    [9] 8.007596 7.999619 7.997973 7.997775 7.997454 7.998365 7.998978 7.996386
#>   [17] 8.001305 8.004366 8.000576 7.994035 7.996230 7.996973 7.997003 8.006913
#>   [25] 8.003995 8.000555 7.991549 7.998029 8.007378 7.997363 7.993366 8.004517
#>   [33] 8.006399 8.007510 8.006994 8.000419 8.004252 7.986660 8.005040 8.001847
#>   [41] 8.000644 8.001306 7.994865 8.000104 7.994877 7.996606 8.000823 8.002073
#>   [49] 7.999559 7.995925 7.999690 7.999020 8.009687 7.999169 7.996707 7.994817
#>   [57] 7.997977 7.993292 7.994568 8.003915 8.001575 7.996282 8.000760 7.998352
#>   [65] 7.999548 7.998483 7.996060 8.001019 8.005415 7.999924 7.994840 7.996303
#>   [73] 8.002643 7.995999 7.995157 8.004132 7.997768 8.000082 8.001275 7.993533
#>   [81] 8.006586 8.007608 7.997862 7.997768 7.994237 8.000904 7.998071 7.991187
#>   [89] 7.992225 8.002259 7.996533 8.001052 7.996837 8.004348 7.998156 7.995158
#>   [97] 8.000112 8.000963 8.003050 7.991453 8.004259 7.989114 7.997063 7.995339
#>  [105] 7.997602 7.994243 7.998824 7.998586 7.996620 8.003392 8.000615 8.001509
#>  [113] 7.996245 7.997885 8.000442 7.993744 8.001818 8.002449 7.998761 8.005148
#>  [121] 8.000152 7.997489 8.006753 8.002872 8.003703 7.997286 7.994461 7.996828
#>  [129] 8.002158 8.000710 7.999843 7.997874 8.000839 8.001316 7.994218 7.999182
#>  [137] 8.001597 8.001391 7.999268 8.005483 8.001285 7.995361 8.001581 8.006647
#>  [145] 7.999567 7.999511 8.000324 8.003506 8.000902 8.000685 8.004237 8.002677
#>  [153] 7.999206 8.004737 7.994880 7.999574 8.006911 7.995514 7.999009 8.003851
#>  [161] 8.002484 8.004293 8.001428 8.000410 7.999496 8.003297 8.000536 8.002892
#>  [169] 7.999473 7.994189 7.997220 7.999287 7.997034 8.002018 7.998543 8.002261
#>  [177] 7.995196 7.998689 8.001253 7.995664 7.998828 8.002591 8.004963 8.003620
#>  [185] 8.004057 7.998868 7.996352 7.998437 7.996710 8.004129 8.002998 8.005867
#>  [193] 7.991396 8.007123 7.997639 8.002234 7.998377 7.998766 8.002824 8.003768
#>  [201] 7.995479 8.006218 7.998912 7.994977 7.998956 8.003910 8.001464 8.002885
#>  [209] 7.999469 8.005164 8.003965 8.001886 8.000391 7.999974 7.995777 8.001860
#>  [217] 7.998013 7.999373 8.008529 8.002646 7.996948 8.010393 8.002950 8.002737
#>  [225] 8.004115 8.003315 8.003611 8.001848 8.002638 8.003148 8.003750 7.997785
#>  [233] 8.010139 7.996002 7.997798 8.001096 8.007033 7.995334 7.996814 7.995428
#>  [241] 8.001995 7.993982 8.006621 7.993753 7.999565 8.004752 8.004959 8.000594
#>  [249] 7.991523 7.998524 7.996239 7.999738 7.996767 7.998253 8.002541 7.991995
#>  [257] 7.993276 7.997417 8.006750 8.005255 7.997924 7.997751 7.997938 7.999794
#>  [265] 7.995946 8.000156 8.003493 7.998647 8.002542 7.993344 7.999939 7.993507
#>  [273] 7.998779 8.002759 7.997687 8.002172 8.004199 8.001715 8.001212 8.005758
#>  [281] 7.993880 8.004775 7.998163 8.007551 8.001031 7.998477 7.996789 7.998817
#>  [289] 7.995704 7.999332 7.999271 8.001597 8.001965 7.995646 8.006281 8.007053
#>  [297] 7.994469 7.998912 8.001451 7.993374 7.996599 8.003826 7.991420 7.999191
#>  [305] 8.002195 8.000359 7.997842 8.007016 8.004230 7.999342 8.000073 7.999261
#>  [313] 8.003717 7.999851 7.998204 7.998501 7.999636 8.005209 8.002458 7.999753
#>  [321] 7.998577 7.995993 7.998715 8.000532 8.005190 8.003919 8.000339 8.004897
#>  [329] 8.001475 8.006686 8.003568 7.998474 8.010947 8.009024 8.002097 8.001776
#>  [337] 7.998887 8.002130 7.995754 7.991093 8.002153 8.002419 8.002633 8.000632
#>  [345] 8.001994 7.997948 8.002352 8.000983 7.997894 8.002909 8.000952 7.995252
#>  [353] 7.998691 8.003381 7.996509 7.998064 8.005946 8.000778 7.998161 7.997987
#>  [361] 8.004676 7.992660 8.002370 7.996660 7.997940 8.000988 8.004288 8.001952
#>  [369] 7.999146 7.999341 8.003840 8.000806 8.002022 8.005483 8.003165 8.000701
#>  [377] 8.006683 8.009538 8.004917 8.003710 8.000014 7.995408 8.002574 8.000452
#>  [385] 7.997078 8.003276 8.005062 7.996275 8.000037 8.010198 8.002064 8.008357
#>  [393] 7.997277 8.008149 8.004711 7.996503 7.994436 7.993485 7.999311 8.009997
#>  [401] 8.004745 7.987293 8.001589 8.002879 7.995938 8.006645 7.992860 7.995130
#>  [409] 8.001504 7.992851 8.004797 8.005040 8.007440 7.997578 7.987032 8.003157
#>  [417] 8.004510 8.000624 7.998198 8.005845 7.999626 7.998326 7.997761 7.997089
#>  [425] 8.001810 7.997948 8.004894 7.992254 8.002345 7.997851 8.004147 7.994927
#>  [433] 7.998588 7.994647 7.993691 8.003844 7.997349 8.011645 8.006924 7.994578
#>  [441] 7.995582 7.998880 7.999336 7.992738 7.993633 8.002799 8.001707 7.992957
#>  [449] 8.000315 8.003799 8.000724 7.996752 8.000551 7.991191 7.998707 8.002051
#>  [457] 8.003549 7.997704 7.999792 7.999042 8.000836 8.004398 7.999083 7.998689
#>  [465] 8.004498 7.996733 8.001847 7.998845 8.010162 8.002379 7.997016 7.996539
#>  [473] 7.999495 7.996435 7.994657 8.011524 7.999356 8.006519 7.988773 8.000225
#>  [481] 8.002975 7.998449 8.003293 8.001882 8.001251 8.001482 7.999436 8.000665
#>  [489] 8.004600 8.004074 7.998633 8.001669 7.998561 7.998467 7.998123 7.998541
#>  [497] 8.001742 7.998274 7.996458 7.998605 7.998043 7.999703 8.010412 7.998309
#>  [505] 8.002328 7.994785 7.994667 7.996487 8.000806 7.997448 7.999411 8.003865
#>  [513] 8.000105 7.999801 7.999720 7.999751 7.999840 8.004566 8.003932 7.999072
#>  [521] 8.006003 7.996394 7.999593 7.998265 8.000048 8.000101 7.991196 7.997728
#>  [529] 7.997101 7.998847 7.992676 8.000998 7.998463 8.002438 8.004156 8.000932
#>  [537] 8.003173 7.998595 8.002036 8.002978 7.999782 7.996014 7.996475 8.004365
#>  [545] 7.997711 8.003079 8.002907 7.999365 7.993490 7.992843 8.001101 7.996941
#>  [553] 8.003964 8.005878 8.005543 7.993178 8.003159 7.995026 7.996859 8.002714
#>  [561] 8.002406 8.006406 7.995205 7.988903 7.994209 8.004422 8.002465 7.999083
#>  [569] 8.001943 8.006390 7.995300 8.000965 8.002986 8.003540 7.998119 8.002417
#>  [577] 8.000901 8.000121 8.001069 8.009597 8.001675 8.001479 7.993162 8.009085
#>  [585] 8.003806 8.002336 8.004740 7.997728 8.000490 8.003730 7.996883 8.003166
#>  [593] 7.992441 8.003567 7.993107 7.997414 7.996661 7.997489 7.996555 8.000574
#>  [601] 7.998582 8.004767 8.003132 7.997465 8.002089 7.999709 7.999259 7.995737
#>  [609] 7.997047 8.000338 8.006531 8.006078 8.003725 7.997750 8.003000 8.000018
#>  [617] 8.005052 7.997268 8.001639 7.990979 8.001252 7.998654 7.995711 8.002732
#>  [625] 7.996906 7.999097 7.999740 7.998948 7.995581 8.000313 8.006183 7.999571
#>  [633] 8.005510 8.003471 8.006436 8.005281 8.003222 8.001325 7.990984 7.998500
#>  [641] 8.002261 8.001133 7.992839 8.003087 7.994868 7.996228 8.001203 7.998164
#>  [649] 8.002304 8.004581 8.005261 7.996005 7.999089 7.995226 7.997893 7.996988
#>  [657] 8.004387 7.992728 8.004273 7.991483 8.007438 8.003723 7.996726 7.998202
#>  [665] 7.995578 7.999798 7.998169 7.999947 7.999002 8.001032 8.000137 8.000689
#>  [673] 8.001463 7.990861 7.996499 8.000715 7.994415 8.004065 8.000141 8.001473
#>  [681] 8.003729 8.005708 7.996172 7.995647 8.007183 7.996375 7.995762 7.996956
#>  [689] 8.005013 8.001118 7.997133 8.002603 7.995251 8.000704 8.004775 7.992064
#>  [697] 7.999326 8.001423 8.001110 8.010605 8.005676 7.998125 8.005279 7.997307
#>  [705] 7.999588 7.996618 7.993493 7.999982 8.007297 7.998640 8.002334 8.006300
#>  [713] 7.999717 7.995867 7.999067 7.993488 7.998045 7.999077 8.004019 7.992221
#>  [721] 8.000816 7.997677 8.007396 8.001282 7.996884 7.992911 7.996625 7.995049
#>  [729] 8.005926 8.000446 8.004369 8.002994 7.992723 7.998204 7.991110 7.998333
#>  [737] 7.992127 8.001307 8.003563 7.996561 7.997971 7.991682 8.013060 8.000834
#>  [745] 7.997423 7.992637 7.996278 8.006671 7.997989 7.997439 7.993536 8.000334
#>  [753] 7.997850 8.004511 8.001089 8.000544 7.995928 7.999231 8.000098 8.006629
#>  [761] 8.002121 8.007308 8.005817 8.001170 7.994373 8.000899 8.003127 8.004416
#>  [769] 7.994818 8.006052 8.001788 8.002023 8.000535 7.999967 7.993877 7.999782
#>  [777] 8.000256 8.001998 8.004530 7.999223 7.992855 8.001456 8.002113 8.001354
#>  [785] 7.999032 8.003057 7.996276 8.004822 7.998606 8.001770 7.998497 7.998044
#>  [793] 7.999154 7.997282 7.999231 8.008736 7.999926 7.998852 7.997421 8.004628
#>  [801] 7.999885 7.998100 7.999401 8.000069 8.004708 8.005711 7.997410 8.003802
#>  [809] 7.997304 8.002205 7.987116 8.010234 8.001602 8.003602 7.995958 8.005756
#>  [817] 7.991758 7.996843 8.003196 8.003607 7.998393 8.004438 8.002108 8.000802
#>  [825] 7.995868 8.002883 8.001276 7.998054 8.000580 7.999039 8.001628 8.001807
#>  [833] 7.995678 8.003286 8.008315 8.003468 7.989349 7.999718 7.996887 7.998978
#>  [841] 8.002041 7.984988 7.994769 8.011143 7.993913 7.998172 8.002448 8.004698
#>  [849] 8.000538 8.003942 7.990700 8.000784 7.998765 8.002061 8.002792 7.997581
#>  [857] 8.007502 8.006741 7.991396 7.996231 8.004555 8.010579 7.998787 7.994651
#>  [865] 7.999804 8.000790 8.000797 7.999258 8.006050 8.001132 7.999770 8.003834
#>  [873] 8.000026 8.006769 7.997277 7.999573 7.998330 7.993863 7.996323 7.995160
#>  [881] 8.004870 7.995001 8.005242 8.000583 8.002539 7.999040 8.002472 8.004648
#>  [889] 8.003831 7.996224 8.002614 7.997678 8.001527 8.008071 8.003216 7.998664
#>  [897] 7.998510 7.997161 7.992343 7.997547 8.000843 7.999923 7.998131 8.000496
#>  [905] 7.999225 8.006539 7.998148 7.994758 8.005142 8.000250 8.000364 7.995563
#>  [913] 8.005904 8.001970 8.004786 7.999882 8.004474 7.992855 7.995388 8.002836
#>  [921] 7.995282 7.996902 7.999073 7.999165 7.996337 7.995192 7.993515 7.998223
#>  [929] 8.006490 8.002603 7.995310 8.004004 8.001495 8.001074 7.994019 8.001612
#>  [937] 7.997956 8.003876 7.993290 8.001043 7.999398 7.997879 8.006828 8.001572
#>  [945] 8.006243 8.000783 8.001805 7.992247 7.996892 7.998589 8.003084 8.001178
#>  [953] 8.005162 7.997835 8.003680 8.004798 8.000865 8.000598 7.998172 8.008019
#>  [961] 8.002210 7.992428 7.993999 8.001785 7.998352 8.004404 8.001358 8.000511
#>  [969] 8.001908 8.002787 8.001974 8.001293 8.000670 7.997405 8.002455 8.005334
#>  [977] 7.998905 8.004823 8.009505 7.999660 7.998636 8.002939 8.002640 8.001892
#>  [985] 8.006981 7.995572 7.999681 8.005191 7.998535 7.997529 7.997061 7.999304
#>  [993] 8.005253 7.998253 7.996060 7.997975 8.001365 7.999579 7.993138 7.996120

Distributions of \(\hat \beta_1\)

Code
ggplot(data.frame(x = betas), aes(x = x)) +
  geom_density()

Central Limit Theorem

  • Sampling Distribution

  • Simulating Unicorns

  • Central Limit Theorem

  • Common Sampling Distributions

  • Sampling Distributions for Regression Models

Central Limit Theorem

The Central Limit Theorem (CLT) is a fundamental concept in probability and statistics. It states that the distribution of the sum (or average) of a large number of independent, identically distributed (i.i.d.) random variables will be approximately normal, regardless of the underlying distribution of those individual variables.

Formal Statement of the CLT

  • Let \(X_1\), \(X_2\), …, \(X_n\) be a sequence of i.i.d. random variables with mean \(\mu\) and standard deviation \(\sigma\).
  • Let \(\bar X\) be the sample mean of these variables.
  • As n (the sample size) approaches infinity, the distribution of \(\bar X\) approaches a normal distribution with:
    • Mean: \(\mu\)
    • Standard Deviation: \(\sigma/\sqrt{n}\)

CLT Example

  • Imagine: You’re flipping a fair coin many times.
    • Each flip is an independent event (heads or tails).
    • The probability of heads/tails is the same for each flip.
  • Now: Calculate the average number of heads after each set of 10 flips, then each set of 100 flips, and so on.
  • Observation: As the number of flips in each set increases, the distribution of these averages will start to resemble a bell-shaped curve (normal distribution), even though the individual coin flips are not normally distributed.

CLT Implications

  • Approximation: Even if the underlying data is not normally distributed, the distribution of the sample means will be approximately normal for large enough sample sizes.
  • Practical Rule: A common rule of thumb is that the sample size (n) should be at least 30 for the CLT to provide a good approximation. However, this is a guideline, and the actual required sample size can vary depending on the shape of the original distribution.

Normal Example \(n = 10\)

Simulating 500 samples of size 10 from a normal distribution with mean 5 and standard deviation of 2.

Code
#rnorm(10, 5, 2)
sims <- replicate(500, rnorm(10, 5, 2))
sims_mean <- colMeans(sims)
ggplot(data.frame(x = sims_mean), aes(x)) +
  geom_density() +
  stat_function(fun = dnorm, 
                args = list(mean = 5, sd = 2 / sqrt(10)),
                col = "red")

Normal Example \(n = 30\)

Simulating 500 samples of size 30 from a normal distribution with mean 5 and standard deviation of 2.

Code
# rnorm(30, 5, 2)
sims <- replicate(500, rnorm(30, 5, 2))
sims_mean <- colMeans(sims)
ggplot(data.frame(x = sims_mean), aes(x)) +
  geom_density() +
  stat_function(fun = dnorm, 
                args = list(mean = 5, sd = 2 / sqrt(30)),
                col = "red")

Normal Example \(n = 50\)

Simulating 500 samples of size 50 from a normal distribution with mean 5 and standard deviation of 2.

Code
# rnorm(50, 5, 2)
sims <- replicate(500, rnorm(50, 5, 2))
sims_mean <- colMeans(sims)
ggplot(data.frame(x = sims_mean), aes(x)) +
  geom_density() +
  stat_function(fun = dnorm, 
                args = list(mean = 5, sd = 2 / sqrt(50)),
                col = "red")

Normal Example \(n = 100\)

Simulating 500 samples of size 100 from a normal distribution with mean 5 and standard deviation of 2.

Code
# rnorm(100, 5, 2)
sims <- replicate(500, rnorm(100, 5, 2))
sims_mean <- colMeans(sims)
ggplot(data.frame(x = sims_mean), aes(x)) +
  geom_density() +
  stat_function(fun = dnorm, 
                args = list(mean = 5, sd = 2 / sqrt(100)),
                col = "red")

Common Sampling Distributions

  • Sampling Distribution

  • Simulating Unicorns

  • Central Limit Theorem

  • Common Sampling Distributions

  • Sampling Distributions for Regression Models

Normal DGP

When the data is said to have a normal distribution (DGP), there are special properties with both the mean and standard deviation, regardless of sample size.

Statistics

Mean \[ \bar X = \sum ^n_{i=1} X_i \]

Standard Deviation \[ s^2 = \frac{1}{n}\sum ^n_{i=1} (X_i - \bar X)^2 \]

When the true \(\mu\) and \(\sigma\) are known

A data sample of size \(n\) is generated from: \[ X_i \sim N(\mu, \sigma) \]

Distribution of \(\bar X\)

\[ \bar X \sim N(\mu, \sigma/\sqrt{n}) \]

Distribution of Z

\[ Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \sim N(0,1) \]

When the true \(\mu\) and \(\sigma\) are unknown

A data sample of size \(n\) is generated from: \[ X_i \sim N(\mu, \sigma) \]

Distribution of \(s^2\) (unknown \(\mu\))

\[ (n-1)s^2/\sigma^2 \sim \chi^2(n-1) \]

Distribution of Z (unknown \(\sigma\))

\[ Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \rightarrow \frac{\bar X - \mu}{s/\sqrt{n}} \sim t(n-1) \]

Sampling Distributions for Regression Models

  • Sampling Distribution

  • Simulating Unicorns

  • Central Limit Theorem

  • Common Sampling Distributions

  • Sampling Distributions for Regression Models

Regression Coefficients

The estimates of regression coefficients (slopes) have a distribution!

Based on our outcome, we will have 2 different distributions to work with: Normal or t.

Linear Regression

\[ \frac{\hat\beta_j-\beta_j}{\mathrm{se}(\hat\beta_j)} \sim t_{n-p^\prime} \]

\(\beta_j = 0\)

\[ \frac{\hat\beta_j}{\mathrm{se}(\hat\beta_j)} \sim t_{n-p^\prime} \]

Logistic Regression

\[ \frac{\hat\beta_j - \beta_j}{\mathrm{se}(\hat\beta_j)} \sim N(0,1) \]

\(\beta_j = 0\)

\[ \frac{\hat\beta_j}{\mathrm{se}(\hat\beta_j)} \sim N(0,1) \]