Sampling Distribution
Simulating Unicorns
Central Limit Theorem
Common Sampling Distributions
Sampling Distributions for Regression Models
Sampling Distribution is the idea that the statistics that you generate (slopes and intercepts) have their own data generation process.
In other words, the numerical values you obtain from the lm and glm function can be different if we got a different data set.
Some values will be more common than others. Because of this, they have their own data generating process, like the outcome of interest has it’s own data generating process.
Distribution of a statistic over repeated samples
Different Samples yield different statistics
The Standard Error (SE) is the standard deviation of a statistic itself.
SE tells us how much a statistic varies from sample to sample. Smaller SE = more precision.
\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]
\[ \varepsilon_i \sim DGP \]
The randomness effect is a sampling phenomenom where you will get different samples every time you sample a population.
Getting different samples means you will get different statistics.
These statistics will have a distribution on their own.
Sampling Distribution
Simulating Unicorns
Central Limit Theorem
Common Sampling Distributions
Sampling Distributions for Regression Models
To better understand the variation in statistics, let’s simulate a data set of unicorn characteristics to visualize and understand the variation.
We will simulate a data set using the unicorns function and only we need to specify how many unicorns you want to simulate.
#> Unicorn_ID Age Gender Color Type_of_Unicorn Type_of_Horn Horn_Length
#> 1 1 18 Non-binary Silver Rainbow Aquamarine 4.988714
#> 2 2 16 Male Gold Ember Opal 5.072202
#> 3 3 4 Genderfluid Silver Ember Opal 5.011309
#> 4 4 3 Genderfluid Black Rainbow Opal 5.208816
#> 5 5 11 Male Pink Rainbow Aquamarine 5.155314
#> 6 6 12 Female Gray Jewel Opal 5.452588
#> 7 7 15 Agender White Rainbow Aquamarine 4.840136
#> 8 8 11 Male Black Ember Aquamarine 4.603378
#> 9 9 7 Genderfluid Brown Rainbow Opal 4.862706
#> 10 10 14 Female White Jewel Aquamarine 5.165843
#> Horn_Strength Weight Health_Score Personality_Score Magical_Score
#> 1 29.08917 158.24746 9 1.24258693 11178.12
#> 2 28.79146 98.13809 5 1.21624385 11174.57
#> 3 24.38329 118.56501 7 0.41849898 10855.76
#> 4 30.60107 102.44132 1 0.05771830 10774.50
#> 5 30.74040 97.73896 5 0.16768542 10977.25
#> 6 26.79192 162.32819 6 0.02227278 11087.51
#> 7 32.36461 130.40431 10 2.00743797 11102.24
#> 8 29.55155 153.61094 3 2.17824506 10986.32
#> 9 27.69844 129.47895 7 0.71708960 10892.58
#> 10 30.19733 67.38834 1 0.53934383 11116.27
#> Elusiveness_Score Gentleness_Score Nature_Score
#> 1 36.01488 7.821928 969.6667
#> 2 36.69148 15.658062 968.9276
#> 3 39.46995 5.539503 929.3217
#> 4 36.08758 13.870742 918.8624
#> 5 35.82860 55.510164 944.4110
#> 6 30.99412 72.735760 957.9081
#> 7 38.22692 7.780179 959.7173
#> 8 39.41312 -1.743923 945.2409
#> 9 29.93956 15.172132 933.7903
#> 10 32.67157 50.991588 961.5624
#> [1] "Unicorn_ID" "Age" "Gender"
#> [4] "Color" "Type_of_Unicorn" "Type_of_Horn"
#> [7] "Horn_Length" "Horn_Strength" "Weight"
#> [10] "Health_Score" "Personality_Score" "Magical_Score"
#> [13] "Elusiveness_Score" "Gentleness_Score" "Nature_Score"
We will only look at Magical_Score and Nature_Score.
\[ Magical = 3423 + 8 \times Nature + \varepsilon \]
\[ \varepsilon \sim N(0, 3.24) \]
#> Nature_Score Magical_Score
#> 1 915.4536 10751.59
#> 2 977.5376 11242.84
#> 3 959.3878 11096.57
#> 4 966.8805 11157.59
#> 5 915.4945 10747.38
#> 6 971.3866 11194.83
#> 7 950.1761 11023.78
#> 8 959.9471 11102.46
#> 9 923.4441 10808.77
#> 10 929.2300 10856.48
lmN: number of times to repeat a processCODE: what is to repeatedMODEL: a model that can be used to extract componentsINDEX: which component do you want to use
0: Intercept1: first slope2: second slope...#> [1] 8.006005 8.000749 8.000242 8.000970 7.999794 8.004019 7.995709 8.004831
#> [9] 8.007596 7.999619 7.997973 7.997775 7.997454 7.998365 7.998978 7.996386
#> [17] 8.001305 8.004366 8.000576 7.994035 7.996230 7.996973 7.997003 8.006913
#> [25] 8.003995 8.000555 7.991549 7.998029 8.007378 7.997363 7.993366 8.004517
#> [33] 8.006399 8.007510 8.006994 8.000419 8.004252 7.986660 8.005040 8.001847
#> [41] 8.000644 8.001306 7.994865 8.000104 7.994877 7.996606 8.000823 8.002073
#> [49] 7.999559 7.995925 7.999690 7.999020 8.009687 7.999169 7.996707 7.994817
#> [57] 7.997977 7.993292 7.994568 8.003915 8.001575 7.996282 8.000760 7.998352
#> [65] 7.999548 7.998483 7.996060 8.001019 8.005415 7.999924 7.994840 7.996303
#> [73] 8.002643 7.995999 7.995157 8.004132 7.997768 8.000082 8.001275 7.993533
#> [81] 8.006586 8.007608 7.997862 7.997768 7.994237 8.000904 7.998071 7.991187
#> [89] 7.992225 8.002259 7.996533 8.001052 7.996837 8.004348 7.998156 7.995158
#> [97] 8.000112 8.000963 8.003050 7.991453 8.004259 7.989114 7.997063 7.995339
#> [105] 7.997602 7.994243 7.998824 7.998586 7.996620 8.003392 8.000615 8.001509
#> [113] 7.996245 7.997885 8.000442 7.993744 8.001818 8.002449 7.998761 8.005148
#> [121] 8.000152 7.997489 8.006753 8.002872 8.003703 7.997286 7.994461 7.996828
#> [129] 8.002158 8.000710 7.999843 7.997874 8.000839 8.001316 7.994218 7.999182
#> [137] 8.001597 8.001391 7.999268 8.005483 8.001285 7.995361 8.001581 8.006647
#> [145] 7.999567 7.999511 8.000324 8.003506 8.000902 8.000685 8.004237 8.002677
#> [153] 7.999206 8.004737 7.994880 7.999574 8.006911 7.995514 7.999009 8.003851
#> [161] 8.002484 8.004293 8.001428 8.000410 7.999496 8.003297 8.000536 8.002892
#> [169] 7.999473 7.994189 7.997220 7.999287 7.997034 8.002018 7.998543 8.002261
#> [177] 7.995196 7.998689 8.001253 7.995664 7.998828 8.002591 8.004963 8.003620
#> [185] 8.004057 7.998868 7.996352 7.998437 7.996710 8.004129 8.002998 8.005867
#> [193] 7.991396 8.007123 7.997639 8.002234 7.998377 7.998766 8.002824 8.003768
#> [201] 7.995479 8.006218 7.998912 7.994977 7.998956 8.003910 8.001464 8.002885
#> [209] 7.999469 8.005164 8.003965 8.001886 8.000391 7.999974 7.995777 8.001860
#> [217] 7.998013 7.999373 8.008529 8.002646 7.996948 8.010393 8.002950 8.002737
#> [225] 8.004115 8.003315 8.003611 8.001848 8.002638 8.003148 8.003750 7.997785
#> [233] 8.010139 7.996002 7.997798 8.001096 8.007033 7.995334 7.996814 7.995428
#> [241] 8.001995 7.993982 8.006621 7.993753 7.999565 8.004752 8.004959 8.000594
#> [249] 7.991523 7.998524 7.996239 7.999738 7.996767 7.998253 8.002541 7.991995
#> [257] 7.993276 7.997417 8.006750 8.005255 7.997924 7.997751 7.997938 7.999794
#> [265] 7.995946 8.000156 8.003493 7.998647 8.002542 7.993344 7.999939 7.993507
#> [273] 7.998779 8.002759 7.997687 8.002172 8.004199 8.001715 8.001212 8.005758
#> [281] 7.993880 8.004775 7.998163 8.007551 8.001031 7.998477 7.996789 7.998817
#> [289] 7.995704 7.999332 7.999271 8.001597 8.001965 7.995646 8.006281 8.007053
#> [297] 7.994469 7.998912 8.001451 7.993374 7.996599 8.003826 7.991420 7.999191
#> [305] 8.002195 8.000359 7.997842 8.007016 8.004230 7.999342 8.000073 7.999261
#> [313] 8.003717 7.999851 7.998204 7.998501 7.999636 8.005209 8.002458 7.999753
#> [321] 7.998577 7.995993 7.998715 8.000532 8.005190 8.003919 8.000339 8.004897
#> [329] 8.001475 8.006686 8.003568 7.998474 8.010947 8.009024 8.002097 8.001776
#> [337] 7.998887 8.002130 7.995754 7.991093 8.002153 8.002419 8.002633 8.000632
#> [345] 8.001994 7.997948 8.002352 8.000983 7.997894 8.002909 8.000952 7.995252
#> [353] 7.998691 8.003381 7.996509 7.998064 8.005946 8.000778 7.998161 7.997987
#> [361] 8.004676 7.992660 8.002370 7.996660 7.997940 8.000988 8.004288 8.001952
#> [369] 7.999146 7.999341 8.003840 8.000806 8.002022 8.005483 8.003165 8.000701
#> [377] 8.006683 8.009538 8.004917 8.003710 8.000014 7.995408 8.002574 8.000452
#> [385] 7.997078 8.003276 8.005062 7.996275 8.000037 8.010198 8.002064 8.008357
#> [393] 7.997277 8.008149 8.004711 7.996503 7.994436 7.993485 7.999311 8.009997
#> [401] 8.004745 7.987293 8.001589 8.002879 7.995938 8.006645 7.992860 7.995130
#> [409] 8.001504 7.992851 8.004797 8.005040 8.007440 7.997578 7.987032 8.003157
#> [417] 8.004510 8.000624 7.998198 8.005845 7.999626 7.998326 7.997761 7.997089
#> [425] 8.001810 7.997948 8.004894 7.992254 8.002345 7.997851 8.004147 7.994927
#> [433] 7.998588 7.994647 7.993691 8.003844 7.997349 8.011645 8.006924 7.994578
#> [441] 7.995582 7.998880 7.999336 7.992738 7.993633 8.002799 8.001707 7.992957
#> [449] 8.000315 8.003799 8.000724 7.996752 8.000551 7.991191 7.998707 8.002051
#> [457] 8.003549 7.997704 7.999792 7.999042 8.000836 8.004398 7.999083 7.998689
#> [465] 8.004498 7.996733 8.001847 7.998845 8.010162 8.002379 7.997016 7.996539
#> [473] 7.999495 7.996435 7.994657 8.011524 7.999356 8.006519 7.988773 8.000225
#> [481] 8.002975 7.998449 8.003293 8.001882 8.001251 8.001482 7.999436 8.000665
#> [489] 8.004600 8.004074 7.998633 8.001669 7.998561 7.998467 7.998123 7.998541
#> [497] 8.001742 7.998274 7.996458 7.998605 7.998043 7.999703 8.010412 7.998309
#> [505] 8.002328 7.994785 7.994667 7.996487 8.000806 7.997448 7.999411 8.003865
#> [513] 8.000105 7.999801 7.999720 7.999751 7.999840 8.004566 8.003932 7.999072
#> [521] 8.006003 7.996394 7.999593 7.998265 8.000048 8.000101 7.991196 7.997728
#> [529] 7.997101 7.998847 7.992676 8.000998 7.998463 8.002438 8.004156 8.000932
#> [537] 8.003173 7.998595 8.002036 8.002978 7.999782 7.996014 7.996475 8.004365
#> [545] 7.997711 8.003079 8.002907 7.999365 7.993490 7.992843 8.001101 7.996941
#> [553] 8.003964 8.005878 8.005543 7.993178 8.003159 7.995026 7.996859 8.002714
#> [561] 8.002406 8.006406 7.995205 7.988903 7.994209 8.004422 8.002465 7.999083
#> [569] 8.001943 8.006390 7.995300 8.000965 8.002986 8.003540 7.998119 8.002417
#> [577] 8.000901 8.000121 8.001069 8.009597 8.001675 8.001479 7.993162 8.009085
#> [585] 8.003806 8.002336 8.004740 7.997728 8.000490 8.003730 7.996883 8.003166
#> [593] 7.992441 8.003567 7.993107 7.997414 7.996661 7.997489 7.996555 8.000574
#> [601] 7.998582 8.004767 8.003132 7.997465 8.002089 7.999709 7.999259 7.995737
#> [609] 7.997047 8.000338 8.006531 8.006078 8.003725 7.997750 8.003000 8.000018
#> [617] 8.005052 7.997268 8.001639 7.990979 8.001252 7.998654 7.995711 8.002732
#> [625] 7.996906 7.999097 7.999740 7.998948 7.995581 8.000313 8.006183 7.999571
#> [633] 8.005510 8.003471 8.006436 8.005281 8.003222 8.001325 7.990984 7.998500
#> [641] 8.002261 8.001133 7.992839 8.003087 7.994868 7.996228 8.001203 7.998164
#> [649] 8.002304 8.004581 8.005261 7.996005 7.999089 7.995226 7.997893 7.996988
#> [657] 8.004387 7.992728 8.004273 7.991483 8.007438 8.003723 7.996726 7.998202
#> [665] 7.995578 7.999798 7.998169 7.999947 7.999002 8.001032 8.000137 8.000689
#> [673] 8.001463 7.990861 7.996499 8.000715 7.994415 8.004065 8.000141 8.001473
#> [681] 8.003729 8.005708 7.996172 7.995647 8.007183 7.996375 7.995762 7.996956
#> [689] 8.005013 8.001118 7.997133 8.002603 7.995251 8.000704 8.004775 7.992064
#> [697] 7.999326 8.001423 8.001110 8.010605 8.005676 7.998125 8.005279 7.997307
#> [705] 7.999588 7.996618 7.993493 7.999982 8.007297 7.998640 8.002334 8.006300
#> [713] 7.999717 7.995867 7.999067 7.993488 7.998045 7.999077 8.004019 7.992221
#> [721] 8.000816 7.997677 8.007396 8.001282 7.996884 7.992911 7.996625 7.995049
#> [729] 8.005926 8.000446 8.004369 8.002994 7.992723 7.998204 7.991110 7.998333
#> [737] 7.992127 8.001307 8.003563 7.996561 7.997971 7.991682 8.013060 8.000834
#> [745] 7.997423 7.992637 7.996278 8.006671 7.997989 7.997439 7.993536 8.000334
#> [753] 7.997850 8.004511 8.001089 8.000544 7.995928 7.999231 8.000098 8.006629
#> [761] 8.002121 8.007308 8.005817 8.001170 7.994373 8.000899 8.003127 8.004416
#> [769] 7.994818 8.006052 8.001788 8.002023 8.000535 7.999967 7.993877 7.999782
#> [777] 8.000256 8.001998 8.004530 7.999223 7.992855 8.001456 8.002113 8.001354
#> [785] 7.999032 8.003057 7.996276 8.004822 7.998606 8.001770 7.998497 7.998044
#> [793] 7.999154 7.997282 7.999231 8.008736 7.999926 7.998852 7.997421 8.004628
#> [801] 7.999885 7.998100 7.999401 8.000069 8.004708 8.005711 7.997410 8.003802
#> [809] 7.997304 8.002205 7.987116 8.010234 8.001602 8.003602 7.995958 8.005756
#> [817] 7.991758 7.996843 8.003196 8.003607 7.998393 8.004438 8.002108 8.000802
#> [825] 7.995868 8.002883 8.001276 7.998054 8.000580 7.999039 8.001628 8.001807
#> [833] 7.995678 8.003286 8.008315 8.003468 7.989349 7.999718 7.996887 7.998978
#> [841] 8.002041 7.984988 7.994769 8.011143 7.993913 7.998172 8.002448 8.004698
#> [849] 8.000538 8.003942 7.990700 8.000784 7.998765 8.002061 8.002792 7.997581
#> [857] 8.007502 8.006741 7.991396 7.996231 8.004555 8.010579 7.998787 7.994651
#> [865] 7.999804 8.000790 8.000797 7.999258 8.006050 8.001132 7.999770 8.003834
#> [873] 8.000026 8.006769 7.997277 7.999573 7.998330 7.993863 7.996323 7.995160
#> [881] 8.004870 7.995001 8.005242 8.000583 8.002539 7.999040 8.002472 8.004648
#> [889] 8.003831 7.996224 8.002614 7.997678 8.001527 8.008071 8.003216 7.998664
#> [897] 7.998510 7.997161 7.992343 7.997547 8.000843 7.999923 7.998131 8.000496
#> [905] 7.999225 8.006539 7.998148 7.994758 8.005142 8.000250 8.000364 7.995563
#> [913] 8.005904 8.001970 8.004786 7.999882 8.004474 7.992855 7.995388 8.002836
#> [921] 7.995282 7.996902 7.999073 7.999165 7.996337 7.995192 7.993515 7.998223
#> [929] 8.006490 8.002603 7.995310 8.004004 8.001495 8.001074 7.994019 8.001612
#> [937] 7.997956 8.003876 7.993290 8.001043 7.999398 7.997879 8.006828 8.001572
#> [945] 8.006243 8.000783 8.001805 7.992247 7.996892 7.998589 8.003084 8.001178
#> [953] 8.005162 7.997835 8.003680 8.004798 8.000865 8.000598 7.998172 8.008019
#> [961] 8.002210 7.992428 7.993999 8.001785 7.998352 8.004404 8.001358 8.000511
#> [969] 8.001908 8.002787 8.001974 8.001293 8.000670 7.997405 8.002455 8.005334
#> [977] 7.998905 8.004823 8.009505 7.999660 7.998636 8.002939 8.002640 8.001892
#> [985] 8.006981 7.995572 7.999681 8.005191 7.998535 7.997529 7.997061 7.999304
#> [993] 8.005253 7.998253 7.996060 7.997975 8.001365 7.999579 7.993138 7.996120
Sampling Distribution
Simulating Unicorns
Central Limit Theorem
Common Sampling Distributions
Sampling Distributions for Regression Models
The Central Limit Theorem (CLT) is a fundamental concept in probability and statistics. It states that the distribution of the sum (or average) of a large number of independent, identically distributed (i.i.d.) random variables will be approximately normal, regardless of the underlying distribution of those individual variables.
Simulating 500 samples of size 10 from a normal distribution with mean 5 and standard deviation of 2.
Simulating 500 samples of size 30 from a normal distribution with mean 5 and standard deviation of 2.
Simulating 500 samples of size 50 from a normal distribution with mean 5 and standard deviation of 2.
Simulating 500 samples of size 100 from a normal distribution with mean 5 and standard deviation of 2.
Sampling Distribution
Simulating Unicorns
Central Limit Theorem
Common Sampling Distributions
Sampling Distributions for Regression Models
When the data is said to have a normal distribution (DGP), there are special properties with both the mean and standard deviation, regardless of sample size.
Mean \[ \bar X = \sum ^n_{i=1} X_i \]
Standard Deviation \[ s^2 = \frac{1}{n}\sum ^n_{i=1} (X_i - \bar X)^2 \]
A data sample of size \(n\) is generated from: \[ X_i \sim N(\mu, \sigma) \]
\[ \bar X \sim N(\mu, \sigma/\sqrt{n}) \]
\[ Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \sim N(0,1) \]
A data sample of size \(n\) is generated from: \[ X_i \sim N(\mu, \sigma) \]
\[ (n-1)s^2/\sigma^2 \sim \chi^2(n-1) \]
\[ Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \rightarrow \frac{\bar X - \mu}{s/\sqrt{n}} \sim t(n-1) \]
Sampling Distribution
Simulating Unicorns
Central Limit Theorem
Common Sampling Distributions
Sampling Distributions for Regression Models
The estimates of regression coefficients (slopes) have a distribution!
Based on our outcome, we will have 2 different distributions to work with: Normal or t.
\[ \frac{\hat\beta_j-\beta_j}{\mathrm{se}(\hat\beta_j)} \sim t_{n-p^\prime} \]
\[ \frac{\hat\beta_j}{\mathrm{se}(\hat\beta_j)} \sim t_{n-p^\prime} \]
\[ \frac{\hat\beta_j - \beta_j}{\mathrm{se}(\hat\beta_j)} \sim N(0,1) \]
\[ \frac{\hat\beta_j}{\mathrm{se}(\hat\beta_j)} \sim N(0,1) \]
m201.inqs.info/lectures/9