Sampling Distribution
Simulating Unicorns
Central Limit Theorem
Common Sampling Distributions
Sampling Distributions for Regression Models
Scientific Notation
Sampling Distribution is the idea that the statistics that you generate (slopes and intercepts) have their own data generation process.
In other words, the numerical values you obtain from the lm and glm function can be different if we got a different data set.
Some values will be more common than others. Because of this, they have their own data generating process, like the outcome of interest has it’s own data generating process.
Distribution of a statistic over repeated samples
Different Samples yield different statistics
The Standard Error (SE) is the standard deviation of a statistic itself.
SE tells us how much a statistic varies from sample to sample. Smaller SE = more precision.
\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]
\[ \varepsilon_i \sim DGP \]
The randomness effect is a sampling phenomenom where you will get different samples every time you sample a population.
Getting different samples means you will get different statistics.
These statistics will have a distribution on their own.
Sampling Distribution
Simulating Unicorns
Central Limit Theorem
Common Sampling Distributions
Sampling Distributions for Regression Models
Scientific Notation
To better understand the variation in statistics, let’s simulate a data set of unicorn characteristics to visualize and understand the variation.
We will simulate a data set using the unicorns function and only we need to specify how many unicorns you want to simulate.
#> Unicorn_ID Age Gender Color Type_of_Unicorn Type_of_Horn Horn_Length
#> 1 1 5 Non-binary Silver Jewel Opal 4.951910
#> 2 2 5 Non-binary Black Ruvas Aquamarine 4.872996
#> 3 3 10 Genderfluid Brown Rainbow Opal 4.742437
#> 4 4 8 Genderfluid Gray Jewel Aquamarine 4.562934
#> 5 5 13 Male Gold Rainbow Opal 5.060623
#> 6 6 4 Non-binary White Ruvas Opal 5.220545
#> 7 7 4 Female Gold Ember Aquamarine 4.922968
#> 8 8 18 Agender Gold Jewel Opal 5.143903
#> 9 9 3 Genderfluid Gold Ember Opal 5.348174
#> 10 10 11 Agender Gold Ember Aquamarine 4.944412
#> Horn_Strength Weight Health_Score Personality_Score Magical_Score
#> 1 29.39855 101.20251 4 0.43747137 10865.68
#> 2 29.82135 126.83994 9 0.45074685 10833.80
#> 3 31.18717 120.63401 5 1.88302942 10972.52
#> 4 30.93950 118.56818 9 0.15418617 10943.94
#> 5 30.31468 145.43005 10 0.49375757 11062.99
#> 6 27.72867 93.98067 5 0.21029380 10839.68
#> 7 29.80613 135.90553 2 1.91010886 10792.55
#> 8 32.12727 151.10988 10 1.51589676 11236.40
#> 9 29.70228 116.02724 10 3.53243783 10798.15
#> 10 24.50514 122.09621 3 0.07098974 11003.79
#> Elusiveness_Score Gentleness_Score Nature_Score
#> 1 35.30540 68.9829417 930.1933
#> 2 33.52608 16.0238988 926.9225
#> 3 31.10193 36.4801673 944.1614
#> 4 36.13097 18.9782923 940.2227
#> 5 39.19512 -3.7024106 955.0634
#> 6 40.16632 7.3193552 926.6107
#> 7 32.46548 19.9564981 920.8988
#> 8 31.01421 34.8743531 976.9195
#> 9 40.37698 0.8154278 921.4826
#> 10 36.92682 26.0854586 946.9687
#> [1] "Unicorn_ID" "Age" "Gender"
#> [4] "Color" "Type_of_Unicorn" "Type_of_Horn"
#> [7] "Horn_Length" "Horn_Strength" "Weight"
#> [10] "Health_Score" "Personality_Score" "Magical_Score"
#> [13] "Elusiveness_Score" "Gentleness_Score" "Nature_Score"
We will only look at Magical_Score and Nature_Score.
\[ Magical = 3423 + 8 \times Nature + \varepsilon \]
\[ \varepsilon \sim N(0, 3.24) \]
#> Nature_Score Magical_Score
#> 1 956.6238 11075.50
#> 2 916.9463 10755.28
#> 3 938.1597 10930.60
#> 4 932.0630 10877.50
#> 5 951.1510 11032.36
#> 6 917.6937 10763.65
#> 7 947.4190 10999.03
#> 8 947.6509 11006.81
#> 9 970.1088 11183.30
#> 10 962.1259 11120.21
lmN: number of times to repeat a processCODE: what is to repeatedMODEL: a model that can be used to extract componentsINDEX: which component do you want to use
0: Intercept1: first slope2: second slope...#> [1] 7.993656 7.999690 8.001198 7.993221 8.004441 7.999825 8.001543 8.006482
#> [9] 8.006514 8.001987 8.005656 7.997864 8.002317 8.003379 7.998318 7.994921
#> [17] 7.990697 8.002850 7.997028 7.995882 7.997129 8.003754 8.007337 7.999664
#> [25] 7.999592 8.003175 8.005357 8.000002 7.997193 7.993761 8.002808 8.006323
#> [33] 8.005159 8.002546 7.999062 7.997466 8.004567 8.000997 8.007392 7.997930
#> [41] 8.001092 7.999066 8.002222 8.001094 8.002372 8.000599 7.999838 7.995123
#> [49] 8.000061 8.004573 8.000384 8.003016 7.998903 7.997579 8.004011 8.001953
#> [57] 8.005887 8.000591 7.995478 7.993457 7.997983 7.992394 7.996457 8.000833
#> [65] 7.998270 7.997194 7.994570 8.001122 7.999953 8.003748 8.004779 7.991768
#> [73] 8.004973 8.006890 8.000144 7.999265 8.000072 7.998224 7.993940 7.997529
#> [81] 8.002825 7.993976 8.003274 7.999534 8.007691 7.995399 8.004377 7.999901
#> [89] 8.000895 8.003512 8.005528 8.001967 8.000197 8.000630 8.002022 8.002010
#> [97] 8.001163 7.998601 7.996225 7.998667 7.993385 7.996722 8.006068 7.993321
#> [105] 7.997869 7.999490 7.998475 7.994163 7.993667 8.003766 8.005743 8.001028
#> [113] 8.003598 8.002787 7.999035 8.001892 7.998708 7.990676 7.997503 7.989309
#> [121] 7.995000 8.002351 7.996767 8.003173 8.002730 8.005466 8.002359 8.000457
#> [129] 7.999757 7.999714 7.999015 7.996821 7.999615 8.000697 8.007788 8.005649
#> [137] 8.006232 7.991873 8.000846 8.000052 7.993224 7.998269 7.997678 8.001057
#> [145] 7.999467 8.004883 8.002979 7.998309 8.004031 8.001765 7.999211 7.997191
#> [153] 7.997190 8.000057 7.996684 8.006073 8.000022 8.004828 8.007117 8.000906
#> [161] 8.003613 8.004025 8.007511 7.999572 8.008117 8.001852 7.999641 8.006247
#> [169] 7.997141 7.998012 7.991940 8.002577 8.009914 8.005415 8.001082 8.001695
#> [177] 8.002419 7.998647 8.004693 7.995217 8.000318 8.004454 7.995363 8.001330
#> [185] 7.996487 7.995085 8.002614 8.001953 8.005349 8.002102 7.994469 8.002642
#> [193] 7.997636 7.996689 7.999547 7.993334 7.996597 8.002239 7.999894 8.002126
#> [201] 7.997832 8.000456 7.994331 8.001353 7.999186 7.995129 7.992566 8.007869
#> [209] 8.003302 7.998648 8.007370 7.997596 8.000829 7.990216 8.004353 8.002254
#> [217] 7.997271 7.997385 7.993200 7.994223 8.000320 8.002742 7.996383 7.998740
#> [225] 7.999118 8.002637 7.996163 8.006223 8.000076 8.003450 7.997182 8.002282
#> [233] 7.998304 7.996264 8.001572 8.003495 7.998688 8.000087 7.998967 7.997929
#> [241] 7.996501 8.007279 7.995109 8.005124 7.994793 8.001051 7.995168 8.002491
#> [249] 8.000920 8.002628 8.003129 7.996830 8.003179 8.002890 7.996416 7.995109
#> [257] 7.996695 7.998979 8.006989 7.999497 8.001632 7.995931 8.002040 7.995809
#> [265] 7.999015 7.998938 7.996792 8.005672 7.996803 8.004194 7.993460 7.998614
#> [273] 8.005326 7.999983 7.998427 7.996915 7.993304 8.000353 7.999507 8.003016
#> [281] 7.994866 8.001112 7.997118 7.999881 7.995265 7.994515 8.000504 7.996676
#> [289] 8.004658 8.005490 7.996514 7.997943 8.006465 7.995604 7.997163 8.002765
#> [297] 7.991581 7.997130 8.001778 8.006046 8.001279 7.996982 8.003120 8.004321
#> [305] 7.995276 7.997094 8.001085 7.997306 8.004889 7.995561 7.994586 7.999778
#> [313] 7.997758 7.994777 8.000222 8.003563 8.011871 8.003802 7.995412 7.993600
#> [321] 8.005828 8.001594 7.997874 7.992859 8.004795 7.998239 7.993787 8.000150
#> [329] 7.998166 8.001027 8.009464 7.997939 8.000866 8.006277 8.001261 8.000836
#> [337] 8.000561 8.005793 7.999016 7.997973 8.003603 7.998520 8.001929 7.996807
#> [345] 8.000888 7.993513 8.002998 7.998766 8.007056 8.001985 7.999240 8.001638
#> [353] 7.999399 7.992633 7.996959 7.996585 8.003005 8.000914 8.000032 7.989436
#> [361] 8.002016 8.008695 7.997116 7.992545 7.999794 7.998784 8.004697 7.996443
#> [369] 8.006321 7.996042 7.996581 8.001330 8.001329 8.000877 8.006062 7.995357
#> [377] 8.000774 8.007953 8.002373 7.993216 8.001411 7.992596 7.993885 8.001095
#> [385] 7.998682 8.002511 8.000296 8.006829 8.005165 8.003278 8.000488 7.992142
#> [393] 7.989619 7.997095 7.995613 8.002372 8.000172 8.007166 7.998793 7.990125
#> [401] 7.997629 8.001087 7.996056 8.000106 7.995812 7.996864 7.995779 7.995610
#> [409] 7.994889 7.998727 7.998206 7.997257 8.002273 8.001277 8.005228 8.005837
#> [417] 8.002841 8.002972 7.998298 8.001369 8.000616 7.990777 8.004418 7.997433
#> [425] 8.007227 8.006599 7.993694 7.998908 7.998783 7.996144 8.001381 8.003276
#> [433] 8.006026 7.997998 8.007165 8.000583 7.997001 8.000389 8.002644 7.998811
#> [441] 8.000179 8.004989 7.995315 8.002251 8.005782 8.005203 7.998009 7.999689
#> [449] 8.004805 8.008117 8.004772 8.002114 7.999082 7.999469 7.991620 8.008216
#> [457] 7.999311 7.997780 8.003282 8.004237 7.997365 8.000974 7.995752 7.999819
#> [465] 7.998938 7.999174 7.998820 7.993689 7.999752 8.006252 7.995619 8.001164
#> [473] 8.001448 7.998733 7.994680 7.996605 8.003394 8.001011 7.996033 7.996615
#> [481] 8.001048 7.998055 8.005474 8.000651 7.999415 7.999283 7.998749 7.998797
#> [489] 8.001240 7.995873 7.998248 7.998585 8.003648 7.999958 7.999987 7.996024
#> [497] 8.003265 7.993844 7.993053 8.000676 7.989724 8.003678 7.999329 7.994561
#> [505] 8.003344 8.001379 7.999074 8.002234 8.008086 7.999987 7.997556 8.002967
#> [513] 7.999109 8.001560 8.002630 8.004656 7.999941 7.996773 8.000107 8.004588
#> [521] 8.005994 8.003032 7.998047 7.994429 8.000164 8.001867 8.007621 7.990777
#> [529] 7.997493 8.000303 8.009507 8.002621 8.003299 7.996629 8.001061 7.994639
#> [537] 8.003137 7.996108 7.997487 7.997164 8.001542 7.996276 7.998566 8.001741
#> [545] 7.996649 7.997363 7.992175 8.000706 8.009285 8.001895 7.996784 7.997696
#> [553] 7.999706 8.005308 7.996932 8.000104 7.998593 7.996936 7.997143 8.001444
#> [561] 8.001216 8.002172 8.001111 8.001480 7.997736 7.997234 8.002695 7.995080
#> [569] 7.991579 8.000782 8.001992 7.998090 8.001970 8.001644 8.005893 7.995938
#> [577] 7.996367 7.995374 8.000255 7.999639 7.992823 7.998008 8.001651 7.991974
#> [585] 7.994398 8.001471 7.997475 8.001882 8.004609 7.999540 8.003847 7.998844
#> [593] 7.998137 7.997708 7.994401 8.004038 8.000553 8.000947 7.997157 8.006518
#> [601] 8.005602 8.001288 8.006513 8.002771 7.990496 7.994121 7.997878 7.995691
#> [609] 7.997464 7.999842 7.998620 7.998730 8.011159 8.001067 7.999442 8.001591
#> [617] 8.001337 8.000414 7.997394 8.003377 8.010424 7.999034 8.001702 7.996595
#> [625] 8.004419 8.006238 7.998633 7.998476 7.997274 8.002216 7.999248 7.998132
#> [633] 7.998279 7.999876 8.000194 7.996906 7.998710 8.000418 8.006416 8.004206
#> [641] 7.995972 8.000041 8.001186 8.001223 8.001055 7.999361 7.998779 7.998905
#> [649] 7.999730 7.998823 8.003683 7.997253 7.993687 8.002840 7.995973 8.003563
#> [657] 7.990765 8.001760 7.995621 7.996890 8.000066 8.003806 7.995195 8.000917
#> [665] 8.004240 8.000259 7.995743 8.000550 8.001266 8.002948 8.005902 7.999230
#> [673] 7.996463 7.999423 7.999244 8.005886 7.993702 8.006581 7.997174 8.007371
#> [681] 7.996579 7.998645 7.991134 7.993845 7.995641 7.999017 8.001209 8.004217
#> [689] 8.003326 8.003319 7.996755 8.000083 8.007736 7.997238 7.996216 7.999672
#> [697] 7.999790 7.997571 8.000419 7.999898 8.004140 7.996078 8.001302 8.001826
#> [705] 8.004581 8.003796 8.002392 8.006140 7.999968 7.995746 7.998474 7.996009
#> [713] 8.001163 7.996848 7.999185 7.995735 8.002709 7.999834 7.999113 8.003107
#> [721] 7.999107 8.003067 7.999949 7.991588 8.006225 8.003228 7.998812 8.001137
#> [729] 8.005041 7.999273 7.999878 8.003063 7.994737 8.001007 8.001926 8.006076
#> [737] 7.995679 8.004245 7.998943 7.996628 7.996980 8.000790 7.998217 7.991912
#> [745] 8.003771 8.001768 8.003839 8.002060 8.001651 8.009515 8.001980 8.000811
#> [753] 7.998532 8.002432 8.005563 7.994124 8.004772 7.997444 7.997548 8.002125
#> [761] 7.994879 8.000146 8.006171 7.997810 7.996581 8.004562 7.995223 8.003483
#> [769] 8.001398 7.989631 8.007811 8.000729 7.998694 7.994836 7.996238 7.996553
#> [777] 8.004146 8.006999 8.002633 7.998827 7.993554 7.993712 8.000532 8.002011
#> [785] 7.996649 8.001729 7.996563 7.993744 7.996148 7.997351 8.001222 7.995523
#> [793] 8.000790 7.994888 7.992621 8.002905 7.995877 7.990865 7.999730 8.009705
#> [801] 7.998444 8.002551 7.999142 7.993166 8.001941 7.992097 8.000774 8.000909
#> [809] 7.995149 7.998417 7.995671 8.007671 8.006773 8.000359 8.004616 7.994395
#> [817] 7.999258 7.994453 7.999177 8.005401 8.004785 8.004088 7.996173 8.001438
#> [825] 7.999278 8.001063 7.997859 8.000294 7.993605 7.993606 8.000955 8.003085
#> [833] 7.997039 7.997328 8.002280 7.997463 8.002422 8.005707 7.997955 8.003229
#> [841] 7.998555 7.999101 8.000033 8.001891 8.003937 7.993499 7.995179 7.998195
#> [849] 7.998353 7.994297 8.001726 8.001316 8.002282 7.993639 7.997171 7.997351
#> [857] 7.999910 7.999163 8.001983 8.004974 8.004379 8.003645 8.003654 7.999988
#> [865] 7.996824 8.001547 8.000548 8.000178 8.002110 8.002439 7.997762 7.999460
#> [873] 7.997102 7.998224 7.996107 8.002130 8.005693 7.993522 7.995625 8.006664
#> [881] 8.002300 8.001126 7.995530 8.006664 8.004149 8.002678 8.000967 8.003790
#> [889] 8.001882 8.003173 7.998561 8.004332 7.998460 8.004160 7.995857 8.004241
#> [897] 7.997648 8.006215 8.000060 8.003297 7.989020 8.000340 8.002147 8.003592
#> [905] 8.003411 8.000942 7.999571 7.998270 8.004105 7.993873 7.995837 8.002504
#> [913] 8.003065 8.000666 7.994605 8.002010 8.000512 7.997459 7.998314 8.001058
#> [921] 8.008064 8.001249 7.992442 7.998330 8.001584 8.000801 7.996156 8.000920
#> [929] 8.005537 7.997296 7.998391 7.999807 7.996090 7.997975 7.998623 8.009616
#> [937] 8.003399 8.000941 7.997280 8.005439 7.997933 8.005383 8.004816 7.997955
#> [945] 7.998318 7.995374 7.998718 7.996529 7.995161 8.003814 8.002190 7.996802
#> [953] 8.000886 8.004891 8.000324 8.001367 7.999951 8.010050 8.003547 7.999592
#> [961] 8.003298 8.005285 8.006691 7.998901 8.000284 7.988529 8.003017 8.004817
#> [969] 8.008156 8.001708 7.994504 8.000221 7.999190 7.993903 7.994274 7.996829
#> [977] 7.996098 8.000400 8.000691 8.004224 8.006820 8.000166 7.999767 8.001706
#> [985] 8.000223 7.999157 7.999289 8.006795 8.007572 8.008376 8.001983 8.000214
#> [993] 7.999887 7.995714 8.003901 7.999882 7.999978 7.996359 8.001119 7.999299
Sampling Distribution
Simulating Unicorns
Central Limit Theorem
Common Sampling Distributions
Sampling Distributions for Regression Models
Scientific Notation
The Central Limit Theorem (CLT) is a fundamental concept in probability and statistics. It states that the distribution of the sum (or average) of a large number of independent, identically distributed (i.i.d.) random variables will be approximately normal, regardless of the underlying distribution of those individual variables.
Simulating 500 samples of size 10 from a normal distribution with mean 5 and standard deviation of 2.
Simulating 500 samples of size 30 from a normal distribution with mean 5 and standard deviation of 2.
Simulating 500 samples of size 50 from a normal distribution with mean 5 and standard deviation of 2.
Simulating 500 samples of size 100 from a normal distribution with mean 5 and standard deviation of 2.
Sampling Distribution
Simulating Unicorns
Central Limit Theorem
Common Sampling Distributions
Sampling Distributions for Regression Models
Scientific Notation
When the data is said to have a normal distribution (DGP), there are special properties with both the mean and standard deviation, regardless of sample size.
Mean \[ \bar X = \sum ^n_{i=1} X_i \]
Standard Deviation \[ s^2 = \frac{1}{n}\sum ^n_{i=1} (X_i - \bar X)^2 \]
A data sample of size \(n\) is generated from: \[ X_i \sim N(\mu, \sigma) \]
\[ \bar X \sim N(\mu, \sigma/\sqrt{n}) \]
\[ Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \sim N(0,1) \]
A data sample of size \(n\) is generated from: \[ X_i \sim N(\mu, \sigma) \]
\[ (n-1)s^2/\sigma^2 \sim \chi^2(n-1) \]
\[ Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \rightarrow \frac{\bar X - \mu}{s/\sqrt{n}} \sim t(n-1) \]
Sampling Distribution
Simulating Unicorns
Central Limit Theorem
Common Sampling Distributions
Sampling Distributions for Regression Models
Scientific Notation
The estimates of regression coefficients (slopes) have a distribution!
Based on our outcome, we will have 2 different distributions to work with: Normal or t.
\[ \frac{\hat\beta_j-\beta_j}{\mathrm{se}(\hat\beta_j)} \sim t_{n-p^\prime} \]
\[ \frac{\hat\beta_j}{\mathrm{se}(\hat\beta_j)} \sim t_{n-p^\prime} \]
\[ \frac{\hat\beta_j - \beta_j}{\mathrm{se}(\hat\beta_j)} \sim N(0,1) \]
\[ \frac{\hat\beta_j}{\mathrm{se}(\hat\beta_j)} \sim N(0,1) \]
Sampling Distribution
Simulating Unicorns
Central Limit Theorem
Common Sampling Distributions
Sampling Distributions for Regression Models
Scientific Notation
We often work with very large or very small numbers.
Problems with standard form:
Scientific notation makes numbers compact and standardized.
A number is in scientific notation if:
\[ a \times 10^n \]
where:
Write 45,000 in scientific notation.
Move decimal:
\[ 45000 \rightarrow 4.5 \]
Moved 4 places left:
\[ 4.5 \times 10^4 \]
Write 0.00072 in scientific notation.
Move decimal:
\[ 0.00072 \rightarrow 7.2 \]
Moved 4 places right:
\[ 7.2 \times 10^{-4} \]
Positive exponents → big numbers
Example:
\[ 2.1 \times 10^6 = 2{,}100{,}000 \]
Negative exponents → small numbers
Example:
\[ 4.3 \times 10^{-3} = 0.0043 \]
Rule:
\[ 6.2 \times 10^5 \]
Move decimal 5 places right:
\[ 620{,}000 \]
\[ 9.1 \times 10^{-4} \]
Move decimal 4 places left:
\[ 0.00091 \]
Step 1: Compare exponents
Step 2: If exponents match, compare coefficients \(a\)
Example:
Since \(10^5 > 10^4\), the first number is larger.
R often displays very large/small numbers using e notation.
\[ a \times 10^n \quad \text{is shown as} \quad a\text{e}n \]
Examples:
3e+06 means \(3 \times 10^6\)4.5e-04 means \(4.5 \times 10^{-4}\)