Distribution Functions

2025-02-11

Probability Theory

Probability Theory
Joint Probability
Distribution Functions
Normal Distribution
Empirical Rule
Binomial Distribution
Poisson Distribution
Probability in R

What is Probability?

Probability is the measure of how likely an event is to occur. It ranges from 0 to 1:

$P(A) = 0$: The event $A$ will definitely not happen.
$P(A) = 1$: The event $A$ will definitely happen.
Values between 0 and 1 represent varying degrees of likelihood.

Everyday Examples

What’s the probability it will rain tomorrow?
What are the chances of rolling a 6 on a standard die?
How likely is it that a randomly chosen student has a GPA above 3.0?

An action or process that generates outcomes.
- Example: Rolling a die.

The set of all possible outcomes of an experiment.
- Example: For rolling a die, $S = \{1, 2, 3, 4, 5, 6\}$.

A subset of the sample space, representing outcomes of interest.

Example: Rolling an even number ($A = \{2, 4, 6\}$).

The proportion of times an event is expected to occur if the experiment is repeated many times.

The Probability Formula

The probability of an event $A$ is defined as:

\[ P(A) = \frac{\text{Number of favorable outcomes}}{\text{Total number of possible outcomes}} \]

Example

If we roll a die, what is the probability of rolling a 4?

Favorable outcomes: 1 (rolling a 4).
Total outcomes: 6 (since $S = \{1, 2, 3, 4, 5, 6\}$).

\[ P(\text{rolling a 4}) = \frac{1}{6} \approx 0.167 \]

Rules of Probability

Rule 1
Rule 2
Rule 3

Probability of the Sample Space

The probability of the sample space is always 1: \[ P(S) = 1 \]

Probability of Impossible Events

The probability of an event that cannot happen is 0:

\[ P(\emptyset) = 0 \]

Complement Rule

The probability of the complement of an event $A$ (not $A$) is:

\[ P(A^c) = 1 - P(A) \]

Example: If $P(\text{rain}) = 0.3$, then $P(\text{no rain}) = 1 - 0.3 = 0.7$.

Applications

Drawing a Card
Tossing a Coin Twice
Traffic Lights

If you draw a card from a standard deck of 52 cards, what is the probability of drawing:

A heart?
\[ P(\text{heart}) = \frac{13}{52} = 0.25 \]
A red card (heart or diamond)?
\[ P(\text{red card}) = \frac{26}{52} = 0.5 \]

What is the probability of getting:

Exactly one head?
Sample space: $S = \{\text{HH, HT, TH, TT}\}$.
Event: $A = \{\text{HT, TH}\}$.
\[ P(A) = \frac{2}{4} = 0.5 \]
At least one head?
Event: $B = \{\text{HH, HT, TH}\}$.
\[ P(B) = \frac{3}{4} = 0.75 \]

A commuter encounters three traffic lights, each with a 70% chance of being green. Assuming independence, what is the probability that all three lights are green?

\[ P(\text{all green}) = 0.7 \cdot 0.7 \cdot 0.7 = 0.343 \]

1	2	3	4	5	6
1/6	1/6	1/6	1/6	1/6	1/6

Joint Probability

Probability Theory
Joint Probability
Distribution Functions
Normal Distribution
Empirical Rule
Binomial Distribution
Poisson Distribution
Probability in R

Joint Probability

Often, we’re interested in the probabilities of multiple events occurring.
This presentation focuses on calculating probabilities involving two events.
We’ll explore key concepts like:
- Joint Probability
- Union of Events
- Conditional Probability
- Independence

Joint Probability

The probability of both events A and B occurring.
Denoted as $P(A\ \mathrm{and}\ B)$ or $P(A \cap B)$.
Examples:
- Drawing a King and a Heart from a deck of cards.
- Flipping two coins and getting heads on both.

Union of Events

The probability of either event A or event B (or both) occurring.
Denoted as $P(A\ \mathrm{or}\ B)$ or $P(A \cup B)$.
Examples:
- Rolling a 3 or a 5 on a die.
- Drawing a red card or a face card.

Calculating Union Probability

General Case: $P(A \cup B) = P(A) + P(B) - P(A \cap B)$ (Inclusion-Exclusion Principle)
Mutually Exclusive Events: If A and B are mutually exclusive (cannot both occur), then $P(A \cap B) = 0$, and $P(A \cup B) = P(A) + P(B)$

Example: Union Probability

What is the probability of rolling a number greater than 4 or an even number on a six-sided die?
- Event A: Rolling a number greater than 4 (5 or 6).
- Event B: Rolling an even number (2, 4, or 6).

$P(A) = 2/6$
$P(B) = 3/6$
$P(A \cap B) = 1/6$ (rolling a 6)
$P(A \cup B) = (2/6) + (3/6) - (1/6) = 4/6 = 2/3$

Conditional Probability

The conditional probability of event A occurring given that event B has occurred: $P(A|B)$ (read as “the probability of A given B”).
Formula: $P(A|B) =\frac{P(A \cap B)}{P(B)}$
- $P(A|B)$: Conditional probability of A given B.
- $P(A \cap B)$: Probability of both A and B occurring.
- $P(B)$: Probability of B occurring.

Independence

Two events are independent if the occurrence of one does not affect the probability of the other.
If A and B are independent:
- $P(A|B) = P(A)$
- $P(B|A) = P(B)$
- $P(A \cap B) = P(A) P(B)$

Bayes’ Theorem

Reverses the conditioning: Relates P(A|B) to P(B|A).
$P(A|B) = \frac{P(B|A) * P(A)}{P(B)}$
Useful when we know P(B|A) but want P(A|B).

Distribution Functions

Probability Theory
Joint Probability
Distribution Functions
Normal Distribution
Empirical Rule
Binomial Distribution
Poisson Distribution
Probability in R

Random Variable

A random variable is a variable whose value is a numerical outcome of a random phenomenon. It’s a way to map the outcomes of a probabilistic event to numbers. This allows us to analyze these outcomes mathematically.

RV: Key Concepts

Random Phenomenon: This is any process or experiment whose outcome is uncertain.
Outcomes: The possible results of a random phenomenon are called outcomes.
Numerical Value: A random variable assigns a numerical value to each outcome.

Types of Random Variables

Discrete Random Variable
Continuous Random Variable

A discrete random variable can only take on a finite number of values or a countably infinite number of values. The values are often integers. Examples:

The number of heads in three coin flips (0, 1, 2, or 3).
The number of defective light bulbs in a box of 100.
The number of customers entering a store in an hour.

A continuous random variable can take on any value within a given range. Examples:

The height of a person.
The temperature of a room.
The time it takes to complete a task.

Distribution Functions

A distribution function describes the probabilities of a random variable across its possible values. It answers questions like:

How likely is a random variable to take on a specific value or fall within a range?
What is the overall “shape” of the distribution of outcomes?

Cumulative Distribution Function

The Cumulative Distribution Function (CDF) describes the probability that a random variable $X$ is less than or equal to a certain value $x$: \[ F_X(x) = P(X \leq x) \]

Properties of the CDF

Range: The CDF is always between 0 and 1: \[ 0 \leq F_X(x) \leq 1 \]
Non-Decreasing: The CDF never decreases as $x$ increases.
Limits:
- $\lim_{x \to -\infty} F_X(x) = 0$
- $\lim_{x \to \infty} F_X(x) = 1$

CDF Example

For a die roll (discrete case):

$F_X(2) = P(X \leq 2) = P(X = 1) + P(X = 2) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6}$.

For a normal distribution (continuous case):

Use the CDF to find probabilities, typically provided via tables or software.

Probability Density Function (PDF)

The Probability Density Function (PDF) is used for continuous random variables and describes the likelihood of the variable falling within a small interval.

\[ P(a \leq X \leq b) = \int_a^b f_X(x) \, dx \]

Important

\[ P(X = a) = P(a \leq X \leq a) = \int_a^a f_X(x) \, dx = 0 \]

Properties of the PDF:

Non-Negative: \[ f_X(x) \geq 0 \quad \forall x \]
Total Area Under Curve: \[ \int_{-\infty}^\infty f_X(x) \, dx = 1 \]

PDF Examples

The PDF of the normal distribution is: \[ f_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]

Here, $\mu$ is the mean and $\sigma$ is the standard deviation.

Probability Mass Function (PMF)

The Probability Mass Function (PMF) is used for discrete random variables and gives the probability of each possible value: \[ P(X = x) = p_X(x) \]

Properties of the PMF

Non-Negative: \[ p_X(x) \geq 0 \quad \forall x \]
Sum of Probabilities: \[ \sum_x p_X(x) = 1 \]

PMF Example

For a fair die, the PMF is: \[ p_X(x) = \frac{1}{6}, \quad x \in \{1, 2, 3, 4, 5, 6\} \]

Percentiles

A percentile is a measure indicating the value below which a given percentage of observations in a group of observations falls. It is used in statistics to provide information about the relative standing of a particular value within a dataset.

Percentile Example

Suppose we have a random variable X representing the height of adults in a population. If the 90th percentile of X is 180 cm, it means that the probability of a randomly selected adult being 180 cm or shorter is 90%.

Precentile

Code

data <- rnorm(500000, mean = 100, sd = 15) |> data.frame(x = _)  # Example: normally distributed data

# Calculate the 75th percentile
percentile_75 <- qnorm(0.75, 100, 15)

ggplot(data = data, aes(x = x)) +
  geom_density(fill = "skyblue", alpha = 0.5) +  # Density curve
  geom_vline(xintercept = percentile_75, color = "red", linetype = "dashed", size = 1) +
  labs(title = "Density Plot with 75th Percentile Highlighted", x = "Data Value", y = "Density") +
  theme_bw()

Applications

PMF of a Coin Toss
CDF of an Exp. Dist.
Apps. in Real Life

For a fair coin tossed once:

$X = 0$: Tails, $P(X = 0) = 0.5$
$X = 1$: Heads, $P(X = 1) = 0.5$

The PMF is: \[ p_X(x) = \begin{cases} 0.5 & x = 0 \text{ or } x = 1 \\ 0 & \text{otherwise} \end{cases} \]

An exponential random variable with rate $\lambda > 0$ has a CDF: \[ F_X(x) = \begin{cases} 1 - e^{-\lambda x}, & x \geq 0 \\ 0, & x < 0 \end{cases} \]

This can be used to model waiting times between events.

PMF:
- Number of emails received per hour.
PDF:
- Heights of students in a class.
CDF:
- The likelihood of completing a task within a certain time frame.

Common Contisuous RV

Continuous RV	Parameters	Notation	Support
Uniform	a (minimum), b (maximum)	Unif(a, b)	[a, b]
Normal (Gaussian)	μ (mean), σ (standard deviation)	N(μ, σ²)	(-∞, ∞)
Exponential	λ (rate)	Exp(λ)	[0, ∞)
Gamma	k (shape), θ (scale)	Γ(k, θ)	[0, ∞)
Beta	α (shape 1), β (shape 2)	Beta(α, β)	[0,1]
Chi-Squared	k (degrees of freedom)	χ²(k)	[0, ∞)
t-distribution	ν (degrees of freedom)	t(ν)	(-∞, ∞)
F-distribution	ν₁ (degrees of freedom 1), ν₂ (degrees of freedom 2)	F(ν₁, ν₂)	[0, ∞)
Weibull	k (shape), λ (scale)	Weibull(k, λ)	[0, ∞)
Lognormal	μ (location), σ (scale)	Lognormal(μ, σ)	[0, ∞)

Common Discrete RV

Discrete RV	Parameters	Notation	Support
Bernoulli	p (probability of success)	Bernoulli(p)	{0, 1}
Binomial	n (number of trials), p (probability of success)	Bin(n, p)	{0, 1, 2, …, n}
Geometric	p (probability of success)	Geo(p)	{1, 2, 3, …}
Poisson	λ (average rate of events)	Pois(λ)	{0, 1, 2, …}
Negative Binomial	r (number of successes), p (probability of success)	NB(r, p)	{r, r+1, r+2, …}
Discrete Uniform	n (number of possible outcomes)	DUnif(1, n)	{1, 2, …, n} or a set of n values

Probability and PD/MF

Continuous
Discrete

Code

mean <- 0  # Mean of the distribution
sd <- 1   # Standard deviation of the distribution

# Generate data points for the curve (more points = smoother curve)
x <- seq(mean - 3*sd, mean + 3*sd, length.out = 100)  # Range covering most of the distribution
y <- dnorm(x, mean = mean, sd = sd)  # Calculate the density at each x value

# Create the plot
ggplot(data = data.frame(x, y), aes(x = x, y = y)) +
  geom_line(color = "blue", size = 1) +  # Line for the distribution curve
  labs(x = "X",
       y = "Probability Density") +
  theme_bw() + # A clean theme
  scale_x_continuous(breaks = seq(mean - 3*sd, mean + 3*sd, by = sd)) # Ticks at standard deviations

Code

n <- 10   # Number of trials
p <- 0.3  # Probability of success on each trial

# Generate the data for the plot
x <- 0:n  # Possible values for the random variable (number of successes)
y <- dbinom(x, size = n, prob = p)  # Calculate the probabilities

# Create the plot using geom_col (for discrete data)
ggplot(data = data.frame(x, y), aes(x = factor(x), y = y)) +  # x as factor for discrete bars
  geom_col(fill = "skyblue", color = "black") +
  labs(title = paste("Binomial Distribution (n =", n, ", p =", p, ")"),
       x = "Number of Successes (x)",
       y = "Probability P(X = x)") +
  theme_bw()

Normal Distribution

Probability Theory
Joint Probability
Distribution Functions
Normal Distribution
Empirical Rule
Binomial Distribution
Poisson Distribution
Probability in R

Normal Distribution

The Normal Distribution is a probability distribution that is symmetric, with most of the data points clustering around the mean.

It’s bell-shaped and is defined mathematically by two parameters:
- Mean ($\mu$): The center or peak of the distribution.
- Standard Deviation ($\sigma$): Controls the spread of the distribution.

Important

For any normally distributed data, the highest probability density is at the mean, and as you move away from the mean, the probability density gradually decreases.

Properties

Symmetry: It is perfectly symmetric about the mean, meaning the left side is a mirror image of the right.
Unimodal: There is a single peak at the mean.
Mean, Median, and Mode are Equal: In a normal distribution, these three measures of central tendency are located at the same point.
68-95-99.7 Rule (Empirical Rule)

Normal Distribution

\[ X\sim N(\mu, \sigma) \]

\[ -\infty < X < \infty \]

\[ f_X(x) = \frac{1}{\sqrt{2\pi \sigma}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]

The Standard Normal Distribution

The Standard Normal Distribution is a special type of normal distribution with a mean of 0 and a standard deviation of 1. It’s often used as a reference to convert any normal distribution to a standard form.

Z-Scores

A Z-score (or standard score) tells us how many standard deviations an individual data point is from the mean. It’s calculated as:

\[ Z = \frac{X - \mu}{\sigma} \]

If $Z$ is positive, the data point is above the mean.
If $Z$ is negative, the data point is below the mean.
Using Z-scores, we can compare values across different normal distributions or find the probability associated with a particular score.

Why the Normal Distribution is Important

It Describes Many Natural Phenomena: Heights, weights, test scores, measurement errors, and countless other variables follow a normal distribution, especially when influenced by many small, random factors.
Predictive Power: With normally distributed data, we can make predictions and infer probabilities, thanks to the 68-95-99.7 rule.
Central Limit Theorem: The normal distribution is foundational to the Central Limit Theorem, which tells us that, regardless of the original data distribution, the sampling distribution of the sample mean will approach a normal distribution as sample size increases.
Ease of Use in Statistical Methods: Many statistical tests and methods assume normality, allowing for simplified calculations and reliable inferences.

Apps of the Normal Dist.

Standardized Testing
Finance

Scores on standardized tests, such as IQ tests or the SAT, are often designed to follow a normal distribution. By knowing a student’s score in terms of Z-scores, we can determine their percentile or how they compare to other test-takers.

In finance, stock returns and other economic factors are often modeled with a normal distribution to estimate risk, forecast trends, and make informed investment decisions.

Finding Probability

\[ X \sim N(\mu, \sigma) \]

\[ P(a \leq X \leq b) = P(a < X < b) = \int_a^b \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} dx \]

Empirical Rule

Probability Theory
Joint Probability
Distribution Functions
Normal Distribution
Empirical Rule
Binomial Distribution
Poisson Distribution
Probability in R

What is the Empirical Rule?

The Empirical Rule provides a way to understand the spread of data in a normal distribution by describing how data points cluster around the mean. According to this rule:

Approximately 68% of data points fall within one standard deviation of the mean.
Approximately 95% of data points fall within two standard deviations of the mean.
Approximately 99.7% of data points fall within three standard deviations of the mean.

The empirical rule is very helpful because, with just the mean and standard deviation, we can quickly estimate how data is distributed within a normal curve.

Empirical Rule to the Normal Dist.

Let’s define the key terms and apply the empirical rule to the normal distribution.

Mean (μ): This is the central point of the normal distribution where the data clusters around.
Standard Deviation (σ): This is a measure of how spread out the data points are from the mean.

Empirical Rule to the Normal Dist.

In a normal distribution:

68% of data lies between $(\mu - \sigma)$ and $(\mu + \sigma)$.
95% of data lies between $(\mu - 2\sigma)$ and $(\mu + 2\sigma)$.
99.7% of data lies between $(\mu - 3\sigma)$ and $(\mu + 3\sigma)$.

These intervals allow us to estimate probabilities for data within each range without needing to calculate exact probabilities.

Visualizing the Empirical Rule

To better understand the empirical rule, imagine a symmetric, bell-shaped normal curve. Here’s how it would look based on the empirical rule:

The 68% region represents the middle of the curve, starting one standard deviation left of the mean and ending one standard deviation right.
The 95% region stretches further out, covering almost the entire curve except for the outer tails.
The 99.7% region includes nearly all data points, covering the entire curve except for a tiny fraction at each extreme.

This visualization shows how the data is most concentrated around the mean, with less data appearing as we move further away.

Using the Empirical Rule for Probabilities

The empirical rule helps us answer questions like:

What percentage of data points fall within a certain range?
How unusual is a data point located far from the mean?

Examples

If we know that a data point lies more than two standard deviations away from the mean, we know it’s in the outer 5% of the distribution, making it relatively rare.
Using the rule, we can estimate that around 95% of values should lie within two standard deviations of the mean. If we observe data points outside of this range, we might consider them outliers.

More Examples

Exam Scores
Heights of Adults

Suppose exam scores are normally distributed with a mean of 70 and a standard deviation of 10.

68% of students scored between 60 and 80 (70 ± 10).
95% of students scored between 50 and 90 (70 ± 20).
99.7% of students scored between 40 and 100 (70 ± 30).

Assume that adult heights follow a normal distribution with a mean height of 170 cm and a standard deviation of 8 cm.

68% of adults have heights between 162 cm and 178 cm (170 ± 8).
95% of adults have heights between 154 cm and 186 cm (170 ± 16).
99.7% of adults have heights between 146 cm and 194 cm (170 ± 24).

Binomial Distribution

Probability Theory
Joint Probability
Distribution Functions
Normal Distribution
Empirical Rule
Binomial Distribution
Poisson Distribution
Probability in R

Binomial Distribution

The Binomial Distribution is a probability distribution that models the number of successes in a fixed number of trials, where each trial has:

Two possible outcomes: typically called “success” and “failure.”
A constant probability of success, $p$, on each trial.

Real-World Examples:

Tossing a coin $n$ times and counting how many heads you get.
Rolling a die $n$ times and counting how many times you roll a 6.
Administering a medical treatment to $n$ patients and recording how many recover.

The binomial distribution answers questions like:

“What is the probability of getting exactly 3 heads in 5 coin tosses?”
“What is the likelihood of at least 4 successes in 10 trials?”

Conditions for a Binomial Experiment

Fixed Number of Trials ($n$):
- The experiment consists of a set number of trials.
Two Possible Outcomes:
- Each trial results in either a success (e.g., heads) or a failure (e.g., tails).
Constant Probability of Success ($p$):
- The probability of success remains the same for each trial.
Independence:
- The outcome of one trial does not affect the outcomes of other trials.

The Binomial Probability Formula

\[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \]

Where:

$P(X = k)$: Probability of exactly $k$ successes.
$n$: Number of trials.
$k$: Number of successes.
$p$: Probability of success on a single trial.
$1-p$: Probability of failure.
$\binom{n}{k} = \frac{n!}{k!(n-k)!}$: Represents the number of ways to choose $k$ successes from $n$ trials

Example Calculation

Suppose you flip a fair coin 5 times ($n = 5, p = 0.5$) and want to know the probability of getting exactly 3 heads ($k = 3$).

\[ P(X = 3) = \binom{5}{3} (0.5)^3 (1-0.5)^{5-3} \]

Properties of the Binomial Dist.

Mean and Variance
Shape of the Distribution

For a binomial random variable $X$:

Mean (Expected Value): $\mu = n \cdot p$
Variance: $\sigma^2 = n \cdot p \cdot (1-p)$
Standard Deviation: $\sigma = \sqrt{n \cdot p \cdot (1-p)}$

If $p = 0.5$, the distribution is symmetric.
If $p > 0.5$, the distribution is skewed left.
If $p < 0.5$, the distribution is skewed right.

Applications

Quality Control
Sports
Clinical Trials

A factory produces lightbulbs, and 95% of them meet quality standards. If you randomly test 10 bulbs, what is the probability that exactly 8 bulbs pass the test?

Here:

$n = 10$, $p = 0.95$, $k = 8$.

\[ P(X = 8) = \binom{10}{8} (0.95)^8 (0.05)^2 \]

A basketball player has a 60% chance of making a free throw. If they take 5 shots, what is the probability of making at least 3 shots?

Here: - $n = 5$, $p = 0.6$.

\[ P(X \geq 3) = P(X = 3) + P(X = 4) + P(X = 5) \]

In a clinical trial, a new drug has a 70% success rate. If 15 patients are treated, what is the probability that exactly 10 respond positively to the treatment?

Poisson Distribution

Probability Theory
Joint Probability
Distribution Functions
Normal Distribution
Empirical Rule
Binomial Distribution
Poisson Distribution
Probability in R

Poisson Distribution

The Poisson Distribution is a discrete probability distribution that models the number of events occurring within a fixed interval. These events must happen independently and at a constant average rate.

Real-World Examples

The number of emails you receive in an hour.
The number of cars passing through a toll booth in 10 minutes.
The number of defects in a square meter of fabric.

The Poisson distribution helps us answer questions like:

“What is the probability of receiving 5 emails in the next hour?”
“How likely is it to have 2 defects in a single square meter?”

Conditions for the Poisson Dist.

Events Occur Randomly:
- The events are random and unpredictable.
Independence:
- The occurrence of one event does not affect the probability of another event occurring.
Constant Average Rate ($\lambda$):
- The average number of events ($\lambda$) over a fixed interval remains constant.
Non-Overlapping Intervals:
- Events in one interval do not influence events in another.

The Poisson Probability Formula

\[ P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!} \]

Where:

$P(X = k)$: Probability of $k$ events occurring.
$\lambda$: Average number of events in the interval.
$e \approx 2.718$
$k!$: Factorial of $k$, calculated as $k \times (k-1) \times \dots \times 1$.

Example Calculation

Suppose a call center receives an average of 10 calls per hour ($\lambda = 10$). What is the probability of receiving exactly 7 calls in the next hour ($ k = 7 $)?

\[ P(X = 7) = \frac{10^7 e^{-10}}{7!} \]

Poisson Distribution Properties

Mean: $\mu = \lambda$
Variance: $\sigma^2 = \lambda$
Standard Deviation: $\sigma = \sqrt{\lambda}$

Applications

Traffic Flow
Defective Products

A toll booth observes an average of 3 cars passing through every 5 minutes ($\lambda = 3$). What is the probability of seeing exactly 5 cars in the next 5 minutes?

Using the formula: \[ P(X = 5) = \frac{3^5 e^{-3}}{5!} = \frac{243 \cdot 0.0498}{120} \approx 0.1008 \]

A factory produces an average of 2 defective items per day ($\lambda = 2$). What is the probability of finding no defective items in a day ($k = 0$)?

\[ P(X = 0) = \frac{2^0 e^{-2}}{0!} = e^{-2} \approx 0.1353 \]

Probability in R

Probability Theory
Joint Probability
Distribution Functions
Normal Distribution
Empirical Rule
Binomial Distribution
Poisson Distribution
Probability in R

Distributions in R

?Distributions

Function Structure

Letter	Functionality
“d”	returns the height of the probability density function
“p”	returns the cummulative density function value
“q”	returns the inverse cummulative density function (percentiles)
“r”	returns a randomly generated number

Probabilities

R can compute the probabilities of a distribution given the correct parameters:

Cummulative probability: p is used in front of the distribution R function
Probability for a discrete distribution: d is used in front of the distribution R function
- Note: Continuous Distribution Functions will not yield a valid probability value.

Examples

Find $P(X \leq 5 )$ where $X \sim N(6,2)$.

Code

pnorm(5, mean = 6, sd = 2)

Find $P(X \geq 7 )$ where $X \sim N(6,2)$.

Code

1 - pnorm(7, mean = 6, sd = 2)

Find $P(X = 20 )$ where $X \sim Bin(30,0.8)$.

Code

dbinom(20, size = 30, prob = 0.8)

More Examples

Find $P(2 \leq X \leq 5 )$ where $X \sim N(6,2)$.

Code

pnorm(5, 6, 2) - pnorm(2, 6, 2)

Find $P(14 < X < 20)$ where $X \sim Pois(16)$.

Code

ppois(19, lambda = 16) - ppois(14, lambda = 16)

## OR

dpois(15, 16) + dpois(16, 16) + dpois(17, 16) + dpois(18, 16) + dpois(19, 16)

Find $P(23 < X)$ where $X \sim Pois(12)$.

Code

1 - ppois(23, 12)

Percentiles

Finding the values (percentiles) for any distributions can be found by using the q-based distribution R function such as qnorm(), qpois(), and qbinom() functions.

Examples

Finding the $95^{th}$ percentile from $N(0,1)$, we will use the qnorm().

Code

qnorm(.95, 0, 1)

Finding the $95^{th}$ percentile from a Poisson distribution with $\lambda = 9.5$.

Code

qpois(.95, 9.5)

Finding the $75^{th}$ percentile for $Bin(45,.4)$.

Code

qbinom(.75, 45, .4)

Random Number Generator

R is capable of generating random numbers. For example if we want to generate a random sample of size fifty from a normal distribution with mean eight and variance three, we will use the rnorm(). If we want to generate a random sample from any distribution, use the distribution function with r in front of it.

Examples

Let’s first generate the random sample of fifty from $X \sim N(8,3)$.

Code

rnorm(50, 8, sqrt(3))

Generate a random sample of 100 form an $X \sim Gamma (2,3)$.

Code

rgamma(100, 2,3)

Generate a random sample of 100 form an $X \sim Binom (25,.23)$.

Code

rbinom(n = 100, size  = 25, prob = 0.23)

Generate a random sample of 100 form an $X \sim Pois (34.4)$.

Code

rpois(n = 100, lambda = 34.4)

Distribution Functions

Probability Theory

What is Probability?

Everyday Examples

Key Terms and Definitions

The Probability Formula

Example

Rules of Probability

Applications

More Problems

Joint Probability

Joint Probability

Joint Probability

Union of Events

Calculating Union Probability

Example: Union Probability

Conditional Probability

Independence

Bayes’ Theorem

Distribution Functions

Random Variable

RV: Key Concepts

Types of Random Variables

Distribution Functions

Cumulative Distribution Function

Properties of the CDF

CDF Example

Probability Density Function (PDF)

Properties of the PDF:

PDF Examples

Probability Mass Function (PMF)

Properties of the PMF

PMF Example

Percentiles

Percentile Example

Precentile

Applications

Common Contisuous RV

Common Discrete RV

Probability and PD/MF

Normal Distribution

Normal Distribution

Properties

Normal Distribution

The Standard Normal Distribution

Z-Scores

Why the Normal Distribution is Important

Apps of the Normal Dist.

Finding Probability

Empirical Rule

What is the Empirical Rule?

Empirical Rule to the Normal Dist.

Empirical Rule to the Normal Dist.

Visualizing the Empirical Rule

Using the Empirical Rule for Probabilities

Examples

More Examples

Binomial Distribution

Binomial Distribution

Real-World Examples:

Conditions for a Binomial Experiment

The Binomial Probability Formula

Example Calculation

Properties of the Binomial Dist.

Applications

Poisson Distribution

Poisson Distribution

Real-World Examples

Conditions for the Poisson Dist.

The Poisson Probability Formula

Example Calculation

Poisson Distribution Properties

Applications

Probability in R

Distributions in R

Function Structure

Probabilities

Examples

More Examples

Percentiles