Probability models

Binomial (& Bernoulli)

Let discrete RV $X$ be the number of successes in $n$ trials where:

the trials are independent;
each trial has an equal probability $p$ of success

Then $X$ is Binomial with parameters $n$ and $p$:
\[X \sim Bin(n,p)\] with pdf \[f(x) = P(X=x) = \left(\begin{array}{c} n \\ x \end{array} \right) p^x (1-p)^{n-x} \;\; \text{ for } x \in \{0,1,2,...,n\}\]

Properties:

\[\begin{split}E(X) & = np \\ Var(X) & = np(1-p) \\ \end{split}\]

NOTE:

In the special case in which $n=1$, ie. we only observe 1 trial, then \[X \sim Bern(p)\] That is, the $Bern(p)$ and $Bin(1,p)$ models are equivalent.

Consider some plots of Binomial RVs for different parameters $n$ and $p$:

<img src="354_manual_spring_20_files/figure-html/unnamed-chunk-229-1.png" width="576" style="display: block; margin: auto;" />

In R: Suppose $X \sim Bin(n,p)$…

# Calculate f(x)
dbinom(x, size=n, prob=p)

# Calculate P(X <= x)    
pbinom(x, size=n, prob=p)

# Plot f(x) for x from 0 to n
library(ggplot2)
x <- c(0:n)
binom_data <- data.frame(x=x, fx=dbinom(x, size=n, prob=p))
ggplot(binom_data, aes(x=x, y=fx)) + 
    geom_point()

Uniform (discrete)

Suppose discrete RV $X$ is equally likely to be any value in the discrete set $S = \{s_1,s_2,...,s_n\}$. Then $X$ has a discrete uniform distribution on $S$ with pdf

\[f(x) = P(X=x) = \frac{1}{n} \;\; \text{ for } x \in \{s_1,s_2,...,s_n\}\]

Geometric

Let discrete RV $X$ be the number of trials until the 1st success where:

the trials are independent;
each trial has an equal probability $p$ of success

Then $X$ is Geometric with parameter $p \in (0,1]$:

\[X \sim Geo(p)\]

with pdf \[f(x) = P(X=x) = (1-p)^{x-1}p \;\; \text{ for } x \in \{1,2,...\}\]

Properties:

\[\begin{split} E(X) & = \frac{1}{p} \\ \text{Mode}(X) & = 1 \\ Var(X) & = \frac{1-p}{p^2} \\ \end{split}\]

Notes:

$f(x) = (\text{probability of $x-1$ failures}) * (\text{probability of 1 success})$
There are other parameterizations of the geometric distribution. They are used in different ways but produce the same “answer”.

Consider some plots of Geometric RVs for different parameters $p$:

In R: Suppose $X \sim Geo(p)$…

# Calculate f(x)
dgeom(x-1, prob = p)

# Calculate P(X <= x)
pgeom(x-1, prob = p)

# Plot f(x) for x from 1 to m (you pick m)
library(ggplot2)
x <- c(1:m)
geo_data <- data.frame(x = x, fx = dgeom(x-1, prob = p))
ggplot(geo_data, aes(x = x, y = fx)) +
    geom_point()

Poisson

Let discrete RV $X \in \{0,1,2,...\}$ be the number of events in a given time period. The outcome of $X$ depends upon parameter $\lambda>0$, the rate at which the events occur (ie. the average number of events per time interval). In this setting, we can often model $X$ by a Poisson distribution: \[X \sim Pois(\lambda)\] with pdf \[f(x) = P(X=x) = \frac{\lambda^x e^{-\lambda}}{x!} \;\; \text{ for } x \in \{0,1,2,...\}\]

Properties: \[\begin{split}E(X) & = \lambda \\ Var(X) & = \lambda \\ \end{split}\]

Consider some plots of Poisson RVs for different parameters $\lambda$:

In R: Suppose $X \sim Pois(a)$…

# Calculate f(x)
dpois(x, lambda=a)

# Calculate P(X <= x)    
ppois(x, lambda=a)

# Plot f(x) for x from 0 to m (you choose m)
library(ggplot2)
x <- c(0:m)
pois_data <- data.frame(x=x, fx=dpois(x, lambda=a))
ggplot(pois_data, aes(x=x, y=fx)) + 
    geom_point()

Uniform (continuous)

If continuous RV $X$ is uniformly distributed across the interval $[a,b]$, then

\[X \sim Unif(a,b)\]

with pdf

\[f(x) = \frac{1}{b-a} \;\; \text{ for } x \in [a,b]\]

Properties: \[\begin{split}E(X) & = \frac{a+b}{2} \\ Var(X) & = \frac{(b-a)^2}{12} \\ \end{split}\]

Consider the Uniform distribution for different parameters $a$ and $b$:

In R: Suppose $X \sim Unif(a,b)$…

# Calculate f(x)
dunif(x, min = a, max = b)

# Calculate P(X <= x)
punif(x, min = a, max = b)

# Plot f(x) for x from a-1 to b+1 (you pick the range)
ggplot(NULL, aes(x = c(a-1, b+1))) +
    stat_function(fun = dunif, args = list(min = a, max = b))

Beta

The Beta distribution can be used to model a continuous RV $X$ that’s restricted to values in the interval [0,1]. It’s defined by two shape parameters, $\alpha$ and $\beta$:

\[X \sim Beta(\alpha,\beta)\]

with pdf

\[f(x) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} x^{\alpha-1} (1-x)^{\beta-1} \;\; \text{ for } x \in [0,1]\]

NOTE: For positive integers $n$, the gamma function simplifies to $\Gamma(n) = (n-1)!$. Otherwise, $\Gamma(z) = \int_0^\infty x^{z-1}e^{-x}dx$.

Properties: \[\begin{split} E(X) & = \frac{\alpha}{\alpha + \beta} \\ Mode(X) & = \frac{\alpha-1}{\alpha+\beta-2} \\ Var(X) & = \frac{\alpha \beta}{(\alpha + \beta)^2(\alpha+\beta+1)}\\ \end{split}\]

Consider the Beta distribution for different parameters $\alpha$ and $\beta$:

In R: Suppose $X \sim Beta(a,b)$…

# Calculate f(x)
dbeta(x, shape1 = a, shape2 = b)

# Calculate P(X <= x)
pbeta(x, shape1 = a, shape2 = b)

# Plot f(x) for x from 0 to m (you pick the range)
ggplot(NULL, aes(x = c(0, m))) +
    stat_function(fun = dbeta, args = list(shape1 = a, shape2 = b))

Exponential

Let $X$ be the waiting time until a given event occurs. For example, $X$ might be the time until a lightbulb burns out or the time until a bus arrives. The outcome of $X$ depends upon parameter $\lambda>0$, the rate at which events occur (ie. the number of events per time period). In this scenario, $X$ can often be modeled by an Exponential distribution: \[X \sim \text{Exp}(\lambda)\]

with pdf

\[f(x) = \lambda e^{-\lambda x} \;\; \text{ for } x \in [0,\infty)\]

Properties: \[\begin{split} E(X) & = \frac{1}{\lambda} \\ Var(X) & = \frac{1}{\lambda^2} \\ \end{split}\]

Consider the Exponential distribution for different parameters $\lambda$:

In R: Suppose $X \sim Exp(a)$…

# Calculate f(x)
dexp(x, rate=a)

# Calculate P(X <= x)
pexp(x, rate=a)

# Plot of f(x) for x from 0 to m (you pick the range)
ggplot(NULL, aes(x = c(0, m))) +
    stat_function(fun = dbeta, args = list(shape1 = a, shape2 = b))

Gamma

The Gamma distribution is often appropriate for modeling positive RVs $X$ ($X>0$) with right skew. For example, let $X$ be the waiting time until a given event occurs $s$ times. The outcome of $X$ depends upon both the shape parameter $s>0$ (the number of events we’re waiting for) and the rate parameter $r>0$ (the rate at which the events occur). In this setting, $X$ can often be modeled by a Gamma distribution: \[X \sim \text{Gamma}(s,r)\]

with pdf \[f(x) = \frac{r^s}{\Gamma(s)}*x^{s-1}e^{-r x} \hspace{.2in} \text{ for } x \in [0,\infty)\] NOTE: For positive integers $s \in \{1,2,...\}$, the gamma function simplifies to $\Gamma(s) = (s-1)!$. Otherwise, $\Gamma(s) = \int_0^\infty x^{s-1}e^{-x}dx$.

Properties: \[\begin{split} E(X) & = \frac{s}{r} \\ Mode(X) & = \frac{s-1}{r} \;\; \text{ (if $s>1$)} \\ Var(X) & = \frac{s}{r^2} \\ \end{split}\]

Consider the Gamma distribution for different parameters $(s,r)$:

In R: Suppose $X \sim Gamma(s,r)$…

# Calculate f(x)
dgamma(x, shape = s, rate = r)

# Calculate P(X <= x)
pgamma(x, shape = s, rate = r)

# Plot of f(x) for x from 0 to m (you pick the range)
ggplot(NULL, aes(x = c(0, m))) +
    stat_function(fun = dgamma, args = list(shape = s, rate = f))

Normal

Let $X$ be an RV with a bell-shaped distribution centered at $\mu$ and with variance $\sigma^2$. Then $X$ is often well-approximated by a Normal distribution: \[X \sim \text{Normal}(\mu,\sigma^2)\]

with pdf \[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}*\exp\left\lbrace - \frac{1}{2}\left(\frac{x-\mu}{\sigma} \right)^2 \right\rbrace \hspace{.2in} \text{ for } x \in (-\infty,\infty)\]

Properties: \[\begin{split} E(X) & = \mu \\ Var(X) & = \sigma^2 \\ \end{split}\]

Consider the Normal distribution for different parameters $(\mu,\sigma^2)$:

In R: Suppose $X \sim N(m,s^2)$…

# Calculate f(x)
dnorm(x, mean = m, sd = s)

# Calculate P(X <= x)
pnorm(x, mean = m, sd = s)

# Plot f(x) for x from a to b (you pick the range)
ggplot(NULL, aes(x = c(a, b))) +
    stat_function(fun = dnorm, args = list(mean = m, sd = s))