10 Properties of RVs: Expected Value & Variance



READING:

For more on this topic, read B & H Chapter 4.1 & 4.6.



10.1 Discussion



OVERARCHING GOAL OF UNIT 2
Explore univariate probability models.



BIG QUESTION 1: What model is appropriate for any given random variable?





BIG QUESTION 2: What are the features of this model? How might data generated from this model behave in the long run?

  1. Measuring central tendency: Expected Value of \(X\)
    \(E(X)\) measures the trend in, or long-run average of, \(X\). It’s calculated as a weighted average of all possible values of \(X\) (ie. each possible \(x\) is weighted by its corresponding PMF \(p_X(x)\) or PDF \(f_X(x)\)) \[\begin{split} \text{discrete $X$: } & \;\;\; E(X) = \sum_{\text{all } x} x \; p_X(x) \\ \text{continuous $X$: } & \;\;\; E(X) = \int_{\text{all } x} x \; f_X(x)dx \\ \end{split}\]

    Similarly, let \(g(X)\) be some function of RV \(X\). Then \[\begin{split} \text{discrete $X$: } & \;\;\; E(g(X)) = \sum_{\text{all } x} g(x) \; p_X(x) \\ \text{continuous $X$: } & \;\;\; E(g(X)) = \int_{\text{all } x} g(x) \; f_X(x)dx \\ \end{split}\]

    NOTE: \(E(X)\) is reflected in the sample mean of a sample of data \((X_1,X_2,...,X_n)\), \(\overline{x} = \frac{1}{n}\sum_{i=1}^n x_i\).


  1. Measuring spread: Variance & Standard Deviation of \(X\)

    \(Var(X)\) measures the typical squared deviation from the trend, or long-run variance of, \(X\). \(SD(X)\) measures the typical deviation from the trend, or long-run standard deviation of, \(X\). \[\begin{split} Var(X) & = E((X - E(X))^2) = \begin{cases} \sum_{\text{all } x} (x - E(X))^2 \; p_X(x) & \;\; X \text{ discrete} \\ \int_{\text{all } x} (x - E(X))^2 \; f_X(x)dx & \;\; X \text{ continuous} \\ \end{cases} \\ Var(X) & = E(X^2) - \left[E(X)\right]^2 \;\;\;\;\; \text{(typically easier to calculate!)}\\ SD(X) & = \sqrt{Var(X)} \\ \end{split}\]

    NOTE: \(Var(X)\) is reflected in the sample variance of a sample of data \((X_1,X_2,...,X_n)\), \(\overline{x} = \frac{1}{n-1}\sum_{i=1}^n (x_i - \overline{x})^2\).








EXAMPLE 1

Let \(X\) be the measurement error of a weighing device (in pounds) used by a certain company.

\(x\) -1 0 1 2 Total
\(p_X(x)\) 0.4 0.3 0.2 0.1 1


  1. Calculate \(E(X)\), the typical measurement error.
  2. Calculate \(E(X^2)\), tye typical squared measurement error.
  3. Calculate \(Var(X)\), the variance in the weighing errors, two ways: \[\begin{split} Var(X) & = E((X - E(X))^2) = \sum_{\text{all } x} (x - E(X))^2 p_X(x) \\ Var(X) & = E(X^2) - \left[E(X)\right]^2 \\ \end{split}\]



EXAMPLE 2

Consider a generic discrete RV \(X\). Show that \[Var(X) = E((X - E(X))^2) = E(X^2) - \left[E(X)\right]^2\]

HINT: Note that \(E(X)\) is a constant.





10.2 Exercises

  1. The Uniform model
    Suppose random variable \(X\) is equally / uniformly distributed across the interval \([a, b]\). In notation, \[X \sim Unif(a, b)\] with pdf \[f_X(x) = \frac{1}{b-a} \;\; \text{ for } x \in [a,b]\] Thus the behavior of \(X\) depends upon parameters \(a\) and \(b\). Prove that \[\begin{split} E(X) & = \frac{1}{2}(a + b) \\ Var(X) & = E(X^2) - [E(X)]^2 = \frac{1}{12}(b - a)^2\\ \end{split}\] NOTE: The features of the following Uniforms will help confirm your work.

    Model \(E(X)\) \(Var(X)\) \(SD(X)\)
    Unif(-1,1) 0 0.333 0.577
    Unif(0,1) 0.50 0.083 0.289
    Unif(1.5,2) 1.75 0.021 0.144



  1. The Binomial model: gut check
    Let \(X \sim Bin(n,p)\). Thus the behavior of \(X\) depends upon parameters \(n\) and \(p\). Use your gut to answer the following questions.
    1. Let \(X\) be the number of Heads in 100 flips (\(X \sim Bin(100, 0.5)\)). What’s \(E(X)\), ie. how many Heads would you expect?
    2. Let \(X\) be the number of 1s in 24 dice rolls (\(X \sim Bin(24, 1/6)\)). What’s \(E(X)\), ie. how many 1s would you expect?
    3. Generalize it: Let \(X \sim Bin(n,p)\). What’s a general formula for \(E(X)\)? NOTE: this formula should depend upon \(n\) and \(p\).



  1. The Binomial model
    Let \(X \sim Bin(n,p)\) with PMF \[p_X(x) = \left(\begin{array}{c} n \\ x \end{array} \right) p^x (1-p)^{n-x} \; \text{ for } \in x \in \{0,1,...,,n\}\] Then \[\begin{split} E(X) & = \sum_{\text{all } x} x p_X(x) = np \\ Var(X) & = E\left[(X - E(X))^2\right] = E(X^2) - \left[E(X)\right]^2 = np(1-p) \\ \end{split}\]

    1. Calculate \(E(X)\) and \(Var(X)\) for each Binomial model below:

    2. For what value of \(p\) is \(Var(X)\) maximized? Why does this make intuitive sense?

    3. CHALLENGE: prove that \(E(X) = np\) and \(Var(X) = np(1-p)\). This is quite tricky and will be easier after we get some practice with other RVs.



  1. The Poisson model
    Let \(X \sim Pois(\lambda)\) with PMF \[p_X(x) = \frac{\lambda^x e^{-\lambda}}{x!} \; \text{ for } \in x \in \{0,1,2,...\}\] Then the Poisson expected value and variance are both equal to \(\lambda\): \[\begin{split} E(X) & = \sum_{\text{all } x} x p_X(x) = \lambda \\ Var(X) & = E\left[(X - E(X))^2\right] = E(X^2) - \left[E(X)\right]^2 = \lambda \\ \end{split}\]

    1. On average, calls come into a call center at a rate of 4 per minute. Let \(X\) number of calls in 1 minute. Thus \(X \sim Pois(4)\). Calculate \(E(X)\) and \(SD(X)\), the expected number of calls and standard deviation of calls per minute.
    2. Let \(Y\) be the number of calls in 30 seconds. Thus \(Y\) is Poisson with rate \(\lambda\). Identify \(\lambda\) and calculate \(E(Y)\) and \(SD(Y)\).
    3. Let \(Z\) be the number of calls in 2 minutes. Thus \(Z\) is Poisson with some rate \(\lambda\). Identify \(\lambda\) and calculate \(E(Z)\) and \(SD(Z)\).



  1. The Normal model
    Let \(X \sim N(\mu, \sigma^2)\). Then \(E(X) = \mu\) and \(Var(X) = \sigma^2\).