Random Variables and Expectation

Often we are interested in the magnitude of an outcome as well as its probability. E.g. in a gambling game amount you win or loss is as important as the probability each outcome.

(This is a section in the notes here.)

Definition [Random Variable] A random variable is a function $X : \Omega \rightarrow \mathbb R$ that gives a value for every outcome in the sample space.

For example, if we roll two dice then $\Omega = \{ (d_1,d_2): d_1,d_2 \in \{1,2,3,4,5,6 \}\}$ is our sample space and the sum of the two dice can be given by $X((d_1,d_2)) = d_1+d_2$.

Random variables (RVs) can be discrete, e.g. taking values $1,2,3,...$, or they can be continuous, e.g. taking any value in $\mathbb R$.

Typically we use capital letters to denote random variables $X,Y,Z$. Often we suppress the underlying sample space when writing probabilities. E.g. for our sum of two dice examples, we might write

where here by $\{ X =2 \}$ we really mean the event $\{ \omega \in \Omega : X(\omega) =2 \}$.

Discrete Probability Distributions. A key way to characterize a random variable is with its distribution.

Definition [Probability Mass Function]The probability distribution of a discrete random variable $X$ with values in the finite or countable set $\mathcal X \subseteq \mathbb R$ is given by

This is some times called the probability mass function (PMF). Notice it satisfies the properties

• (Positive) For all $x \in \mathcal X$,

• (Sums to one)

Another way to characterize the distribution of the a random variable is through its cumulative distribution function, which is simply gives the probability that the random variable is below a given value.

Definition [Cumulative Distribution Function] The cumulative distribution function (CDF) of a random variable $X$ is

$x \in \mathbb R$.

We can define more than one random variable on the same probability space. E.g. from our two dice throws, $X$ could be the value of first dice, $Y$ could be the value of the second dice, and $Z$ could be the sum of the two dice.

Definition [Joint Probability Distribution] Suppose there are two random variables defined on the same probability space, $X: \Omega \rightarrow \mathcal X \subseteq \mathbb R$ and $Y: \Omega \rightarrow \mathcal Y \subseteq \mathbb R$, then the joint probability mass function is given by

for $x \in \mathcal X$ and $y \in \mathcal Y$.

Definition [Independent Random Variables] We say that $X$ and $Y$ are independent random variables if

for all $x \in \mathcal X$, $y \in \mathcal Y$ (I.e. $p(x,y) = p_X(x) p_Y(y)$).

We can extend the about definition to any number of random variables. E.g. a set of random variable $X_1,X_2,X_3,...,X_n$ are independent if

A common situtation that we are interested in is where $X_1,...,X_n$ are independent identically distributed random variables or IIDRVs, for short. Here in addition to being independent, the random variables each have the same CDF. Here we think of IIDRVs repeating the same random experiment $n$ times and recording the answer to each.

Expectation and Variance

Definition. The expectation of a discrete random variable $X$ is

The expectation gives an average value of the random variable. We could think of placing one unit of mass along the number line, where at point $x$ we place a weight of $\mathbb P(X=x)$. The expectation, $\mathbb E[X]$, is then the point of the number line that balances the weights on the left with the right.

Example. Calculate the expectation for the following random variable

Properties of the expectation

Here are various properties of the expectation. Proofs are included and are good to know but are not essential reading for exams. (For the most part the following lemmas are really properties of summation.)

Lemma 1. For a function $g: \mathcal X \rightarrow \mathbb R$,

Lemma 2. For constants $a$ and $b$

Proof. Applying Lemma 1.

Lemma 3. For two random variables $X$ and $Y$

Lemma 4. For two independent random variables $X$ and $Y$

Warning! In general, we cannot multiply expectations in this way. We need independence to hold. For instance if $X=Y =1$ with probability $1/2$ and $X=Y=0$ otherwise a quick check shows that $1/2 = \mathbb E [ X Y] \neq \mathbb E[X] \mathbb E[Y] = 1/4$.

Example. Let $Y$ the number of heads from $100$ coin tosses then $\mathbb E [Y] = 50$.

Answer. The exact distribution of $Y$ is not so straight-forward to explicitly calculate. However, we can use the above rules.

Note that $Y = \sum_{i=1}^{100} X_i$ where

It is easy to see that $\mathbb E[X_i ] = 1 \mathbb P(X_i =1 ) + 0 \mathbb P(X_i = 0) = 1/2$. Thus by Lemma 3

Variance

While the expectation gives an average value for a random variable. The variance determines how spread out a probability distribution is.

Definition [Variance / Standard Deviation] The variance of a random variable $X$ is defined to be

Further the standard deviation is the square-root of the variance. That is

It is common to use $\mu$ to denote the expectation of a random variable and for $\sigma$ to denote the variance of a random variable. So

Here we use $(X-\mu)^2$ to determine a square distance about the mean. We then take the square root to give the rescale the distance. This is very similar to the way we think of (Euclidean) distances for vectors, E.g. for $x = (x_1,x_2)$, $| x| = \sqrt{x_1^2+x_2^2}$.

Lemma 5.

So the variance is the mean square minus the square of the mean.

Proof.

$\square$

Lemma 6. For $a,b \in \mathbb R,$

Lemma 7. If $X$ and $Y$ are independent random variables then

Notice if we have a sequence of independent identically distributed random variables (IIDRVs) $X_1,....,X_n$ where $\mathbb V(X_1) = \sigma^2$ then we can see from the above lemma that

So we see that as we add up more and more IIDRVs the distance from the mean is about $\sqrt{n}$. This is an important observation that will be refined later on where we discuss the normal distribution.