Random Variables and Expectation

Often we are interested in the magnitude of an outcome as well as its probability. E.g. in a gambling game amount you win or loss is as important as the probability each outcome.

(This is a section in the notes here.)

Definition [Random Variable] A random variable is a function X : \Omega \rightarrow \mathbb R that gives a value for every outcome in the sample space.

For example, if we roll two dice then \Omega = \{ (d_1,d_2): d_1,d_2 \in \{1,2,3,4,5,6 \}\} is our sample space and the sum of the two dice can be given by X((d_1,d_2)) = d_1+d_2.

Random variables (RVs) can be discrete, e.g. taking values 1,2,3,..., or they can be continuous, e.g. taking any value in \mathbb R.

Typically we use capital letters to denote random variables X,Y,Z. Often we suppress the underlying sample space when writing probabilities. E.g. for our sum of two dice examples, we might write

Screenshot 2021-11-18 at 15.26.04

where here by \{ X =2 \} we really mean the event \{ \omega \in \Omega : X(\omega) =2 \} .

Discrete Probability Distributions. A key way to characterize a random variable is with its distribution.

Definition [Probability Mass Function]The probability distribution of a discrete random variable X with values in the finite or countable set \mathcal X \subseteq \mathbb R is given by

Screenshot 2021-11-18 at 15.26.56

This is some times called the probability mass function (PMF). Notice it satisfies the properties

  • (Positive) For all x \in \mathcal X,

  • (Sums to one)

Another way to characterize the distribution of the a random variable is through its cumulative distribution function, which is simply gives the probability that the random variable is below a given value.

Definition [Cumulative Distribution Function] The cumulative distribution function (CDF) of a random variable X is

x \in \mathbb R.

We can define more than one random variable on the same probability space. E.g. from our two dice throws, X could be the value of first dice, Y could be the value of the second dice, and Z could be the sum of the two dice.

Definition [Joint Probability Distribution] Suppose there are two random variables defined on the same probability space, X: \Omega \rightarrow \mathcal X \subseteq \mathbb R and Y: \Omega \rightarrow \mathcal Y \subseteq \mathbb R, then the joint probability mass function is given by

Screenshot 2021-11-18 at 15.27.54

for x \in \mathcal X and y \in \mathcal Y.

Definition [Independent Random Variables] We say that X and Y are independent random variables if

Screenshot 2021-11-18 at 15.28.21

for all x \in \mathcal X, y \in \mathcal Y (I.e. p(x,y) = p_X(x) p_Y(y)).

We can extend the about definition to any number of random variables. E.g. a set of random variable X_1,X_2,X_3,...,X_n are independent if

Screenshot 2021-11-18 at 15.28.50

A common situtation that we are interested in is where X_1,...,X_n are independent identically distributed random variables or IIDRVs, for short. Here in addition to being independent, the random variables each have the same CDF. Here we think of IIDRVs repeating the same random experiment n times and recording the answer to each.

Expectation and Variance

Definition. The expectation of a discrete random variable X is

Screenshot 2021-11-18 at 15.29.19

The expectation gives an average value of the random variable. We could think of placing one unit of mass along the number line, where at point x we place a weight of \mathbb P(X=x). The expectation, \mathbb E[X], is then the point of the number line that balances the weights on the left with the right.

Example. Calculate the expectation for the following random variable

Screenshot 2021-11-18 at 15.30.21


Properties of the expectation

Here are various properties of the expectation. Proofs are included and are good to know but are not essential reading for exams. (For the most part the following lemmas are really properties of summation.)

Lemma 1. For a function g: \mathcal X \rightarrow \mathbb R,

Screenshot 2021-11-18 at 15.31.18Screenshot 2021-11-18 at 15.31.26

Lemma 2. For constants a and b

Screenshot 2021-11-18 at 15.32.19

Proof. Applying Lemma 1.

Screenshot 2021-11-18 at 15.33.50

Lemma 3. For two random variables X and Y

Screenshot 2021-11-18 at 15.34.18

Screenshot 2021-11-18 at 15.34.45

Screenshot 2021-11-18 at 15.34.56


Lemma 4. For two independent random variables X and Y

Screenshot 2021-11-18 at 15.35.54

Warning! In general, we cannot multiply expectations in this way. We need independence to hold. For instance if X=Y =1 with probability 1/2 and X=Y=0 otherwise a quick check shows that 1/2 = \mathbb E [ X Y] \neq \mathbb E[X] \mathbb E[Y] = 1/4.

Example. Let Y the number of heads from 100 coin tosses then \mathbb E [Y] = 50.

Answer. The exact distribution of Y is not so straight-forward to explicitly calculate. However, we can use the above rules.

Note that Y = \sum_{i=1}^{100} X_i where

It is easy to see that \mathbb E[X_i ] = 1 \mathbb P(X_i =1 ) + 0 \mathbb P(X_i = 0) = 1/2. Thus by Lemma 3

Screenshot 2021-11-18 at 15.36.59


While the expectation gives an average value for a random variable. The variance determines how spread out a probability distribution is.

Definition [Variance / Standard Deviation] The variance of a random variable X is defined to be

Screenshot 2021-11-18 at 15.37.42

Further the standard deviation is the square-root of the variance. That is

Screenshot 2021-11-18 at 15.37.45

It is common to use \mu to denote the expectation of a random variable and for \sigma to denote the variance of a random variable. So

Screenshot 2021-11-18 at 15.38.14

Here we use (X-\mu)^2 to determine a square distance about the mean. We then take the square root to give the rescale the distance. This is very similar to the way we think of (Euclidean) distances for vectors, E.g. for x = (x_1,x_2), | x| = \sqrt{x_1^2+x_2^2}.

Lemma 5. 

Screenshot 2021-11-18 at 15.38.55

So the variance is the mean square minus the square of the mean.



Lemma 6. For a,b \in \mathbb R,

Screenshot 2021-11-18 at 15.39.59

Lemma 7. If X and Y are independent random variables then

Screenshot 2021-11-18 at 15.40.36

Notice if we have a sequence of independent identically distributed random variables (IIDRVs) X_1,....,X_n where \mathbb V(X_1) = \sigma^2 then we can see from the above lemma that

Screenshot 2021-11-18 at 15.41.08

So we see that as we add up more and more IIDRVs the distance from the mean is about \sqrt{n}. This is an important observation that will be refined later on where we discuss the normal distribution.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: