Often we are interested in the magnitude of an outcome as well as its probability. E.g. in a gambling game amount you win or loss is as important as the probability each outcome.

(This is a section in the notes here.)

**Definition **[Random Variable] A random variable is a function that gives a value for every outcome in the sample space.

For example, if we roll two dice then is our sample space and the sum of the two dice can be given by .

Random variables (RVs) can be discrete, e.g. taking values , or they can be continuous, e.g. taking any value in .

Typically we use capital letters to denote random variables . Often we suppress the underlying sample space when writing probabilities. E.g. for our sum of two dice examples, we might write

where here by we really mean the event .

**Discrete Probability Distributions.** A key way to characterize a random variable is with its distribution.

**Definition** [Probability Mass Function]The probability distribution of a discrete random variable with values in the finite or countable set is given by

This is some times called the probability mass function (PMF). Notice it satisfies the properties

- (Positive) For all ,

- (Sums to one)

Another way to characterize the distribution of the a random variable is through its cumulative distribution function, which is simply gives the probability that the random variable is below a given value.

**Definition** [Cumulative Distribution Function] The cumulative distribution function (CDF) of a random variable is

.

We can define more than one random variable on the same probability space. E.g. from our two dice throws, could be the value of first dice, could be the value of the second dice, and could be the sum of the two dice.

**Definition** [Joint Probability Distribution] Suppose there are two random variables defined on the same probability space, and , then the joint probability mass function is given by

for and .

**Definition** [Independent Random Variables] We say that and are independent random variables if

for all , (I.e. ).

We can extend the about definition to any number of random variables. E.g. a set of random variable are independent if

A common situtation that we are interested in is where are independent identically distributed random variables or IIDRVs, for short. Here in addition to being independent, the random variables each have the same CDF. Here we think of IIDRVs repeating the same random experiment times and recording the answer to each.

# Expectation and Variance

**Definition.** The expectation of a discrete random variable is

The expectation gives an average value of the random variable. We could think of placing one unit of mass along the number line, where at point we place a weight of . The expectation, , is then the point of the number line that balances the weights on the left with the right.

**Example.** Calculate the expectation for the following random variable

**Answer.**

## Properties of the expectation

Here are various properties of the expectation. Proofs are included and are good to know but are not essential reading for exams. (For the most part the following lemmas are really properties of summation.)

**Lemma 1.** For a function ,

**Lemma 2.** For constants and

**Proof.** Applying Lemma 1.

**Lemma 3.** For two random variables and

**Lemma 4.** For two independent random variables and

**Warning!** In general, we cannot multiply expectations in this way. We need independence to hold. For instance if with probability and otherwise a quick check shows that .

**Example.** Let the number of heads from coin tosses then .

**Answer.** The exact distribution of is not so straight-forward to explicitly calculate. However, we can use the above rules.

Note that where

It is easy to see that . Thus by Lemma 3

## Variance

While the expectation gives an average value for a random variable. The variance determines how spread out a probability distribution is.

**Definition** [Variance / Standard Deviation] The variance of a random variable is defined to be

Further the standard deviation is the square-root of the variance. That is

It is common to use to denote the expectation of a random variable and for to denote the variance of a random variable. So

Here we use to determine a square distance about the mean. We then take the square root to give the rescale the distance. This is very similar to the way we think of (Euclidean) distances for vectors, E.g. for , .

**Lemma 5. **

So the variance is the mean square minus the square of the mean.

**Proof. **

**Lemma 6.** For

**Lemma 7.** If and are independent random variables then

Notice if we have a sequence of independent identically distributed random variables (IIDRVs) where then we can see from the above lemma that

So we see that as we add up more and more IIDRVs the distance from the mean is about . This is an important observation that will be refined later on where we discuss the normal distribution.