Entropy and Relative Entropy occur sufficiently often in these notes to justify a (somewhat) self-contained section. We cover the discrete case which is the most intuitive.

## Entropy – Discrete Case

**Def 1. [Entropy]** Suppose that is a random variable with values in the countable set and distribution then

is the entropy of .

**Def 1. [Relative Entropy]** For probability distributions $latex {\mathbb P}=(p_i:i\in {\mathcal X})$ and then

is the Relative Entropy of with respect to .

**Ex 1**. For a vector with $latex n=\sum_in_i$ , if

then

**Ans 1.** Take logs and apply the Stirling’s approximation that . So,

**Ex 2.** Suppose that , are IIDRVs with distribution . Let be the in empirical distribution of , , that is

If for each then

**Ans 2.** Note that , so combining with [1] gives

**Boltzmann’s distribution**

We now use entropy to derive Boltzmann’s distribution.

Consider a large number particles. Each particle can take energy levels , . Let be the average energy of the particles. If we let be the proportion of particles of energy level then these constraints are

As is often considered in physics, we assume that the equilibrium state of the particles is the state with the largest number of ways of occurring subject to the constraints on the system. In other words, We solve the optimization

**Ex 3.** [Boltzmann’s distribution] Show that the solution to is given by the distribution

is called *Boltzmann’s distribution*. The scaling constant is called the Partition function and the constant is chosen so that

**Ans 3.** The Lagrangian of this optimization problem is

So, finding a stationary point

which implies

For our constraints to be satisfied, we require that

Thinking of as a function of , we call the *partition function* and notice it is easily shown that

The distribution we derived