Entropy and Relative Entropy occur sufficiently often in these notes to justify a (somewhat) self-contained section. We cover the discrete case which is the most intuitive.
Entropy – Discrete Case
Def 1. [Entropy] Suppose that is a random variable with values in the countable set
and distribution
then
is the entropy of .
Def 1. [Relative Entropy] For probability distributions $latex {\mathbb P}=(p_i:i\in {\mathcal X})$ and then
is the Relative Entropy of with respect to
.
Ex 1. For a vector with $latex n=\sum_in_i$ , if
then
Ans 1. Take logs and apply the Stirling’s approximation that . So,
Ex 2. Suppose that ,
are IIDRVs with distribution
. Let
be the in empirical distribution of
,
, that is
If for each
then
Ans 2. Note that , so combining with [1] gives
Boltzmann’s distribution
We now use entropy to derive Boltzmann’s distribution.
Consider a large number particles. Each particle can take energy levels ,
. Let
be the average energy of the particles. If we let
be the proportion of particles of energy level
then these constraints are
As is often considered in physics, we assume that the equilibrium state of the particles is the state with the largest number of ways of occurring subject to the constraints on the system. In other words, We solve the optimization
Ex 3. [Boltzmann’s distribution] Show that the solution to is given by the distribution
is called Boltzmann’s distribution. The scaling constant is called the Partition function and the constant
is chosen so that
Ans 3. The Lagrangian of this optimization problem is
So, finding a stationary point
which implies
For our constraints to be satisfied, we require that
Thinking of as a function of
, we call
the partition function and notice it is easily shown that
The distribution we derived