Entropy and Boltzmann’s Distribution

Entropy and Relative Entropy occur sufficiently often in these notes to justify a (somewhat) self-contained section. We cover the discrete case which is the most intuitive.

Entropy – Discrete Case

Def 1. [Entropy] Suppose that X is a random variable with values in the countable set {\mathcal X} and distribution P=(p_i:i\in {\mathcal X}) then

is the entropy of X.

Def 1. [Relative Entropy] For probability distributions $latex {\mathbb P}=(p_i:i\in {\mathcal X})$ and {\mathbb Q}=(q_i:i\in {\mathcal X}) then

is the Relative Entropy of P with respect to Q.

Ex 1. For a vector {\mathbf n}=(n_i : i=1,...,m) with $latex n=\sum_in_i$ , if



Ans 1. Take logs and apply the Stirling’s approximation that \log(n!)=n\log n - n + o(n). So,


Ex 2. Suppose that X_k, k=1,...,n are IIDRVs with distribution P. Let \hat{P}^n=(\hat{P}_i^n : i \in {\mathcal X} ) be the in empirical distribution of X_k, k=1,..,n, that is

If n_i/n \rightarrow q_i for each i=1,...,m then

Ans 2. Note that p_i^{n_i} = e^{nq_i \log p_i}, so combining with [1] gives

Boltzmann’s distribution

We now use entropy to derive Boltzmann’s distribution.

Consider a large number particles. Each particle can take energy levels e_i, i\in {\mathbb N}. Let \bar{e} be the average energy of the particles. If we let p_i be the proportion of particles of energy level e_i then these constraints are

As is often considered in physics, we assume that the equilibrium state of the particles is the state p with the largest number of ways of occurring subject to the constraints on the system. In other words, We solve the optimization

Ex 3. [Boltzmann’s distribution] Show that the solution to is given by the distribution

is called Boltzmann’s distribution. The scaling constant Z(\lambda) is called the Partition function and the constant \lambda is chosen so that

Ans 3. The Lagrangian of this optimization problem is

So, finding a stationary point

which implies

For our constraints to be satisfied, we require that

Thinking of Z as a function of \lambda, we call Z(\lambda)=\sum_{i=1}^\infty e^{-\lambda e_i} the partition function and notice it is easily shown that

The distribution we derived


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: