Parallel tempering
Monte Carlo algorithms are often used to explore probability distributions. Particularly, the Metropolis-Hastings algorithm is an extremely widely used technique for sampling from distributions. In this story, there are a few frog characters to familiarize ourselves with.
First up, explorer frog:
- Doesn’t give a flip
- Red
Next, we have exploiter frog (also known as greedy frog):
- Wants flies, now
- Blue
Finally, our most complicated frog will be sampler frog:
- Wants flies, most of the time (!)
- Green
In the Metropolis-Hastings algorithm, in order to sample from a distribution proportional to \(P(x)\), a proposed move from a “state” \(x\) to some new state \(x'\) will be “accepted” with probability \[P_{\text{accept}}(x \rightarrow x') = \min\left(1,\frac{P(x)}{P(x')}\right).\] (really we should be sure our proposals are symmetric, but we won’t worry about that in this demonstration).
What is going on with our frogs is that there is a
So, generally by using parallel tempering method we can quite generically improve the convergence performance of Monte Carlo methods, and therefore the efficiency of our posterior inference.
A nice side-effect of this, however, is the ability to leverage a technique known as “calorimetry” in the Physics community for computing the free energy of a system, and as “thermodynamic integration” within the statistis community.
Suppose that we have a general model where the
The fundamental identity that will come in handy is that if we define the partition function at an arbitrary \(\beta\) as
\[ Z(\beta) = \int P(A|\vec{\theta})P(\vec{\theta})^\beta d \theta\]
so that notably we have that the Bayesian evidence is obtained at \(\beta = 1\):
\[ Z(1) = \int P(A | \vec{\theta}) P(\vec{\theta}) d \theta = P(A), \qquad Z(0) = \int P(\theta) d\theta = 1\]
Key terms/concepts:
- Simulated annealing is generally a useful way to improve convergence.