Review on the Cluster Monte Carlo Updates ​with Boltzmann Machines

Motivation

Cluster Monte Carlo (cluster MC) algorithms are an important class of sampling method in many-body physics, especially for the systems at critical points.

The critical slowing down from the power law of relaxation time τ\tau with respect to the critical length ξ\xi causes an extraordinary amount of time consumption for single-spin flipping Monte Carlo:

This means that to provide correct sampling, a single-spin flipping Monte Carlo needs to spend the time of the power of system size, which shows inevitably low efficiency.

Then there comes the cluster Monte Carlo algorithm, which helps a lot in decreasing the autocorrelation time. Some well-known examples include Swendsen-Wang and Wolff.

In the Swendsen-Wang algorithm, the core part, which serves as the constructor of clusters, shows a familiar conditional choice

This strongly reminds us of the sampling technique used in the restrive Boltzmann Machine (RBM). Indeed, in the paper Exploring cluster Monte Carlo updates with Boltzmann machines, researchers explores an even more general method, using Boltzmann Machine (BM) to express the cluster Monte Carlo algorithm. The simulation result also shows great improvement in the autocorrelation time compared with the local updating algorithm.

Introduction to Boltzmann Machines

In general, any kinds of networks can be viewed as the set of vertices and edges

where the edges can be viewed as the set of pairs of vertices

What makes Boltzmann machines different is that they divide the set of vertices into two classes

where conventionally we call them visible layer and hidden layer, and the vertices in them will be correspondingly visible vertices and hidden vertices.

Of course, this is just the abstract, structural feature of BMs; the real essence locates at how it works: they are energy-based, generative models.

RBMs, which we have learned a lot about in the machine learning course, impose even more restrive confinement on the BMs; they do not allow the interaction between the same layer, either the visible layer or the hidden layer.

When one wants to train the BMs, they have to sample between layers, based on the conditional probability formula which should be derived from the stochastic essence of this generative model.

A typical strategy is to sample from the input, let's say the visible layer, the hidden layer. To know how well the parameter works, one has to sample back the visible layer from the hidden layer they just got. Repeat this procedure back and forth several times, then update the interaction parameters.

Design A Boltzmann Machine for Cluster MC

The researchers of Exploring cluster Monte Carlo updates with Boltzmann machines proposed some special Boltzmann machines to reproduce the cluster Monte Carlo algorithm.

Their methods use the system configuration as input, serving as the visible layer of the BM, sample special hidden vertices with well-designed energy models which make sure the marginal distribution of the visible layer keeps the original physical system, and finally use the sampled hidden layer to manipulate the visible layer to eventually achieve the same effect as the cluster MC.

Classical 2D Ising Model

The Hamiltonian of a classical 2D Ising model is

where s={s1,s2,,sN},si{+1,1}\mathbf{s}=\{s_1,s_2,\cdots,s_N\},s_i\in\{+1,-1\} represents a specific system configuration, JJ is the interaction coefficient, and \ell is the link of neighboring spins.

Based on the statistical physics, we know the probability distribution will be

where β:=1kT\beta := \frac{1}{kT} indicates the system temperature.

Basic BM Design

Following the essence of the cluster MC, we can define the hidden layer as the links connecting neighboring spins, so that the energy model can be written as

where h{0,1}h_\ell \in \{0, 1\} indicates whether the neighbors are counted into the same cluster or not. Due to the fact that our purpose is to recover the Ising model, we can choose the general connection parameter WW_\ell to be a global constant, serving as a tunable hyper-parameter. The same argument holds for b=bb_\ell = b.

One can derive the conditional distribution for h\mathbf{h} is

where σ(z):=11+ez\sigma(z):=\frac{1}{1+e^{-z}} is the sigmoid activation function.

Moreover, to correctly recover the marginal distribution p(s)p(\mathbf{s}), the parameters have to satisfy

Updating Strategy

After inputting the configuration s\mathbf{s}, we sample the hidden layer according to this conditional distribution. Then, use the hidden samples to flip the linked spins.

Behaviors in Different Parameter Regions

In the proposal above, we got two different tunable parameters. It is valuable to have a look at how the entire BM works in different parameter regions.

  • bb \to -\infty

    In this limit, one can derive that the BM updating strategy exactly recovers the Swendsen-Wang algorithm since

which gives

and these are precisely the Swendsen-Wang algorithm's cluster updating rules.

  • b+b \to +\infty

    In this limit, the BM does nothing but pass the Ising interaction coefficient to the parameter since

  • b(,)b \in (-\infty, \infty)

    In this limit, the corresponding algorithm is Niedermayer’s cluster algorithm, which permits misaligned links and is well-known as the general cluster updating method.

Ising model with plaquette interactions

After introducing the plaquette interactions, the new Boltzmann weight is

where \wp represents the plaquette consisting of 4 neighboring spins which all together form a square shape.

Basic BM Design

The Hubbard-Stratonovich (HS) transformation helps express the plaquette term in another way

where W=arccosh(e2βK)W=\operatorname{arccosh}\left(e^{2 \beta K}\right) is the coupling strength between the binary HS field hh_\wp and F(s)=isi+iˉsi\mathcal{F}_{\wp}(\mathbf{s})=\prod_{i \in \ell_{\wp}} s_{i}+\prod_{i \in \bar{\ell}_{\wp}} s_{i} is defined as the sum of the bipartite division of the plaquette. Here the bipartite division for each plaquette can be chosen randomly.

Now we can use hh_\wp as the hidden layer, and the energy model is given by

Updating Strategy

The calculation of the conditional probability gives that

Meanwhile, the fact that there is no interaction between different hidden vertices implies that the factorizability of the conditional probability of hidden layer

Using the conditional distribution, we can sample the hidden layer from the visible layer, i.e. the input spins configuration.

Once the hidden layer is given, the BM can be reduced into an ordinary 2D Ising model. This fact enables us to design another auxiliary hidden layer, which is not shown in the illustration figure above.

Note that after using the hidden layer to calculate the new effective Ising model, the interaction coefficient JJ is no longer invariant for all pairs of neighboring spins. Therefore, the coefficient WW in the energy model of the BM for 2D Ising model requires to be modified so that WijW_{ij} can express exactly the new Ising model.

With the new auxiliary hidden layer, one can still use the same scheme mentioned before to do the back sampling for the visible layer, i.e. the spin configuration.

General Framework

In general, the BM recommender for cluster Monte Carlo can be written as

where Fα(s)\mathcal{F}_{\alpha}(\mathbf{s}) represents the feature of the physical spin system of interest, i.e. the dual structure in the BM reflecting the physcal essence of the visible layer; α\alpha can be used for counting many kinds of different features, e.g. link indices in the classical 2D Ising model and plaquette indices in the Ising model with plaquette interaction.

After manually selecting the BM for some specific physical system, one can easily sample the hidden layer, since every hidden vertex is well-separated from the other ones.

However, to sample back the visible layer, the procedure is not symmetric as what we will do for RBMs. To achieve this, one needs to utilize the sampled hidden layer to reduce the whole BM to another classical 2D Ising model with arbitrary interaction linking coefficients JJ_\ell. Then apply the cluster MC for this new Ising model to carry the actual update.

Demonstration

Applying the BM scheme for cluster MC updating mentioned above on the Ising model with plaquette interaction, the researchers found that it is much more efficient than the local updating and successfully predicts the critical binder ratio.

Conclusions

This work really inspires us a lot:

  • It shows the surprising power of expressing physical models of Boltzmann machines

  • The extensibility of BMs implies great potential for using a hierarchy structure for solving even more complicated physical problems

  • The matrix-multiplication essence of BMs provides great efficiency to the classical cluster MC and enlightens tremendous improvement space for large-size systems

  • Rich duality emerges from the BM correspondence with different many-body Hamiltonians

  • There is much more to explore regarding the application of actual machine learning involved in such interaction between BMs and many-body systems