How are non-restricted Boltzmann machines trained?

Restricted Boltzmann machines are stochastic neural networks. The neurons form a complete bipartite graph of visible units and hidden units. The "restricted" is exactly the bipartite property: There may not be a connection between any two visible units and there may not be a connection between two hidden units.

Restricted Boltzmann machines are trained with Contrastive Divergence (CD-k, see A Practical Guide to Training Restricted Boltzmann Machines).

Now I wonder: How are non-restricted Boltzmann Machines trained?

When I google for "Boltzmann Machine", I only finde RBMs.

Topic rbm machine-learning

Category Data Science


I will apologize in advance for linking to the Wikipedia article, but I'm not sure I can do a better job explaining it. https://en.wikipedia.org/wiki/Boltzmann_machine

(Will extract selected passages to prevent link rot shortly.)

One of the reasons we see so much talk about training RBMs and not much talk about the other is, by my recollection of a talk by Hinton, "Intractable, except for very small problems or limited cases." The activation of a hidden node depends not only on the state of visible nodes, but on its compatriots. This means our clever contrastive divergence and gradient decent methods don't work!

But you didn't ask why it's difficult, you asked how to do it.

The short, short answer is simulated annealing.

The long short answer, we select an initial state for the network, then perform updates until the network reaches a global thermal equilibrium, then try and adjust the weights while reducing the temperature until our activations approximate a global minimum. The process repeats until we're happy with the outcome.


General BMs were trained the same way as RBMs, but it was much slower, because Gibbs sampling was much less efficient for general BM due to presence of hidden-hidden and visible-visible connections.

Until this paper, where the authors proposed new algorithm, that also uses variational mean-field approach to approximate one of the terms in gradients of log-likelihood.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.