Node values in Boltzmann machines (0/1 vs -1/1). Are they the same?

Boltzmann machines were introduced by Hinton and Sejnowski as taking values in $\{0,1\}$. The Wikipedia entry also uses this convention. However, Hopfield Networks, which are the deterministic version of Boltzmann machines, are usually introduced as taking values in $\{-1,1\}$. Ising models also follow this convention.

With the energy function being defined equivalently in both models as $$ E(x) = \sum_i b_ix_i + \sum_{ij}w_{ij}x_ix_j$$ it seems that the two conventions would behave differently. For example, how would the $\{0,1\}$ model learn a preference for checkered patterns?


More generally, in an undirected graphical model where the nodes take values in $\{a,b\}$, we can define the interaction (energy) between two neighbouring nodes $x$ and $y$ as $$E(x,y) = \begin{cases} w_{aa}, \text{if}\ x=a,y=a \\ w_{ab}, \text{if}\ x=a,y=b \\ w_{ba}, \text{if}\ x=b,y=a \\ w_{bb}, \text{if}\ x=b,y=b \end{cases}$$ or equivalently represented by the matrix $ \begin{bmatrix} w_{aa} w_{ab} \\ w_{ba} w_{bb} \end{bmatrix} $.

Boltzmann machines restrain the interactions between two neighbouring nodes to being described by a single scalar $w_{ij}$. In the case when our values $\{a,b\}$ are $\{0,1\}$, we get $$ E(x_i,x_j) = \begin{bmatrix} w_{ij} 0 \\ 0 0 \end{bmatrix} $$ If we set $\{a,b\}$ to $\{1,-1\}$, we instead get $$ E(x_i,x_j) = \begin{bmatrix} w_{ij} -w_{ij} \\ -w_{ij} w_{ij} \end{bmatrix} $$

Are these two formalisms really equivalent? It seems unlikely...

Topic mathematics graphical-model

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.