The Boltzmann Machine: Necker Cube Example

A tutorial and java demonstration by Simon Dennis

In the mid 1800s, L. A. Necker noticed while examining crystals that three dimensional objects can fluctuate in appearance.

The Necker cube provides a useful paradign for demonstrating the properties of constraint satisfaction networks (Feldman, 1981) such as the Boltzmann machine (McClelland & Rumelhart, 1988). In the following simulation two sets of eight units represent the alternate interpretations of the cube. For example, the top right most unit represents an upper right back vertex in the top interpretation and a upper right front vertex in the lower interpretation. The same vertex cannot be both a front and a back vertex so these units represent conflicting interpretations and are joined with a negative weight in the network. To see all of the weights in the network select the weights visible checkbox. The red lines represent negative weights and the blue lines positive weights. There are three sorts of constraints implemented in these connections:

  1. each vertex can have only one interpretation - therefore there are negative weights connecting units representing different interpretations of the same vertex
  2. the same interpretation cannot be given to more than one vertex - so units representing the same interpretation are connected with negative weights
  3. units from the same interpretation should be on together - so locally consistent units are connected with positive weights

Figure 1: The Necker cube example of the Boltzmann machine. [source].

On each iteration of the algorithm a unit is selected. The weights of the incoming connections are multiplied by the corresponding activations (black = 1, white = 0) and are summed. If there are alot of units active with positive connections to this unit and few active that have negative connections the sum will be large. Conversely, if there are already alot of units active that have negative connections to this unit the sum will be negative. The activation of the unit is then set to 1 with probability p where p equals the sigmoid function of the summed weighted activation. The sigmoid function serves to squash the input so that it falls between 0 and 1 (to ensure it can be interpreted as a probability). A unit with a large summed activation will have a high probability of being set to 1.

As the network is run through multiple iterations it tends to settle to one of the two interpretations. The goodness value is a measure of how well the network is satifying all of the constraints and should increase has the network converges. Try this now. Hit the reset button and then click several times on the "Do 20 Iterations" button. The network should gradually activate all of the units of one of the interpretations and deactivate the units of the other interpretation. To try it again hit "Reset" and start again.

Questions:

  1. What factors determine which of the interretations the network settles into?
  2. On some runs, the network seems to stay in a nonoptimal state for some time before eventually converging to one of the interpretations. Which states is it likely to "get stuck" in and why? (Hint: Watch the goodness value carefully when the network becomes "unstuck").

References

Feldman, J. A. (1981). A connectionist model of visual memory. In G. E. Hinton and J. A. Anderson (Eds.). Parallel models of associative memory (pp. 49-81). Hillsdale, NJ. Erlbaum.

McClelland, J. L. & Rumelhart, D. E. (1988). Explorations in Paralell Distributed Processing: A Handbook of Modles, Programs and Exercises. MIT Press, Cambridge, MA.

Sekular, R. & Blake, R. Perception. Knopf, New York.