Understanding Hopfield

Posted on 12/10/2024

#hopfield

#associative memory

#neural network

#understanding

In this first post I will go through the famous ‘Neural networks and physical systems with emergent collective computational abilities’ published in 1982 by Hopfield.

Introduction

One of the main points that Hopfield brings in this paper is the fact that simple interactions of large number of elements can lead to a very complex collective behavior. He mentions some physical examples such as the stable magnetic orientations and domains in a magnetic system or the vortex patters in fluid flow, two very physical examples. Some other examples we might be more used to are ants behavior or Conway’s game of life.

Working principle of the convolutional operator.

In the paper, he wonders if a similar mechanism —the simple interaction of neurons— could be the origin of cognitive processes such as the stability of memories, creation of categories and generalization, and temporal memory.

He acknowledges that a detailed modeling of the neuron anatomy is not possible, both for complexity and lack of understanding. He hopes that, by retaining the minimal properties, the collective complex properties are still retained.

The model

The objective to develop an associative memory passed by creating a dynamical model in which the stored patterns are equilibrium points or dynamical attractors in the state space. In other words, if the neurons are initialized at a random state, the natural dynamics of the system will lead to the stored pattern.

The model that Hopfield presents is based on the McCullough and Pitts neuron model, where each neuron can have two states, either firing ( $V_i=1$ ) and not firing ( $V_i=-1$ ). Note that in the original paper, Hopfield uses 0 and 1 for the neuron states, however using -1 and 1 will simplify the notation later.

The network consists of a series of densely interconnected neurons ${1, .., N}$ and the strength of the connection from neuron $i$ to neuron $j$ is parameterized by the so-called synaptic weight matrix $T_{ij}$ , normally also named $w_{ij}$ . The state of the system is the message formed by the state each neuron ${V_1, ..., V_N}$ which forms a binary message. The values of the neurons are updated asynchronously, normally by random picking, and according to this rule

D_{it} = \begin{cases} 1 & \text{if} & \sum_{j \neq i} T_{ij} V_j > U_i\\ 0 & \text{if} & \sum_{j \neq i} T_{ij} V_j < U_i\\ \end{cases}

where $\sum_{j} T_{ij} V_j$ is the sum of input signals arriving to $i$ , and $j \neq i$ since neurons cannot have self connections.

Hopfield also talks about the difference between the perceptrons, in which connections are only forward and the update in synchronous and his model, in which there is a strong back-coupling and update is asynchronous.

Assuming we want the network to store $P$ patters $\xi_1, ...,\xi_P$ . the synaptic weight matrix $T_{ij}$ is built using the Hebbian learning rule.

\begin{cases} T_{ij} = \sum_{P} \xi_i \xi_j \\ T_{ii} = 0 \end{cases}

The Hebbian learning rule is based on the Hebbian 1949 book The Organization of Behaviour in which he states

When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.

which is normally phrased as

Neurons that fire together, wire together. Neurons that fire out of sync, fail to link.

Proof of convergence (single pattern)

We could ask, why this network using the Heb rule for the synaptic weight matrix creates a dynamical system with the learn patterns as fixed, or equilibrium points?

First of all, recall that for a difference equation, a fixed point $x^*$ is by definition a point such that $x^*=f(x^*)$ . For a Hopfield network storing one pattern, the weight matrix is given by $w_ij=\xi_i \xi_j$ and the dynamics are

V_i^{t+1} = \text{sgn}(\sum_j w_{ij} V_j^{t}) = \text{sgn}(\xi_i \sum_j \xi_j V_j)

note that when the values $V_j$ are equal to the one of the stored pattern $V_j = \xi_j$ the summation term is equal to $N$ .

V_i^{t+1} = \text{sgn}(N\xi_i) = \xi_i

since $N>0$ . Which means that $\xi_i$ is a fixed point of the system since $V_i^{t+1} = \xi_i$ for all $i$ . Similarly, when values $V_j$ are the opposite of the stored pattern $V_j = -\xi_j$ the summation term is equal to $-N$ so that

V_i^{t+1} = \text{sgn}(-N\xi_i) = -\xi_i

which means that $-\xi_i$ is also a fixed point since $V_i^{t+1} = -\xi_i$ for all $i$ . Which proves that for a stored pattern $\xi$ both the pattern and its opposite are fixed points of the system. Moreover, when the term $\sum_j \xi_j V_j > 0$ updates will cause the term to increase while if it is $<0$ updates will cause it to decrease. Which means that the fixed points are attractors of the system, and that if $\xi$ and $V$ are orthogonal, then $\sum_j \xi_j V_j = 0$ and $\xi$ will be an unstable fixed point of the system.

Proof of convergence (multiple pattern)

When multiple patterns are stored the synaptic weight matrix is the superposition of each individual weight matrix. We can check if a pattern is a fixed point of the dynamical system by expanding the expression for updating a neuron

\sum_{j,j \neq i} w_{i,j} V_j^\upsilon = \frac{1}{N-1} \sum_{j,j \neq i} (V_j^\upsilon \sum_\mu \xi_j^\mu \xi_i^\mu ) = \frac{1}{N-1} \sum_{j,j \neq i} \sum_\mu V_j \xi_j^\mu \xi_i^\mu

If we substitute the values of the network by the pattern $\xi^\upsilon = V$ then

\sum_{j,j \neq i} \sum_\mu V_j \xi_j^\mu \xi_i^\mu = \sum_{j,j \neq i} \sum_\mu \xi_j^\upsilon \xi_j^\mu \xi_i^\mu

when adding the contribution of each pattern ( $\sum_\mu$ ), we get that when the index carrying the pattern $\mu$ matches the selected pattern $\upsilon$

\xi_j^\upsilon \xi_j^\upsilon \xi_i^\upsilon = \xi_i^\upsilon

because $\xi_j^\upsilon \xi_j^\upsilon$ is always equal to $1$ . We can then extract this terms from the summation, rewrite the previous equations and factorize $\xi_i^\upsilon$ from it —by multiplying rather than dividing since again ${\xi_i^\upsilon}^2=1$ .

\frac{1}{N-1} \sum_{j,j \neq i} (\xi_i^\upsilon + \sum_{\mu,\mu \neq \upsilon} \xi_j^\upsilon \xi_j^\mu \xi_i^\mu) = \frac{1}{N-1} \xi_i^\upsilon (\sum_{j,j \neq i} 1 + \sum_{j,j \neq i} \sum_{\mu,\mu \neq \upsilon} \xi_j^\upsilon \xi_i^\upsilon \xi_j^\mu \xi_i^\mu)

since $\sum_{j,j \neq i} = N-1$ we finally obtain

\xi_i^\upsilon (1 + \frac{1}{N-1} \sum_{j,j \neq i} \sum_{\mu,\mu \neq \upsilon} \xi_j^\upsilon \xi_i^\upsilon \xi_j^\mu \xi_i^\mu)

if the term in parenthesis is $>0$ then the updates will carry the state towards that pattern, which means the pattern is a fixed point, in other words if second term is $>-1$ the the pattern $\upsilon$ is a equilibrium point.

Energy equation

Hopfield states that, under the defined dynamics and the assumption that $T$ is a symmetric matrix or in other words that the synaptic weight is the same for both sides of the communication, the following energy function is monotonically decreasing, which means that state changes will lead towards a (at least local) minimum of the energy function $E$ .

E = -\frac{1}{2} \sum_i \sum_j W_{ij} V_i V_j

In order to prove this let’s compute the energy difference after an asynchronous state update. Let’s assume that $E^{t+1}$ is the energy after updating $V_i^t$ to $V_i^{t+1}$ . I’ll work out a simple 3-neuron example whose energy would be given by:

E = -\frac{1}{2} (W_{11}V_1V_1 + W_{12}V_1V_2 + W_{13}V_1V_3 + W_{21}V_2V_1 + W_{22}V_2V_2 + \\ W_{23}V_2V_3 + W_{31}V_3V_1 + W_{32}V_3V_2 + W_{33}V_3V_3)

If $V_1$ changes to $V_1'$ and we assume that $W$ is symmetrical $W_{ik}=W_{ji}$ , then the energy difference will be

\Delta E= -\frac{1}{2}(2(W_{12}V_1V_2 + W_{13}V_1V_3) - 2(W_{12}V_1'V_2 + W_{13}V_1'V_3)) = \\ -(V_1' - V_1)(W_{12}V_1V_2 + W_{13}V_3)

The diagonal terms $W_{ii}$ disappear because $V_i^2$ is always $1$ and then the non-diagonal terms that interact with the modified neuron appear twice due to the symmetry we assumed. In general, we can write the energy difference as

\Delta E = -(V_i^{t+1}-V_i^{t})\sum_{j, j \neq i} W_{ij} V_j

Now there are two cases to study, if $V_i^{t+1}=V_i^{t}$ then the energy does not change. However if $V$ changes, it can only change mean that $V_i^{t+1}=-V_i^{t}$ because there are two states, and then

\Delta E = -2V_i^{t+1}\sum_{j, j \neq i} W_{ij} V_j

Since the summation is the term used to update the neuron value, if its value is positive, then $V_i^{t+1}$ will also be positive, and if its value is negative then $V_i^{t+1}$ will also be negative. Therefore $V_i^{t+1}\sum_{j, j \neq i} W_{ij} V_j>0$ which means that $\Delta E<0$ when $V_i^{t+1}\neq V_i^{t}$ .

This proofs that the energy function is monotonically decreasing. Which means the dynamics of the system can be seen as descend on the energy landscape.

Let’s play!

And finally, let’s see how the Hopfield network performs. I have created an interactive component which allows to create a network with $n^2$ neurons that represent a black (-1) and white (+1) squared ( $nxn$ ) image. Feel free to modify this parameter to run larger or smaller networks.

Click the + button to add your own images which will be translated to $nxn$ black and white images that you will be able to preview. These images will be the memories or patterns that will define the synaptic weight matrix for the network.

The network will be initialized in a random state that can be modified by pressing Shuffle. Once you are ready, press Try and the network will run for a predefined number of iterations that you can modify as well.

See how fast the network converges to one of the memorized patterns? Have you obtained a solution which is the opposite of the original pattern? is the network stuck into a cyclic state?

Have fun interacting with the Hopfield networkd and let me know what you discover !