Understanding Hopfield

Posted on 12/10/2024

In this first post I will go through the famous ‘Neural networks and physical systems with emergent collective computational abilities’ published in 1982 by Hopfield.

Introduction

One of the main points that Hopfield brings in this paper is the fact that simple interactions of large number of elements can lead to a very complex collective behavior. He mentions some physical examples such as the stable magnetic orientations and domains in a magnetic system or the vortex patters in fluid flow, two very physical examples. Some other examples we might be more used to are ants behavior or Conway’s game of life.

Working principle of the convolutional operator.
Working principle of the convolutional operator.

In the paper, he wonders if a similar mechanism —the simple interaction of neurons— could be the origin of cognitive processes such as the stability of memories, creation of categories and generalization, and temporal memory.

He acknowledges that a detailed modeling of the neuron anatomy is not possible, both for complexity and lack of understanding. He hopes that, by retaining the minimal properties, the collective complex properties are still retained.

The model

The objective to develop an associative memory passed by creating a dynamical model in which the stored patterns are equilibrium points or dynamical attractors in the state space. In other words, if the neurons are initialized at a random state, the natural dynamics of the system will lead to the stored pattern.

The model that Hopfield presents is based on the McCullough and Pitts neuron model, where each neuron can have two states, either firing (Vi=1V_i=1) and not firing (Vi=1V_i=-1). Note that in the original paper, Hopfield uses 0 and 1 for the neuron states, however using -1 and 1 will simplify the notation later.

The network consists of a series of densely interconnected neurons 1,..,N{1, .., N} and the strength of the connection from neuron ii to neuron jj is parameterized by the so-called synaptic weight matrix TijT_{ij}, normally also named wijw_{ij}. The state of the system is the message formed by the state each neuron V1,...,VN{V_1, ..., V_N} which forms a binary message. The values of the neurons are updated asynchronously, normally by random picking, and according to this rule

Dit={1ifjiTijVj>Ui0ifjiTijVj<Ui D_{it} = \begin{cases} 1 & \text{if} & \sum_{j \neq i} T_{ij} V_j > U_i\\ 0 & \text{if} & \sum_{j \neq i} T_{ij} V_j < U_i\\ \end{cases}

where jTijVj\sum_{j} T_{ij} V_j is the sum of input signals arriving to ii, and jij \neq i since neurons cannot have self connections.

Hopfield also talks about the difference between the perceptrons, in which connections are only forward and the update in synchronous and his model, in which there is a strong back-coupling and update is asynchronous.

Assuming we want the network to store PP patters ξ1,...,ξP\xi_1, ...,\xi_P. the synaptic weight matrix TijT_{ij} is built using the Hebbian learning rule.

{Tij=PξiξjTii=0\begin{cases} T_{ij} = \sum_{P} \xi_i \xi_j \\ T_{ii} = 0 \end{cases}

The Hebbian learning rule is based on the Hebbian 1949 book The Organization of Behaviour in which he states

When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.

which is normally phrased as

Neurons that fire together, wire together. Neurons that fire out of sync, fail to link.

Proof of convergence (single pattern)

We could ask, why this network using the Heb rule for the synaptic weight matrix creates a dynamical system with the learn patterns as fixed, or equilibrium points?

First of all, recall that for a difference equation, a fixed point xx^* is by definition a point such that x=f(x)x^*=f(x^*). For a Hopfield network storing one pattern, the weight matrix is given by wij=ξiξjw_ij=\xi_i \xi_j and the dynamics are

Vit+1=sgn(jwijVjt)=sgn(ξijξjVj) V_i^{t+1} = \text{sgn}(\sum_j w_{ij} V_j^{t}) = \text{sgn}(\xi_i \sum_j \xi_j V_j)

note that when the values VjV_j are equal to the one of the stored pattern Vj=ξjV_j = \xi_j the summation term is equal to NN.

Vit+1=sgn(Nξi)=ξiV_i^{t+1} = \text{sgn}(N\xi_i) = \xi_i

since N>0N>0. Which means that ξi\xi_i is a fixed point of the system since Vit+1=ξiV_i^{t+1} = \xi_i for all ii. Similarly, when values VjV_j are the opposite of the stored pattern Vj=ξjV_j = -\xi_j the summation term is equal to N-N so that

Vit+1=sgn(Nξi)=ξiV_i^{t+1} = \text{sgn}(-N\xi_i) = -\xi_i

which means that ξi-\xi_i is also a fixed point since Vit+1=ξiV_i^{t+1} = -\xi_i for all ii. Which proves that for a stored pattern ξ\xi both the pattern and its opposite are fixed points of the system. Moreover, when the term jξjVj>0\sum_j \xi_j V_j > 0 updates will cause the term to increase while if it is <0<0 updates will cause it to decrease. Which means that the fixed points are attractors of the system, and that if ξ\xi and VV are orthogonal, then jξjVj=0\sum_j \xi_j V_j = 0 and ξ\xi will be an unstable fixed point of the system.

Proof of convergence (multiple pattern)

When multiple patterns are stored the synaptic weight matrix is the superposition of each individual weight matrix. We can check if a pattern is a fixed point of the dynamical system by expanding the expression for updating a neuron

j,jiwi,jVjυ=1N1j,ji(Vjυμξjμξiμ)=1N1j,jiμVjξjμξiμ \sum_{j,j \neq i} w_{i,j} V_j^\upsilon = \frac{1}{N-1} \sum_{j,j \neq i} (V_j^\upsilon \sum_\mu \xi_j^\mu \xi_i^\mu ) = \frac{1}{N-1} \sum_{j,j \neq i} \sum_\mu V_j \xi_j^\mu \xi_i^\mu

If we substitute the values of the network by the pattern ξυ=V\xi^\upsilon = V then

j,jiμVjξjμξiμ=j,jiμξjυξjμξiμ \sum_{j,j \neq i} \sum_\mu V_j \xi_j^\mu \xi_i^\mu = \sum_{j,j \neq i} \sum_\mu \xi_j^\upsilon \xi_j^\mu \xi_i^\mu

when adding the contribution of each pattern (μ\sum_\mu), we get that when the index carrying the pattern μ\mu matches the selected pattern υ\upsilon

ξjυξjυξiυ=ξiυ \xi_j^\upsilon \xi_j^\upsilon \xi_i^\upsilon = \xi_i^\upsilon

because ξjυξjυ\xi_j^\upsilon \xi_j^\upsilon is always equal to 11. We can then extract this terms from the summation, rewrite the previous equations and factorize ξiυ\xi_i^\upsilon from it —by multiplying rather than dividing since again ξiυ2=1{\xi_i^\upsilon}^2=1.

1N1j,ji(ξiυ+μ,μυξjυξjμξiμ)=1N1ξiυ(j,ji1+j,jiμ,μυξjυξiυξjμξiμ) \frac{1}{N-1} \sum_{j,j \neq i} (\xi_i^\upsilon + \sum_{\mu,\mu \neq \upsilon} \xi_j^\upsilon \xi_j^\mu \xi_i^\mu) = \frac{1}{N-1} \xi_i^\upsilon (\sum_{j,j \neq i} 1 + \sum_{j,j \neq i} \sum_{\mu,\mu \neq \upsilon} \xi_j^\upsilon \xi_i^\upsilon \xi_j^\mu \xi_i^\mu)

since j,ji=N1\sum_{j,j \neq i} = N-1 we finally obtain

ξiυ(1+1N1j,jiμ,μυξjυξiυξjμξiμ) \xi_i^\upsilon (1 + \frac{1}{N-1} \sum_{j,j \neq i} \sum_{\mu,\mu \neq \upsilon} \xi_j^\upsilon \xi_i^\upsilon \xi_j^\mu \xi_i^\mu)

if the term in parenthesis is >0>0 then the updates will carry the state towards that pattern, which means the pattern is a fixed point, in other words if second term is >1>-1 the the pattern υ\upsilon is a equilibrium point.

Energy equation

Hopfield states that, under the defined dynamics and the assumption that TT is a symmetric matrix or in other words that the synaptic weight is the same for both sides of the communication, the following energy function is monotonically decreasing, which means that state changes will lead towards a (at least local) minimum of the energy function EE.

E=12ijWijViVjE = -\frac{1}{2} \sum_i \sum_j W_{ij} V_i V_j

In order to prove this let’s compute the energy difference after an asynchronous state update. Let’s assume that Et+1E^{t+1} is the energy after updating VitV_i^t to Vit+1V_i^{t+1}. I’ll work out a simple 3-neuron example whose energy would be given by:

E=12(W11V1V1+W12V1V2+W13V1V3+W21V2V1+W22V2V2+W23V2V3+W31V3V1+W32V3V2+W33V3V3)E = -\frac{1}{2} (W_{11}V_1V_1 + W_{12}V_1V_2 + W_{13}V_1V_3 + W_{21}V_2V_1 + W_{22}V_2V_2 + \\ W_{23}V_2V_3 + W_{31}V_3V_1 + W_{32}V_3V_2 + W_{33}V_3V_3)

If V1V_1 changes to V1V_1' and we assume that WW is symmetrical Wik=WjiW_{ik}=W_{ji}, then the energy difference will be

ΔE=12(2(W12V1V2+W13V1V3)2(W12V1V2+W13V1V3))=(V1V1)(W12V1V2+W13V3)\Delta E= -\frac{1}{2}(2(W_{12}V_1V_2 + W_{13}V_1V_3) - 2(W_{12}V_1'V_2 + W_{13}V_1'V_3)) = \\ -(V_1' - V_1)(W_{12}V_1V_2 + W_{13}V_3)

The diagonal terms WiiW_{ii} disappear because Vi2V_i^2 is always 11 and then the non-diagonal terms that interact with the modified neuron appear twice due to the symmetry we assumed. In general, we can write the energy difference as

ΔE=(Vit+1Vit)j,jiWijVj\Delta E = -(V_i^{t+1}-V_i^{t})\sum_{j, j \neq i} W_{ij} V_j

Now there are two cases to study, if Vit+1=VitV_i^{t+1}=V_i^{t} then the energy does not change. However if VV changes, it can only change mean that Vit+1=VitV_i^{t+1}=-V_i^{t} because there are two states, and then

ΔE=2Vit+1j,jiWijVj\Delta E = -2V_i^{t+1}\sum_{j, j \neq i} W_{ij} V_j

Since the summation is the term used to update the neuron value, if its value is positive, then Vit+1V_i^{t+1} will also be positive, and if its value is negative then Vit+1V_i^{t+1} will also be negative. Therefore Vit+1j,jiWijVj>0V_i^{t+1}\sum_{j, j \neq i} W_{ij} V_j>0 which means that ΔE<0\Delta E<0 when Vit+1VitV_i^{t+1}\neq V_i^{t}.

This proofs that the energy function is monotonically decreasing. Which means the dynamics of the system can be seen as descend on the energy landscape.

Let’s play!

And finally, let’s see how the Hopfield network performs. I have created an interactive component which allows to create a network with n2n^2 neurons that represent a black (-1) and white (+1) squared (nxnnxn) image. Feel free to modify this parameter to run larger or smaller networks.

Click the + button to add your own images which will be translated to nxnnxn black and white images that you will be able to preview. These images will be the memories or patterns that will define the synaptic weight matrix for the network.

The network will be initialized in a random state that can be modified by pressing Shuffle. Once you are ready, press Try and the network will run for a predefined number of iterations that you can modify as well.

See how fast the network converges to one of the memorized patterns? Have you obtained a solution which is the opposite of the original pattern? is the network stuck into a cyclic state?

Have fun interacting with the Hopfield networkd and let me know what you discover !


Comments

No comments yet.