Hello there! In this last leg of our journey through the land of networks, we will visit the tricky, but fascinating, world of recurrent networks. To understand recurrent networks, we will get help from our favorite frenemies, the eigenvectors and the eigenvalues of a matrix. But to negotiate those treacherous eigenwaters, It is best to have your eigen hat on. I have mine on, and I'll give you a couple of seconds to grab yours. Let's begin by asking what can a linear recurrent network do? Here's the equation for a linear recurrent network. If there are N output neurons, then the output vector v is going to be N by 1, and the feedforward input to these ouput neurons is given by W times U, and that's again going to be an N by 1 vector. And we can call this N by 1 vector h, so we don't have to write W times U each time. And the feedback to the output neurons, is given by M x v, where M is the recurrent connection matrix. What we want to find out, is how the output of the network v(t) behaves for different values of M, the recurrent connection matrix M. This is where eigenvectors come to our rescue. Here is the differential equation that we are trying to solve, to understand how v{t} behaves. And this equation, as you can see, contains a mix of vectors and this matrix times a vector. So that's a pretty complicated equation to solve. Fortunately, we can use, eigenvectors to solve this particular differential equation. How do we do that? Well, suppose the connection matrix, the N by N vector and connection matrix, is symmetric. What does that mean? It means that for any particular pair of output neurons, so if this is neuron number one and this is neuron number two. Then the fact that the recurrent connection matrix is symmetric, just means that if one connects to two with some particular value or strength A, then two connects to one also, with the same value array. So in other words, M 1, 2 is equal to M 2, 1, is equal to the value A. And that's what it means for this matrix M, to be symmetric. Now why is it useful to have the connection matrix M symmetric? Well it turns out that if M is symmetric, then M has N different orthogonal eigenvectors, and corresponding eigenvalues would satisfy the standard eigenvector, eigenvalue equation shown here. Now what does it mean for these eigenvectors to be orthogonal? Well, if you take any two of these eigenvectors ei and ej, as long as i is not equal to j, the fact that they're orthogonal just means that he dot product of these two eigenvectors is going to be, you guessed it, 0. Now we can further make these eigenvectors orthonormal, so, orthonormal means, that these eigenvectors are not only orthogonal, but also, they have a length of 1. And, we can do that, by dividing each of these item vectors by their length, then we have the fact that ei.ei is going to equal 1. And if that's satisfied, then we say that we have a set of vectors, these eigenvectors, which are orthonormal to each other. Why is it useful to have these eigenvectors of them, which are orthonormal to each other? Well, it turns out that we can now write any n-dimensional vector, including our output vector, v{t}, as simply a linear combination of our orthonormal eigenvectors. So these eigenvectors now form a new basis or a new coordinate system for expressing n-dimensional vectors such as v{t}. To drive home the point let's look at the special case of a three dimensional space. So here's x, y, and z, and lets suppose that this is our vector v{t}. All we're doing now is expressing this vector vt in a new coordinate system given by our orthonormal eigenvectors e1, e2, and e3. And in the xyz system we were writing v(t), as simply the linear combination of the first component of v times 1,0,0. This was our vector for x. And v2 times 0,1,0, this is our y component. And finally for the z component v3 times 0,0,1. So all we are doing now is instead of expressing vt in the coordinate system given by the x, y, and z vectors, we are now writing v as a different linear combination, c1 times e1, plus c2 times e2, plus c3 times e3. Now why go through all this trouble? Well it turns out that if you substitute the equation for v(t) in terms of the ei's into the differential equation for v, and then further we use the eigenvector equation, as well as the orthonormality of ei, then we can solve for ci as a function of time. And so here is the equation for ci as a function of time, and once you have a closed form expression for ci as the function of time, we can substitute that value for ci into our equation for v. And therefore, we have solved the differential equation, and we now have a complete expression, that characterizes how v changes as a function of time. And if you want to get into all the mathematical detail of how we derived this expression for ci(t), I would encourage you to go to the supplementary materials on the course website. We can now show that the eigenvalues of the recurrent connection matrix, determine whether the network is stable or not. To see this, suppose one of lambda I is bigger than 1. Well, what happens to the output of the network, given by v(t), which is a linear combination of the item vectors weighted by these coefficient ci? Well if one of the lambda I's is bigger than 1, lets say that this lambda I here is equal to 2, which is bigger than 1. Then this term ends up being an exponential function of time. And so as time goes on you're going to have this term becoming larger and larger, and therefore ci of t is also going to become larger and larger. And so the output of the network then, also grows without any bound, which means that v(t) explodes, and so what you end up getting is an unstable network. On the other hand if all the eigenvalues are less than 1, then you should be able to convince yourself, by plugging in values of lambda I less than 1, in our equation for ci(t), that the network is stable because v(t) is going to converge to some steady state value. Which is given simply by the linear combination of all of these coefficients which are conversed now to this particular value, multiplied by each of the corresponding eigenvectors. Now we can answer the question that we posed earlier in the lecture. What can a recurrent network do? One thing that a linear recurrent network can do, is amplify its inputs. To see this, suppose that all the lambda I, the eigenvalues are less than 1. So we showed in the previous slide that the output of the network in the steady state is going to look like this. And if one of these eigenvalues, let's say lambda 1 is very close to 1, and all the other eigenvalues are much much smaller. Then the lambda 1 term, is going to dominate the sum, and so the steady state output of the network, is going to be basically the projection of the input onto the, first item vector, divided by 1, minus lambda 1, multiplied by e 1. So, what we have then, is a network that is amplifying it's input projection. So, if lambda 1, for example, is equal to 0.9, which is close to 1, then 1 over 1 minus lambda 1 is going to be 10. And so, we have an amplification factor of this projection of the input on to e1 of 10. Now let's look an example of a Linear Recurrent Network. So, let's assume that each of these output neurons codes for some angle between minus 180 degrees to plus 180 degrees. So instead of labeling these neurons with 1, 2, 3, 4 and 5, we can label them according to some angles. So for example, this could be minus 180 degrees, this neuron could be minus 90. This neuron could be labeled with 0, this with plus 90, and this with 180. Now, why are we labeling neurons with angles? It's because we can now define the connection matrix M, as a cosine function, for example, of the relative angle labeling the neurons. So in other words, m of theta, theta prime, could be proportional to cosine of theta minus theta prime. What does this type of connectivity look like? Well it results in neurons exciting other neurons that are nearby, and inhibiting other neurons that are further away. And here's a graphical depiction of the cosine based connectivity function. So, for neurons that are close to any given neuron, you have excitation, and for neurons that are further away, you have inhibition. Now let's ask the question, isn't M, defined by such a connectivity function, symmetric? In other words, is M theta, theta prime equal to M theta prime theta? Well, that's the same as asking whether cosine of x is equal to cosine of minus x, which we know is true. Which means that yes, the connectivity matrix is indeed symmetric. Now this type of a connectivity function's interesting because there's some evidence that such connectivity is also found in the cerebral cortex. Neurons in the cerebral cortex tend to excite other neurons that are near them, and inhibit neurons that are further away. Now suppose we choose the connectivity matrix of a linear recurrent network to be proportional to the cosin function, such that all the eigenvalues are 0 except one eigenvalue, which is equal to 0.9. Then as we showed earlier, we would expect to see amplification. And we'd expect to see an amplification of the input by a factor of 10. So, let's see if that really happens when we simulate such a network. And, not surprisingly, the answer is yes. When we present the network with a noisy input, we do get an output that is an amplified version of the input, where the peak of this noisy input has been amplified, and the smaller peaks have been suppressed. So what else can a linear recurrent network do? Well the earlier remark that if all the eigenvalues are less than 1, then the network is stable. Now suppose one of these eigenvalues, lets say Lambda 1 is exactly equal to 1. In that case, we can show that we have a different kind of equation for how the coefficient for c1 evolves. It's given by this differential equation. And here's something interesting that happens. So suppose that the input was initially 0, and then it was turned on and then it was turned off. So we have the input h, which was initially 0, and then it was turned on to some value and then turned off again. Then here's what happens, even after the input has been turned off, so even after h is equal to 0, the network maintains an output. So the network now maintains a memory of the integral of the past inputs, as given by this integral shown here. Interestingly there's evidence for integrator neurons in the brain. In particular in the medial vestibular nucleus, there are these neurons that maintain a memory for eye position. So when the input to these neurons comes in the form of bursts, so here's one burst spikes that changes the eye position. Here's another burst of spikes from a different neuron, that decreases the eye position. We note that the integrated neuron maintains persistent activity, or a memory of the I position by changing its firing rate. And this is very similar to what we had in the previous slide. Where we had the neuron maintaining a memory of the integral of past inputs. So what this goes to show, once again, is that the brain can do calculus. In this case, we've shown that it can do integration. And we already showed that it can do differentiation in the previous lecture. So once again, sorry Newton and Liebowitz ,looks like the brain has beaten you to the punch. Let's conclude our tour of recurrent networks, by looking at nonlinear recurrent networks. And we can make the network nonlinear by applying a nonlinear function F to the sum of the input and recurrent feedback. And perhaps the simplest kind of non-linearity is the rectification non-linearity, which takes any input x and sets it equal to x, if x is greater than 0 and sets it equal to 0 otherwise. This non-linearity is quite useful because if you recall, the vector v represents the firing rates of neurons. And so the rectification non-linearity makes sure that the firing rates never go below 0. So what can non-linear recurrent networks do? They can perform amplification, similar to linear recurrent networks. So here is the input to the non-linear network. Which is a noisy input with a peak near 0. And here is the output of the nonlinear network. And you can see how the network has amplified the input, but it has also cleaned up the input, and it has suppressed some of the other peaks in the input. Now the interesting thing here is that the recurring connections, although they were again the cosine type recurrent connections, with excitation nearby and inhibition further away, the eigenvalues, in this case are all 0, but one of the eigenvalues was actually bigger than 1. So lambda 1 was actually 1.9. So in the linear recurrent network case, this would have led to an unstable network. But since we have the rectification non-linearity, it saves the day, and the network is in fact, stable and gives us this kind of amplification. Now here's something else that the non-linear recurrent network can do. It can perform selective attention, which is it can select one part of the input, and suppress the other part. So here's an input that contains 2 peaks. And if you look at the output of the non-linear network, it has essentially focused only on the peak at minus 90 degrees, and it has suppressed the other peak. So the network is performing a type of winner takes all input selection. Some might say that the network is implementing the capitalist credo, of the rich get richer, and the poor get poorer. And some people might even say that the moral of the story here, is that you have to be non-linear to be a capitalist. But I think we digress. The same non-linear network can also perform something called gain modulation. What does that mean? Well if the inputs look like this. Where you're adding a constant amount to a particular input. Which basically means you're shifting the input additively from one level to the other. The effect on the output is multiplicative. So the change in the input multiplies the output, and so you get this type of modulation. Also called, Gain Modulation, of the output firing rate of the neuron. Now, this is interesting because, this type of Gain modulation of neuro responses, has also been observed in the brain. Specifically in area 7A of the parietal cortex. Finally, the same non-linear network also maintains a memory of past inputs, just like the linear recurrent network that we considered a while ago. Here is the input for the non-linear network, it's basically a bump center around 0. That's the local input, along with some background input, which is about 0. The output of the network, as you might expect, is just an amplified version of the input, with the background suppressed. What happens to this output, when we turn off the local input? Here's what we get. So when the local input is turned off, you still have an output in this network, and the output has a peak at 0, which is exactly where the peak of the local input was. So this memory of the input, is being maintained in this network by rigorant activity. So what we have here then, is a network that maintains a memory of past activity, when the input has been turned off. And this is quite similar to the short term memory or working memory of past inputs, that is maintained by neurons in the pre-frontal cortex in the brain. We have been so far looking at networks with symmetric recurrent connections, what about non-symmetric recurrent networks? Well the simplest form of non-symmetric recurrent networks, would be a network of excitatory and inhibitory neurons. So for example, if you had one excitatory neuron, and one inhibitory neuron. You could have the excitatory neuron exciting the inhibitory neuron, and the inhibitory neuron then inhibiting, the excitatory neuron. And perhaps there is also connection from the neuron onto itself. These are called autapses, and so this will again be excititory, this will be inhibitory. So you can see why, the connections cannot be symmetric, because you cannot have excititory connection be plus, and the inhibitory connection also be plus. It has to be a negative, or an inhibitory connection. Here are the differential equations for our two neurons. So here is the differential equation for the firing rate of the excitatory neuron, here is the differential equation for the firing rate of the inhibitory neuron. And these are all the different parameters. The Excitatory connection from the neuron onto itself. Here is the connection from the inhibitory neuron onto the excitatory neuron, and so on. And you also see that we've added these parameters for thresholds that we apply And then that in turn is passed through a non-linearity, which is the rectification non-linearity. And just to make things concrete, let's assign some values. So these are some values for each of these parameters, for the connections and the threshold. And then finally we will leave one particular parameter. We're calling that tau i. That is the time constant for the inhibitory neuron. We will leave that unassigned, and we will vary this parameter to study the behavior of this non-linear and non-symmetric recurrent network. So how do we analyze the dynamics of such a non-linear and non-symmetric network? Well, hold on to your eigenhats, because we're going to need to use eigenvectors and eigenvalues again. To understand the dynamic behavior of this network, we can perform linear stability analysis. What does that mean? It means we can how stable the network is near a fixed point. The fixed point is basically obtained by looking at one of the values for vE and vI that make DvEdt and DvIDT go to 0. So, when both of these are 0, then we have values for vE and vI which are fixed, and which do not change as the function of time, and that would give you a fixed point for this network. So, how do we perform Linear Stability Analysis? Well we take the derivatives of the right-hand side of both of these equations, with respect to vE and vI. What we get then is a matrix, which is called the stability matrix, or if you want to be cool, you can call it the Jacobian matrix. Since the Jacobian matrix is not symmetric, the eigenvalues of the matrix can have both real and imaginary parts. So the eigenvalues can be complex, and these real and imaginary parts of the eigenvalues, in turn, determine the dynamics of the nonlinear network near a fixed point. So they determine whether the network is stable or not. Now we've assigned values for all of the parameters except for tau I. So what we can do now is choose different values for tau I, and this will in turn cause different eigenvalues for J. And then we can look at the effect of the different eigenvalues for J, on the stability and the behavior of this nonlinear network. First, let's look at what happens when we set tau I equal to 30 milliseconds. This makes the real part of the 2 eigenvalues for the stability matrix, negative. And, as we show in the supplementary materials for this lecture, on the course website, the real part being negative causes the network to be stable near the fixed point. So here's a pictorial depiction of what happens when we set tau I equal to 30 milliseconds. So the x axis is vE, the y axis is vI. And so if we start out at some particular location, which is some particular value for vE and vI. Then the network essentially converges to the fixed point, which is the point at which dve dt equal to 0, and dv1 dt equal to 0. So both vE and vI are not changing at this location, in this particular plot. Now if we look at what's happening as a function of time, you can see that both vE and vI oscillate. And the oscillations are damped, and eventually the oscillations are no longer there, and the network has converged to a specific value for vE, and a specific value for vI, and that is the stable fixed point of the network. This stable fixed point is also called a point attractor in the terminology of dynamical systems. Now look at what happens when you choose tau I to be 50 milliseconds. That makes the real part of the eigenvalues for the stability matrix positive. And as we show in the supplementary materials for this lecture, when the real part of the eigenvalues turn out to be positive, then the network is unstable. And so if you start out, in this plot of vE and vI at some location, near the fixed points, so here is the fixed point. And if you start out here with some value for vE and vI, then the network moves away from the fixed point, and so the network is unstable, and diverges away from the fixed point. But luckily, the rectification of linearity comes to the rescue. How is that? Well, as the value for vE tends to go negative, the rectification on linearity stops it from going negative, and it puts it back on track. And so we have the network looping around on this limit cycle. Here's another way to look at this limit cycle. So if you plug vE and vI as a function of time, then you'll observe that initially the vE and vI values start to increase. But then, once you hit this rectification non-linearity, then you have a stable oscillation. So, both vE and vI start to oscillate in a stable manner, and that corresponds to a going around on this limit cycle. So let's summarize what we saw in the previous slide and in this slide. So when you change the parameter tau I from 30 to 50 milliseconds, the nonlinear network made a transition, from having a stable fixed point, to becoming unstable and resulting in a limit cycle. In dynamical systems theory, such a transition is known as a half bifurcation. Well, I think it's time now for our own half bifurcation. That wraps up our journey into the land of networks. Next week, we learn about how the brain learns, by changing the connections between neurons in its networks. Until then, adios and goodbye.