In this video, let's get into the mathematical details behind recurrent neural nets. So starting out, we have our inputs wi, where I represents the ith position in our sequence, and as we talked about so far, that would be the ith word within our sentence. And with that we also have si, which is the state at position i, that holds all pass information that should be passed through our network. We have oi, which is the output at position i, and to calculate si as mentioned in the previous video, we take a function of the linear combination of our input. Add it on to a linear combination of our prior state. And that function should be some nonlinear activation function. And then to get our output or our final output, we take a linear combination of our current state, and pass that through the activation function. And assuming we're trying to predict classes that maybe a softmax for example. So what are we doing here? We get our current state as a function of the old state, an that current input. We then get our current input as a function of that current state. And we learn the appropriate weights for these function by training through our network. So what are going to be the different weights that we need in order to get that current state? Now, if we think through our matrix multiplication, thinking through all of this is passing in just a single input. In reality, we would pass through a batch at a time. Then we're starting off with an input of dimension r, and in our example that represented a single word from our input, so that's going to be a vector of dimension r. We then have s is going to be a dimension of our hidden state. And t, we're going to use as the dimension of our output vector, after passing that through our dense layer at that final layer. So in order to get the transformations that we need, and thinking back to that visualization that we saw earlier. U will be s by r matrix, so that will take our r dimensional vector or word vector input, and return something that is an s vector. So it's the same shape as our state. W is then and s by s vector, so we take that prior state of dimension s, and keep it in dimension s so it will still be an s vector. And then finally V is going to be a t by s vector or a t by s matrix, and I will transform our s vector from that hidden state, into something that is of size t or t vector that fits the dimensions of our output. And with that we should note that the learned weights U, V, and W are going to be the same across all positions, so we saw that unrolled version of RNN. And we had that U show up repeatedly. We should note that not U, or that V, or that W will be the same throughout. And as mentioned as well. Throughout we will often ignore the intermediate outputs, and only care about that final output that has seen all inputs from our sequence. So think about that unrolled RNN, we discussed that output for being that final output. We only really care about that final output. In order to train recurrent neural nets, there's going to be a slight variation to our normal back propagation method, called Backpropagation Through Time, that allows us to update the weights within our recurrent neural network. Now we're not going to get into too much detail about this. One can imagine that recurrent neural nets must learn weights by updating across the entire sequence. And thus if the sequence is very long, we are even more prone to that banishing exploding gradient problem than we are with our regular feed-forward neural nets. And in practice, we're going to set a maximum link to our sequences to ensure that they don't get too long, and with that in mind, if the input is shorter than that maximum, then we just pad that sequence. And if it's too long, then we would truncate it, and this ensures uniform input lengths for all of our sequences. Now we touched on this briefly earlier. But although RNNS are often used for text applications and those with examples we've seen, there are multiple uses for working with such a framework. They can be used for all types of sequential data, including customer sales, loos rates, or network traffic overtime. Speech recognition, so working with audio input for call center automation, and voice applications. For manufacturing sensor data, to tell where along a chain failure may happen to occur. And as even been extremely powerful in regards to our ability to now do genome sequencing. So we talked all about RNNS and some of the powers of RNNS. But one of the major weaknesses of RNNs is that the nature of that state transition as it's currently constructed, makes it hard to leverage information from the distant pants, or in other words, early on in our sequence. With that in mind, in our next lecture we're going to introduce LSTMS or long short-term memory, which you similar concepts to what we just learned. But I have a more complex mechanism for updating the state that allows for longer term memory. So that closes out our section here on recurrent neural nets. In this section we discussed recurrent neural networks, and then motivation in regards to learning neural networks for sequences. We discuss the practical and mathematical details, for how it allows for providing context for our sequential data. And then finally we touched on the limitations of recurrent neural nets, and accounting for information throughout the entire sequence, especially those longer sequences. And how in the next video we're going to introduce LSTMS to help account for such issues. All right, I'll see you there.