In this set of videos were going to motivate and gain an understanding of how Recurrent neural network work. Now let's go over the learning goals for the set of videos, in this set of videos, we're going to cover what Recurrent Neural Networks are, as well as the motivation behind them. We'll discuss both the practical and mathematical details, that allow you to understand how recurrent neural networks work. And then finally, will touch on some limitations of these recurrent neural networks that we discuss, that will lead into our next set of videos on how to adjust for those limitations. So we discuss how processing of images, will force them into a specific input dimension, where with our grayscale we can imagine the two dimensions of, pixels a 28 by 28. And why something like a convolutional operation, which we saw in the past videos takes on surrounding cells, and it makes sense for these types of input data. But this may not be immediately obvious, in regards to text, in regards to what kind of data we want to input on, what kind of operations we want to use. For example, if our problem statement was to classify tweets as positive, negative, or neutral. Different tweets can have different number of words. And we want to know, how can we account for this variable length for each one of our input sequences. Now we want to do better, than just the bag of words implementations, which would essentially take every word, and just state how many times that word appeared in the document. Ideally, one working with text data, each word can be processed and understood in the appropriate context, and by context we can think of it as the prior word surrounding that word, prior sentences, etc. And those words should be handled differently depending on that context. You can think about, say, about being either the animal of a bat, or a baseball bat. Also, we get more words, as we get those more words, we should be able to update the context that we are currently working with. So the solution will be, to use this idea of recurrence where we input the words into our network one by one. And this would mean that, we can deal with variable lengths by just continuing to feed until the end of a sentence, or till the end of the document. And because we have the information from prior words, the response to any particular word, can depend on those that actually preceded it. Since we're feeding it one by one. Our network would then, output two things as each new word came in. One being a prediction, if the sequence were to end at any particular word, what would the prediction be? And second a state which contained a summary, of everything that happened in the past leading up to that point. Or again, that context that we're looking for. Now this picture of how the recurrent network is used, or how it's built is often a bit confusing to digest, but what we're looking at here. Is that the input will come into our network, one word or one time step at a time. And as those values come in, we can update the state with all past contexts, so we can keep track of the inputs that come in. As well as out putting a value, so that we can have a prediction at each word or at each time step. Now, as I said, this may be a bit unclear, so let's unroll this recurrent neural net and take a deeper dive inside. Now looking at the quote Unquote, unrolled version of what we just saw. We're going to have our words coming in as input, and those come in one at a time, and then starting back at W1, we have a linear transformation denoted here by this matrix U. And note that, W will be some vector representing that single word, and in general, RNN does not only take in words, but can be taking in any information at one point in time. So you can imagine this being an input vector, with sales data with inventory levels. Promotional spends all for one time period and then W2 being that same information for time period two, and so on. But again, in this instance we're just going to continue to think of this in terms of words, where each word has its vector representation for word one or two, etc within the sentence. Now how are we able to store, and pass that information from one cell to the next? The way that we do that, is that each step, along with our input dot product of W and U, which we just did right? We took the dot product of that W vector, with our U transformation, we're also going to be getting as input the state from the prior cells. So starting at S1, or even before S1, we can initialize that state with a zero vector, but then we pass that information from S1, or State one to S2, and so on. The way that we do this, is we add together the values from that prior state. And take the dot product of that state with our matrix W. We then combine the values from the input of W 1 and U. As well as S1 and that W matrix, and pass that through altogether, all combined through an activation function to get our new state. Now the output from this activation, can be used as actual output at each step, and that is often the case. We can also do as we see here with these V matrices, is have another transformation take place with this vector. Pass that through another activation, and either pass that new value as output, or even create another layer on top. You can think of this as just the first layer in our neural network, with some amount of nodes. We can have it another depth to our layer, and create another layer taking in as input these output values. And once all that is done, often we may only use that final output that we have here, to create our ultimate prediction that we're trying to make. So as an example. The first two words have an unknown sentiment, or the last two words that we have here, are going to have that positive sentiments. So you see the question mark, and then it was able to predict positive sentiment, if it's predicting sentiment at each one of these different outputs. Now each of these cells can have an output greater than one, so we can imagine if we set our matrix to have an output with, say five values, so that O1 is an array with five different values. We would be getting five different outputs at each one of these steps. And this is the idea of having more than one node, in our first layer within a fee for a neural network. Those are one in the same. So if we are assuming something like five, or more nodes, or five or more outputs, and ultimately we want to predict something, like a class that's only between two values or three values. we'll need to have a dense matrix, that give the linear combination of each one of those nodes, as well as an activation function, that results in either values, or only a single value depending what we're looking at. Now, if this is a bit confusing, an important note, is that usually were only looking at that final output. So here say O4 output four. And since that is the only output with information from all the other inputs, that's going to be the most important. And that single output of O4, can have those five values that we just talked about, or 32 values whatever amount of values you want, in regards to the number of nodes you want in that first layer. And if we have something like five or 32 nodes, then we need to pass that through a dense layer, just that O4 in order to come up with the prediction. Whether that's an output of just one value or three values, whatever it is you're trying to classify. Now what we have here, what we have circled here is really the crux of our recurrent neural net, which passes through that save state from all the prior inputs within our sequence. In Keras we call this part, that we have as the input, the kernel and the kernel, refers to the matrices used for that input transformations, those use and we can initialize these weights, using our kernel initializer and will see that in the notebook. And then we also have weights, within our current portion of our network, and that will also need to be initialized, and those are going to be the Ws that we see here. Now that closed out this video, and the next video, we'll start to walk through at a high level. The actual math of how this all works alright, I'll see you there.