Let's first study this convolution operation in this one dimensional case. So in one dimensional case, the input is a set of numbers, a sequence of numbers for example, ft indicates input. It has this five values, one two, minus one, one minus three and then we have this future of size three. Which consists of this set of weights one, zero and minus one. Then this convolution operation was slide this filter over the input and generate a set of output by doing element wise multiplication and then sum them up to generate the value. For example, if we start from this first position. As the center, then, we multiply element wise modification with the future. There's three values here, since this is the first value in the input, so it's at the boundary. So we have to somehow handling this condition. There's different ways to do it, one way is pad some zeros to the left so that you can have this first values at the center and to do this operation. So let's say we pad zeros to the left, then we have, we can do element wise multiplication zero times one, then plus one plus times zero plus two times minus one. So this whole operation equals to minus two, so that's how you generate the output of convolution. Then you do this operation by slide this filter step by step going through the whole input, then will generate this set of output for convolution minus two, two, one, two, one. So that's convolution operation in this one dimensional case. There's another important concept here, so in this case with stride this future one step at a time over the entire input. So that's what we call stride equal to one. We can actually have larger stride. For example, we can have stride equal to two, so in this case will apply this filter operation every other elements. So we first applied this center at the first element one, then skip the second element. Go to the third one, the center at the third element minus one to generate another output. Then skip another one center at minus three generate another output so you can see that with larger stride you have lower dimensional output than the input, so that's very important choice when you designed the CNN architectures. So that's one dimensional convolution operation. Mathematically, if we're revisit this operation in this particular case. Essentially, we're doing this type of element wise multiplication between filter which is this W1, W2, W3. Against the input X1, X2, X3, then some them up. They give you the first output then. The second output is applying the same set of weights, doing weighted sum at a different location see in this particular case X3, X4 and X5. So this is how 1D convolution works. Now let's look at two dimensional convolution operation. So in many neural network applications were dealing with images. That's if it's a black and white images that's two dimensional input. So for example, in this case we have this five by five input. And then we have this three by three filters. Filter, sometimes also called Colonel, it's the same thing. So in this class we just call it filter. So we have the three by three filter and the idea is the same instead of slicer one dimensional, now we slide this over this two dimensional space by doing again the same element wise multiplication And summation over all those results. So for example, would take this first patch from the input three by three, then. Multiply corresponding element with the future weight, then some them up that give us four, if we slide this forward. We take another patch from the input and doing element wise multiplication and summation that give us three and so on. So mathematically this is how this 2D convolution works. So I is this two dimensional input, K is this two dimensional future and this astray operation indicate the convolution operation, so the element IJ of the outputs really. It's just doing a double summation over this future and then doing element wise modification and sum them up that give us the output from the IJ elements in this output. So the output of a convolution is called feature maps. Thus, 2D convolution and 3D convolution is also very common. Just imagine we have a colored input image so a colored images is usually represent as a by three different channels RGB and three different colors. And also this image has the specific dimensionality, for example five by five and then essentially we have this five by five matrix, three times for three different colors. In this case we only showing two different slices here. Again, it's very similar to to the end of 1D case and just in this case the future is also three dimensional. So in this case the future is of size three by three by two and then to do this convolution operation with first take a patch from this input and then we do element wise modification in both slices. And then some them up then this whole thing give us a single number that's the output for this first element in this output feature map. So if you look at mathematical operation, here is very similar. So the IJ element in this output feature map is really just doing this triple submission over all this futures and then just doing element wise small vacation between this patch and the carp spawning location in the future of course. In this case we do it for different kind of slices here and that give us the final output and in many cases. We also call this different slices channels, so let's give us for example RGB three different channels corresponding to three different colors. So now let's look at this complete convolution operation, so the convolution operation in this one dimensional case is just like this. Particular future, for example, in this case represented by W and layer l over the input, which is X at layer l minus one, but with slide this weight over the entire input. And that would generate that will generate the output from this convolution operation, that's indicated as Xl. And in most of the cases, convolution operation followed by a nonlinear transformation. In this case a ReLU operation follows. So that's just doing another non linear transformation for each output from this convolution operation. In the two dimensional case, it's also very similar, so you have input image six by six and if you have two different filters right? So far we have shown you only have a one filter right filter is is there this one dimensional case or two dimensional case, but we can have multiple filters. It doesn't have to be just one filters, so each filter. Maybe intuitively trying to detect certain type of patterns. And multiple patterns, if we have multiple patterns, we could have multiple filters corresponding to each of those patterns. So for example, here we have two different filters applied to the same input that would generate two different output feature map, in this case, 3 by 3. And that's the convolution output. And we can put them together, that give us three dimensional tensors as the output feature map, 3 by 3 by 2. And then we pass this through ReLU operations, that give us another 3 by 3 by 2 output feature map. In this case, we just make sure slicewise, that passed through a ReLU operation. So if you look at this complete convolution operation, what are the parameters? So the parameters here are all those filter weights, so all those filters and the elements in those filters are the things we have to learn from data. And also, this kind of bias vector from the ReLU operation, it's also a parameter. It is much smaller than the filter operation, usually, but it's still there. So we have to keep that in mind as well. So that becomes the parameters for a convolution layer.