What about pooling layers? So pooling layer is a lot simpler than convolution layer, so think about pooling layers as a down-sampling operation. So for example in this 1D case, we have four dimensional input after this pooling layer with 1x2 filter, so the filter is size 2 and with stride 2. So we apply this filter every two element at a time, then give us two output. So essentially, dimensionality reduced from four to two. And of course, we can also have pooling operation in two dimensional space as very similar. So for example, we have this pooling layers with 2x2 filter and the 2x2 pooling operators will take a patch of 2x2 window then turned out into a single values. For example, there's different pulling operation, the most popular one is Max Pooling. It would take a patch and then just keep the maximum value in that patch. For example, the first patch 2x2 patch here, will give us 4. The second patch, the maximum value is 5, so I have 5 here. The third patch will have value 6 and the last one have value of 4, so that's Max Pooling. We can also do some polling, that's just summation of all those values or a Mean Pooling, that's just taking average of the sum. So those are all kind of popular pooling operations. Now, let's understand the relationship between the input to a convolution layers and the output of the convolution layers. So say we have an input that is a three dimensional tensor of size W1 times H1 Times D1, think of that as colored images. For example, we have image of size 227x227x3, and that's our input. Then we have all these hyperparameters specifying the convolution operation. So K is the number of filters, spatial extent as F, so the filter size will be F times F And stride equal to S and the padding size, P. So in that case, the output will be another output feature map of size W2 times H2 Times T2, where W2 = W1- F + 2P divided by stride, S + 1. And H2 = H1- F + 2P divided by S + 1. And D2 = K, which is number of filters. So here's a quiz question, to calculate the number of parameters for this convolution operations. So here the input is of image of size the 227x227x3 and the hyperparameter number filter = 96, spatially extent 11, stride = 4, no padding, P = 0. So what's the number of parameter here for this operation? So we can calculate the output by the way, the output feature map based on what we learned in the previous slide. The relationship between W2 and W1, and all those hyperparameters, so you can find out this. Output feature map is at size 55x55x96 but the number of parameters for this operation is really mostly the features. So we have a 96 filters each of size 11x11. So the number of parameters really that 11x11x96, so that's our filter. So every element in those filters are parameters for this operation plus another K, so another 96. Those are the biased elements in the ReLU operation. So when we apply the ReLU operation to the convolution output, each of those ReLU also have a bias Factors in each of those operations. For each filter, so we have 96 filters, we have a 96 values here. So that's the answer. The number of parameters for the convolution operation. So what what about pulling operation? So what's the output dimension for pooling? Again, here the input is a size W1 times H1 times T1. And for pooling operation we only cares about spatial extent F, so it means we have F times F. That's the size of the filters and the stride of S, there's no padding involved for pooling. And the output is W2 times H2 times D2, where W2 in this case you go to W1 minus F divided by S plus 1. And H2 equal to H1 minus F divided by S plus 1. And D2 equal to D1. So however number of filters or number of channels you have as input, the same number of channel will be in the output of the pooling operation. So let's let's apply it here in this example. We have 55 by 55 by 96. If we apply a Max Pooling of a three by 3 by 3 filter with stride 2, it will give us what? Another feature map of size 27 by 27 by 96. So D1 equal to D2? So there's 96 that doesn't change. Then this W2 equals to 55 minus 3, so it's 52 divided by S, that is 2, so that give us 27. So that's how you calculate the output dimension for pooling operation. Here is another quiz, let's find out the number of parameters for the fully connected layer. So for convolution networks it will go through convolutional layer and some pooling operation. And eventually at the very last few layers will have again the fully connected layers. So in this particular case we have this input of W1 times H1 Times D1, where W1 equal to 13, H1 equal to 13 and B1 equals 256. Then we go through a max pooling operation of 3 by 3 filter with stride 2. Then go through a fully connected ReLU layer with 4,096 dimension output. So the question is what are the number of parameters in this fully connected layer? So for fully connected layer, is really all the input times out output dimensions, right? So the output dimension is 4,096, so that part we already know. And we also need to know the input. That's all we need to know. So the input is the output from the max pooling layers, so we just need to calculate out what the size of the output from the max pooling, in this case is a 6 by 6 by 256. So that's the input to the fully connected layer times the output, which is 4,096, and so that's almost a number of parameters. Keep in mind, since this ReLU operation also have a bias term be, so will also plus 1. That's the total number of parameter for this fully connected layers. And with this knowledge, now you can actually calculate for arbitrary convolution neural networks or CNN architecture. You can figure out the number of parameters involved. So in this particular case if you go through all this calculation for this particular architectures with multiple convolution and max pooling operation and followed by two different fully connected layers. You will see that the number of parameters in the convolution layers, convolution plus pooling layers is about 3.7 million. Well, in the fully connected layer, although it's just a few, it as it actually have many more. So I have a 58.6 million parameters for the fully connected layers, so that's a general pattern for, Convolutional network, because all this convolution layers with this weight sharing and local connection, the number of parameters is actually much fewer than the fully connected layer at the end. So that's the number of parameters. And we also want to understand the number of calculations or flops that involves in applying this feed forward operation from input to output for each layer. So if we look at com lotion layer, given an input of W1xH1xD1 and with all this ha I mean hyper parameters for the future's, number of future K Spatial extent, F, stride S and padding size P. And the number of calculation we have to make is really the output size times the future size, because every element in the output feature map involve doing elementwise multiplication's of us filter, right? So we know the input future sizes F x F xD1.m And we know the output feature map size W2xH2xD2. So multiply this two things together that give you the number of calculations to generate this particular output feature map. But keep in mind we already figure out this relationship between W2 to W1 and all those hyperparameter so just plug them in and you can figure out this total number of flops. So if we applied this or next, we also want to figure out the number of calculations for pooling layers. So in pooling layers we have this input size W1xH1xD1 and then keep in mind that the pooling operation is done for each channel. For example, we have D1 channel, so we will do this pooling operations separately on each channel. So the number of calculation is if we have output of size W2xH2x D2. In this case D2 equal to D1 because it's pooling operation. And this number of calculation is each element in the output feature map require a elementwise kind of comparisons if you do a max pooling operation or for averaging or some you still need to go through the size of the filter, which is F by F. So total number of calculation is output size times the filter size. What about the fully connected layer? So the fully connected layer if you have input of size W1xH1xD1 and with all those normal hyperparameters, then the calculation is really just the input size times all the output size. Say we have this 4096 dimension output, then input is this 6 by 6 by 256 and total number of calculation for this fully connected layer is really the multiplication of these two terms. Okay, now we understand how to compute a number of flops or forward calculations for a CNN architectures, then we can figure out the number of operations or calculations we need. Okay, so if you carryout all those calculation for this particular CNN architecture, you'll find out in terms of number of operations, all this CNN layers, convolution and max pooling layers lead to 1.08 billion operations. While the fully connected layers, only lead to 58.6 million operations which is exactly the same number of parameters in this fully connected layers. So if you remember in term number parameters fully connected layer have more parameters while, in terms of operations,the convolution and the max pooling layers actually require a lot more operations.