[MUSIC] The first place where neural networks made a tremendous amount of difference, is in an area called computer vision, so analyzing images and videos. So let's see a few examples of how deep learning, or this big neural networks, can be applied to computer vision. So to do that, it's good to understand what image features are. So in computer vision, image features are kind of like local detectors that get combined to make a prediction. So let's say we take this particular image. Suppose that I want to predict whether this a face image or not a face image. I run the neural detector, let's say a nose detector, eye detector, another eye detector, a mouth detector, and if all of these fire, you can do it and using a little neural network, you can say this is a face, and that's our prediction. Now, this is a simple example of how it can build a classifier for images, but in reality they don't explicitly have a nose detector or eye detector. What happens is these called image features or interest points and there's various names for this. But they really tried to find local image segments, patches, that are really distinctive. So then maybe they'll find the corner around the eye, maybe the corner around the nose, so if you have lots of this corner detectors, a face is comprised of corners. Corner detector firings at places around the eye, the mouth, and both eyes. And if enough of this fire in a particular pattern, you discover that you have a face. So this is how computer vision typically works. So this is how classification works. Of course, there's more general models and more complex ones, but this is kind of the basic idea. For years, these types of detectors of local features are built by hand. A very popular one was called SIFT features. And this retransformed their computer vision because they were really quite applicable and quite cool. And then, there are many others that improve the accuracy. So other kinds of features that can be used. We talked about this hand created image features like SIFT feature, and so, let's talk about how they can be typically used for classification. What we do is, we run the sifted textures over the image and they fire in various places. So for example the corners of the eyes and the mouth. And then what we do is, we create a vector that describe the image based on the firings, the locations where those SIFT features fired. So, you might have some firings in some locations, no firings in other locations, and this can be viewed similarly to the words in a document. So, does the word messy appear? Does the word football appear? Similarly, does a corner appear in a particular place in the image. Now once we have that description of the image, we feed it to a classifier. So for example, a simple linear classifier like we talked about earlier in the quarter. It's not a quarter, we're teaching this online. It was earlier in the module. [LAUGH] So as we talked about earlier in the module, you can feed it to a simple linear classifier and some names for those are things like linguistically regression, support vector machines and more. And from there, we get a detection as to whether this is image is a face or not. Now that sounds pretty exciting and it had a real significant impact in their computer vision. The challenge though, is that creating these hand built image features was a really complicated process and required several PhD thesis to be done well. Neural networks are going to discover and learn those features automatically. Let me give you an example of that. Suppose they give you this input image and they run it through a three layer neural network before making a prediction. Typically what happens, is that you learn local feature detectors, they're like SIFT, but at different levels and different layers. And this detectors that you learn, they detect different things, different properties of the image at different levels. So, the first layer, you might learn detectors that look kinda like these little patches, which really react to things like diagonal edges. So this first detector here is all about capturing diagonal edges. The center one is about capturing diagonal edges in the other direction. And the last one here is about capturing transitions and color from dark to green. Now, if we look at the next layer, you're combining this edge, diagonal edge [INAUDIBLE] into some kind of more complex detectors. So, for example, we discovered this wiggly line and pattern detectors in the layer. You also discovered this kind of detectors that react to corners, that [INAUDIBLE] detect corners in the images. And at the final layers you come up with detectors that are even more complicated. So for a variety of images you might end up with things that react to torsos and faces. Or maybe if you have a bigger data set, even to these images of here, which they fire up with images of corals. So neural networks capture different types of image features at different layers and then they get learned automatically. [MUSIC]