Image classification refers to a group of methods that can be used to try and extract information from an image, in an automated way. If we look at the ground from the point of view of the sensor, we're looking down and the sensor's dividing up the ground into individual squares or cells, and for each one of those cells, it's recording a number that represents the amount of light that's being reflected off of that patch of ground, in our study area. That's converted into an image, we have grayscale values that are associated with each of those that we can sort of visually look at it, but what does that cell really represent? So a remote sensor, measures the amount of light that's reflected off of the ground, and it converts that into a number but it doesn't really tell you what that number represents, whether it's grass or pavement or water or whatever. So you, have to turn that data into information and that can either be done visually through interpretation or manually through interpretation or it can be done automatically through classification. So visual interpretation is something that you have to do manually, using your brain. Classification is a way of trying to quantify and automate that using software and methods, where you try to identify patterns in the data that allow you to extract information in a more automated way. So, in this section we're going to just focus on the classification side of things. So the goal with image classification is to automatically group cells into land cover classes. The idea, what we're hoping is that different land cover types will have different values or different combinations of values or patterns of values, that we can somehow identify as a spectral pattern in a quantifiable way, and what we want to do is then create a thematic map from that original data. In other words, we want to take that image that we get from the sensor and convert it into a new image with new values where, instead of just having numbers that represent the amount of light reflected, we could say, number one means water, number two means corn, number three means pavement and so on, which that's the thematic part of it. So, that gives us a way of being able to analyze that data in a much more useful way, as we can say, I want to measure distances from water, okay I can isolate all those cells that have a value of one. So this is a way of being able to try and extract that and turn it into thematic data. In order to understand how image classification works, we have to make sure that it's clear to us what we're talking about with this idea of spectral profiles and spectral signatures. So, if we look at the amount of light that's reflected from different types of materials over different parts of the spectrum, so for example lawn grass, versus a maple leaf, versus a first spruce or dry grass or a certain type of rock like dolomite or clear water, versus turbid water with sediments in it. The whole idea here is that different types of materials will absorb, transmit, and reflect in different ways, different parts of the spectrum. What we're gonna try and do with image classification, is find a way to recognize where those differences are most apparent, and use that to try and mathematically isolate cells that we can then use to identify things. So let's start with a natural color image, this is for an area near Toronto, called Jokers Hill, it's Scientific Reserve that's affiliated with the University of Toronto. I've purposefully zoomed in quite a bit so you can see the individual pixels. I know sometimes that can be a little hard to look at, but I've done that on purpose so you can actually see different types of land cover, and the individual cells. So this is a natural color image, in other words I've assigned blue light to the blue on the screen, green light to green, and red to red. So, from a combination of being able to interpret this visually, and because I've been there before, and I've worked in this area, I can tell you that I know that this is water, this is forest, this is what I'm just calling meadow, bare soil, so that's a farmer's field that's been turned over, and this is a crop. We can look at the same data, but with a different color combination. This is a false color infrared image, and so we can see that we're able to extract different information visually, just by the fact that we're using these different color combinations. Here I'm using near-infrared light, green light and red light, instead of red green and blue. So, this is just a way for us to kind of think about the fact that we're seeing these different color combinations, but can we somehow classify them that way. So, let's look at this image again, and now I actually have cell values for these different land cover types, four different bands and Landsat 7, so these are real numbers I've extracted these using the software. So, for example our crop has a value here, so this is the crop area here. So, in band one it's sort of a relatively low amount, band two it's a little bit lower again, band three, band four it's really high, band five it's a bit lower, and band six it's fairly low. Okay, so the fact that it's high in band four, and then it shows up as bright red, and I've assigned the color red to band four makes sense, so I can see that that has a high amount of reflectance in that band, and I'm trying to get you to see how you can start to interpret this stuff in terms of like, what colors am I seeing on the image? How does that relate to the amount of reflectance I'm seeing with different types of bands? Okay, so let's keep going. So here's our meadow, that has a different spectral profile, forest has a different one again, bare soil and water. So you can see that in particular, band four is quite good at separating out the different types of materials, so there's water, crop is good, but actually I should say that bare soil, forest, and meadow are fairly similar values in band four, but bare soil is quite different than the other ones in band three, so band three might be good for separating bare soil from the other ones. So, I won't go through all of this, but this is the idea, as you're trying to find these spectral signatures, what's different, in what band, and how can I use that to try and isolate things? So let's use this to do our spectral classification, I'm only going to look at bands three and four and this is a very simplified version of how classification can be done, but really a lot of them are just based on this similar idea, they're just more sophisticated, statistically, and mathematically. All right, so we're just going to look at two bands from our image, we have a red band and a near infrared band. So, these are the images here, so this is our red image and our near infrared image, this is our red band here, and our near infrared band there. I'm going to do a scatter plot, this would also be referred to as feature space that's the remote sensing lingo for it, and let's just look at our different land cover types. So if we took one cell for water, in our red band, the same cell for water in the near infrared band, and then we put it on our scatter plot, this is where it would end up being. So that kind of makes sense, is that if we look at water in band three and four here, they're fairly low values, and so that it's a low value in the near infrared, and it's a low value in the red bands. So, that's how I'm charting this or graphing it. We could do this for a bunch of different cells that are all water, that we know are water, and we'll notice that they all kind of cluster together, they all have similar values, which is what we're hoping for is that the same type of material will have the same kind of spectral response over and over again that it's consistent, so that we can use it for mapping. Let's try the same thing with our forest area. So, there's our forest cell and so this is a little bit different. You'll notice that it's low in the red and relatively high in the near-infrared. So, if we go over here, forest is low in the red, which is band three and it's higher in band four. So, that's exactly what we're doing, we're just seeing it in a different way by putting it on the scatter plot. So, if we do the same thing, we take a bunch of different cells, we plot those individual values, we see that they're all similar. We can do the same thing for meadow, for bare soil and for crop. So, I hope what you're seeing here is that we have these patterns that are emerging, or these clusters for the different land cover types. So, we can draw a box around each of these. So for example, for water, what we're getting out here is if we get any cell in our image that has a range of values in the near-infrared between there and there, and a range of values in the red between there and there, then we can probably guess that those cells are going to be water. Then for here, for forest, if we have cells that have a value between here and here in the near-infrared, and here and here in the red, then that's going to be forest. Even this alone, you can see that, actually the red, the values are fairly similar between water and forest, right? But what saves us and what it allows us to be able to distinguish them is the near-infrared, because we're getting quite different values in the near-infrared between water and forest. So, it's the same thing for meadow crop and bare soil, is that what these boxes represent are ranges of values that you could use to essentially just reclassifying image or say if it's between this value and this value and this band, then make all of those the same value and we're going to call that land cover this, whatever bare soil, water and so on. Not to blow your minds, but you can actually do that with three bands or four bands. All you're doing is coming up with these ranges of values for each of the bands. The more bands you have, the more likely you are to be able to isolate those individual land cover types. So, the result of this is that we have our input image here. This is just obviously a hypothetical version, but if we look for those patterns of similar values, we can use that to classify them. So, what I've done is we have cell values that are all on a similar range here and I said okay, the software has recognized that and this can be done in an automated way or a semiotic made way. I'm not going to get into the different algorithms here. The result is that you end up with cells that are all assigned the same number. So, we have a more simplified version of our data that we can then use for mapping purposes. We can then assign each of those a different color. So maybe all of those cells that are now ones represent water, maybe all the twos represent vegetation or some type of crop or whatever level of detail we're able to get. So now, this is our thematic now, this is our way of now being able to say I want to be able to analyze this in some way, so how much of our land is in class one, how much is in class two or do we want to measure distances or whatever it is we want to do with that next, or is that class one land zone for a particular purpose from another map layer that we're looking at? So, the whole idea here is to extract information. If we have our original image here, if we just leave it as that, then all we can do is interpret it visually. But once it's classified, once we have our output here, now we have something we can work with in a more GIS way, that's data that we can actually work with to do analysis. So, here's our natural color image of the larger area. Here's a false color version of the same image, and this is a section that I've classified. So, I just made it semi-transparent so you can see that there is a pattern between what's been classified and the original image. So, here's our study area again with our different land cover types and here's the classified version of it. So, what I'm hoping that you're seeing is a couple of things. One is it's ugly looking. That's okay. I can take it, I understand that, I'll get to that in a second. But it's a simplified version of that image. It's literally been classified or interpreted for us. The next step from that is for us to say, okay, well I think I know what those classes represent but is that really what they represent. There's different ways to verify that, you could compare it to, say, in your photo, you could go and do field work there. There's lots of ways to do that, but the idea is that we now have these classes and we have to make sure that those classes are what we think they are. Part of the way to do that and what I've done here is, at least to begin with, I'd like to give them really high contrast, bright colors that are different from one another not because I think it looks pretty but because functionally it works better that I want to be able to easily tell what's class one, what's class two, what's class three, where are those things and be able to tell them apart from one another. Once I've gone through that process and I've done that identification, then I might give it a more visually appealing color scheme. For now, I'm just trying to find something that I can work with in order to be able to identify what those classes might be. So, here's our natural color image in our classified image. So, the classification process is not just limited to two bands. As I showed in my example, you can use three bands, four bands, five. So, for example here, I might use three. I could have a green band, a red band and the near-infrared band. So, these are all images that were taken at the same time of different wavelengths and we put those through a classification algorithm to identify those patterns and that's what gives us our unique combinations that we can identify and come up with a classified image. So, this is an aerial photo for this same area and this is band two for that. So I have zoomed in a little bit, so it's a little more pixelated but I want you to be able to see the differences here. So this is band two, which is the green band, the red band and the near-infrared bands, and here's the natural color image for that area, and a false color image for that area, and here's the classified image for that area. Again, you may look at that and say, wow that seems noisy or complicated or pixelated or what am I looking at. So, remember, the legend on the lower right here from 0 to 10, that's all I have to start with. All I know is that we have one group of cells that have been identified as being similar to one another. Those are in class 0, the next ones are in class 1. As I said, there's different ways of doing this, but then we have to decide what those individual things are. So, it turns out that classes 2 and 3, if I isolate those, they seem to correspond fairly well to this open woodland, I guess I would call it at least to begin with. You can get more specific with it, but you can see that there's a fairly good correspondence here between this open area with some trees and then the more forested area there. So, this, I could just refer to as say a class of forest, this is where you have to then decide, well is that good enough for what I want to do, do I want to go back and try to classify it again, do I want to try and separate things out in more detail. So, for example, would I try to get one class for this area and another class for this based on how many trees there are, the density of them? Am I happy just to having one class that's crop or do I want to try and pick out different types of crops or soybean versus corn, something like that. That can be a fairly time consuming complex process. So you can either go with much more general kinds of classes like vegetation versus water, which way are more accurate because you can say, well, I know for a fact that that's all vegetation, or you can try and get more specific and more detail and say is it coniferous forest versus deciduous forests, is it a maple tree versus a spruce tree. So, the more specific you try to get, the more difficult that can be, but if you can do it, the more information you end up with at the end. So that's just an overview of image classification. I just want you to understand conceptually how that works and how that relates to things like band combinations and spectral signatures so that in the future, when you're trying to work with this data, you have some appreciation of what you might be able to do with it or how you might be able to extract information if you use this automated or semi-automated process through image classification.