Welcome back. This section is going to concentrate on Supervised Learning. In particular, if you remember, supervised learning has to do with labels or we've identified an object and we've told the computer what the object is. So we're going to take that and we're going to use it for some applications. Here, we see an example of a person walking in front of a car. The question is, in fact that's my graduate student Sinem, who's been helping us on some of our machine learning applications, but we're trying to identify is this a person? Is it an animal? What is it that is in front of you? Machine learning is very good at these kinds of classification methods. They can use lots of data, thousands and hundreds of thousands of pictures of individuals that are labeled, and then we are able to identify what the characteristics are of those individuals. So in this context, in fact, what we have is to look for the features that lend themselves to identifying what causes that person to be a person versus a deer, for example. What are their characteristics? We found that in machine learning as opposed to human learning, we need to have many different features which are going to be the essence of this exercise. What are the features that lend themselves to the characteristics that tend to identify what this object is? The features can either be binary, cancer or non-cancer, or they could be spectrum, let's say risk of a particular default on a loan, or they could be a number of objects that are integer but not necessarily binary. So again, remember that we have this notion of labels. We're going to have to identify the labels, we're going to need enough data to do this exercise. Let's go back a bit and look at the traditional statistical area and let's compare that against Machine learning exercise of a similar type. Here, we see a chart of two variables; your systolic blood pressure and your diastolic blood pressure. The systolic blood pressure is the high number and the diastolic is the low number. This is an old app which basically says that if these numbers are low enough, you're in the normal category, if they're elevated a bit, you become slightly more at risk. You get higher and higher, and you get into the red zone which is the highest risk. This is the traditional statistics. Basically, it takes a small amount of data, tries to have an interpretable set of variables which we use for classification. Now, let's compare that against a technique that was developed by a colleague of mine about 25 years ago, John TUKEY. He looked at breast cancer X-rays and tried to understand what are the characteristics that lend themselves to women who have breast cancer and those that don't. So here we have a very simple stylized example where we have two features. Turns out that computers have many different features and instead of having a graph where we have simply two characteristics as we did before with low, medium, and high, we have in effect, a series of multiple features and we're looking at it in the context of multiple dimensions. What we want to do is find separators. This line that goes between these. If you look at this chart we see that the red dots would be, let's say, the women who have cancer and the green dots are ones who don't have cancer. So by looking at many millions of X-rays, we can identify the characteristics of the features that lend themselves to separation. The goal in this, which is called Support Vector Machines, is not to make any assumptions about what the characteristics are but to simply understand the features that separate people from one characteristic for another and of course, we're going to have adequate data for this type of exercise. Another example is to look at not just the features of a heart attack, but let's see if we can trace who was having a heart attack and under what conditions. If we have modern phones nowadays or even watches, say the Apple Watch or Fitbit watches, we can identify who's having a heart attack and who's not having a heart attack. Indeed, we see now apps that transmit the information to an emergency responder and one can come out immediately and try to help the person. We also see networks now where individuals can identify who's having a heart attack, if you sign into this app you can see who's on the street, and in particular where the defibrillator would be. So here's a picture of the defibrillators relative to a person who had a heart attack recently. So this is the type of things where we can use the information in the context. When we look at other examples like ZIP codes identifiers, here we see 10 possible outcomes, zero to nine, and we want to identify what are the characteristics that lend themselves to writing in this style? Now, humans would identify it immediately, right? What's the number between three and five? We'd say, well, that's a four, right? So we did identify that very easily. But computers think a different way. They don't think in the same way that humans do remember. So in this context, there are over 256 features that identify as ZIP code. It looks at this classification, uses the 256 features, and can identify what the ZIP code is. Most reliable methods in the world are very different from what we've seen in the context of humans. So let's now summarize this supervised learning and let's say something about traditional statistics. In the old days, we had limited data, we had to make thereby assumptions about what the cause and effect would be between the explanatory variables and the variable of interest that we're trying to forecast, and we'd then take the results and we'd try to use it for understanding purposes. Contrast that with machine learning where we have now large amounts of data, we don't make the assumptions of the cause and effect, we look at the data and we try to analyze what causes a particular characteristic without making the large numbers of assumptions. Now, we have to worry about this issue of sunspots or variables that don't have a big impact on the results, but they have a result but it's not causal. So we're going to have to worry about that later in the course and we're going to spend a lot of time on that. The summarize in feature selection is critical. Later we're going to worry about how to identify reliability and robustness, which is going to be an important aspect of this exercise. Again, we want to evaluate it and we'll see that we are going to need to have specialized methods to make sure that the techniques that we develop are reliable and robust.