[MUSIC] Okay, so how do we set this up? Imagine you had five documents, that were just represented by these simple phrases here. So, document 1, is just the phrase, Romeo and Juliet. Document 2, is Juliet O happy dagger!. Document 3 is, Romeo died by the dagger, and so on. Well, with some stemming and some normalization of the terms, we could associate a column of date with each one of these terms. And each document is in a row in this table or matrix and we put a count in each position for the number of times that term appears in the document. Again, you've seen this in a couple of different assignments now. Okay, so given this basic setup, you can think about a supervised learning version of the problem, or an unsupervised learning problem in different context. So the supervised version is, you have a corpus of documents, and a human has assigned a label to each document in the corpus. And so, you wanna come up with some sort of model that can predict that label given the text in the document. An unsupervised learning version of this problem is that no labels are given, but you wanna discover groups of similar documents. So you wanna say that, look all of these things have something in common. Maybe we don't understand that it's about sports, but we know that they all have something in common. And these over here, we don't know it's about science, but we know they all have something in common. Okay, and so that would be an unsupervised learning problem. In particular, it was sort of a clustering problem. All right, so in all these cases there's going to be three core components to think about and this again comes from Pedro's article. So they are, representation, evaluation and optimization. All right and so every problem you come into is going to have something to say about all three of these, okay. Representation is, what exactly is your classifier? Is it a rule like the ones we saw with the golf example? A simple hypothesis of maybe when it's sunny, we always play golf, right? The space of all possible such hypotheses, all possible such rules, is the representation of this classifier, okay, or each one of those is perhaps the representation. Or is it the space of all possible neural networks that map inputs to outputs, right? Or is it a decision tree which we'll talk about in a little bit? Or if it's more numerical data, if you've got a 2D scatterplot and you try to draw a line that precisely separates the data such that all the positive examples are on one side of the line and all the negative examples are on another side of the line, the space of all possible such lines is the representation here, okay. All right, so what are the nuts and bolts, what are the units that you're talking about? Okay, fine. So, once you have that figured out, you can think about, well look, how do I judge whether one particular example of this instance of this representation is effective or not? How do I know the good ones from the bad ones? So, for example, with the rules of when you're trying to play golf, I've got a rule that says when it's sunny we play golf and when it's rainy and windy we don't. Those are two different rules. How do I judge which one's better? You have to make some sort of a choice about how you're going to evaluate this thing. And it could be just the number of errors it makes on some given test set. Or you could define some notion of precision and recall. Or, if it's more of a regression, numerical case, you could talk about the absolute error or the squared error or other variance. Right, but you need to make some sort of a decision here. So now we have a representation, we have a space of possible units, and now we have evaluation method selected where we can take a given instance and determine whether it was any good or not, fine. The third piece here is optimization. So how do you search among this potentially massive space or potentially infinite space of possible hypotheses. Okay, so in pretty much no case is it gonna be feasible to enumerate all possible instances and apply your evaluation technique to find the best one. So you're gonna have to have some method of searching through this efficiently. Okay. Now what's useful, I think, about this breakdown is that it helps make sense of the various terms and methods you'll hear if you sort of scan the machine learning literature, is that what will be being presented is just an optimization technique that can apply to a variety of different representations and even evaluations. And you can plug in your own evaluation and plug in your own representation and it'll still work. Other times you're talking about a representation that is amenable to a variety of different optimization strategies or particular algorithms, okay. So it helps to make sense to understand which one of these three things you're talking about, or if you're talking about something that really denotes some choice about all three. Okay. And as we talk about methods, we'll try to refer back to these three components and describe what's going on in each case. [MUSIC]