Okay, in this video we're going to teach you about the method of maximum likelihood for estimating a statistical parameter. And why is this important? Because in the next video we're going to teach you how to rate sports teams. And you could rate tennis players or chess players is where this basically, originally came from, the idea. Used to rate teams based on wins and losses, not the scores. So this would be good for tennis matches, if you had the data. It began with chess players, I believe. The work of the great Arpad Elo. And you can look up on Wikipedia, or anywhere, for Arpad Elo's books on how you rate chess players. Okay, but let's concentrate on what the method maximum likelihood of it is. Suppose you watched Dwight Howard shoot free throws, painful experience. Shoots one hundred free throws. He makes forty-eight. Based on this data. On this information alone. What would you estimate his free throw shooting percentage to be? Sorry. And you'd probably say 48%. Well, what is the statistical justification for this? What you do is you estimate the parameter here, P is the chance Dwight Howard makes a free throw. You estimate it as whatever value P would maximize the likelihood of what you saw. Now if you let P be the chance Dwight, however, makes a free throw by the binomial theorem which you probably had, you take the number of ways you could pick 48 successes out of 100 making the free throws. And there's a function for that. Ways to choose 48 out 100 not important and then you'd have to take the probability. Well, let's let P be that probability. Or probability, let's do 52%. And then the chance that he would make 48 out of 100. There's a function binome disk that we'll talk about later. But basically you would take the probability of one way of making 48 out of 100, which would be the probability times the probability. I'm going to name this cell PROB. One way of making 48 out of 100 would be make the first 48 and miss the next 52. And the probability of that happening would be probability times probability to the 48th power. And you would multiply that times the probability he would miss. To the 52nd power. And then you'd multiply that times the ways you could pick the 48 free throws he made out of 100, and that is combinations. Don't worry too much if you don't know this. It's not important. So you'd multiply that times the probability to the 48th, times one minus the probability to 52nd. And you'd like to pick the probability that maximizes that. Well, see this is sort of a hard thing to maximize. And so if I put FORMULATEXT in here. Okay this is the constant. The combinations we dont care. But if you want to maximize a probability like this that's the product of something to the 48th [INAUDIBLE] probably the 52nd. You should max the log of the probability, what's called the log likelihood. Okay. because maximizing the log of something is the same as maximizing that something. Okay? If it's positive number. So I think you remember some rules of logarithms. The logarithm of the product is the sum of the logarithms, and the logarithm of, let's say, something raised to a power is. That power ties the logarithm of what your base, okay, so what would, we don't care about this constant, but we want to maximize probably to the 48 times probability to the 52nd. So that's the log of probably to 48 plus the log of what's makes probably the 52nd. But what that is is 48*Ln(prob) + 52* LN(1-(prob). Okay so that's what we want to maximize, so we want to pick a probability to maximize that and we can use the solver. So the log likelihood would be 48 times the logarithm of the probability. Plus 52 times the logarithm of one minus the probability. So we should pick the probability to maximize that thing, see that's a much more tractable number it turns out.then if he shot a thousand free throws. This would be very close to zero. Maximizing log likely turns out to be much easier for the computer. So all I would say is with the solver, pick a problem maximizes log likelihood. Pick a probability and I should probably add that probability is between let's say, 0.99 and 0.01, because if you can rent like log at zero, it's undefined, and you can run into problems with that. So we'll say greater or equal to 0.01. And we get 0.48 and you could actually prove this mathematically if you know calculus. I mean, take the derivative, it'd be 0.48 divided by prob -52/(1-prob. And if you solve for prob, you get 0.48. If you don't know calculus, don't worry. I'm certainly not expecting you to. But that principle will have some likelihood is going to be a key for us to figure out for instance, what's the chance of making a field goal based on the length of the field goal? And rating teams based on wins and losses, which we'll get to in the next video. What we'll do is pick ratings of teams, that basically maximize the probability of the sequence of wins, losses, that we observe in the NFL. And if there's a tie, we can count it as one win, and one loss. So we'll show you how to do that in the next video. And basically it's, that's a form of what's called logistic regression. So we've done ordinary regression, but when what you're trying to predict, the dependent variable, is binary where it's going to have two outcomes, like win or loss, make a free throw or not, you cannot use ordinary multiple regression. And I'll refer you, if you're interested, to the marketing analytics book for more of a discussion on this, but I don't want to get bogged down. But whenever you're trying to predict something, where the dependent variable has two outcomes, like success or failure. They subscribe to ESPN Insider or they don't subscribe to ESPN Insider. They make a field goal or don't make a field goal. They win or lose a game. Then basically you need to use logistic regression. It's an important tool in any data scientist analytics tool kit, and we can introduce it you in terms of trying to figure out the ratings based on wins and losses. And the BCS force the computers to rate teams this way, not letting them use the scores of the games which is sort of stupid.