So, welcome back. The final thing we're going to do in today's lecture is to talk about the profile likelihood, which is a method for creating univariate likelihoods from multivariate likelihoods. So in this case, we're going to look at the bivariate normal distribution, which has two parameters, mu and sigma. And we're going to figure out to get likelihoods for mu alone and you could equivalently do it to get likelihoods for sigma alone. And here's the idea. The multivariate likelihood is a bivariate surface. It has mu on one axis, sigma on another axis, and the surface above it. And to obtain a likelihood for mu, profiling is basically like let's imagine we took a lamp and shine it along the sigma direction and looked at the shadow that the likelihood placed on the plane defined by the mu direction. And that's exactly what it gives so it's name is exactly indicative of the, of the technique. And then, now we'll just go to how do you actually execute the mathematics to do that. So, in other words, we want to shine the light on this bivariate likelihood and we want to get the function that you obtain onto say, the wall, where the shadow occurs, okay? So let's pick a particular value of mu 0 and we want to know what's the value of this curve in the shadow at mu 0. Well basically, the light will go through all values above the likelihood and we'll get stopped anywhere on the likelihood and up until the maximum value. And so what we basically do is we maximize the joint likelihood for sigma with mu fixed at mu 0. And then, this process is repeated for lots of values of mu 0. So, let's actually go through it. So the joint likelihood with mu fixed at mu 0 is just the Gaussian density. And then we have independent data so we take a product out front. So it's sigma squared to the minus 1 half e to the minus xi minus and we're fixing mu 0 squared divided by 2 sigma squared. And collect all the terms and you get the next line. With mu 0 fixed, then the maximum likelihood estimator for sigma squared and you can go through this. Log the likelihood, take derivatives, solve for sigma squared, you know, so maybe just so you don't accidentally take the derivative with respect to sigma, replace sigma squared say by theta, so that you remember that you're fixing sigma squared as the parameter, not sigma as the parameter. If you accidentally take derivatives with respect to sigma, you'll get the square root of this answer, of course. So then, you wind up with a summation i equals 1n xi minus mu 0 squared divided by n. That's actually a nice result, right? If you fix mu at a particular value then your MLE for sigma squared is the sample variance. But instead of plugging in the sample mean and subtracting deviations around the sample mean, you're subtracting deviations around that specific value of the mean. So, it's a nice little result. So, anyway, with mu 0 fixed, the maximum likelihood estimator for sigma squared is this generalization of the variance right there. So that's the peak of our likelihood, all right? That's the point where the light switches from being able to not go through the likelihood to the point right above it where the light is actually, you know, passes over the likelihood. And that's that point. That's that point that gets shadowed onto the wall at mu 0. And so, we want to plug this back in to the likelihood and we get this function right here. Summation xi minus mu 0 squared over n raised to the minus n over 2 power, and then e to minus -n over 2. And this e to the -n over 2 is irrelevant because that it doesn't involve mu 0. So that's for one mu 0 and if we did that for every mu 0, we would get a function. And so here's our profile likelihood is this function. Summation xi minus mu square raised to the negative n over 2. That function is our profile likelihood. And then again, this function is clearly maximized it at mu equals x bar. You can, of course, solve it. But in general, one nice property of the profile likelihood is that the maximizer of the profile likelihood, the maximum profile likelihood estimate is also your MLE for the parameter. So, in this case, the maximum of the profile likelihood for mu is going to be x bar, the same as the maximum likelihood for the complete value. So if we wanted to divide this by its peak value, we would simply divide it by the same thing in, with instead of mu, there plug x bar n. And that would normalize this function so it tops it out at 1. So, lets actually go through the R code to generate this function, our mu for the sleep data. So, our muVals, we're going to go from say zero to three and do a thousand of them, so we plot a function of the thousand mu not values, our likelihood values. And then so it would just be the sum xi minus mu squared sum raise to the minus n over 2 power so that's this term right here, raise to the negative n over 2 power. But I want it to be maxed out at one, so normally I create the likelihood and then divide by its maximum value. But in this case, I know exactly what the exact maximum value is. It's when you replace mu by the mean. So instead, I divide it by the mean right here and this sapply is just a loop it says loop over mu values and do this function. And then I'll plot them and connect them together with type equals l and then I'll put the likely values above 1 eighth and above 1 sixteenth and then I get this plot. So that is my profile likelihood for mu. That is the function that I get if I take the bivariate likelihood for mu and sigma, place a light along the direction of the sigma axis and look at the shadow on the wall, this is the outline of that shadow. And that's called the profile likelihood. And there's many theoretical properties of the profile likelihood. But, most importantly, you can kind of treat them as if they were a standard univariate likelihood. So, you would treat this just like a regular likelihood for mu, the higher values are better supported, the peak is where the maximum likelihood estimate occurs. And you could draw horizontal lines to get likelihood-based intervals for mu. Well, that's the end of today's lecture. We gave you many ways to create confidence intervals. We gave you methods for creating T confidence intervals. We gave you a method for creating a confidence interval for a variance, maybe not the most useful one, but we did it. We also showed you lots of really neat kind of well-known amongst statistics circles but not generally well-known techniques for generating likelihoods when you have Gaussian data. And so all these techniques you could use in practice. If you have data, and you're willing to assume that it's Gaussian, all of the techniques would apply. The specific technique of the T confidence interval is a very robust interval, as long as your data looks roughly mount-shaped then you, you're going to be okay. The last thing I always mention is a question I always get about the T confidence interval is basically, the T confidence interval and the standard normal confidence interval look the same, except with the T quantile replaced by a standard normal quantile. And people always ask me at what point do I switch, what sample size do I switch between its T confidence interval and a standard normal confidence interval. But the point is, is that the T confidence interval limits to the standard normal confidence interval. So my answer to that is just always do a T confidence interval. Just never do a standard normal confidence interval. And then you don't even have to worry about it because if your sample size is big enough that T quantile looks like a normal quantile anyway. So hopefully that answers that question. And I look forward to seeing you next time where we'll expand on confidence intervals for more general settings where we have multiple groups. [MUSIC] [MUSIC]