Okay, let's talk about residuals.

Recall that we're talking about linear regression so we have some data that

we can show on a scatter plot and we fit a line through it.

Remember last lecture we talked about the fitted values where the y values

associated with a particular observed x value on the line.

Whoop, I lost my x there.

Okay, so another useful concept is the length,

the vertical distance between the point and the fitted value on the line.

So if we take y and subtract off y hat,

which we can just write as y- beta not

hat times jn- beta one hat times x.

Those are called the residuals, and in general the residuals,

this is specific to linear regression but this is generally the other question

y- y hat will be applicable throughout the rest of the semester.

So just like we can think of our least squares criterion as trying to minimize

the sum of the squared distances between the observed points and the theta values.

We can also think of these squares as trying to

minimize the sum of the squared residuals.

So residuals will tell us a lot about the fit, and

in fact there are many reasons why we would want to look at residuals.

And I think probably most have you have seen these sorts of thinks before where

if there is some obvious component of failed model fit

when we fit the model looking at the residuals we'll really highlight in on it.

So if we look at the plot by itself,

we include all this space here and space here that's sort of irrelevant.

When you focus in on the residuals you can focus in

just on the most important parts of the plot.

Residuals will also be what we use to define and

obtain an estimate of our variance.

We're gonna cover more of this as we get to the more inferential components of

regression, but we're gonna first cover multi-variable regression before we do it.