0:01

I'd like to talk about a different way to think about these squares

that's very useful for conceptualizing what it's doing.

So, imagine our Y is a 3 x 1 vector.

And our x is a3 x 2 matrix, so we only

have three data points and we're going to try to explain them with two predictors.

I agree, not a terribly useful scenario, but

because I can only draw this in three dimensions, that's the best we can do.

But hopefully you can use this to think in,

use it to reorganize how you think about least squares.

And we want to minimize Y- X beta squared.

So let me define by gama equal to the space

of x Beta, such that Beta is in r2.

So what we want to do is minimize y- y-hat squared,

over the collection of y-hats that live in gamma.

So let's draw a picture.

Here's our three dimensions for our three data points,

and our point y is a vector in this space.

1:44

It's indexed by a two dimensional parameter.

So it is a two dimensional space, we're drawing it in three dimensional space and

it's linear so it's going to be a plane.

So this looks like a plane.

2:00

And that's gamma.

So, what is y-hat?

So, actually let me re-write this earlier statement

a little bit by calling it y- z for z and gamma.

And let me let y-hat to be the actual minimizer of that equation.

Okay, so what is y-hat?

Well, it's the point that minimizes the distance between y and

points in this plane.

So it's the point in this plane that's minimally distant to y.

So it's the orthonormal, orthogonal projection onto the plane.

So let's draw it like that.

Here's our orthogonal projection and there's y-hat.

So y-hat is the point in gamma that is minimum distance to y.

2:54

So, thinking about it this way really kind

of helps us think about different aspects of how we apply multi variable regression.

For one thing, notice that if we include a redundant column in x,

if we include an extra column in x.

Say x was the vector's x1, x2 and

then the completely redundant column x1 + x2, okay?

We'd see that the rank of that matrix is of course still 2.

But notice that the space gamma, let's call this, I don't know,

we'll call that x prime, oh no, x prime is going to be like transpose.

Let's call that x tilde, okay.

So the space gamma tilde, which is equal to x

tilde beta such that beta is in R3.

3:51

That space you can prove to yourself though I think it's pretty easy to see is

identically equal to gamma.

Because you know any, if we take a linear combination of x1 and x2, we can,

and we take the collection of all possible linear combinations of x1, x2,

that´s the same thing as the collection of all possible linear combinations of x1,

x2, and x1 + x2.

So what we see is that we don´t actually need an invertible matrix

to do regression, we just need the space defined by

a linear independent subset of the collection of columns of x.

That's an important point.

4:32

So, at any rate, so

it also tells you that if you have, let's say, let me give you an example.

If you have an x that includes an intercept and

a vector that's ones for a while, and then zeroes for a while.

And then consider another x that's ones for a while, and then zeroes for

a while, then zeroes for a while, and then ones for a while.

5:01

And then consider another x which is ones and

then zeros and then ones.

Okay?

Notice, these two add up to a vector of one, of vector ones,

so all three of these cases are identical, they all have the same column space.

5:27

Okay?

So, in every one of these cases, the gamma, the space defined by

linear combination of the columns of these design matrices is identical, so

the y-hat defined by design matrices will all be the same.

Okay so that's an important point.

5:55

That also means because we know what the solution is, beta-hat,

which is the particular, which is x transpose x inverse x transpose y.

That is the particular vector in R2 that converts x into

the correct set of linear combinations of its columns to form the projection.

So our y-hat, or our projection, is x, x transpose x inverse x transpose y.

6:28

So, it is interesting to note that the linear operator that takes in out of

vector Z and moves it to HxZ is the operation that projects a vector

in our end onto the two-dimensional space spanned by the columns of x.

So Hx is exactly a projection vector, it's a projection operator.

So it's often called the projection matrix.

Sometimes it's also called,

the reason I give it the letter H is it's often called the hat matrix as well.

7:01

The final thing I'd like to point out is if we consider our residuals,

that's the points e=y- y-hat.

So that's y- x, x transpose x inverse x transpose y,

which we can also write out as I minus the hat matrix times y.

So notice our residual is exactly the difference between y and

y-hat, so that's this vector right here.

That's e.

So if we write it over there, that's e.

But notice that e is going to be orthogonal to any point in gamma, right?

So e is going to be orthogonal to any point in gamma.

And what does that mean?

That means e transpose times z

8:08

And in particular our residuals are orthogonal to any column of x,

and we'll elaborate on that point here in a minute.

But let's actually prove this mathematically.

We can see it geometrically pretty easily but let's prove it mathematically.

Well, we've mentioned at some previous lectures that if I take I- Hx and

multiply it times x, I get 0.

So certainly if I multiply it times x times gamma, that is also going to get 0.

So you can go through, if you didn't see this in a previous lecture,

just actually go through it, it's a very easy thing to prove.

And so that's showing again,

that the residuals are orthogonal to every point in the space gamma.

So this means that if our x contains a vector Jn intercept,

and then other columns, this means that e transposed times Jn has to equal 0,

which means that the sum of our residuals has to be zero.

But it's not just that the sum of the residuals have to be zero,

it's e transposed times xk for any xk column of x also has to be zero.

Or any linear combination of the columns of x also has to be zero.

Okay, so the point I wanted to get across in this lecture is that it's

quite useful to think about geometry and

geometrically what's occurring when you think about least squares.

Often you can, this will help you logic your way through actual

applied statistical problems if you can think about these problems geometrically.