So, let's start with the covariance matrix.

First of all, covariance,

if you're not familiar with this measure,

is actually a measure of how two variables vary together.

So, if I have two attributes in my data set,

let's say X1 and X2, all right,

the covariance between these two attributes is given by the sum over i,

goes from one to n,

where n is the number of observations in my data set,

of X1i, minus the mean of X1,

times X2i, minus the mean of X2,

and this is all divided by n minus one.

Okay, and this is the covariance.

The covariance matrix sigma, is a matrix,

contains some values such that if I pick one of these guys, let's say this one,

and he has an index i and j,

this guy gives me the covariance between

the ith and the jth component of a random vector,

and when I say a random vector,

I mean a random variable with multiple dimensions, right.

So, I can't compute the covariance matrix using this formula,

but what I can do is actually if think about this,

realize that first my data is centered,

that was our first step in the PCA process.

So, I don't actually need to subtract this,

this has already been done as part of centering the data,

and instead of also looping over the data,

I can just vectorize this and I can say that my sigma,

my covariance matrix is simply one over n minus one,

times my data set X transpose X,

where X is a matrix that contains these two guys,

X1 and X2, and this is how I compute my covariance matrix.

Now this covariance matrix has one extremely useful property.

If you take this matrix and multiply any vector by it,

just a random vector,

this vector will be turned towards the direction of greatest variance.

This is a very useful property and I would like to

do a quick demo to show you how this actually happens.

So, let's go quickly to what's a studio. All right.

So, here is my notebook,

let me actually edit it.

And again, I'm using the same method that I create to generate some data.

This is this function corr vars,

and I also define a second function in this second function

called plot arrow is purely for visualization purposes,

you'll see how I plot tiny arrows on my plots.

So, I will just define these two,

and then I will generate some artificial data

using again the same function two times the sign of X,

plus some random gaussian noise,

and I end up with my data like this.

The data is again generated in this interval, two to four,

step 0.2 and so on,

and my first task is to center the data.

Here you see that I take the matrix a,

is my matrix with data,

and I subtract the mean for each column from the matrix,

and then I will plot this for you so you can actually see

the original data and how I tend to center the data using these transformations,

so, let me plot this.

Okay. And here, you can see the original data set on

the left hand side and these blue lines

essentially represent the zero-sum on the two axis,

and then after centering the data you see that we

retain the distance as everything is the same,

we just kind of shift the data to zero-zero,

because we want this property so we can compute the covariance matrix easier.

And then, my sigma,

which I define here as S,

is simply the A transposed times A,

that's the dot product of A transposed in A,

divided by the number of observations minus one,

which is the shape of the Matrix A,

the first dimension of the Matrix A minus one.

And then I will print it for you.

So, the covariance matrix

is this one, right?

And now, what I will do is I will pick a random vector, just absolutely random,

this guy here with code and it's minus one and zero,

and I will plot the vector together with the data.

So, this is what the plot looks like, right?

This is my data set and this vector with coordinates zero minus one from the zero-zero.

The coding system points to the left.

And now, let's multiply these vector by

the covariance matrix and see what happens and plot it again.

So, I will multiply it,

and you see that this vector turns clockwise, right?

It turns towards the direction of

greatest variance which I already know for this data set, right?

We already saw what the line is.

And also, you see that that's the new slope of the vector,

minus one point five or six and so on.

So, the slope changes because the vector starts to turn, right?

So, let's keep multiplying and see what happens.

I will multiply the vector again by the covariance matrix,

and you see that this vector turns some more and it also grows, right?

I keep multiplying.

Okay.

And now it already I think reached the direction of greatest variance,

and it also grows, so,

the arrow is not visible on the plot anymore.

But you can take a note here of the slope which is minus 6.9,

and if I can continue doing this,

I won't plot the vector anymore because it doesn't make sense,

the plot will be identical,

it would just stay oriented in the same direction,

it will just grow in magnitude.

But if I keep doing this and I print the changes in the slope,

you will see that the vector converges onto this slope minus seven point,

or five-eight, and doesn't change much

after if I keep multiplying because

it has already reached the direction of greatest variance.

So, this is something we could potentially use to find the principal component.

Just grab a random vector,

keep multiplying until it stops changing,

this is your direction of greatest variance, right?

That's your first principle component.

However, this is not very efficient,

and there is an analytical solution that will give us

all eigenvalues and eigenvectors by just solving some equations,

so, that's the preferred method and that's what we will do.