So basically, you want to find the direction that preserves most of the variance,

most of the distance between the individual points because when we're doing

classification we are interested in how far apart the points are.

Once you have found this direction,

in my case the direction is this one,

the precise line that preserves most of the variance,

then what you can do is you can project

your points onto this line and use this line as your new coordinate system.

So, if I project my points onto this line then this line becomes my 1D attribute.

If I have zero, zero here then I can re-index

my points based on how far away from the zero they are on the line.

So, this point here for example will be something like -3,

-2, -1.9, minus I don't know 0.5 and so on.

So, I will project my original data onto this line and this gives me my new dimension.

Effectively this operation has transformed my data from 2D-1D,

and you can use the same approach in

the 3D dataset when you're working in three dimensional space.

So, you will find

the first direction preserves most of the variance in this case something like this.

Then the next thing you do is you find a line

that's perpendicular to this one that preserves most of what's left of the variance.

So, this will be a line probably like this one,

perpendicular to this one.

Okay, I didn't draw this very perpendicular,

maybe something like this.

Then keep going until you reach the number of dimensions.

Then you'll find the line which is perpendicular to the second line and

preserves most of the variance that's left unexplained in the data,

so maybe it goes like this then.

You'll end up with three new lines and again,

you don't have to use all of them,

you can just say, "Well,

I will project all my data points on the blue line" in

which case you would have reduced your data say to

one dimension where you can say actually,

I will use both the first line and the second line.

So, you'll project into the coordinates and then you have reduced

your dimensionality to two.

So, that's the idea of principle component analysis,

you weld us the idea of

dimensionality reduction using the direction of greatest variance.

You find the first line that explains most of

the variance then you keep looking for perpendicular lines to

the previous one that explain what's left of the variance and you keep

going until you reach the number of dimensions in the data set.

Then you will select a number of lines which

ideally will be less than the total dimensionality of your original data,

you'll project the data and then you end up with a new data set with new features,

the three things most of the information in

the original data set with less lower dimensionality.

I would like to give you an example of how this technique works,

and the important thing to mention is that these lines,

these directions that preserve

the variance are essentially given by the principal component analysis technique.

So, we will do this now and I have

prepared an old book in Watson studio to show you how we

can use an arbitrary data set and then find

the principal component analysis and to reduce the dimensionality.