And when we look at the absolute values of each of these different terms,
we can see how much weight each feature represents.
So it looks like flavanoids and
non-flavanoid phenols represent the most variance in our data.
Now the question is how many eigenvectors should we keep?
Which eigenvectors should we keep to maintain the most amount of data from our
original data set?
because that is the whole point of pca, right?
We want to reduce dimensionality of our data.
But we have to figure out how much we want to reduce it to.
So to do this, we're going to look at our eigenvalues.
And we're going to quantify how much variance each vector represents.
So we're going to go ahead and sum all the values up, and
then calculate the percentage of the total for each value.
And then we'll use the cumulative sum function to progressively add up each of
these percentages.
And they should obviously add up to 100% at the end.
And then real quick, we're going to go ahead and
get a variable that tells us how many dimensions we have of our data.
This will be useful for a bunch of plotting functions.
And now we can just very simply go ahead and plot our cumulative sum array.
And now we can use this graph to tell us how many principal components
we should keep.