And then, you should be able to apply the filtering command in it, so now,
I've got a smaller data set that just consists of
all the genes measured on the Y chromosome.
And so, the first thing that I want to do is then check to see, so
I'm going to take the sum in each column.
So take out the genes from the Y chromosome.
I'm going to take the sums for
each sample, that's the total count for the Y chromosome genes for each sample.
I'm going to plot that versus gender.
When I do that, I can see that the males
have way more counts than the females, so that's kind of a good independent check.
And then, I can overlay the data points if I want to, actually.
If I make that boxplot without coloring it and then I overlay the data points,
so here, points like lines from the previous lecture,
basically just overlays points on top of the current plot.
I'm going to again, plot the column sums of the total counts for the Y chromosome.
Here, I'm using a jitter command,
you'll see that again in a lot of plots that I make like this, with box plots.
So jitter basically adds random noise to this value for gender, so it basically is
going to make the points so they don't all land on top of each other.
And then I'm going to color it by the gender, and so you can see that
the females have counts mostly of zero, and the males have some positive counts.
So that makes us think that maybe there is
sort of an external validation that the plot is looking okay.