And I can say only keep the ones that are not low gene.

So the bang or exclamation point here means not low genes.

Means it's going to flip this around so it'll say true for

all the genes that are above the threshold.

And so now I've got a new data set which I've filtered out which actually

has only the 11,000 genes or so that have an average count greater than some value.

The average isn't always necessarily great for

count data because you often have these really high values.

So if I look at a summary of the e-data.

So, for example, for this sample almost all the values are equal to zero,

but the mean is relatively high because you have this gigantic maximum value,

but you can see the median isn't as affected.

So the median is just the 50th percentile, so even if you have like one

gigantic value but all zeroes it will be a little bit more robust to that, and

so people often do this filtering based on the median.

So you can actually use the row medians command to find all those genes that

have a row median count less than five, so when the median count is less than five.

And then if I make a table of those low genes versus the low genes for the mean k,

so you can see that most of the time they're both true at the same time.

But sometimes, when the mean is still high,

the median is low, and so we still filter some more of those out.

And so then if I basically make a second filtered data set,