So the variance of a random variable is another expected value property of a distribution. Recall the mean measured the center of a distribution. The variance measures how spread out it is. So. If x is a random variable and it has mean, , I, expected value of x equals , then the variance of x is defined as the expected value of quantity x minus , whole thing squared and, and the expected value. So what, what does that mean? So, the expected values is in, is, is in essence an, an average, right? So it's sort of the average or the typical value that the random that the variable takes, the center of the distribution. On the other hand the, The variance is sort of the, the, the average distance the random variable is from the mean. So, what that means is, is sort of higher variances imply that variances are more, what that implies is that, is that. Random variables with higher variances come from distributions that are more spread out then ones that have a lower variance. >> That makes sense and I'm just kind of thinking of that fulcrum point still. >> Yeah. >> How things are more spread out [inaudible]. >> Yeah, exactly, exactly, great. >> Alright. >> And so, Let me just remind you what this formula, the variance formula means again. If you were to take the random variable x and figure out what the distribution was, if you were to subtract off its population mean, which turns out to be the exact same distribution just with all the possible values of x shifted by the value and then it has mean zero. And then I were to take that random variable and figure out what the distribution of the square of it is, then take the expected value of the resulting random variable. And that's, that's hard, so we don't ever calculate the variance that way. We typically calculate the variance by a, a convenient shortcut, and that is that the variance of a random variable is the expected value of x squared minus the expected value of x quantity squared, and again this expected value of x quantity squared is just Mu squared. This shortcut formula, then, requires you to calculate the expected value of x-squared. But again, the, the kind of, ten, typically the more convenient way to do that is to, to use, if it's discrete, the summation of, of say, t-squared, p of t, where p is the probability mass function. Or, if it's continuous, use integral t-squared, f of t, where f is the, the density function. It would be nice for you as, as exercise, to show that this original variance calculation. Equals, this. Shortcut variance calculation. Just by, expanding the square and using the. Expected value, rules. It would be convenient if the variance operator was also linear. It's not. As an example, the, the variance if you pull a random variable out of the, out of the variance, you, it gets squared. So variance of a times x, right, is not a random variable it is a squared variance of x. The square root of a random variable is called the standard deviation and the reason we use standard deviation often instead of the variance is that the standard deviation has the same units as the random variable. So let's say x as a random variable has units in inches, the variance has units inches squared, whereas, the standard deviation has units inches. So, it's often quite convenient to. Talk about the spread in the same units as the random variable itself. So the standard variation is a common summary of the variance. Well let's, let's calculate a, a sample variance. What's the sample variance from a toss of a die? So in this case, expected value of x is 3.5. We've covered that already. And expected value of x squared, let's calculate that. Well, we have one squared times a sixth, plus two squared times a sixth, plus three squared times a sixth, plus four squared times a sixth, plus five squared times a sixth, plus six squared times a sixth. That works out to be 15.17. And then you subtract 15.17 minus 3.5 squared, and that works out to be about 2.92. Let's go through a very important, formula. Let's suppose we, flip a coin. But let's make it slightly more interesting. Instead of the coin having probability one-half of a head, let's say that it has probability p of a head. So here expected value of x equals zero times the probability of a tail, which is one minus p, plus one times the probability of a head, which is p, so it works out to be p as the expected value. And of course this works out with our calculation when the p happens to be one-half for the, a fair coin. Now let's calculate the expected value of x squared. Well, actually it's kind of interesting in this case it's pretty easy to do that because x only takes on the values zero and one, and if you square zero you get zero, and if you square one you get one. So x squared is in fact exactly x. So expected value of x squared is equal to the expected value of x which we already calculated as p. So the variance of x in this case is expected value of x squared minus the expected value of x quantity squared, which is p minus p squared, which works out to be p times one minus p, which is a formula you may have encountered before. It's interesting to know that this variance formula is maximized when p is.5, so just simply plot the function p times one minus p, between zero and one. So, plot this function between zero and one and you'll see that it maximizes at, at.5. So the most variable coin flip can be is if, in fact exactly a fair coin. It's, it's interesting to know that the most variable a random variable can be, in general is if you shuff all its mass to two endpoints. And equally distributed between those two endpoints. That's so, if you have a continuous random variable and you wanna make it more variable kind of chop out the middle and spread it out equally distributed between the two ends. And in fact let, let's, let's talk about this in greater detail. Suppose that you have any random variable, like a uniform random variable, that's between zero and one. And it's expected value is p. Now, since the variable takes value between zero and one, p has to be a number between zero and one. And then notice if, if x is a, a, a random variable that's between zero and one, x squared has be less than or equal to x. Because if you take, any number between zero and one and square it you get a. Smaller number. And so X, expected value at X squared has to be less than or equal to expected value of X which is P. Therefore, the variance of X. Which is expected value of X squared minus expected value of X quantity squared. Has to be less than or equal to the expected value of X minus the expected value of X squared, which is P times one minus P. And basically, this is then just a proof, that the Bernoulli variants, this. Binary variance where the random variable can only take the value zero or one, is the largest possible for a random variable that has expected value of p. And then we also noted that we earlier that the, the maximum value that you can get is when p is in fact 0.5, so this basically just shows that the, this is basically a simple little proof that the random variable, that the largest variance that you can get for a random variable is that you. So to shove its mass to two endpoints, and, the, the closer you can get to, to an equal mass in both the endpoints, the, the larger the variance is. I' not sure if I'd mentioned this previously but I called the, the variable a coin flip that can take heads with probability p, I called it a Bernoulli random variable. This is named after the mathematician Jacob Bernoulli who is one of the fathers of probability and Jacob Bernoulli is an interesting character. You should, you should read up on him. The Bernoullis were a very famous mathematical family. They came up with lots of Lots of discoveries, Jacob was a particularly influential member of the Bernoulli, Bernoulli family and he discovered quite a bit of probability theory very, very early on. At any rate, when you have a random variable that takes the value zero or one with probability P, then we, we call that a Bernoulli random variable. So here we are back. Talking about variances, and. Variances are kind of difficult things to understand and, and equivalently standard deviations. I, I prefer to interpret standard deviations. Intuitively we know that, that bigger variances mean distributions are more spread out but, but we need some way to actually interpret what bigger a-, what bigger means. Now in the context of a specific distribution, we might learn. The, the kind of quantities associated with that distribution to, to know that what, what does one variance mean, or two standard deviations mean, three standard deviations mean? And that's particularly true of the Gaussian, or bell-shaped density. We, we know, we tend to know those, the values associated with those variances, sort of, by heart. But there is a, a general rule that applies to all distribution and its, its so called Chebyshev inequality, after the Russian mathematician Chebyshev. So any rate, Chebyshev gave a really useful inequality for interpreting variances. So, Basically the inequality says the probability that a random variable is K standard deviations from its mean, or more, is less than or equal to (1/K^2). So let me repeat that because it's so important. The probability that a random variable is more than K standard deviations from its mean is less than or equal to (1/K^2). And let's just look at some simple benchmarks for K. The probability that a random variable is more than two standard deviations from its mean is. 25 percent or less, the probability of the random variable is three standard deviations from its mean is eleven percent or less. The probability of the random variable four standard deviations from its mean is six percent or less. And again, note that, that, that is a bound on the probability statement. It doesn't. It's not an equality, so. It's the worst that it could possibly be the, the, the lots of distributions the probability of being four standard deviations or more beyond the mean is far lower than six%, but six percent is the worst it can be. So, so it's unlikely, say, for example that you will, if you Ob, observe a random variable, it's unlikely that you will see that random variable be say, six standard deviations from the mean, that's, that's quite unlikely, that's has probably less than one over 36, regardless of the distributions. What, what's interesting about Chebyshev's inequality is that it's, it's quite easy to prove. And so well, let's just go through the proof really quickly. Well, let's look at this probability statement. The probability that a random variable is more than K standard deviations from its mean. And, and let's do it in the, in the continuous case. Let's just do it in the continuous case. You can prove it more generally but, but this just gives you the intuition behind the proof. Well that's the integral over the, the set of x where it's more than k standard deviations from the mean, where here now the little x and the, the domain of integration is a, is a dummy variable of integration, f of xdx, and this could, could be, you know, we could replace this by another letter over on the right-hand side but on the left-hand side it has to be capital x. Well notice, notice the that x minus mue over k sigma. Absolute value x minus mue over k sigma has to be bigger than one. So if we square that it has to be bigger than one as well. So you take a number that's bigger than one and square it, it's still bigger than one. So we can multiply by x minus mue squared over k squared sigma squared. And we've only made the integral bigger. Right? So we can replace this equality with an inequality where here the alligator's chomping the bigger part. [laugh] Okay? So now we have this quantity here. And we'll only make it bigger yet if we, instead of integrating over this, this restriction of the domain, we'll, we integrate over the whole thing. From minus infinity plus infinity because everything, the, the X minus B squared is strictly positive, so we'll only make it bigger. And then, notice now that this, the sure that the case squared sigma part is a scaler that we can just factor out, and then we have minus infinity, to plus infinity, X minus mew squared, minus infinity to plus infinity integral of X minus mew squared X of FDX, well that's just exactly the definition of the variants. And that, so that equals sigma squared. The sigma squared's cancel and you get one over K squared. So we see that the, the probability. That X is more than K standard deviations from the mean. We started out with an equal sign. We got bigger, we got bigger, then we've had a final equality. So, the whole thing is less than or equal to one over k-squared. So, I find it remarkable that Chebyshev's Inequality, this powerful result that applies to all distributions, has such a simple little proof. Let's go through some numerical examples, just that, to, to. Show, why this. Result is, useful. So, intelligence quotients. And, I, you know, a, a, actually, I would recommend that you look up intelligence quotients are often called Binet scales. They're, they have a very rich and interesting history that intersects with statistics in several other fields and, and, psychology and so on. And, I, I, it's really quite an interesting literature on intelligence quotients. So, I, I would, I would highly recommend you look it up just because it's quite fun. But, but let's kind of skirt that discussion and just say, let's suppose intelligence quotients really are distributed with a mean of 100 and a standard deviation of fifteen. What's the probability that, that a randomly-drawn person from a, from this population of people that have IQs of, with mean 100 and standard deviation of fifteen, what's the probability of drawing a person with a IQ higher than 160 or below 40? And of course I picked the 160 or 40 specifically. Well 160 is four standard deviations above the mean and 40 is four standard deviations below the mean, so Chebyshev's inequality that the, that this will be no larger than six%. If, if in fact, if in fact the IQ distribution is bell-shaped or is Gaussian this bound is very, very conservative. Just to give you a sense of how conservative the probability that a random draw from a bell curve being four standard deviations from the mean is not six percent but on, but on the order of, of, of ten to the minus fifth. 1000 of one%. Which again, it doesn't violate the Chebyshev Inquality. Ten to the minus fifth is less than.06 so it's, it's fine but it's quite a bit less so, just to give you sense of how conservative Chebyshev inequality can be. Let me go through another example. So a buzz phrase in... In industrial quality control is... Is Motorola so-called 6-Sigma, and I have to admit to being largely ignorant of exactly what the 6-Sigma Industrial Protocol is, but might just the jest of it as far as I understand is that businesses are suggested to control extreme events or rare def... Or rare defective parts and the ideas that you go out six standard deviation, so. Let's as an intellectual exercise, maybe you on your own, can go look up what exactly the six sigma protocol is. Let's as an intellectual exercise talk about what the probability of six sigma events are, the idea of having a random variable that lies six standard deviations above the mean, well, that's by Chebyshev's inequality, six standard deviations above or below the mean. By Chebyshev's inequality, that's either, that's the probability of such an occurrence is less than one over six squared which is about three%. So it's highly unlikely. But again, remember Chebyshev's is a bound that applies to all distributions. If you know something about the distribution, for example, if you know the distribution is a bell curve then the probability of a six sigma event is on the order. Of, ten to the minus ninth, which is, I calculated, is one-ten millionth of a percent. So, again, that doesn't violate Chebyshev's Inequality, ten to the ninth is less than.03. So it doesn't violate, Chebyshev's Inequality, but, any rate, that's what a 6-Sigma event is discussing.