Okay, so welcome back troops. We're going to talk about covariance and correlations now, and what happens when random variables are independent. So, if we have two random variables, X and Y, then their covariance Is defined here as cov X comma Y. The expected value of X minus its mean times Y minus its mean whole quantity. And just like the variance there's a shortcut formula for covariances and it works out to be expected value of X times Y minus the expected value of X times the expected value of Y. So some very useful facts about covariance. First of all, the covariance you can interchange the variables and you get the same numbers. So covariance of X comma Y is covariance of Y comma X. The covariance can be negative or positive. But in an application of so called Jensen's inequality will tell you that the absolute value of the covariance of x, y, is less than or equal to the square root of the variance of x times the variance of y. So we could just write this right hand side as the standard deviation of x times the standard deviation of y. This final property is very useful because as we go on to the next slide, we use it to divine the correlation. The correlation of X and Y is nothing other than the covariance divided by The product of the standard deviations. So there's a couple reasons why we might wanna do this. From a previous slide, notice that this then normalizes the covariance so that it's between minus one and plus one. That's a very useful thing to do so that maybe we can somehow utilize the idea of covariance across different kinds of random variables. Now, another rationale for doing this is if we look back at our covariance formula, right? It's expected value of x minus the mean of x times y minus the mean of y. Well, that has Units of X times units of Y, right? So the top part of the calculation has units X times units Y, the standard deviation of X has units of X, and the standard deviation of Y has units Y. So the bottom parts has units X times units Y, so the correlation is a unit free measurement that's a useful property to have. So if X is in inches and Y is in pounds, then the covariance is in inches times pounds, but then the correlation is then unit free which is useful. Correlations have some nice properties, one is that the correlation Is only plus or minus 1, if and only if the random variables are linear related. X equals a plus by, for some constants, a and b. Correlation is unit-less, as we discussed already. And that we say x and y are uncorrelated if correlation x and y are zero. And sort of the more positively correlated they are, the closer correlation core x comma y gets to one, and the more negatively correlated as core xy gets goes to minus one. And this is again, a description of a population quantity. Not a sample quantity, right? So this is a description, if you're using a joint probability mass function, or a joint probability density function to model the population behavior of x and y, then we want ways to summarize that joint mass function, or joint density function. And the correlation is a summary of how related joint random variables are from this distribution. So it's a summary of a population quantity. And of course if something is a population quantity we want sample quantities that are able to estimate them. So probably what you've heard of, if you've never had a mathematical statistics class before, is the sample correlation. And again the goal of the sample correlation is to estimate the population correlation if you're using a probability model. So the sample correlation estimates the population correlation. So If you've ever had a sample correlation and you've had a probability model, what you are trying to estimate is the population correlation. The S demand is the population correlation. So it follows the same rule we have so far for everything. The sample variance estimates the population variance. The sample standard deviation estimates the population standard deviation. The sample median estimates the population median. So all these sample quantities have analogous population quantities. So if two random variables x and y are independent, then their correlation is zero. The reverse is not true. Things can be uncorrelated but not be independent. So if they're independent, then they're uncorrelated, but if they're uncorrelated they're not necessarily independent. In the case of Gaussian random variables, by the way the two things agree, always. But in general, that's not always the case. So let's talk about some useful results that rely on correlation and covariances. And probably the most useful one we'll talk about is this variance idea. So if we have a collection of random variables, X1 to Xn, when the x's are uncorrelated, in here I wrote out a very general form. The sum of the x's may be times some constant ai, plus a constant b works out to be the sum of ai squared times the variance of the individual x. And let's think about the specific case where B is zero, and the As are all one. That just means that the variance of the sum is the sum of the variances. And here we just wrote out a slightly more general term. We know in general that constants pull out variances and get squared, that's why you have the Ai squared. And we also know that if you have a random variable and you shift it by a constant b, it doesn't change it's variance, it just moves the density to the left or the right. So, it doesn't change the variant. So, the As and Bs are kind of the fluff on top of this equation. The core of this equation is just think about the instance when B is zero and A is 1. That when the axes are uncorrelated, they dont have to be independent, they just have to be uncorrelated for this to result to hold, then the variance of the sum is the sum of the variances. If they're not uncorrelated then you can actually calculate what their variance is in a way that depends on the covariance. It works out to be the sum of the variances with the ai squared out front. Plus twice the sorta sum of all the pairs of covariances. So this is a very useful formula, we won't use it in this class but I thought I'd give it to you and so notice if they're all uncorrelated, all these terms here zero, then we'd get the top formula. The top formula is what we're really gonna use in this class, and it basically says that the variance of the sum is the sum of the variances if you have independent events. The other important thing that this kind of says is that you shouldn't be adding standard deviations, you should probably be adding variances is another way to kind of think about it in general. So, this leads to an interesting proof of useful property that the variants of X bar, the sample mean, is sigma squared over N. And it also leads to the expected value of the sample variants, is sigma squared. These are two very important properties that we'll go on and on about. Okay, so I don't wanna prove the general facts from the previous slide. Let's just go through the heuristic, because it's pretty easy to do. So let's Prove that the variance of x plus y is the variance of x plus the variance of y plus twice the co-variance of x and y. So at the top line let's start with variance of x plus y. Well, just by the definition of variance, that's the expected value of x plus y squared, minus expected value of x plus y quantity squared. Right. So, if you're confused by that, just replace x plus y with a random variable z. And it's expected value of z squared minus expected value of z quantity squared, directly plugging into the shortcut variance calculation formula. Well, the right hand element here, is expect a values always commute across sum, so we have mu of x, plus mu of y, quantity squared. And the left hand side let's just expand out this x plus y squared, to get x squared plus 2xy plus y squared. And then let's just move the expected value across the three elements of this expression. Well then let's just organize terms, and we get expected value of X squared, minus Mu of X squared, plus expected value of X squared minus Mu of Y squared. And then to expect a value of XY minus mu X, mu Y. Well, this first one, expect a value of X squared minus mu X squared, that's Var of X. The second one, expect a value of Y squared minus mu Y squared, that's varriance of why. And this latter part to expect a value of XY minus mu X mu Y right expected value of X times expected value of Y for this right hand part well that's just covariance of X comma Y completed the proof. So you can see that it only requires the basic rules for expected values to performed this calculation and the definition of co-variance to perform this calculation. So just to reiterate some things we discussed earlier, if a collection of random variables are uncorrelated then the sum of the variances is the variances of the sum. So what it basically means is that sums of variances tend to be useful, not sums of standard deviations, and this is just the issue I'm trying to raise is don't sums standard deviations. So in other words this standard deviation of the sum of a bunch of independent random variables, is the square root of the sum of the variances not the sum of the standard deviations. So it's just a common little problem. So maybe try and avoid it from the start.