Hi, my name is Brian Caffo. I'm in the department of biostatistics at the Johns Hopkins Bloomberg School of Public Health and this is mathematical biostatistics boot camp lecture four. Today, we're gonna talk about random vectors and independence. Independence is a key ingredient in simplifying statistical models. Independence is a, a useful assumption. And we frequently use it in statistics to get a handle on complex phenomena. In addition, we'll find that independence in identically distributed random variables are going to be our canonical model for what we might think of as a random sample. So let's just briefly talk about what we're going to cover today. Random vectors, which are simple collections of random variables. Then we'll talk about independence. And you probably have a rough idea of what is meant by independence to begin with. But we're gonna talk about the mathematical formalism a little bit. We'll talk about correlation. And then, go over various mathematical properties of the correlation and covariance operators. And then we'll use our facts about independence, and variance, and correlation to talk about properties of the sample mean. And then we'll cover the sample variance, and end with some discussion. This lecture is actually one of the hardest things in all of statistics. I think if you can kind of understand this lecture, you've understood what the goal of probability modeling and this kind of population modeling really is. You might want to consider listening to it over and over again. And it's incredibly difficult concepts until you internalize them. And then once you internalize them they seem simple. So what I, what I'm hoping in this lecture to do is to help you internalize them. Okay. So random vector is nothing other than an ordinary vector with random variables as its entries. So if you have X and Y's random variables, then simply the ordered collection X comma Y is a random vector. So just like, individual random variables have densities in mass functions or distributions that govern their probabilistic behavior. Random vectors have joint densities and joint mass functions and joint distribution functions that govern their probabilistic behavior. Lets just simply talk about densities in mass functions to begin with. A joint density f of x, y first of all it has to satisfy that it is positive everywhere. Its the two dimensional random vector, so the surface f exists on the two dimensional plane and the fact that f is greater than zero suggests that its height everywhere is above the horizontal plane. And it has to integrate to one, but when you integrate over the whole XY plane; so the Z direction has to be greater than zero and the integral over the XY plane has to be one. So it's a direct extension of the ordinary one-dimensional probability density function, and I think from this definition you should probably be able to guess what the definition of a joint density is for, by n random variables. And then for discrete random variables, let's say f now is a joint probability mass function. Then the joint probability mass function F. Maps possible values of X and Y here. So, lowercase x and lowercase y are the possible combined values of X and Y. To probabilities, so to satisfy the definition of being a joint probability mass function, f has to be bigger than zero for all possible combinations of x and y, and then the sum over all possible combinations has to equal one. By the way, the joint density function exactly works like a univariate density function in that. In this case, volumes under it, so integrals under it correspond to probabilities, and so the total area is one and in the same case with the joint mass function, sums of collections of possible values of x and y yield the probability of that collection. So for this class, a general discussion of random vectors is probably too much, so we're only going to focus on one specific kind of joint density that's particularly manageable type. And that's when the random variables x and y are independent. And what we'll see is that, for say joint density or joint mass function, if the random variables x and y are independent then the joint density just factors into the product of the two individual densities. The f of x and g of y. Basically this is what mathematically independence does for us a lot is it turns complicated, multi-varied structures into products. We're gonna use this, this fact a lot. And we'll explain some of the intuition behind this. Thinking back to our early definitions of probability, we were discussing the sample space and events, two events are independent if the probability of their intersection is equal to the product of their probabilities, so probability of A intersect B is probability of A times the probability of B, so A and B are independent. Incidentally, if this is true, then A is independent of B complement, B is independent of A complement, and A complement is independent of B complement. And the mathematical definition of independence is equivalent to our kind of intuition of what it means to be independent. A is unrelated to B. That's what the mathematical definition implies and we'll kind of get a better sense of that. For two random variables, we would maybe define that they're independent. If you have any two sets, a and b, that probably the x lies in a, and y lies in b, is the product of the probability that x lies in a regardless of what y is doing. And the probability that y lies in b regardless of what x is doing. And so that's just simply a direct extension of the definition of independence above, that I think probably everyone is maybe a little bit familiar with. We automatically think of independence all the time already so if you would to ask nearly anyone who has basic amount of mathematical training, what's the probability of getting two consecutive heads on two consecutive coin flips? They would probably say okay well, the probability of getting a head on the first one is a half, and the probability of getting a head on the second one is half, so it's probably a quarter, right? Well that's just an exact execution of the independence rule. Let A, B the event that you get head on flip one. B be the event you get head on flip two, and basically what you are saying is you want the probability of the intersection ahead on flip one and two, and so then the probability of that intersection is exactly the product of probabilities we have independence probability of A times the probability of B so .5 times .5, which is .25 or a quarter. So we use independence all the time and, you know, the main consequence of independence is that probabilities of independent things multiply to obtain the probability of both occurring. But this creates a problem in that people have then gone onto extend this rule to where they just multiply probabilities regardless of whether they're independent and this can lead to tragic consequences. Here's a great example. In Science, Volume 309 they report a physician who gave expert testimony in a criminal trial and he was giving expert testimony on sudden infant death syndrome, SIDS, which is this tragic phenomenon where a baby dies, for example, in the middle of the night and no one exactly knows why. So, a woman was on trial because she had two consecutive children who died of SIDS. And there was a court case that then considered whether or not this was too unlikely to happen by chance, and that it wasn't really SIDS, it was something malicious on the part of the mother. So, the person who was testifying did the following calculation. The person said well, the probability of SIDS is one out of 8543. I'm not 100 percent clear where they got that number, but lets assume for the case that's correct. Then, the person giving the testimony said well then the probability that you have two SIDS would be the product of that number twice, or the square of that number. One over 8543 squared. Based on this evidence the mother was convicted of murder. So, what was this physician's mistake in this case. For the purpose of this class, there is actually quite a bit of discussion you could have over ethics, probability, evidence, and culpability based on this case. There's quite a collection of complicated issues that intersect when you're discussing a case liked this. For example, where and how does this probability of a SID come from? What's the evidence for it? How do you, you know balance medical evidence when convicting a person or not convicting a person in a trial. For the purpose of this class lets just simplify the discussion down to, is this directed calculation warranted of simply multiplying this number twice, given that it's correct? Well, if A one is the event that the first child died of SIDS and A two is the event that the second child did, then the inherent calculation that's, or the inherent assumption being made is that A1 is independent of A2 so that you can multiply the probability of A1 times the probability of A2. But this logic fails immediately. There's no reason to believe that the event of the second SID is independent of the event of the first SID so in this case And in many cases in biology, biological processes that have a genetic or familial component would tend to be dependent within families. So you couldn't multiply the marginal probabilities to obtain the intersection. And there's other problems, and I outlined an example of one here, with this estimate. The prevalence was obtained from an unpublished report on single cases, and quite a bit of the discussion surrounding this case revolved around these and other issues. But, the point I'm trying to make for the purposes of this class is, you can't just go around multiplying probabilities willy-nilly. The random variables or events that you're discussing have to actually be independent. Okay, so we'll use the following fact extensively in this class and we'll use it as a basic simplifying principle. If we have a collection random variables that are independent X1 up to Xn, then the joint distribution of X1 to Xn, or the joint density function, is the product of the individual densities or mass functions. So, in other words, the density of F of X1 up to Xn is the product of the individual densities. And here I have fi of Xi, indicating that every Xi could potentially have a different density. The most common model that we'll be dealing with is the instance where X1, X2. All the way up to Xn are from the same distribution. And, this particular case, we would say that the Xi's are independent and identically distributed. The independent being that X1 is independent from X2, and so on. And identically distributed in that f1 is equal to f2 is equal to all the way up to fn. Iid samples are very important in the subject of statistics, and the reason for that is that IID random variables are a basic kind of default model for random samples. If you have a collection of things that are in, in essence we believe exchangeable, then we treat them as if they are IID. And many of the important theories of statistics are founded on the assumption that variables are IID. So, to give you an example of IID random samples, imagine just simply rolling a die. Each roll of a die is a draw from the uniform distribution on the numbers of one to six. So when we say that a process, when we model a process as if it's IID, we're saying it's like we're rolling a die. For each variable that we're modelling from some population level distribution. I just wanna comment on the broader discussion on probability modelling, this is never actually the case right? It, it's probably a very good model for rolling a die, but we use IID to model things where surely the variables themselves are not IID. We can rarely guarantee that are sample's actually a random draw from some population distribution f over and over again. The point is that it's statistical model used to simplify calculations, and simplify our discussion. But whenever we use this statistical model we have to be cognizant of the fact that it is a model, and it's an enormously simplifying assumption. Let's just go to a very important example of flipping a coin. So imagine if we have a biased coin and remember if we have a biased coin we could say the probability of a head or success probability is p and we flip it n times. What is the joint density of the collection of possible outcomes? Recall, each coin flip here is a Bernoulli random variable, with success probability p. And recall we wrote out the density in the form p to the x, one minus p to the one minus x, and notice that's a very easy form, right? So if you plug in x equal one, you get the probability p of a head or of a one. And if you plug in x equals zero, we get one minus p for the probability of a tail or a zero. So this density is a nice way to represent it and you'll see why we present it specifically this way in the next line. So the joint density is the joint mass function after f of X1 to XN. If their independent coin flips, right, is simply the product of the individual densities and you'll see, from this formula, we get p raised to the summation Xi one minus p raised to the n summation Xi. So if the xs are all 0s and 1s, this works out to be p to the number of heads, one minus p to the number of tails. And that's basically why we write out the density this way is because if we have a bunch of independent coin flips, then it's convenient that the mass functions multiply and we wind up with this nice form for the joint mass function. >> So if you wanted to say for example, if I have a bias point, the success probability P, and I had four coin flips and I wanted to know what is the probability of getting a one and then a zero and then a one and then another one. So one, zero, one, one you would simply plug into this formula and notice the order of one, zero, there we got three heads and one tail. Notice the order doesn't matter. We would need p to the three, one minus p to the one would be the probability of that occurrence. And notice it's the same probability regardless of what order. The 1s and 0s occurred. So this formula makes it easy to calculate the joint probability for a collection of 1s and 0s from a potentially biased coin flip. Just want to mention again that this model is tremendously important. So, imagine for example we want to model the prevalence of hypertension in a population. One way we might go about doing that, is to say that our sample is IID and again that's often a big assumption, that people are IID draws, individuals are coin flips and what we would like to know is their success probability of having hypertension. And so that success probability is the prevalence of hypertension in the population and we would use this joint mass function to model that process for our collection of data and that's the idea behind where were going with this. But notice there's a lot of assumptions that go into that right? I just want to emphasize this fact quite a bit. We're assuming that we're randomly drawing people from the population that we're interested in or not even that we're randomly drawing them but that we can model the collection of people. Their hypertension status as if they were a bunch of independent coin flips with the prevalence being the success probably. That's ultimately what our model is stating. So it's important to always keep that in mind. So let's stop here and we'll next talk about some of the mathematical properties associated with random variables and covariances in correlation and their consequences when variables are