Hi, my name is Brian Caffo and this is Mathematical Biostatistics BootCamp Lecture five on Conditional Probability. So in this lecture, we are going to talk about conditional probabilities and then the associated density functions for calculating conditional probabilities basically just called conditional densities and we'll talk about conditional maths functions for a discreet random variables. We'll talk about Bayes Rule and then briefly talk about an example of Bayes Rule using diagnostic test and then we'll talk little about the so-called diagnostic likelihood ratios. So let me give you some brief motivation for conditional probabilities. I think we kind of internally do these things pretty easily. So, imagine rolling a standard die, and we're assuming that the probability of each face is one-sixth. Suppose you didn't know the outcome of the die but someone would give you the information that die roll is odd. So it had to be a one, a three or a five. Two or four or six is not possible, given this extra information. The conditional on this new information, everyone would probably agree that probability is now one-third. And all we're going to do in the next couple of slides is mathematically develop these ideas a little bit more completely. So let's develop the notion of conditional probability, just generically talking about events. So let's let B be any event. Such that the probability of B is greater than zero. This condition's kind of important because it doesn't make any sense to condition on the probability of an event occurring when that event cannot occur. So it doesn't make any sense to talk about the probability of A given B occurred if the probability that B occurred is exactly zero. Just to put this in words it makes no sense to talk about the probability that a coin is head given that the coin is on it's side if you're not going to allow for the possibility for the coin to land on it's side. So the definition of a conditional probability of an event is the probability of the event A occurring given that the event B has occurred, is the probability of the intersection divided by the probability of B. Now notice if A and B are independent. Then the probability of A given B, well the numerator component, the intersection. A intersect B, factors into the product of the two probabilities. Probability of A times the probability of B. The probability of B cancels out the numerator and denominator and you're left with the probability of A. So this actually makes a lot of sense that if the events A and B are independent. Then the probability of A, given that B has occurred is simply the probability of A without knowledge of whether or not B has occurred. That is the information about whether B has occurred is irrelevant to the calculation of the probability of A. This matches our intuition as to what independence means and it's nice that the mathematics works out that way. In fact in some probability texts, this is their definition of independence as opposed to the definition that we gave earlier. So let's just work through the formula given our example with the die role just to convince ourselves that it's actually working. We want the probability of a one, given that the die roll is odd, so in this case B is a one, three or a five. A is just a one, so the probability of A given that B has occurred is the probability of the intersection. And in this case A is the set containing one, B is the set containing one, three, and five. So A is a subset of B. So when you intersect the two you just get A by itself. So it works out to be the probability of A divided by the probability of B. The probability of A by itself is one sixth, the probability of B by itself is three sixth, and we get one third. Exactly the answer that our intuition told us. Okay. So that ends our very brief discussion of basic conditional probability calculations using standard events, and generic discussion of probability. Next, we're going to talk about conditional densities, which will be our mathematical formulation for conditional probabilities for our continuous random variables. So welcome back troops. We're going to be talking now about conditional densities now that we know a little bit more about conditional probabilities, so conditional densities or mass functions are exactly densities in mass functions that govern the behavior of a random variable condition on the value that the other random variable or another random variable took a different value. So just to tie this down a little bit, let's let f(x, y) be a bivariate density or mass function and it governs the probabilistic behavior of the random variables, capital X and capital Y. Now, I'm going to abuse notations slightly and let the letter f be the joint density and f(x) be the marginal density associated with x and f(y) be the marginal density or mass function associated with y. And it's probably not the best notation to use f for the joint density, f for the two marginals when they're all referring to different things. So, you know, just keep in mind that the arguments are kind of differentiating what I'm talking about here. This is exactly very sloppy notation but I'm using it anyway. So just to remind you the marginal density f(y) is the joint density f(x, y) integrated over x or if the random variables happen to be discreet. Then f(y) is the joint mass function f(x, y) summed over x. So, in other words if you want to know regardless of what happened with respect to x, what is the probability behavior of the random variable y, you have to integrate over the random variable x. Overall, the potential values you can take with what probabilities and then you get the marginal behavior of the random variable y. In similar, you get the marginal for x. F(x) is the integral of the joint density over y or the sum of the joint mass function over y. Well, the conditional density is exactly, say for example, f(x) given y is the joint density f(x, y) or mass function, divided by the marginal f(y). It follows actually directly from the definition of conditional probabilities that we just gave you a couple slides ago and that we sort of all agreed on made a lot of sense. Let me elaborate on that point. It's in fact in the discrete case where x can only take so many values one, two, three, four, then this definition of conditional probability is exactly the definition that we used from events were A is the event that x = x, and B is the event that y = y. So there's no confusion. It exactly agrees with our definition of conditional probability. The continuous one is a little bit harder to kind of motivate why this is the definition. The event that x takes on a specific value or y takes on a specific value has probability zero for continuous random variables and so, that kind of fails our basic premise from conditional probability associated with events that the probability of the event that we're conditioning on has to have probability greater than, than zero. Now, note we're talking not about conditional probabilities, we're talking about the construction of the conditional densities which govern the behavior of conditional probabilities. So, we haven't violated that rule from earlier but it still kind of seems to break the spirit of the rule and how do we get at this idea? How can we have a meaningful definition of the probabilistic behavior or a random variable, given that another random variable takes on a specific value. Well, here's the motivation that I like. So, imagine if you define the event, A that the random variable x is less than or equal to a specific value little x and the event B is that the random variable y lies in this interval from y to y plus some small amount, say epsilon. Then now A and B are events that have positive probability. And we can apply our standard definition of conditional probability to talk about the probability of the event A given that the event B has occurred, right? That would just follow from our standard definition. So, actually let's formulate this. So, the probability A given B is the probability of x being less than equal to little x, given that y is in the set y to y + epsilon. And then now in this case, nothing has probability zero. We can just directly apply the probabilistic formula. And I don't think this is terribly important for this class. I just wanted this argument be here for those who want to see it. But then. You can just follow through the arithmetic it's not the calculus here, and get that basically this construction. Yields the conditional distribution function associated with the x. Given that y = y, as we let epsilon get smaller and smaller. So as the conditioning event gets closer and closer to y conditioning on it being the specific value y. We limit to, conditional distribution function associated with x. And then, remember that density functions are derivatives of distribution functions so if we just take the derivative of this, then we get the conditional density function. So we can see right here that if we differentiate this conditional distribution function, we get exactly the definition of the conditional density that we gave you before, f(x, y) / f(y). So if you're interested in this at this level, then you can go through those arguments carefully, and to be fair, these only cover. The definition in the continuous case when we have differentiable distribution functions. But this is more than enough for our case. If you're interested in it at a deeper level even than this, where you have mixed continuous and discrete densities, then you can take an advanced probability course somewhere; but, for our purposes, this is enough. And so just to summarize, we have the conditional probability definition associated with events that kind of governs all of our thinking about conditional probabilities and that's the probability of A given B is the probability of A intersect B divided by the probability of B and then in the event you are talking about random variables what we want talk about the probability of a random variable x, given that the random variable y has taken on a specific value. It's the joint density or mass function divided by the marginal. And it has a nice sort of parallel with the probability associated with events and here we've gone through the arguments to show how we get from these statements about events to this definition for mass functions and density functions. So conditional densities actually have a very nice geometric interpretation. So if you have a joint density f(x, y) that's a surface. F yields the Z value, and XY is the plain. So f(x, y) is a joint density. It's a surface, and it's volume under the surface has to be one for it to be a joint density. Well what is it mean to get the conditional density of x given that y takes a particular value. The event that y takes a particular value that's sort of like a plane at the point, let's say y is five, at the point y equals five, that's a plane, and that plane slices through this surface and yields a function. That function is just f(x, y) evaluated at the point five, f(x, five), okay? So we have this surface. We have this plane. The y = five plane that cuts through the surface and then we have the function that is on that plane at f(x, five). And that is exactly the conditional density, with the exception of now it doesn't integrate to one. So we have to normalize it by something that integrates to one. Well, that' exactly what we divide by there, f(5). Let's go through a specific example. We have f(x, y) = ye^-xy - y. For, x and y both greater than zero. Now the marginal density associated with y, let's just perform the integral. We integrate from zero into infinity, of the joint density function over x because we want the, marginal associated with y. And you can perform the integral. It works out to be e^-y. And then our conditional density then f(x) given y, is the joint density, f(x, y) / f(y). So just churn through the calculations and you get ye^-x<i>y.< /i> And so if you</i> wanted to know what's the conditional density, the governing behavior of the random variable x, given that y is, say, three, then that density. Would be 3e^-x<i>3.< /i> Okay, so you just plug in y = three.</i> So, now this function, if you plug in any possible value of Y, this function will now give you the associated density function for the random variable x conditioning on the information that y takes on that specific value.