Hi, my name is Brian Caffo and this is Lecture two of mathematical biostatistics boot camp. In the last lecture, we covered probability and the basics of bio-statistics at a very conceptual level. In this lecture, we are going to get much more down to specifics. First, we are going to cover subject of probability, which are mathematical functions, so we will talk about the specifics of those kinds of mathematical functions. Then in two, we will talk about random variables. Random variables are just like any other variables that you maybe encountered before like calculus, with the exception that they are random. They can take lot of different values. In, section three, we'ere going to talk about probability mass functions and probability density functions. Theses are mathematical functions that map probability to random variables. In section four we're going to talk about so called commutative distribution functions, CDF's, very associated things called survival functions, and quantiles. And then we'll wrap up with a brief summary. So a probability measure is the function that's going to govern the rules of probability for us. And there is basically three rules that a probability measure has to follow. And these three rules and every probability textbook will give you these three rules are things that are equivalent. There is interesting history behind these three rules. The Russian mathematician Kolmogorov, who's generally considered the father of all of the modern probability. Basically distilled everything that we thought of as in terms of things that a probability should have to follow. He minimized them down to the minimal set of rules that you could possibly have. If you delete any of these rules you wind up with something that fails in some fundamental way to be probability. And if you add any other rules they turn out to be excessive. So it's really kind of an interesting collection of research he did. It's also interesting to note that [inaudible] tried to do something else. Which is to figure out what exactly it is we mean by probability. So he found that problem to be very hard, and I think if you look into it, the theory of exactly what is randomness, and exactly. What is probability measure is a very deep problem, and philosophers are still debating this, and I question whether or not it'll ever reach a resolution. However. One thing that's much less controversial is what rules probability has to follow when comograph just nailed it, its done. So lets go over these three rules. So probability measure P, the letter P here in italics, is a function that maps events, which are subset of sample space to numbers between zero and one, that's item one here. So events E have to be mapped to numbers between zero and one. So probability is a function that operates on sets. The second item here says that the probability of the whole sample space has to be one. Basically what this means is that something has to happen. The sample space has to enumerate everything possible that can happen. So for example, if you are flipping a coin, the coin can either be heads or tails. The sample space is heads or tails when you flip the coin, one of those two things has to happen. The probability of one of the happening is one. The coin can't land on it's side. If you want to allow the coin to land on it's side, then it has to be heads, tails and land on it's side as the sample space. The third statement and we will talk a lot about the third statement because we are giving you an incorrect version of it. The third statement says that if two events are mutually exclusive and recall events from mutual exclusive if they have no intersection. If two events E1 and E2 are mutually exclusive then the probability of the union is the sum of their probabilities. So as an example we just talked about coin flipping, we said the probability of a head or tail has to be one. The probability of either getting a head or tail has to be one. So let's talk about that in the context of real free. If E1 is the event that you get a head and E2 is the event that you get a tail, then the probability of E1 union E2, the probability to get a head or tail winds up being the probability of getting a head, let's say is.5, plus the probability of getting a tail which is.5 which adds up to one, exactly what we know has to. So in part three, the third rule that we talked about in the previous slide, I said that there was some concern over it not being complete, so I'm going to elaborate on what I mean by that in this slide. First of all, let's note the following fact. Part three of the previous slide, the fact that if you have two mutually exclusive events, the probability of their union is the sum of their probabilities. That pretty easily extends to the so called finite additivity, that instead of having two, if you had three, or four, or five, or let's just say end events, that the probability of their union. Equals the sum of their probabilities. So in this case I have the probability of the union of a collection of mutually exclusive events Ai, equals the sum of their probabilities. That pretty directly follows from the previous definition, just to give you a sense of how it works. If you had three events say A1, A2 and A3 and they are all. Mutually exclusive. Then the probability of a1 union a2 and a3 is the sum of the probability of a1 plus the probability of a2 union a3 right because a1 is mutually exclusive from the union of a2 and a3. And then that second probability, the probability of a2 union a3 is then again the probability of the union of two mutually exclusive events. So it is the probability of a2 plus the probability of a3. And you can formalize this with mathematical induction if you want. So at any rate, the rule that I gave you implies so-called finite additivity. And it seems like maybe that should be enough to cover everything. Well the probabilists have thought very hard, and they said well. Maybe we think it should be countable additivity, instead of n it should go up to infinity. And then it's not the case that the definition that we gave implies countable additivity. That if you add an infinite collection of mutually exclusive events that the probability of the union is the sum of the probabilities, which requires ideas of limits and other things that we're not going to cover so much in this class. So at any rate, it's the case that finite additivity does not imply countable additivity, but of course countable additivity implies finite additivity. So, in standard probability classes, in the more theoretical probability classes, they make quite a bit of hay out of this distinction. They discuss it a lot. And the general definition gives countable additivity rather than finite additividty. If you take a more advanced measure theoretic probability class, they will deal with this issue at length. In this class, this will be the last time we discuss this. In general, finite additivity will work just fine for us. In the next slide, we are going to talk about more details about what the probability functions operates on. And again, it's going to be a rather important but maybe unnecessary detail for this class, so we are going to... Again it's going to be another thing that we cover very briefly and then tend not to think about for the remainder of the lectures. Recall that our probability function operates on events which are subsets of the sample space and maps them to numbers between zero and one. So we need an appropriate domain. Of our function, our domain is not an event, it's a collection of events. So let me go through an example to make this idea a little bit more clear. So let's suppose the sample space is simply the numbers one, two, or three. Imagine somehow if you had a three sided die, that you were rolling. Then, the. Probability function operates on all possible events, that are subsets of that sample space. So in this case the null event. The event, that you get a one. A two, a three. A one or two. A one or three. A two or three. Or the whole sample space, a one, two or three. And this is fine. Pretty much whenever you have a finite set, the domain of the probability function will operate on all possible subsets of the sample space. In this case we're using the letter script F to denote this so called domain. When the sample space is a continuous set, it actually gets a lot harder. And you can no longer say things like the probability operates on the set of all possible subsets of a continuous set. And it turns out that, that is an incredibly deep mathematical problem. The mathematician Cantor thought about measure and sets in a very deep way, and if you want to read about it, interesting character in the history of mathematics, you should read about Cantor. He came up with interesting sets that, for example, you can't reasonably include in the definition of a probability. So in this class we're not going to think about this at all. But I wanted to raise it just for those students that go on to take some of these more advanced classes. So that you'll be prepared for some of these admittedly kind of strange ideas that come up when you when you try to talk about the set of sets that probabilities operate on. For our purposes. When our sample spaces continue a set, we are mostly going to be concerned with things like intervals or unions of intervals. And in that case, definitions are very easy. So our definition of the domain that the probability operates on, we are just going to assume that anything that we can think of, and since none of are Cantor, probably we won't think of anything too crazy. Anything that we can think of is just fine. And that definition works very well for this class. In this slide, we're going to give a laundry list of properties that a probability function has to have by virtue of its three definitions. So it, you should find it kind of interesting that the three definitions then imply all these things that we know probabilities have to have. So take this first bullet here. The probability of the no said is zero, basically the probability nothing happens is Zero. So if you say you're going to roll a die, you actually roll a die, if you say you can flip a coin, you actually flip a coin. That's basically what the probability of the no said game zero is. The second bullet says the probability of an event is one minus the probability with compliment. In other words for example, if E is the probability that you get a head when you flip a coin. The probability of getting a head is one minus the probability of getting a tail and that's off-course true on a fair coin, where the probability of head is 0.5 and the probability of tail is 0.5. But lets suppose you have an unfair coin, maybe you, glued together, nickel and a US dyne and made a funny shaped coin that you didn't know whether or not, the probability of head was 0.5, lets suppose the probability of head in that case was 0.3. Well, this would say if the probability of head is 0.3 then the probability of the tail has to be 0.7. The next bullet says that the probability of the union of two events is the probability of their sum. And that's all we would have to say if the events are mutually exclusive. But, we have to subtract off the intersection. If they are not mutually exclusive. And the intuition behind this statement is something like this. When you add the probability of A, you've added the probability of A. Which includes the part of A that intersects B, and the part of A that does not intersect B. And then you've added. The probability of b, which includes the part of b that intersects a, and the part of b that does not intersect a. So you have then just added that part of a that intersects b and the part of b that intersects a, you've added it twice. Once, when you added probability of a, once when you added probability of b. You've added it twice, you only want to add it once, so subtract it out. That's how the rule works. The next bullet point is a pretty simple point, if A is a subset of B then the probability of A is less than or equal to the probability of B. So this is analogous to saying if I am rolling a die and A is say the event, that I get a one and B is the event that I get a one or two, then the probability of getting a one is less than the probability of getting a one or a two. And so this role I think makes a lot of sense. From DeMorgan's laws we get probability of A union B is one minus the probability of A complement intersect B complement. The next bullet point is kind of a long though, lines of subtraction. So A intersect B compliment, that set is sort of like subtracting B out of A, the component of A that has nothing to do with B. So the probability of A removing B is the probability of A minus the probability of A intersect B, so that works out to be a nice rule that sort of set levels subtraction works out to be equivalent to subtracting the probabilities. The next bullet talks about the probability of the union events again. This says the probability of the union of a collection of events is less than or equal to the sum of the probability of the events. Now again, if the events are mutually exclusive, then the probability of the union has to equal the sum of the probabilities. So this rule doesn't violate that rule whatsoever, but it also accounts for the times when the events are not mutually exclusive. The final rule talks, again, about unions of events. In this case, the probability that the union of events is bigger than the probability of the maximum of the collection of probabilities. Again, this rule holds if the events are mutually exclusive or not. But there's intuition behind this that's very easy. The union, is. Everything that's in any, of the events. E1 To EN. So it contains anything. The probability of that has to be bigger than, any of its. Component events. I think that makes quite a bit of sense. So just, let me give you an example. Go back to our die roll, if E1 is the event that you get a one, E2 is the event that you get a two, E2 is the event that you get a three. The probability on the left hand side of the equation is the probability that you get a one, two or three on the right hand side it says that it's the maximum probability. If you are talking about a standard dye probability of one is 1/6th, probability two is 1/6th, probability three is 1/6th. So the maximum of them is 1/6th. On the left hand side the probability of the union is the probability of a one, two or three which is one half. So half is definitely bigger than 1/6th. So let me, give you an example of one of these proofs. So let's take, a simple one. The probability, of an event is one minus the probability of its compliment. So consider line one. Recall that the probability of the whole sample space is one. But, again, the sample space for any event is equal to the union of that event and its complement. So. Omega equals e union e-complement. Then consider the next line. An event is always mutually exclusive with its complement. Something cannot simultaneously occur and not occur, so events are always mutually exclusive with their compliment. So e and e compliment are mutually exclusive events. So we can take the probability of the union and turn it into sum of the probabilities, the probability of the, possibly the probability of the compliment and then that's simply a restatement what we want to prove, one equals the probability pos probability of compliment. Let's do a more complex example of the consequences of the probability rules. So recall that we discussed that the probability of the union of a collection of events is less than or equal to the sum of the probabilities. And recall that less than or equal to is an equality if the events are mutually exclusive. So let's prove this using mathematical induction. The way mathematical induction works is you prove it for some small statement, one or two, then you assume that it's true for say n minus one, and then prove that it's true for n. That's how mathematical induction works. So let's consider just two events, probability of e1 union e2. Well that's by one of the other consequences of the probability rules that we investigated. That's equal to probability of e1 plus the probability of e2 minus the probability of e1 intersect e2, and here I'm assuming that we've gone ahead and proved that one as well. So. This final term here that's subtracted off, minus probability e1 e2. We are subtracting off a number that has to be positive. Remember probabilities have to be between zero and one, so they have to be non-negative at least. So if we throw away that final term. What's left can only get bigger, right? So if we're subtracting off a positive number and we throw it away, then it's gotta get bigger. So, then we've established the result for the case when we have two events. Now let's assume the result is true when we have n minus one events, and let's consider n events. So we want to demonstrate that the probability of the union of the EI is less than or equal to the sum of the probabilities. So let's write out the probability of the union of the EI's as EN union with the union of the rest of them. So the union of the rest of them I co one to N-1 is a single set. We've already done that, that's just two sets EN and the union of the remainder are two separate sets, we already worked it out for two sets. So we can say that the probability of the union E1 to EN is less or equal to the probability of EN, plus the probability of the remainder. Now consider the next line. In the next line, we have the probability of E N from the next line. And then we can say, that we have only gotten bigger by our induction hypothesis. By the fact that we assume that this statement is true for N minus one events. So there if we switch this probability from the probability of the union to the sum of the probabilities. We've only made it bigger. So we can maintain that inequality. Then just collecting the terms, then we just have that this is the sum of the probabilities. And just to give you a sense of notation I use, when I write equals on this last line I mean it equals the previous line, not that it's equal to the first line. So I am assuming that it's less than or equal to, less than or equal to and equal to. Implying that the final statement is less than or equal to the first statement, but equal to the previous lines. So that's notation that I commonly use. So you should be able to prove all of these probability statements that we outlined on the previous slide. This particular one, let's go ahead and take a step back from the mathematics and try and put some of this within a context. So the National Sleep Foundation reports that around three percent of the American population has Sleep Apnea. This is a, sleep disease where the upper airways collapses. They also report that around ten percent of the North American and European population has restless leg syndrome. For the purpose of our discussion let's just assume that this is ten percent of the American population has, restless leg syndrome. Similarly they report that 58 percent of adults in the US experience insomnia. So imagine if you were a sleep physician and you wanted to know the probability that a random american has any of these three sleep disorders. Can you simply add these probabilities, three%, ten%, 58 percent and get 71 percent of people have at least one of these sleep problems. So this question is nothing other than, restatement of the probability relationship that we just proved. So hear I am using A instead of E, but maybe that's a good thing to do, just so you get used to not using the same letter for everything. So lets A1 be that the person has sleep Apia, A2 be the event the person has restless leg syndrome, and A3 be the event that the person has Insomnia. And I'm gonna gloss over the details, but the probability that a person has at least one of these diseases is, we're talking about the union, A1 union, A2 union, A3. So we want to know the probability of the union. Well that's only equal to the sum of the probabilities, right? When a1, a2, and a3 are mutually exclusive. Otherwise it's the probability of a1 plus the probability of a2 plus the probability of a3, and we have to subtract out other things. And in this case I give you the exact equation for relating the probability of the union of three events to the probability of A1, A2 and A3 and so works out to be.71 but then there's all the other stuff that A1 intersect A2, A1 intersect A3, A2 intersect A3 and then you have to add in the triple intersect in A1 intersect A2 intersect A3. I would suggest you go through and figure out why exactly it is this formula works out. But the point is that other stuff is non-trivial and it's always there unless a-one, a-two, and a-three are mutually exclusive. And so you can't simply add these other things. And in fact, in this case, from a scientific perspective, I mean we're talking about it from a mathematical perspective, but from a scientific perspective it's probably the case that there's a non trivial interception of people with sleep apnea and restless leg syndrome, and a non trivial interception of people with restless leg syndrom and insomnia and so on. So that this point seven one is not close at all. So that ends our whirlwind tour of the basics of probability mathematics. Next, we're gonna talk about random