Hi. My name is Brian Caffo. I'm in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health, and this is Mathematical Biostatistics Boot Camp, lecture seven. In this lecture I'm gonna talk about some common distributions. So in this lecture, I'm gonna talk about basically three distributions. And really two, if you kind of think about it. So the Bernoulli distribution is one, the binomial distribution, which is a distribution based on the Bernoulli distribution, and then I'll talk about the Gaussian or normal distribution, which is a very important one as well. In fact, I think if all the distributions were to get together and nominate a king, it would definitely be the normal distribution. So the Bernoulli distribution is named after Jacob Bernoulli. We've already talked about it before but let's formalize it a little bit and remind ourselves of some notation. So Bernoulli random variable is just a fancy name for a coin flip. So Bernoulli random variable takes values zero and one with probabilities p and one minus p where p is some number between zero and one. So that probability mass function for Bernoulli random variable we've seen before. It's the probability that x takes the specific value zero or one, and it's p to the x 1-p to the 1- x. The mean of a Bernoulli random variable is simply p and the variance is p times 1- p, facts that we've proven before but we're just restating them now. And, in general, for Bernoulli random variables, if you code a coin flip as a one, or a head, it's often people call that a six the best and zero is a failure. And, and they tend to do this regardless of whether the classification is actually something that's actually successful. So we might call a you know, a person getting side effects from a medication a success, in terms of the Bernoulli coin flip. It's just a little bit of odd nomenclature, I guess. We've also already talked about the Bernoulli likelihood function. So, if we have x1 to xn, our observed data points. So the X1 to xn in this side are numbers that we recorded a collection of zeros and ones. Then the likelihood is the product because we're going to model these as independent coin flips, the product of p to the xi one minus p to the one minus xi we've already talked about this equals p to the summation xi, one minus p to the n minus summation xi. So notice again that the likelihood only depends on the sum of the x's, so the total in each x is zero or one, so the sum of the x's is just the total number of successes. And then n minus the sum of the x's is the total number of failure. So if you know n and you know the total number of successes, then you know the Bernoulli likelihood. It doesn't matter what order the heads or tails occurred, as far as information is contained in the data about the parameter p. Because n is fixed and assumed known, this implies that the sample proportion contains all the relevant information that you need to know about p insofar as the likelihood codifies it, simply because summation xi and summation xi over n are one to one, you can get from one to the other easily by either multiplying or dividing by n. So, the sufficiency result is basically that the proportion of successes is all the relevant information you need to make Bernoulli inference about a parameter. Now again, this all depends on the model being correct. That you have. Correctly modeled the data as iid Bernoulli. We also, when we talked about maximum likelihood, we also showed that if you maximize the Bernoulli likelihood over p, then you obtain that p hat, which is summation xi over n, is the maximum likelihood estimator. And here's, you know we, we went over an example. So here just to give you an example of Brunulli likelihoods and what they look like, imagine if we flipped a coin four times. So if we flipped a coin four times, there's only four possible sufficient statistics that we could obtain. We could get zero heads, one head out of the four coin flips. Two heads out of the four coin flips, three heads out of the four coin flips or four heads out of the four coin flips and here is the four possible likelihoods that you could obtain from this experiment and they're all normalized to have height one. So here's, on the left most one is the likelihood if you get if you had no heads. The second one you see it shifted a little bit to the right, right and it's peak is going to be right at p hat at a quarter okay and then if you get two heads then you see the peak mle case is right at.5 and the likelihood shifts a little bit to the right, and if you've gotten three heads it shifts a little bit more to the right and if you've gotten four heads it shifts closer to one. Notice even in the event that you get all tails or all heads, the likelihood is not entirely at zero, or entirely at one. Right? There is substantial uncertainty. Right? The likelihood is correctly codifying the information that it's possible, even if the coin is say fair. You know, here is the likely point right here. Even if the coin is fair, to get four consecutive tails. And, you know, it's just much less likely than if the coin is unfair towards some value near, closer to zero. So, likelihood, you know, it's not entirely shoved up against the vertical line at zero. And it just gets closer and closer to that vertical line as you continue to flip and get tail after tail after tail. Binomial random variables are nothing other than the sum of iid Bernoulli trials. We've seen that the key variable from a Bernoulli experiment is the number of heads, so why don't we just create a random variable that is the number of heads? So in specific, if X1 to Xn are RID Bernoulli, then the random variable X defined as the sum of the individual Xi's is the so-called binomial random variable. And the binomial mass function is just the probability x takes any specific value, is, n choose x. E to the x, 1-p. Or the n-x, where p is the probability from each of the Bernoulli coin flips. Here, the values that x can take are zero, if every single coin flip was a tail. All the way up to N, where every single coin flip was a head. It was just to remind everyone, the notation n choose x, right? This parentheses, n over x, n parentheses. This is n factorial over x factorial n minus x factorial here zero factorial we're going to treat as one. And this formula counts the number of ways of selecting x items out of n without replacement disregarding the order of the items. Okay let's consider an example. Imagine I have I have ten neckties and I pick out three. And. Put them on a bed, let's say. And I'm not caring about what order on my bed, from left to right or something like that. And so ten choose three is the number of different configurations of ties that I could have obtained by picking three ties out of my ten possible. So that, that's an example. I can't think of any reason why you would want to do this with your neckties, but whatever. So it's very easy in fact, certain special cases are very easy. So imagine if I was only picking one necktie right, how many different combinations can I get? Well intuitively we know that answer has to be ten, right? Because you know, there's only ten possibilities. Well you know, if you plug into the formula you have ten factorial divided by one factorial which is just one. N Minus X factorial, which is ten minus one or nine factorial. So we have ten factorial divided by nine factorial, which is just ten. Okay. You know, another quite useful one is choose two, how many ways can you pick two things out of n objects. That one seems to come up a lot in my daily life for some reason. And so that's, in this case would be ten factorial over two factorial, which is just two, right, two times one. Then divided by eight factorial, right. So that's ten times nine divided by two. So, any rate, the general rule is for N choose two, you just wanna take N times N minus one over two. And at any rate, it's one of the ones that it's worthwhile just to memorize the formula for that special case, because it seems to pop up a lot. So why is that factor, why is N choose X, the factor out in front of a binomial mass function. So let's now consider the possibility of getting six heads out of ten coin flips, from a coin with success probability P. Well, if you account for the order, you say what's the probability of getting tail, tail, tail, tail? That's four tails. And the remaining are heads, right? In that specific order. Right we now what that probability is we would just plug into the multivariable Bernoulli mass function obtained by multiplying the Bernoulli mass function for each coin flip in order and we would get p to the sixth one minus p to the fourth. And as we know because of the fact that it's only the total number of heads that's sufficient that if it wasn't just the first four coin flips that were tails. If it was the last four coin flips were tails, and there were, first six were heads, then we would get the same number. We'd get p to the sixth, one minus p to the fourth. And, if we had the four tails sprinkled in any configuration of the six heads, then you will still get the same. Possible answer, p to the sixth, one minus p to the fourth. The result is that, basically, for any collection of instances where you get six heads and four tails, no matter what the order is, the probability is going to be p to the sixth. So what we need is to count how many such configurations there are. Well, in this case there's ten flips and ten positions that could be heads, and we want to know how many different collections of positions we can obtain, and that's just ten to the sixth. It's just exactly the necktie problem, but now we're picking the position of the coin flip that is a head, right? And so in this case there's ten to the six possible orders of six heads and four tails. We don't actually have to go through the, the specific construction of a binomial distribution, because that I think that's a pretty clear demonstration that's how you would wind up with the probability of the sum, Of a collection of ten Bernoulli random variables being six, right? You would sum over all the possible ways you could get six heads. And it turns out that they all have the same probability. P to the six, 1-P to the six. So we just need the number of things that go into that sum. And it's pretty clear that it's ten to the six. So yeah, we only did this for a specific instance here. I hope you can see that if ten instead was N, and six instead was X, that you would wind up with the same answer. N choose X, P to the X, 1-P to the N-X. So that's the motivation. And you can actually mathematically check that the binomial mass function sums to one. In fact, it's actually kind of a relatively famous formula. The binomial sum. And so, if you for example, look up on Wikipedia the binomial sum, you'll see that it just exactly illustrates. That formula by itself tells you exactly that the binomial mass function sums to one. But, but we're, right now just going to trust that we did our calculation right. So let's just go through an example of using the binomial mass function. So suppose you have an intrepid friend that has eight children and it turns out that seven of them are girls and none of them are twins. And let's. So if you're very persnickety, let's just forget about all the little persnickety things that you could possibly think about related to this problem like having twins. So let's just think about the problem in the conceptual way. We're going to think of every child from this family, its gender being a coin flip. And the question is what's the chance of getting seven out of eight children that are girls if it really is the case that every child their gender is independent from the other children and that there's a 50 percent probability at each birth of having a girl. And so what's the probability of getting seven or more. Well that's the probability of seven or more is the probability of getting seven girls out of eight plus the probability of getting eight girls out of eight and so in this case for the fair coin, it would be eight choose 7.5 to the seventh, one minus 0.5 to the one, and then eight choose eight, 0.5 to the eight, one minus 0.5 to zero. And, you know, check with your calculator, or, even better, with R, you know, that you can get this number. It's about four%. This is an example of using the binomial formula. I wanted it, to mention, this particular example because this calculation is an example of a so-called P value. And a P value is always the probability under a known hypothesis of getting a result as extreme or more extreme than the one actually obtained. So the logic behind the P value in this specific instance. Is that you have this evidence here that you think wow, seven girls out of eight children, that seems pretty odd. Maybe the 50 percent chance of girls versus boys for this particular family is off for whatever the reason. So the P value's saying okay why don't we calculate the probability. If the null hypothesis was true of getting an event this extreme, and if that probability's very low, then maybe that's an indication that our hypothesis, that the 50 percent is correct, is not right. So, at any rate, we'll talk about p values later. Well, I hope we'll get to talking about p values later. But I just wanted to mention that that's where the intuition behind this calculation is coming from. Right now, we're only using it as an illustration of plugging into the binomial formula. But I wanted to foreshadow kind of an important statistical technique. And then here in this page is the likelihood associated with p or this particular binomial experiment, if we're willing to model births as, as if they were binomial. And so here you can see, you know, .5 is in the one sixteenth likelihood but not in the one eighth likelihood. And then you can see, you know, that this likelihood is far more peaked around seven 8ths or its, reaches its maximum at seven 8ths, and then the curvature of it sort of gives you a sense of the relative evidence for the collection of possible values of p.