So now that we've shown how to relatively easily compute confidence intervals for population means, proportions, and incidence rates. Let's we revisit the idea of what is a confidence interval, and how should it be interpreted? And this is not the only time we'll do this, we'll keep reflecting throughout all our discussion on confidence intervals in this set and in the next lecture sets. But for this particular section, we're going to try and gain a conceptual and practical understanding of how to interpret a confidence interval for a single population parameter. Think critically about when the confidence level is necessary verses not. And give some basic insight as to why the 95% confidence interval became the standard for research. So a confidence interval for a population parameter, whether it be a mean, proportion, or incidence rates, is an interval that factors in the uncertainty in our estimate for the parameter. The uncertainty comes from the fact that we're using data from an imperfect sample. And a confidence interval can be interpreted as a range of plausible values for the unknown truth that we can only estimate. And confidence intervals can allow for different levels of uncertainty, 90%, 95%, 99%, etc. However, the standard is 95%, so this is what we will use from here on in in the class. When a confidence interval is created from a sample of data, the resulting interval ultimately includes the value of the unknown true parameter, or does not. So we'll never know whether any single confidence interval we've computed from a single sample, whether or not that interval contains the truth or not. It's either in there or not, and we don't know which is the case. So what about the 95% part we talk about when we say 95% confidence interval? So the 95%, or other percent if other level is used, the 95% refers to how often this approach to creating a 95% confidence interval works in general. In other words, for 95% of samples randomly selected from a population, the 95% confidence interval created from the sample will contain the true value of interest. Whether it be a population mean, proportion, or incidence rate. So let me just show you another example via simulation to show you what I mean. So I took a population of US hospitals, or I'm treating it as a population in a given year, 2016. For each hospital in the United States in the year 2016, we have the number of discharges for urinary conditions, based on ICD-9 code classification. So the true average discharge count in this, I'm considering this to be a population with 69.2 patients in 2016 across all hospitals. So then what I did was repeatedly took 100 random samples of 250 hospitals at a time from this population and computed the average number of discharges across those 250 hospitals. And then graphed the discharge number, the average number of discharges represented by a little line in the middle of each of these graphics here. And then the confidence interval, based on the results from that sample. So what's shown in the graph is this horizontal line here 69.2 is the true population mean. And what I'm graphing here is the estimated sample mean, number of discharges from each sample, 250, and then the computed confidence interval. So in this first interval, the estimated sample mean is larger than the true population mean. But the confidence interval includes that true value, it crosses the horizontal line here. In the second sample, the estimate is slightly smaller than the truth, but again, the confidence interval crosses this. And if you look across here, across the 100 sample results I got, most of these confidence intervals include the truth. In some cases the estimate is larger than truth, in some cases it's smaller, it's closer in some situations than others. But most of these confidence intervals include that true value of 69.2 within their endpoints, except for these four that I've highlighted here. So of these 100 in this simulation, we did a little better than 95%, 96% of the confidence intervals included the truth. But if we were to do this process our over an infinite number of samples, that would converge to 95%. So the problem is, we're going to get one of these samples and one of these confidence intervals, and we don't know whether it's this one, this one. Well, we certainly know the confidence interval, the resulting mean and confidence interval. But we don't know whether we've got an interval that includes the truth or not. So this is the tricky concept of 95% confidnece. We know the method for creating a 95% confidence intervals works 95% of the time. 95% of the time applying this method will create interval that includes the unknown truth. The problem, though, is we never know if one of our intervals is when the method worked. So we'll never actually know whether we're one of the lucky 95% of the times this process worked or not. Consider this, contrast this with the idea of going to a mechanic who can fix a very specialized problem on a particular kind of car, such that he or she fixes it 95% of the time. You might choose them because that's a pretty good success record. But you will ultimately know whether their procedure worked or not on your car, right? It's the same idea here, except you'll never know if it actually worked. So again, in this example, the population data were specified, I used that sample of all US hospitals in the year 2016 as my starting population. So in real life research, however, we don't know the truth. We can't simulate based on the truth, because we don't know it. We can only estimate the truth about a population from an imperfect data sample. And so the confidence interval provides a method for combining the best sample-based estimate of a population-level quantity with the estimate of the uncertainty in this quality in regard to estimate the true population level value. So let's think about our confidence interval, of the true mean blood pressure in the clinical population, the men based on the results of a random sample of 113 men. So our summary sample statistics included the sample mean of 123.6 millimeters of mercury, sample standard deviation of 12.9. And a 95% confidence interval for the underlying true mean was 121.2 millimeters of mercury to 126.0. So now we have a sense that the blood pressure in this population is relatively healthy. Our average was maybe a little high, but generally healthy for systolic blood pressure. And all possibilities for the truth in this population run within a healthy range. So even though this interval doesn't narrow it down to a very tight range, it suggested all possibilities for the true mean among these men is a healthy systolic blood pressure. So we've got this confidence interval, so I'll give you the bad news about this confidence interval first. The bad news is the researcher who did this study, and actually none of us will ever know, will never know if the true systolic blood pressure in this population of men falls between 121.2 millimeters of mercury and 126.0. We'll never know if we're one of the lucky 95% of samples whose confidence interval contains the truth, or one of the outlying 5% whose confidence interval does not. The good news is, however, the method we used to create this works most of the time. And so we can go into this and use this method. And while we'll never whether the result encapsulates the truth or not, the process by which we create the interval has very good opportunity to do so. Similarly with response to treatment among HIV positive individuals. When we had the sample of a 1,000 persons, a 95% confidence interval for the true proportions responding in the population from which the sample was taken was based on the observed sample proportion of 20.6% plus or minus 2 estimated standard errors. With rounding, it come out to be 18% to 23%. So the bad news is, the researcher will never know if the true population proportion of responders is in the given interval, never know. But the good news is that the method used to create this confidence interval works most the time. So you're going into this study, taking a sample, and using this method. The success rate is very high, 95% of the time. It's just unfortunate we'll never be able to know whether we've achieved that success or not. You might be thinking, and we've sort of alluded to this before, in some situations, do we really need confidence intervals? So for example, consider our Heritage Health length of stay study. This data, these data were all, it was the group of all patients in the Heritage Health System who had at least one inpatient stay in 2011. So there were 12,928 individuals who met that criteria, that was everybody who met that criteria, though. So you might say, well, don't we have a population of interest? So can't we just take the mean in this population of interest to be the true mean in that year among all patients in that year? And the answer is yes. If we just consider this year to be of interest, we could certainly just take that 4.3 days to be the true mean for those 12,928 persons who make up that population in 2011, and be done with it. But in a lot of situations, we might be interested in learning more about the process that manifested these data. And we might say, well, certainly this is all patients in one given year, so it is the population of patients in that year. But maybe we want to think of that sample as being, or as that group as being a sample of a process that occurs yearly. So we have one manifestation of a group of patients having inputs patient stays in one year, and this could vary. The sample we get on a yearly basis could be different just by chance. And we want to understand the uncertainty in that process of estimating the underlying true mean length of stay across multiple years using the observation from one year. So there's a philosophical question at hand when you have situations like this. How did 95% become the industry standard? Well, it's mostly out of mathematical convenience. I showed you back in lecture set three that really complicated formula for computing areas under the normal curve, where you'd have to integrate the complex exponential function. And so back in the pre-computer ages, tables were made that only had certain cutoffs under the normal curve, because of the difficulty of computation. So initially the tables, the best level, if you will, the best coverage you could get under the curve was plus or minus 1.76, or as we round up to 2 standard errors, or out to 3. There was no information about in between these values. And so Pearsons decided that, well, We get 95% confidence for going plus or minus 2 standard errors from either our mean, or this is really just generic, I could just say estimate. So we get 95% confidence for that. If we wanted 99%, we'd have to go out plus or minus 3 standard errors. So there's an extreme law of diminishing returns here in order to get, it's really 2.65, but I like to round up. So that's maybe intense rounding on my part, but nevertheless, the point is that we'd have to go a lot farther, our confidence would be a lot wider just to get 4% more certainty in the process. So at some point the culture said, well, we can either do 2 or 3 based on the tables that exist now. And then culture decided in research that 95% confidence was fine, and it wasn't worth spending the extra amount in terms of interval width to get slightly greater level of confidence. And that's pretty much how it all came about to be, and 95% now is the industry standard. So again, the process by which we construct 95% confidence intervals yields an interval that contains the unknown truth 95% of the time. Any single 95% confidence interval that's been created either contains the truth or not. We will never know whether the method actually worked for us. But what we're banking on the fact that it works 95% of the time. And so the likelihood of ending up with interval at least that has the truth is very high.