So, this section we'll look at confidence intervals for binary comparisons. We'll start this part one will look very similar to what we did with differences in means because we'll be looking at differences with proportions and computing their confidence intervals and the process will be similar. So, upon completion of this lecture section, you will be able to estimate and interpret a 95 percent confidence interval for a difference in proportions between two independent populations. So, this is an unpaired design as well. There is a pair type of study design for binary outcomes but it's rarely used so we're not going to consider it in this course. A majority almost 100 percent of what you'll see in public health, medicine, spatial and central research will be of the unpaired variety for continuous outcomes as well but especially for binary. So, let's start with a study we're well familiar with this is a response to therapy in a random sample of 1,000 HIV positive patients from a citywide clinical population. We stratify them before by their baseline CD4 count at the time of treatment to see if that was predictive of differential response rates and at least in the estimates we saw some differences. There is percentage responding among those with a lower CD4 counts at the start of the study was 25 percent as compared to 16 percent, among the group with higher CD4 counts at the beginning of the study. So, the summary measure we're going to focus on now because there are several ones we can use when we're comparing two proportion, is the difference in proportions and just to remind you of what that was in the study and how to interpret it, it was we compare the group less than 250 CD4 cells at the beginning to the group with the greater CD4 counts we compare their responses, the absolute difference was nine percent. So, the group with the lower CD4 counts had a nine percent greater response to therapy than compared to the group with CD4 counts of greater than or equal to 250 at the beginning of the study. There was a nine percent greater absolute risk of response to therapy in this lower CD4 count group as compared to the other. So, for these data, the resulting confidence interval it's going to come as no surprise, that's what we've been doing generally speaking for everything, is we take this estimate, this nine percent and we're going to add subtract plus or minus two standard errors and we're going to estimate the standard errors from our sample results as we've done previously. So, how do we get the standard error for this difference in proportions? Well, it's going to look thematically very similar to the standard error estimate for difference in means, the mechanics will be different but thematically it's the same. It's the following, standard error, the estimated standard error based on our sample results for this difference in proportions we can take the proportion for the first group times one minus self over the sample size for this first group. Then add that to the same thing but for the second group. This may look familiar to you because what we have for the first piece here is just what we called the standard error of that first proportion estimate. But it's been squared and we add that to the standard error squared for our second proportion. Again, we're taking a difference here but our standard errors are additive or uncertainties additives so again think about that and we can discuss further in discussion forum or live talks. But that's essentially the uncertainty in this difference in proportions is the sum of the uncertainties in the two proportions making up the difference but we square them both first or add them and then take the square root of that sum. So, for these data if you were to go through the computations, we know that the proportions are 25 percent and 16 percent respectively, the group that had 25 percent response they were 503 people, and the group that had 16 percent response they were 497. So if we plug and do the calculations here, the estimated standard error for this difference in proportions turns out to be 0.025 or two point five percent. For these data the resulting confidence interval we found by taking that difference in nine percent and adding or subtracting two standard errors. The standard error was 0.025 or two point five percent, so we'd add and subtract 0.05 or five percent. So, that's our margin of error for this difference in proportions in this study. We can estimate the true difference in proportions within plus or minus five percent. So that's our margin of error. So, how can we interpret this all in context, we could say well, we estimate nine percent greater response to therapy in the CD4 count less than 250 group as compared to the CD4 count greater than or equal to 250 group. After accounting for sampling variability this increased response it could be four percent, between four percent on the low end and 14 percent greater in the population. So, while there's some uncertainty in how much greater the response would be, all results, all possibilities point to a greater response in the population of persons with lower CD4 counts because that confidence interval does not include zero. Let's look at another one of our favorite studies the ram seminal study on giving AZT to HIV positive pregnant women and seeing whether that reduced maternal infant transmission. In our study, we saw pretty striking results in the estimates that the proportion of children who contracted HIV within 18 months after birth if their mothers were not treated was 22 percent, compared to only seven percent in the group whose mothers were given AZT. Certainly striking differences but we have to account for the fact that these estimates are based on imperfect sub samples of the populations of mothers treated and untreated and where we only have 180 in one group and 183 in the other. So, the observed difference in proportions if we do the direction AZT to placebo and certainly we could do it in the other direction as well and the end results would be comparable. So, if we do it in the difference of AZT placebo, the difference is negative 15 percent absolute reduction in HIV transmission to children born to mothers given AZT as compared to children mothers given placebo, or a 15 percent lower absolute risk of HIV transmission to children born to mothers given AZT. So, if we want to get the confidence interval for this difference we take our estimated difference of negative 15 percent and to extract two estimated standard errors to that difference, the estimated standard error turns out to be 0.036 or 3.6 percent. So, our margin of error for this study where we have 180 women in one group, 183 in the others plus or minus 0.072 or 7.2 percent. So, we can estimate this difference within plus or minus 7.2 percent. So, it turns out the difference, the confidence interval is negative 0.222 or 22 percent, 22.2 percent up to negative 0.078 or reduction of 7.8 percent on the absolute scale. So, how will we interpret this all? Well, we could say the import port if we were writing this up in an abstract or something, we could say the proportion of infants who tested positive for HIV within 18 months of birth was seven percent, and we could give the confidence intervals for each of the two results as well. Seven percent with a 95 percent confidence interval of four to 12 percent in AZT group, and 22 percent in the placebo group with the confidence interval is 16 to 20 percent. This is an absolute decrease of 15 percent associated with AZT. The study results estimate that the absolute decrease in the proportion of HIV positive infants born to HIV positive mothers associated with AZT could be as low as eight percent. Low in quotes because that's still a very sizeable reduction on the absolute scale and as large as 22 percent. One way to translate this into practice is to say something like if AZT we're given to a 1,000 HIV positive pregnant women, we would expect to see based on the results of this study 150 fewer mother-to-infant transmissions of HIV, but at the population level, after accounting for studying sampling error this reduction could be as large as 220 fewer transmissions or as "small" as 80 fewer transmissions. We can make this claim ostensibly because this was a randomized trial so the results pretty clearly showed that this is because of the treatment and not because of other differences in the children or mothers in these groups. Just to look at one more example here, this is the hormone replacement therapy results when the study was shut down because they observed an increase in the proportion of women developing coronary heart disease let us just see why. See if this increase was statistically significant. So, again, this was a study where a large number of women were postmenopausal women were 16,608 to be exact. They were recruited and randomized to receive hormone replacement therapy in other words estrogen or placebo. The main outcome of measure of interest was coronary heart disease. So, what they found at the end of this study is that we saw before that the observed proportion of women with coronary heart disease, developing coronary heart disease in the hormone replacement therapy group was 1.9 percent versus 1.5 percent in the placebo group. So, the risk difference for this it looks small numerically but certainly could have an impact on a large number of women in terms of percentage this was 0.004, 0.4 percent greater proportion of women developing coronary heart disease in the group that was given the replacement therapy. But the confidence interval goes from 0.0006, 0.06 percent up to 0.008 or 0.8 percent. So, low end, it certainly load values here relative to what we've seen before but notice that this does not include zero, and this is why ostensibly the trial was halted because there was a statistically significant increase in coronary heart disease in the women who were given hormone replacement therapy. Again, about these numbers look small, if we were looking at a group of one 100,000 women's who were postmenopausal, if they were given hormone replacement therapy we would expect to see 400 more cases of coronary heart disease developing. But this association could range in this group from 60 to 800 more cases. So, on the low end even a down trivial number of cases in these 100,000 women and certainly if we look at the total number of post-menopausal women worldwide, even this impact it looks relatively small on the absolute scale could be large in terms of the absolute numbers when you apply to a large large number of women. So, in general, in summary, computing confidence intervals for risk differences comparing two unpaired populations. Very similar to computing confidence intervals for mean differences comparing two unpaired populations. The resulting confidence interval gives a range of possible values for the risk difference or attributable risk between the two populations from which the two samples being compared had been taken. With randomized studies, the resulting confidence interval can estimate a range for the absolute impact of an intervention or treatment on a group of known size like we looked at with the AZT study or the hormone replacement study.