A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

135 ratings

Johns Hopkins University

135 ratings

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

From the lesson

Module 2B: Summarization and Measurement

Module 2B includes a single lecture set on summarizing binary outcomes. While at first, summarization of binary outcome may seem simpler than that of continuous outcomes, things get more complicated with group comparisons. Included in the module are examples of and comparisons between risk differences, relative risk and odds ratios. Please see the posted learning objectives for these this module for more details.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in this section we're going to talk about one more way to quantify the association between a binary outcome between two or more populations using sample results. And we'll, we'll talking about a number that oft reported in journal articles and used in epidemiology and other sciences called the odds ratio.

So, in this lecture section what we're going to be able to do is quantify the association between a binary variable outcome between two or more groups as this thing called an odds ratio. And we're going to compare it to the relative risk. Both are ratios of some function of the sample proportions, and we'll compare and contrast them both conceptually and numerically in this section.

So again, we will start this section by looking at our data set on the thousand HIV positive patients from a city wide clinical population. And recall we has broken these into subgroups: those, the sample portion whose CD4 counts at the start of therapy were less than 250, as compared to the group whose CD4 counts are greater than or equal to 250 at the start of therapy. And what we were doing is summarizing the portion who responded in each of these CD4 count groups. And you'll probably recall from the previous section, that 25% of those whose starting CD4 counts were less than 250 responded to the therapy, as compared to 16% in the group the CD4 counts were greater than or equal to 250 at the start of therapy.

So we have already shown how to summarize this in two ways. We're going to go to a third in this section, but just to refresh your memory, if we took the difference in these proportions, the risk difference or attributable risk. The 25% responding in the first group, the group with less than 250 CD4 counts, minus the 16% who responded in the group with CD4 counts of greater than or equal to 250, was a positive .09, or 9%, 9% greater on the absolute scale. 9% greater response for the lower CD4 count group. We also talked about another measure using the exact same two numbers, which would be the ratio of proportions, or the relative risk, or the risk ratio. Those are synonyms for each other. And where we take this 25%, instead of subtracting the 16%, the response proportion in the sample was greater than or equal to 250 CD four counts, we divide by that. So the ratio is the 25% responding in the lower CD4 count, divided by the 16% responding in the larger CD4 count group. That gives us the relative risk of 1.56, 56% greater chance of responding for the group with the lower CD4 count at the start of therapy. That's a relative comparison. The third measure we're going to look at, is something called the odds ratio or the relative odds. So what is odds?

What does it mean? We've all heard it used colloquially, probably as a synonym for proportionate probability, but it actually has an explicit definition that we're going to look at here. The odds of an event, and we're going to talk about the estimated odds because that's all we can get at from sample data, the estimated odds of an event is a function of the risk or probability of the event occuring, but it's not exactly the risk. It's the risk of the event occuring, divided by the risk or probability of it not occuring. So, it's our sample proportion, the percentage with the outcome, divided by one minus that sample proportion with the percentage without the outcome. And, there is a relationship between risk and odds, they track in the same direction. So as the risk sample proportion increases,

so does the odds. So, for example, let's like make a little chart here of risk or proportion versus odds. So let's make a little hand-drawn table, here, of risk versus odds.

So suppose your risk is zero, in the sample, nobody has the outcome we're looking at, for example. What would the odds be? Well, the proportion, we have the outcome as zero. So p hat is 0. 1 minus p hat is 1. The odds is 0 over 1, or 0. So they're equal in that case. And how about, suppose the outcome, the percentage having the outcome is 25%? So, p hat is 0.25. Well, in that case, what is the odds? It's 0.25, the percentage with the outcome divided by 1 - 0.25 or 0.75, the percentage without the outcome, 75% do not have the outcome. And so the odds is 0.33, or one in three. So it's not, it's not exactly equal to the risk, or probability or proportion of 25%. If the risk is .5, or 50%, then the odds is .5 divided by one times .5, which is also .5, or one. And you've heard the expression 50:50 odds, well that just means that the risk of having the event, or outcome, is the same as the risk of not, 50% for both. If we get into a risk or proportion of over 0.5, 0.75 for example, then the odds that corresponds to this is 0.75 over 0.25, or 3:1 odds, it's 3. And we could keep going with this, if we up the probability, or risk of an outcome to 0.95, the corresponding odds is 0.95 over 0.05 which is equal to 19, odds of 19:1, of having the outcome. As the risk gets closer to one, the odds gets larger, numerically, and as our risk approaches one, the odds approaches infinity.

So now, let's look at the odds of responding to therapy for each of our two CD4 count groups. So, for the first group, the group who's CD4 count at the start of their therapy was less than 250, the proportion responding in our sample of 503, was 25%. So, the odds, is the 25%, the chance or probability or proportion of responding, divided by the chance, or proportion not responding, 75%. And this is equal to 0.33, or that 1 in 3 we saw on the previous slide. For the other group, the group with the CD4 count greater than or equal to 250 at the start of therapy, the proportion responding in our study was 16%. So the odds response is 16%, the proportion or probability responding, divided by the proportion who did not respond, or the probability of not responding, 84%. So now, now that we have the odds for the two groups, the odds ratio takes these odds for the two respective groups and compares them in a ratio format. So our numerator, if you want to do the odds ratio and the comparison in the same direction we've been doing of the group with the lesser CD4 count at the start of therapy compared to the group with the greater CD4 count, we'd take the odds of response for the group with CD4 count less than 250, which is that, one in three we talked about.

That would be our numerator for this comparison, and the odds for the group who had the greated CD4 count, that 0.16 dived by 0.84, would be the numerator for this comparison, this ratio, not of the proportion or probability responding directly, but of this function, the odds. And what we get here, is a number, 1.75. So, this is a ratio that compares the relative odds or response for the first group compared to the second. So, the group with the lower starting CD4 counts had 75% greater odds of responding as compared to the group with the greater than or equal to 250 CD4 counts. So how can we interpret this odds ratio? Well as I said before, one way to say this would be, to say the group with CD4 counts of less than 250 at the start of therapy has 1.75 times the odds of responding to therapy, as compared to the group whose CD4 counts are greater than 250 at the start of therapy. Or rephrased, the group with the lower CD4 count has 75% greater odds of responding to therapy than the group with the higher CD4 counts at the start of therapy.

Now this odds ratio sounds suspiciously like a relative risk. But it's actually not. It's a little more obfuscating than a relative risk. It's not a direct comparison of risks but a comparison of this function of risks, the odds for each of the two groups.

So let's just talk about the difference between these two and we'll reinforce this throughout the rest of the course. But here's some commonalities. Relative risk and odds ratio will always agree in the terms of the direction of comparison. So, if one group has a higher risk than the other, it will subsequently have a higher odds, and vice-versa. But the odds ratio and relative risk will not always be the same value, and we'll look at some more examples and talk more about under what conditions these are similar versus not. But just the start, in this example, we can see that both the relative risk of 1.56 and the odds ratio of 1.75 were greater than one, indicating a greater response to therapy for the group with the lesser CD4 counts at the start of therapy. But, by one metric the risk is 56% higher, and by the other metric the odds, not the risk, but the odds which is a function of risk, is 70%, 75% higher. And these do not fully agree numerically.

So now let's look at our HIV in infant, maternal infant transmission example, and add the odds ratio to our measures of association. So, again, this is the two by two table characterizing the outcome of HIV maternal infant transmission amongst the mothers, pregnant mothers with HIV, who were given AZT during pregnancy, compared to the mothers who were treated with a placebo, were not getting treatment. We've seen this before,

and we already talked about the risk difference, or difference in proportions. Again, the underlying proportion of children who contracted HIV within 18 months, passed on from the mother was 7%, for the mothers who were given AZT, compared to 22% to children born to mothers given the placebo. We saw that that was risk difference, or attributable risk of -15%. And then, we saw that the relative risk, that 7% divided by the 22%, was the risk, relative risk of 0.32, we define that. So let's compute the odds ratio. So what we need to compute this is that we need to compute the odds of a maternal HIV infant transmission for children born to mothers who were given AZT. So that would be the 7%, the proportion who actually contracted HIV within 18 months, divided by the 93% that didn't. This is the risk of contracting HIV divided by one minus that risk, or the risk of not contracting HIV. That's our odds. And then we need to do the same thing, compute the odds for the placebo group. Children born to mothers in the placebo group, and that was that 22% who contracted HIV, divided by the 78% who didn't. And we take this ratio of the two odds, it turns out to be 0.27. So it is also less than one but its not exactly the same value of this relative risk here. So how could we interpret this? We could say that the AZT group has 0.27 times the odds of HIV to child transmission of the placebo group. The odds for the mothers given AZT, of transmitting to their child, is .27 times the odds for the other group. Another way to say this is that the AZT group has 73% lower odds of HIV to child transmission relative to the placebo group.

I'll let u confirm that with the raw odds, taking that percent difference or we could say that 0.27 is really the ratio of 0.27 to 1. And, 0.27 is 73% lower than 1, the starting point.

So let's compare and contrast the relative risk and odds ratio. Again, in this example the relative risk and odds ratios are .32, for the estimated relative risk, and .27, for the estimated odds ratio, with respect to they are numerically slightly different but they both indicate a lower numerator than denominator.

Okay. So, how do we interpret this odds ratio substantively? Well, as with the relative risk, the odds ratio can be interpreted as the impact, assuming causation of the exposure, or the treatment in this case, at the individual level.

A HIV positive pregnant woman, can reduce her individual odds of passing or transmitting HIV to her child by 73% if she takes AZT during pregnancy compared to if she didn't. Again, this odds ratio though, does not directly compare the probability or risks or proportions of an outcome, but instead compares this function of risk, the odds.

Both measures use the exact same information again. So we're taking these two proportions and we've seen now we can compare them in three different ways the risk difference and then in terms of ratios, the relative risk or the relative odds.

So if the relative risk estimate, P1 over P2 generically, P1 hat over P2 hat, is greater than 1, then the relative odds for the two groups will be greater than 1. In other words, if the relative risk estimate is greater than 1, so will the relative odds estimate be greater than 1. Similarly, if the relative risk estimate

is less than 1, the resulting odds ratio will be less than 1. And if the relative risk estimate is equal to 1, then the odds ratio be equal to one.

So they will concur in terms of the direction and whether or not these equality but they won't necessarily be the same numerically.

The smaller that the estimated proportions are in the two samples we're comparing the closer in numerical value the relative risk in odds ratio will be. So the rarer the outcome in the two groups we're comparing, the closer in value these two things will be. Why is that? Why do you think that it is? Well, recall the odds ratio is equal to the risk in the first group divided by one minus that risk, that's the odds for the first group divided by the same comparison but for the second group.

So think about it, if P1, and P2 are very, both small, like very close to 0, the closer they get to 0 or close to 0. Close, I'm going to put in quotes here.

The closer they are to zero the closer 1 - p hat 1 and 1 - p hat 2 are to one. So, when P1 hat and P2 hat are close to zero the odds for both groups are close to the risks in those two groups and the odds ratio is close to the relative risk.

So we can have equivalence or near equivalence with smaller underlying proportions in the groups we're comparing.

So let me give you an example of that. Let's go back to our example with Aspirin and cardiovascular disease development in women. This is that randomized trial reported on women with 45 years of age or older who received 100 milligrams of Aspirin on alternative days or alternate days or a placebo. And then they were followed for ten years.

So again, if we look at the proportion of persons who, women who developed cardiovascular disease, in the two groups, over the ten-year follow-up period, we saw that 2.4% in the Aspirin group as compared to 2.6% in the placebo group. And so when we computed the risk difference, it was low numerically, was a negative 0.002 or negative 0.2%. And the relative risk of that 2.4% in the Aspirin group compared to the 2.6% in the placebo group was 0.92. Well, in this situation if we compute the odds ratio for CVD, cardiovascular disease, within ten years, the odds for the Aspirin group relative to the placebo group, well, these proportions are quote unquote small, 2.4% and 2.6%. And if we look at the odds ratio, and I'll let you verify this, it's very close to the relative risk of 0.92 in this example. And this is compared to the previous two examples. The underlying proportions the two groups we're comparing were smaller and so the odds ratio relative risk were close certain numerical value that what we saw on the previous two examples. So how would we interpret this? Well, we could say the Aspirin group has 0.92 times the odds of developing cardiovascular disease as compared to placebo group, or the Aspirin group has 8% lower odds of developing cardiovascular disease than the placebo group.

So, the relative risk versus the odds ratio in this example, they're identical in value, unlike the previous two examples. And that has to do with the fact that the proportions who have the outcome where smaller in both groups. So, a question you've probably been thinking about since the beginning of this lecture, is why do we even bother with the odds ratio? You know, it seems sort of out of left field. It's not a direct comparison of the probabilities or proportions. It seems less intuitive than the other two. In many ways, especially on the ratio scale, the odds ratio is less intuitive. And less and a less direct measure of association than the relative risk.

So, why do we even deal with this odds ratio? Why am I bringing it up here? Why do we talk about it in public health? Well, there's two reasons.

In some types of studies, something called a case control study, which I mentioned in the beginning of this course, but we'll spend a little more time on in term two, the odds ratio is the only measure of association that can be estimated,

and we'll talk about why. That's a little street marketing for, like section two of the course, but we'll talk about why when we get there. In logistic regression, which is a method to extend what we're doing in these sections here, which is also coming in term two, the results we get are initially presented as odd ratios and hence frequently presented as such in publications. So, we want to be familiar with what this is, but also how differs from the other measures of association.

With more than two categories, how could we compare the odds? This is extension of what we did with the risks, as well. A common practice is to designate one of the categories as the reference group and present comparisons of all other categories to this reference.

And while the chosen reference group or the choice of reference group is arbitrary, in many cases it, again, is purposely chosen to highlight the substantive emphasis of the, manuscript or the presentation.

And so from the abstract of this paper that was published in the American Journal of Epidemiology, say data from the third National Health and Nutrition Examination Survey, TNHNE Survey, conducted between 1988 and 1994, were used to examine the relation between obesity and depression. And, they used past month depression, was defined using criteria from the DSM of mental disorders, the third edition. And was measured with a diagnostic interview schedule. And then they used body mass index to define obesity using the cutoff of 30 or higher. And they compared the risks of depression in obese and normal weight persons as characterized by the BMI index.

So here's a table that they use to compare depression-related outcomes measured at different times the past-month, in terms the survey, the past year, lifetime, and recurring. And they did this for all respondants and they compared obesity categories and then they did it separately for females and males in this large table. I'm going to zoom one section of this table and relate it to what we're doing here.

And what they did, I'm going to focus on the outcome, the binary outcome of past-month major depression. They asked each of the responders did they have depression in the previous month.

And the answer was yes or no. It was the binary outcome. And they actually did, it's a little hard to see from this table, but I'm going to cordon off the area where they classified people in terms of different obesity categories. And what they did here is they actually classified people as being normal weight, underweight, overweight or obese, defined by their BMI. So they had four overall categories of weight. And what they do here, you see this column here, back here, where they do an odds ratio, and then they do a confidence interval with that, and we'll get to confidence intervals shortly in the course. This odds ratio column, we have to pay attention to what is going on here. And, a lot of times, there is only so much information in the table, and a lot of it is relegated to the footnote, but if you see where they have the, so this area here, is separate from the previous part, so this looks at the obesity categories. And you notice that the odds ratio they give for the normal weight category is a one, and they put this symbol next to it. And the symbol is actually a footnote, and where they I'm showing you the piece of the footnotes here where we see the symbol. They say this designates the reference category. So, what they're getting at is that odds ratio of 1.0 is the normal weight category compared to the reference category, which is it itself. And of course the odds are equal for normal weight, and normal weight. And that is sometimes how they designate that as 1.0. This is the category whose odd of past-month major depression we're going to compare to for the other categories of weight. So now we've got the underweight category, the BMI less than 18.5, and the reported odds ratio here 1.17. Well, this 1.17 is the odds of depression in the previous month for the underweight category divided by the odds of depression the previous month for the reference group, the normal weight. So they suggest that those who were underweight had 1.17 times the odds of being depressed in the previous month compared to those of normal weight. They had 17% higher odds. The next ratio here compares

the relative odds of depression in the previous month for the overweight group compared to the same reference group of normal weight. So this .86 here is the relative odds for the overweight group. Odds of previous month depression divided by the odds for the normal weight group. The same reference group as the previous comparison and it's 0.86. So that suggests that the relative odds. This is on the odds ratio scale. This is not a risk ratio, but it's the relative odds of being depressed in the previous month, were 14% lower for the overweight group compared to the normal weight group. And then when they get to the obese group, and present this odds ratio, this 1.88. This is comparing the odds of depression in the previous month for the obese group to the same reference that's been used in the other comparisons, the normal weight group.

So the relative odds of past-month depression for the obese group is 80% higher or 1.80 times that from the normal weight group. So, now I could ask you and I will probably do so in the extra exercises section. But, given that all these odds ratios are comparing the respective groups to the same reference you could back into, for example, the relative odds for depression, in the previous month, for those who were obese compared to those who were overweight. Just think about that for now.

So, in summary, the odds ratio, the estimated odds ratio, or OR hat provides an alternative to the relative risk estimate, RR hat, for quantifying the association between a binary outcome between groups.

The odds ratio is a ratio of odds between two groups. Odds is related to risk or the probability of proportion of an outcome, but it's not exactly the same thing. It's a function. The odds ratio and relative risk both estimate the association between a binary outcome between groups at the individual level. And these will agree in terms of direction, but not always magnitude.

The smaller the risk or proportion of the outcome in the groups being compared, the more similar these two quantities will be.

So something to think about, what is the odds ratio? How does it differ from the relative risk? And like I said, we're introducing it here because it is a legitimate measure of binary outcomes across groups.

And it will have more relevance when we get further in the course and we'll see that there are certain types of situations where it becomes the only thing we can estimate correctly.

So the next section we'll actually talk about an interesting property of ratios that we'll want to give pause to think about. And one of the quirky things about ratios is the range of possibilities for associations that are negative, meaning that the group and the top of the ratio, the numerator, has lower risk or value or odds, than the denominator. The range of possibilities for that type of association is very different than the range of possibilities than where the group on top is larger than the bottom. And that can make for difficulty interpreting things depending on the direction that we've purported in. So in the next section we'll talk about this property and one of the ways to deal with it.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.