A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

134 ratings

Johns Hopkins University

134 ratings

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

From the lesson

Module 2C: Summarization and Measurement

This module consists of a single lecture set on time-to-event outcomes. Time-to-event data comes primarily from prospective cohort studies with subjects who haven to had the outcome of interest at their time of enrollment. These subjects are followed for a pre-established period of time until they either have there outcome, dropout during the active study period, or make it to the end of the study without having the outcome. The challenge with these data is that the time to the outcome is fully observed on some subjects, but not on those who do not have the outcome during their tenure in the study. Please see the posted learning objectives for each lecture set in this module for more details.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Okay in this section let's take a look at some review exercises regarding what we've learned about incidence rates and Kaplan Meier curves. And like I usually do, I am going to lay out the exercises to start and advise you to pause the tape.

Work on them at your own leisure and come back and compare. The remainder of the video will be my take on the solution and you can compare yours to mine.

So, the first thing we're going to look at is a 2011 article from the Archives of Pediatric and Adolescent Medicine. Called The Effectiveness of an Early intervention on Infant Feeding Practices and Tummy Time. But we're going to focus on the infant feeding practices part and in particular breast feeding. And so as for the authors. What they did was they randomized expectant mothers to either be in an intervention group or to not receive the intervention group. And let me describe the intervention as per the authors here. The intervention consisted of five or six home visits from a specially trained research nurse delivering a staged home-based intervention in the antenatal period at one, three, five, nine, and 12 months.

And the study sample is they recruited 667 first-time mothers and their infants in 2007 and 2008. So mothers were either randomized to the intervention or to a control group.

So the authors report the results section compared with the control group. The hazard ratio for stopping breastfeeding in the intervention group was 0.82. Recall that hazard ratio is a synonym for incidence rate ratio.

The authors include the following Kaplan-Meier curve where the event being tracked is stopping breastfeeding.

So this is tracking ostensibly the time to stopping breast feeding, and it's the traditional Kaplan-Meier presentation starting at one.

So does this presentation agree with the reported incidence rate ratio in terms of reduced incidence of stopping breast feeding in the intervention group?

And what is the utility of having the Kaplan-Meier estimates above and beyond having the initial incidence rate ratio estimate.

For the second question, exercise. I'm going to have you look at a 2002 article which details the difference in HIV risks for male and female intravenous drug users in Vancouver, British Columbia. And from the abstract, says in 1997, we found a higher prevalence of HIV among female than among male injection users in Vancouver.

Factors associated with HIV incidence among women in this setting were unknown, so in this present study. We sought to compare HIV incidence rates among male and female drugs an drug users in Vancouver to compare factors associated with HIV sterile conversion.

And this analysis was based on over 900 participants recruited between May 1996 and December 2000 who were seronegative. For HIV negative an enrollment with at least one follow up visit completed who were studies perspectively until March of 2001. And they say incidence rates were computed using the Kaplan-Meier method.

So, the authors report incidence rate ratio of contracting HIV in the follow up period of 1.4 for women compared to men. So first I'd like you to think about what, what type of study design is this.

And how would the numerator for this incidence rate ratio be computed. You don't have to put in numbers but just generally talk about the idea. Which group, males or females had a higher incidence of contracting HIV in the follow up period?

The authors also include this following Kaplan Meier curve with the 1 minus survival curve presentation. So that it tracks the proportion who have the event. Not the proportion remaining event free. Where the event being tracked is HIV. And you can see they compare the cumulative incidence, the proportion of having the event over time, for women and males separately.

So does this presentation that the authors show agree with the reported incidence rate ratio in terms of increased incidence of HIV among the females?

And based on these curves, what is the tenth percentile time to contracting HIV for males in the study and for females in the study.

Finally, I just something I brought up in the lecture and I told you in the review problems. I want you to think about what would happen to the estimated Kaplan-Meier curve if we had a sample of full, complete, and some censored data. And we ignored this censored data, just left them out of analysis, what would happen to the curve?

Okay, so now, I"m going to ask you to turn off the tape, enjoy these exercises at your own leisure. And when you're interested in reviewing your solutions compared to mine, come on back and start right here.

Okay, welcome back, hopefully you enjoyed doing these exercises, so let's just go through some of these answers. So this is the first study we looked at in terms of breastfeeding with the intervention program versus the control group. So compared with the control group, the hazard ratio for stopping breastfeeding in the intervention group was 0.82. So what is being compared between the intervention control group with the value of 0.82. And this is just to remind you what an incidence rate ratio is. It's the incident rate of stopping breastfeeding

So it would be the number of, mothers, stopping breastfeeding in the intervention group divided by the total up follow time, total followup time in the intervention group.

And then the same thing for the control group, the number of mothers who stopped in the control group.

So which group had a higher incidence of stopping breastfeeding? Well this ratio is less than one, indicating that the group on top of the ratio had a lower incidence than the group on the bottom. And the group on top was the intervention group, so this means that the control group had higher incidence than the intervention group. because the relative incidence rates were such that it's smaller for the intervention group.

And then how will we interpret this ratio in a sentence? We can say something like. We could, I mean, they've already done it, but they didn't really explain what it was. But if we want to actually impress upon people that this relative ratio, this incidence rate ratio indicated a reduced incidence of stopping breast feeding among those in the intervention group. We could rephrase this and say.

the incidence rate for ceasing breastfeeding was 18% lesser for the intervention group. That's just one way we could explain it. You could also repeat the sentence they gave. Or we could say that the hazard, or the incidence rate for stopping breastfeeding in the intervention group was 0.82 times

the incidence rate for stopping breastfeeding in the control group. So there's several ways to do this. The authors include the following Kaplan-Meier curve where the event being tracked is stopping breastfeeding.

And at this point, everybody was still breast feeding at the start of the study in both groups. But then what do you see here? Which curve is on top here? This is the intervention group.

And then the dotted line below is the control group. So let's think about this for a minute. This curve for the intervention group rides higher than the curve for the control group. Which means what? That the proportion of people in the control group who are event free b i a certain time tends to be higher than those in the control group at each of those times. The remaining event free, where the event is stopping breast feeding, would mean that there's still breast feeding.

So, the higher curve for the intervation group means that the breast feeding incidence over time is higher. And hence, the instance of stopping breast feeding is lower for the intervention compared to the control.

So does this presentation agree with the reported incidence rate ratio in terms of the reduced incidence of stopping breastfeeding in the intervention group? Well, yes, it does, as we've just laid out.

What is the utility of having the Kaplan-Meier estimates above and beyond having the incidence rate ratio estimate? Well, the incidence rate ratio is an excellent relative comparison. But it really doesn't tell us much about the rate of, cumulative rate of stopping breast feeding at any given point in the follow-up period. We don't know if it's on the order of a small rate, hence a small percentage at any given time. Or if it's larger. And so having the Kaplan-Meier curve would allow us to look at what the percentage who are still breast feeding is in each of the two groups at any given time. To give us an underlying sense of the values that go into that overall incidence rate ratio, which only gives us the relative comparison.

Now, let's look at the 2002 article detailing the difference in HIV risks for male and female intravenous drug users in Vancouver, British Columbia. And I already, we've already gone through the abstract.

So the author's report incidence rate ratio of contracting HIV of 1.4 of women, for women in, compared to men. So first of all, and this doesn't have anything to do with that incidence rate ratio. But it will filter our interpretation of it, ultimately. What type of study design is this?

Well, the main exposure of interest here is the person's sex in this drug using group. and they took a sample of drug users in Vancouver, and followed them over time. Where the exposure of interest is male or female, which cannot be randomized. So this is what we call an observational cohort study.

And so how is the numerator for this incidence rate ratio computed? Well, this is obviously, it's the incidence rate of HIV for females, compare the incidence rate for males. So the numerator would be the number of, number of cases of HIV among females, divided by the total follow up time for the females.

And so the denominator would just be the same thing, but for males. And the ratio compares those incidence rates.

So, which group had a higher incidence of contracting HIV? Well, I think it's pretty clear this ratio that compares women to men is above one. So the females had at least an estimated, this is only the sample based estimate. But, but the females had a higher instance of contracting HIV in this data set.

Indicating that the group on top had a higher value, or incidence rate, than the group on the denominator.

And then how will we interpret this in a sentence? Well we could do this in several ways. We could say, the study results report that females had 1.4 times the incidence of HIV compared to males.

incidence of contracting HIV in the follow-up period compared to men. And those are just two ways we could state that.

So the authors include the following Kaplan-Meyer curve where the event being tracked is HIV. But this is one that actually tracks the proportion who have the event as importance to the proportion who haven't yet had the event.

So does this, a presentation agree with the reported incidence rate ratio in terms of increased incidence of HIV among the females? And I'll just go back to the slide for a second here, so this is the female group on top. This is the male group on the bottom, and we can see that the curve rides higher for females. Remember this is tracking the proportion that actually have the event at a given time. And

we can see that after the first couple months, the curves diverge. And the female curve is higher than the males across the rest of the follow up period. So that does corroborate with the purported incidence rate ratio we saw of 1.4 for females compared to males.

Then based on the curves, what is the 10% percentile of time to contracting HIV for males and females? So that would be the proportion, the time at which 10% of the sample had, had the outcome of contracting HIV and the remaining 90% had not. So we can estimate it, and I, I'm going to be the first to admit my estimation won't be that good for my ability to draw straight lines on this tablet. But if we wanted to do this for males and females, we'd go to where we have 10% of the sample having contracted HIV. And we trace that to both the male and female curves.

intersects the time axis. So very roughly speaking for females, this 10% tile is on the order of 15 months after the start of the study.

So this another way of expressing what we saw with that incidence rate ratio of 1.4 in the sense that. If we did this for various times or various percentiles working backwards, we'd see the percentile values are lower for the females than males. Indicating that the females have contracted HIV at a faster rate than males in the followup period.

So finally, I ask you, what would happen to the Kaplan-Meier curve estimate if the censored observations were not included? So generally, we have this Kaplan-Meier curve that say, looks like this.

Just a very rough drawing here. What would happen if we threw out the censored observations, just totally ignored them? Did not include them in our analysis. Well, think about this for a minute, but our curve would tend to drop more quickly than it should.

Because we've lost information about people who were still in the study at a given time, but hadn't had the event. So our proportion, our number of people at risk over time is lesser, which decreases our probabilities of making it beyond a certain time. Given that you got there in the first place, because we're not recognizing those who were censored as being there, or potentially being there in the first place. And that would ultimately impact the cumulative survival and cause our curve to drop more quickly than it should. Because we're throwing out information that really helps inform us about the time you then process.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.