A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

187 ratings

Johns Hopkins University

187 ratings

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

From the lesson

Module 2C: Summarization and Measurement

This module consists of a single lecture set on time-to-event outcomes. Time-to-event data comes primarily from prospective cohort studies with subjects who haven to had the outcome of interest at their time of enrollment. These subjects are followed for a pre-established period of time until they either have there outcome, dropout during the active study period, or make it to the end of the study without having the outcome. The challenge with these data is that the time to the outcome is fully observed on some subjects, but not on those who do not have the outcome during their tenure in the study. Please see the posted learning objectives for each lecture set in this module for more details.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Okay in this section let's take a look at some review

exercises regarding what we've learned about

incidence rates and Kaplan Meier curves.

And like I usually do, I am going to lay out

the exercises to start and advise you to pause the tape.

Work on them at your own leisure and come back and compare.

The remainder of the video will be my take

on the solution and you can compare yours to mine.

So, the first thing we're going to look at is a

2011 article from the Archives of Pediatric and Adolescent Medicine.

Called The Effectiveness of an Early intervention

on Infant Feeding Practices and Tummy Time.

But we're going to focus on the infant

feeding practices part and in particular breast feeding.

And so as for the authors.

What they did was they randomized expectant

mothers to either be in an intervention

group or to not receive the intervention group.

And let me describe the intervention as per the authors here.

The intervention consisted of five or six home

visits from a specially trained research nurse delivering

a staged home-based intervention in the antenatal period

at one, three, five, nine, and 12 months.

And the study sample is they recruited 667 first-time mothers and

their infants in 2007 and 2008.

So mothers were either randomized to the intervention or to a control group.

So the authors report the results section compared with the control group.

The hazard ratio for stopping breastfeeding

in the intervention group was 0.82.

Recall that hazard ratio is a synonym for incidence rate ratio.

So what is being compared between the intervention

and the control group with that value 0.82?

Which group had a higher incidence of

stopping breastfeeding based on that reported ratio.

And then interpret the ratio in a sentence.

The authors include the following Kaplan-Meier curve

where the event being tracked is stopping breastfeeding.

So this is tracking ostensibly the time to stopping breast

feeding, and it's the traditional Kaplan-Meier presentation starting at one.

So does this presentation agree with the reported incidence rate ratio in

terms of reduced incidence of stopping breast feeding in the intervention group?

And what is the utility of having the Kaplan-Meier estimates

above and beyond having the initial incidence rate ratio estimate.

For the second question, exercise.

I'm going to have you look at a 2002 article which details the difference

in HIV risks for male and female

intravenous drug users in Vancouver, British Columbia.

And from the abstract, says in 1997, we found a higher

prevalence of HIV among female than among male injection users in Vancouver.

Factors associated with HIV incidence among women

in this setting were unknown, so in this present study.

We sought to compare HIV incidence rates among male and female drugs

an drug users in Vancouver to

compare factors associated with HIV sterile conversion.

And this analysis was based on over 900 participants recruited between May 1996

and December 2000 who were seronegative. For HIV negative an enrollment with at

least one follow up visit completed who

were studies perspectively until March of 2001.

And they say incidence rates were computed using the Kaplan-Meier method.

So, the authors report incidence rate ratio of contracting HIV in

the follow up period of 1.4 for women compared to men.

So first I'd like you to think about what, what type of study design is this.

And how would the numerator for this incidence rate ratio be computed.

You don't have to put in numbers but just generally talk about the idea.

Which group, males or females had a higher incidence of contracting HIV in

the follow up period?

And I'd like you to interpret the ratio in a sentence.

The authors also include this following Kaplan Meier

curve with the 1 minus survival curve presentation.

So that it tracks the proportion who have the event.

Not the proportion remaining event free. Where the event being tracked is HIV.

And you can see they compare the

cumulative incidence, the proportion of having the

event over time, for women and males separately.

So does this presentation that the authors show agree with the reported

incidence rate ratio in terms of increased incidence of HIV among the females?

And based on these curves, what is the tenth percentile

time to contracting HIV for males in the study and for

females in the study.

Finally, I just something I brought up in the

lecture and I told you in the review problems.

I want you to think about what would happen to the estimated Kaplan-Meier

curve if we had a sample of full, complete, and some censored data.

And we ignored this censored data, just left them

out of analysis, what would happen to the curve?

Okay, so now, I"m going to ask you to turn

off the tape, enjoy these exercises at your own leisure.

And when you're interested in reviewing your solutions compared

to mine, come on back and start right here.

Okay, welcome back, hopefully you enjoyed doing these exercises,

so let's just go through some of these answers.

So this is the first study we looked at in

terms of breastfeeding with the

intervention program versus the control group.

So compared with the control group, the hazard ratio

for stopping breastfeeding in the intervention group was 0.82.

So what is being compared between the intervention control group

with the value of 0.82.

And this is just to remind you what an incidence rate ratio is.

It's the incident rate of stopping breastfeeding

in the intervention group,

divided by the incidence rate, in the control.

So it would be the number of, mothers, stopping breastfeeding

in the intervention group divided by the total

up follow time, total followup time in the intervention group.

And then the same thing for the control group,

the number of mothers who stopped in the control group.

Stop breastfeeding, divided by the total follow up time.

So which group had a higher incidence of stopping breastfeeding?

Well this ratio is less than one, indicating that the group on top

of the ratio had a lower incidence than the group on the bottom.

And the group on top was the intervention group, so this

means that the control group had higher incidence than the intervention group.

because the relative incidence rates were such

that it's smaller for the intervention group.

And then how will we interpret this ratio in a sentence?

We can say something like.

We could, I mean, they've already done it,

but they didn't really explain what it was.

But if we want to actually impress

upon people that this relative ratio, this incidence

rate ratio indicated a reduced incidence of stopping

breast feeding among those in the intervention group.

We could rephrase this and say.

Compared with the control group,

the incidence rate for ceasing breastfeeding was

18% lesser for the intervention group. That's just one way we could explain it.

You could also repeat the sentence they gave.

Or we could say that the hazard, or the incidence

rate for stopping breastfeeding in the intervention group was 0.82 times

the incidence rate for stopping breastfeeding in the control group.

So there's several ways to do this.

The authors include the following Kaplan-Meier curve

where the event being tracked is stopping breastfeeding.

And at this point, everybody was still breast feeding

at the start of the study in both groups.

But then what do you see here? Which curve is on top here?

This is the intervention group.

And then the dotted line below is the control group.

So let's think about this for a minute.

This curve for the intervention group rides

higher than the curve for the control group.

Which means what?

That the proportion of people in the control group who are event free b i a

certain time tends to be higher than those

in the control group at each of those times.

The remaining event free, where the event

is stopping breast feeding, would mean that there's still breast feeding.

So, the higher curve for the intervation group means

that the breast feeding incidence over time is higher.

And hence, the instance of stopping breast feeding

is lower for the intervention compared to the control.

So does this presentation agree with the reported incidence rate ratio

in terms of the reduced incidence

of stopping breastfeeding in the intervention group?

Well, yes, it does, as we've just laid out.

What is the utility of having the Kaplan-Meier estimates

above and beyond having the incidence rate ratio estimate?

Well, the incidence rate ratio is an excellent relative comparison.

But it really doesn't tell us much about the rate of, cumulative

rate of stopping breast feeding at any given point in the follow-up period.

We don't know if it's on the order of a

small rate, hence a small percentage at any given time.

Or if it's larger.

And so having the Kaplan-Meier curve would allow us to look at what the percentage

who are still breast feeding is in each of the two groups at any given time.

To give us an underlying sense of the values that go into

that overall incidence rate ratio, which only gives us the relative comparison.

Now, let's look at the 2002 article detailing the difference in HIV

risks for male and female intravenous drug users in Vancouver, British Columbia.

And I already, we've already gone through the abstract.

So the author's report incidence rate ratio of contracting HIV

of 1.4 of women, for women in, compared to men.

So first of all, and this doesn't have anything to do with that incidence

rate ratio.

But it will filter our interpretation of it, ultimately.

What type of study design is this?

Well, the main exposure of interest here is

the person's sex in this drug using group.

and they took a sample of drug users in Vancouver, and followed them

over time. Where the exposure of interest is male or

female, which cannot be randomized. So this is what we call an observational

cohort study.

And so how is the numerator for this incidence rate ratio computed?

Well, this is obviously, it's the incidence rate of

HIV for females, compare the incidence rate for males.

So the numerator would be

the number of, number of

cases of HIV among females,

divided by the total follow

up time for the females.

And so the denominator would just be the same thing, but for males.

And the ratio compares those incidence rates.

So, which group had a higher incidence of contracting HIV?

Well, I think it's pretty clear this ratio that compares women to men is above one.

So the females had at least an estimated, this is only the sample based estimate.

But, but the females had a higher instance of contracting HIV in

this data set.

Because the ratio comparing females to males is greater than one.

Indicating that the group on top had a higher

value, or incidence rate, than the group on the denominator.

And then how will we interpret this in a sentence?

Well we could do this in several ways.

We could say, the study results report that females had 1.4 times the

incidence of HIV compared to males.

We could say that females have an estimated 40% greater

incidence of contracting HIV in the follow-up period compared to men.

And those are just two ways we could state that.

So the authors include the following Kaplan-Meyer

curve where the event being tracked is HIV.

But this is one that actually tracks the proportion who have the

event as importance to the proportion who haven't yet had the event.

So does this, a presentation agree with the reported incidence rate

ratio in terms of increased incidence of HIV among the females?

And I'll just go back to the slide for a

second here, so this is the female group on top.

This is the male group on the bottom, and

we can see that the curve rides higher for females.

Remember this is tracking the proportion that

actually have the event at a given time.

And

we can see that after the first couple months, the curves diverge.

And the female curve is higher than the

males across the rest of the follow up period.

So that does corroborate with the purported incidence rate

ratio we saw of 1.4 for females compared to males.

Then based on the curves, what is the 10% percentile of time to contracting

HIV for males and females?

So that would be the proportion, the time at which 10% of the

sample had, had the outcome of contracting HIV and the remaining 90% had not.

So we can estimate it, and I, I'm going to be the first to admit my

estimation won't be that good for my

ability to draw straight lines on this tablet.

But if we wanted to do this for males and females, we'd go to where we have

10% of the sample having contracted HIV.

And we trace that to both the male and female curves.

And then for each we draw down to where this

intersects the time axis. So very roughly speaking for females,

this 10% tile is on the order of 15 months after the start of the study.

And for males, this is on the order

of 28 months.

So this another way of expressing what we saw with

that incidence rate ratio of 1.4 in the sense that.

If we did this for various times or various percentiles working backwards,

we'd see the percentile values are lower for the females than males.

Indicating that the females have contracted HIV at

a faster rate than males in the followup period.

So finally, I ask you, what would happen to the

Kaplan-Meier curve estimate if the censored observations were not included?

So generally, we have this Kaplan-Meier curve that say, looks like this.

Just a very rough drawing here.

What would happen if we threw out

the censored observations, just totally ignored them?

Did not include them in our analysis.

Well, think about this for a minute, but our

curve would tend to drop more quickly than it should.

Because we've lost information about people who were still in

the study at a given time, but hadn't had the event.

So our

proportion, our number of people at risk over time is

lesser, which decreases our probabilities of making it beyond a certain time.

Given that you got there in the first place, because we're not recognizing those

who were censored as being there, or

potentially being there in the first place.

And that would ultimately impact the cumulative survival and

cause our curve to drop more quickly than it should.

Because we're throwing out information that really helps

inform us about the time you then process.

Alright, so hopefully you found these exercises helpful and

fun, and onward and upward to the next unit.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.