A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

135 ratings

Johns Hopkins University

135 ratings

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

From the lesson

Module 4B: Making Group Comparisons: The Hypothesis Testing Approach

Module 4B extends the hypothesis tests for two populations comparisons to "omnibus" tests for comparing means, proportions or incidence rates between more than two populations with one test

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in this section we'll look at sample size computations, the inputs, and the results for studies comparing two or more proportions or incidence rates. We'll actually look at examples for comparing two groups for both measures but you could extend the idea that we've set up for means for more than two groups to the same sort of thinking for proportions or incidence rates where you look at the necessary sample size for each unique two-group comparison and take the maximum across all of those.

So upon completion of this lecture section, you will be able to describe the relationship between power and sample size with regard to the size of minimum detectable difference in proportions or incidence rates between two groups. And understand the impact of designing studies to have equal versus unequal sizes on the total sample size necessary to have a certain power. And it will be the same situation as we saw with means in terms of the impact that has.

So the idea of comparing two proportions, the inputs we need for the software. To do this are actually simpler for proportions than with means, it's the same idea as if with comparing means, except that we don't need any standard deviation estimate for the values in each of our groups because you may recall the standard deviation of a proportion is a function of the proportion itself. So once we specify the expected proportions in the groups we're comparing, that's taken care of.

So we can find the necessary sample sizes of a study if we specify the alpha level of the test which will again almost uniformly be 0.05.

Specific values for the two proportions, And hence the difference in proportions between the two groups we're comparing, and that usually represents the minimum scientific difference of interest, and then the desired power which is generally 80% or sometimes 90%.

So let's go back to our peptic ulcer example where we have the two drugs for the treatment of peptic ulcer. This was the situation where we saw a small study done that showed very large difference in the percentage of people healed from peptic ulcer and the two drug groups being compared, 77% in the first group compared to 58% in the second group, for a difference of 19%. But, we saw that this study had low power, and the resulting p value was 0.17, it was not statistically significant and the margin of error was large, and the resulting confidence interval for the difference in proportions was very wide. So the power to detect a difference as large as the sample results of 19% risk difference versus the samples of size 30 and 31, respectively, is only 25%. So this study had a large margin of error and low power.

So perhaps as a clinician, you may find the sample results intriguing. You might want to do a larger study to better quantify the difference in proportions healed. We already showed how to look at that based on the margin of error. Let's look at the root through power. You redesign a new trial using the aforementioned study results to estimate the population characteristics. You might start off with the observed sample results and say, if this were really the truth that was estimated by the study, this risk difference of 19%.

Which corresponds to a relative risk of 1.33. How many people would I need to have in each of the groups to have 80% power, with a rejection level of 0.05? So as this is a randomized trial, to start let's assume equal sample sizes in the two groups. So based on using statistical software, in order to detect a difference of 19%, which is rather large, we'd need 105 people in each sample, for a total of 200 persons. And this actually corresponds to a margin of error on estimate that's still relatively large of plus or minus 12% but because our

finding a significant difference if it really exists. Suppose our funding vantage says, look, this is pretty optimistic and it would be very helpful even if this drug were effective at a lower level for example, if drug A was effective such that it improved healing by an absolute difference of 10% or alternatively presented as a relative risk of 1.15. Increase the individual's chances of being healed by 15%. That would be very notable, so could you rerun the numbers for that? Well, of course, now we're making the minimum detectable difference smaller. It's going to be harder to see, and so the necessary sample size to have the same power of 80% will be larger. If we run the numbers we need 335 people in each group. As opposed to the 135 we needed for the very large difference of 19%. Suppose we thought well even a 5% risk difference would be notable clinically and for treatment purposes. And that would correspond to a relative risk of 1.07 or 7% increase at the individual level but since peptic ulcers are so pervasive. Having this kind of impact would be very helpful from a public health and personal health perspective. Well, if we make our difference even smaller, down to 5%, we're really going to ramp up the number of people we need in this study. We need 1,232 persons in each of the two groups, much larger than what we initially saw when we had such a large minimal detectable difference of 19%. So we could make a table in a grant proposal, we might not make our scenarios this varied from a difference of 19% down to 5% but maybe we do something like 10%, 7%, 5% and show the necessary sample size for each, and may ask for the biggest possible one if we thought that this detectable difference of 5% were still clinically relevant.

Suppose you wanted to design a randomized clinical trial where you had two times as many people on Drug B as compared to Drug A, since Drug A is ostensibly perhaps the new drug being tested. Maybe it's more expensive at this stage in the game, and so you want it to have smaller number of participants in the Drug A group.

But you still want it to power to detect a difference of 19%. Well how would this affect our overall sample size compared to when we add equal sample sizes? Well if we did this, we'd need 80 people in the Drug A group and 160 people in the Drug B group for a total of 240 subjects, which is greater than the 210 we need for equal sample sizes. And if we ran this scenario where we expected to try it on as many people in Drug B compared to Drug A for the other minimal detectable differences would look at 10% and 5%. The total number we need in this scenario with unequal sample sizes would surpass that for the numbers we got with the equal sample sizes assumption.

Now let's just look at an example comparing two incidence rates between two populations. It is going to be a much larger study than we had before. Suppose a randomized trial is being designed to determine if vitamin A supplementation can reduce the risk of breast cancer. And the study will follow women between the ages of 45 and 65 for one year.

And women will be randomized between the vitamin A and placebo group. So what sample sizes are recommended? Well, to get started, we have to get some estimate of the incidence rate in the year, a followup of breast cancer.

In the two groups of interest. So, suppose we want to design this study to have 80% power to detect a 50% relative reduction in the risk or incidence rate of breast cancer with vitamin A compared to placebo. In other words, the study is designed to find an incidence rate ratio of 0.5.

And we want to do this with a significance level of 0.05. So how are we going to get estimates of the incidence rates of interest in the two groups being compared?

Well, perhaps using other studies, on breast cancer, maybe the breast cancer rate in controls can be assumed to be 150 cases per 100,000 women per year. So if that's the case, if that's our starting point and we want to design a study to have find an incidence rate ratio of at least 0.5, then our incidence rate, expect in the Vitamin A group, would be half of that of the incidence rate expected in the placebo group. So we could do this by taking the expected rate in the placebo group of 150 cases per 100,000 women per year, and multiply it by 0.5 to get, under this scenario [SOUND], the expected rate in the Vitamin A group, which is 75 cases per 100,000 women. Per year.

So as this is a randomized trial, to start let's assume equal sample sizes in two groups. If we actually ran the numbers on this using statistical software, we would need 33,974 people in each sample.

The vitamin A sample and the placebo sample for a total of nearly 68,000 persons to have 80% power to detect an incidence rate ratio of 0.5 or smaller. So we would need about 34,000 individuals per group. Well, why so many? Well, the difference between the two hypothesized incidence rates is very small. That 150 per 100,000 women, minus the 75 per 100,000 women anticipated in the vitamin A group is a difference of 75 cases per 100,000 women. Which as a number is 0.00075. So the difference we're looking for is very small numerically. And given these anticipated incidence rates in the two groups, if we did sample 34,000 women for each of the two groups. So sample 68,000 women and randomized them to one of the two groups. We'd only expect in the year of followup to see about 50 cancer cases among the controls and 25 cancer cases among the vitamin A group.

Suppose the Cancer Association came back and said this is a great idea. Vitamin A is easily given out. It's inexpensive. It's not harmful, so we'd actually be very interested in Vitamin A as a propolactive for breast cancer if it had a less of an impact, maybe they only a 20% reduction in relative risk because that would have a huge impact at the population level given so many women in this age group. So a 20% relative reduction would imply that the incidence rate ratio we're trying to detect is on the order of 0.8. Remember that corresponds to a 20% reduction in the vitamin A group compared to the placebo. So again if we start with this starting estimate.

Of, in the placebo group of incidence rate of 150 cases per 100,000 women per year, and we multiply it by 0.8, our desired incidence rate ratio. We'd expect to see an incidence rate of 120 cases per 100,000 women per year in the Vitamin A group.

And again, as this is a randomized trial, let's assume equal sample sizes in the two groups. And based on using statistical software, we would need 241,889 women in each sample, for a total of over 480,000 persons total. So we really need a lot more because the detectable difference here is a lot smaller. And given the large number of women necessary under the scenario of equal sample sizes and given that fact that the treatment of interest can be randomized, there's really no reason to consider other sample size computation scenarios where we'd have unequal sample sizes because that would even make the number of persons even larger than this very large amount we have here.

And so, under this scenario to detect an incidence rate ratio of 0.8, we need about 242,000 women per group. And that's because the underlying incidence rates are very small numerically, and hence the difference is very small numerically and very hard to see. So we need a lot of magnification or power to detect that difference. And even so, with 242,000 women per group, we'd only expect to see about 360 cases in the placebo group, and 290 in the vitamin A group.

Under our assumptions about the incidence rates in each group, so we have to look at a large number of people just to see enough cases to detect a difference.

Sometimes what's done as an alternative approach to studies with short follow up periods and this comes with some difficulties in terms of increasing the likelihood of drop out etcetera. Design a longer study so instead of doing one year followup, maybe we could propose to do five year followup on the women we sample and randomize, in which case our expected incidence rates over the five year periods would be five times what they are per year.

So in this scenario. Where we want to detect an incidence rate ratio of 0.8. If we extended our study period to five years, and we prorated the incidence rates for one year,

And that would make our underlying incidence rates over the five-year period larger in both cases than the detectable difference larger if we make the study longer. And if we did this we'd need about 48,000 women per group but we need to follow them not for one year but for five years. And if we had the anticipated incidence rates in both groups we'd expect to see 290 cases develop among the vitamin a group over the five year period as compared to 360 cases among the placebo in the five year follow up.

So in summary, when designing a study to compare proportions or incidence rates from two or more populations, a researcher must have some estimate of the expected proportion with the outcome or the incidence rate of the outcome in each population being compared. And the sample size necessary to achieve a desired power. To detect a minimal detectable difference in proportions or incidence rates is a function of the difference and the desired power.

And as I said at the beginning of this video, we didn't look at any examples of comparing proportions for incidence rates between three or more populations, but you could extend that example we gave with means where we could look at all possible two population comparisons for desired power and then take the maximum sample size necessary across the comparisons.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.