0:50

We can view the study results in a contingency table.

One of the variables is the obesity status, and that only has two levels,

obese or not obese, and the other variable is the relationship status, and

that has three levels, dating, co-habiting, and married.

Using these results, we want to answer the question,

does there appear to be a relationship between weight and relationship status?

1:15

As usual, we set two hypotheses.

The first one is our null hypothesis that says that there's nothing going on.

In this case that means that weight and relationship status are independent and

obesity rates do not vary by relationship status.

The alternative hypothesis, as usual, says that there's something going on,

and in this case, that means weight and relationship status are dependent and

obesity rates do vary by relationship status.

2:19

The mechanics of the chi square test of independence is

very similar to the chi square goodness of fit test,

in fact we calculate the chi square test statistic in an exactly the same way.

For each cell we look at the observed minus the expected square,

divide by the expected counts and we add this over for each of the cells.

What is different however, is how we calculate the degrees of freedom.

Remember with the chi square goodness of fit test,

the degrees of freedom was simply k- 1, k being the number of cells.

In this case because we have a two-way table,

we need to consider the number of levels for both of the categorical variables.

So the degrees of freedom is calculated as the number of rows minus one

times number of columns minus one.

We're denoting that as r minus one times c minus one here.

The conditions are exactly the same between the chi-square test of

independence and the chi-square goodness of fit test.

The first condition is independence,

where we think about sampled observations having to be independent of each other.

And remember that we can ensure this by random sampling or

assignment, depending on the type of study we're working with.

And if sampling is happening without replacement,

we want to make sure that our sample size is less than 10% of our population.

And we also want to make sure that each case only contributes to

one cell in the table.

The second condition, as usual, is about sample size.

We want to make sure that each particular scenario or

cell has at least five expected counts.

3:55

So we established that the chi squared statistic is calculated similarly for

the chi square goodness of fit test and the chi square independence test but

before we can get to that calculation we need to define

how to calculate the expected counts for a two way table.

It's slightly different than when we only work with a one way table and

only one categorical variable for the chi-square goodness of fit test.

This is what our data looked like.

The first question we want to answer is what is the overall obesity rate in

the sample?

And to calculate that, we simply need to take everyone who's obese in the sample

and divide by the overall sample size.

So that's 331 divided by 1293.

That gives us an overall obesity rate of 25.6%.

In the second question, we ask if in fact weight and

relationship status are independent, in other words, if in fact the null

hypothesis is true, how many of the dating people would we expect to be obese?

Likewise, how many of the cohabiting and

married people would we expect to be obese?

5:08

If we're assuming the the null hypothesis is true.

That means that we're assuming that the rate of obesity does not vary

by relationship status.

So the overall rate of obesity that we calculated,

the 25.6% should apply to each one of the relationship statuses.

In this case, to calculate the number of people who are dating and

expected to be obese, we simply take that overall number of people who are dating,

440, and multiply it by the overall obesity rate,

0.256, and that yields roughly 113.

We can see that the observed number of people who are dating and obese,

81, is actually much lower than what's expected under the assumption that

the null hypothesis is true.

7:45

We want to use these data to test the hypothesis that relationship status and

obesity are associated at the 5% significance level.

To do so, we need a chi-square statistic.

And remember that for each cell,

we take the observed minus the expected squared divided by the expected.

Similarly, for cohabiting people,

103- 110 squared divided 110 for married people and

obese, that's a 147- 108 squared divided by 108.

We can go through this same calculations for each one of the cells in our table.

You can see the value of computation here as the table gets larger,

the calculations by hand become more and more tedious, and more and

more tedious always means more error-prone.

The chi-square statistic here comes out to be 31.68.

We also needed degrees of freedom to calculate the p-value associated

with this hypothesis test.

And remember that in a chi-square test of independence, the degrees

of freedom is number of rows minus one times number of columns minus one.

So that's (2 -1) x (3- 1), that's a degrees of freedom of two.

We can calculate the p value using r and the function we're going to use is p chi

square, and remember that takes the inputs of the observed chi square statistic,

the degrees of freedom, and we also usually specify whether we want

the tail area below the observed or above the observed.

And for chi-square test we always want the tail area above the observed

chi-square statistic.

So that comes out to be p chi-square of 31.68,

two further degrees of freedom and we don't want the lower tail.

And that's a pretty small p value we have there.

With a small p value, we reject the null hypothesis in favor of the alternative.

Which means that these data provide convincing evidence

that relationship status and obesity are associated.

So based on the significance p value,

can we conclude from these data that living with someone is making some

people obese, and marrying someone is making people even more obese.

The answer is no, we definitely cannot.

Remember that this is an observational study, so

what we're, we could be seeing the effect of here could also be age.

People tend to date when they're younger, then they start to live together,

and then at some point they get married.

It is possible that there is a causal relationship between obesity status and

relationship status but the type of analysis that we conducted here is

simply not sufficient to deduce a causal relationship.

And we always want to consider in these cases the effect of possible confounders

like age or other life factors that one might think about that change along with

the different life periods where people tend to be dating co-habitating and

married with each other.

To recap, we saw two types of chi-square tests.

Chi-square test of independence and chi-square test of goodness of fit.

In a chi-square test of goodness of fit, we compared the distribution of one

categorical variable with more than two levels to a hypothesized distribution.

In a chi-square test of independence we evaluate the relationship between

two categorical variables one of which at least has more than two levels,