0:01

In this video, we introduce the chi-square goodness of fit test, where we evaluate

the distribution of one categorical variable that has more than two levels.

And by evaluate, we mean we are going to be comparing the distribution of that

categorical variable to a hypothetical distribution.

Let's give an example.

In a county where jury selection is supposed to be random, a civil rights

group sues the county, claiming racial disparities in jury selection.

Distribution of ethnicities of people in the county who are eligible for

jury duty based on census results are given in this table.

So we can see that in this population we have 80.29% whites 12.06% blacks,

0.79% Native Americans, 2.92% Asians and

Pacific islanders, and 3.94% other ethnicities.

We are also given the distribution of 2500 people who were selected for

jury duty in the previous year.

And we can see that of the 2,500 people, 1,920 were white,

347 were black, 19 were native America, 18 were Asian and

Pacific Islander, and 130 were categorized as other race or ethnicity.

The court retains you as an independent expert to assess

the statistical evidence that there was discrimination.

You propose to formulate this issue as a hypothesis test.

In this case,

your null hypothesis remember, always says there's nothing going on.

People selected for

jury duty are a simple random sample from the population of potential jurors.

The observed counts of jurors from various race ethnicities

follow the same ethnicity distribution in the population.

Your alternative hypothesis says there is indeed, something going on.

In this case, your hypothesizing that people selected for

jury duty are not a simple random sample from the population of potential jurors.

The observed counts of jurors from various ethnicities,

do not follow the same race ethnicity distribution in the population.

So when we're evaluating the distribution of one categorical variable,

against this hypothetical distribution,

that's the true distribution of potential jurors in our population.

Our null hypothesis says that the observed

data follow the hypothesized distribution, and

the alternative says the observed data do not follow the hypothesized distribution.

2:39

So how do we evaluate these hypotheses?

We want to quantify how different the observed counts

are from the expected counts.

And if the observed counts are very different from the expected counts,

in other words the deviations are large from what would be expected

based on sampling variation or simply chance alone.

Would provide strong evidence for the alternative hypothesis.

This is called a goodness of fit test since we're evaluating how

well the observed data fit the expected distribution.

If the jury selection is random, then we would expect the observed

count to follow the percentage distribution in the population.

Meaning that we would expect for example, 80.29% of the 2500 people to be white,

3:31

that means we would expect about 2007

white jurors to be selected if in fact the jury selection is random.

Now, this doesn't mean that, that's exactly what is meant to happen.

If the jury selection is random,

we would of course expect some sampling variation or chance around this.

But what we're going to be evaluating at the end of the day are,

are the observed counts so

different from the expected counts that we might suspect something is going on here.

Or are they only slightly off from the expected counts, and

we maybe wouldn't expect something to be going on here?

Similarly, for the black jurors, we would expect 12.06%

of the 2,500 people to be black, so that gives us 302.

And we can actually go through the entire list and calculate the expected number of

Native Americans that would be 2500 times 0.79% so only about 20.

And that would be 73 for the Asian and Pacific Islanders, and

finally, 98 for other races.

Once you do this expected calculations you should always check to see that

the counts that you calculated actually add up to your total sample size.

So especially if you need to do a little bit of rounding to get these count right,

you want to make sure that your counts at the end of the day add up to a total

sample size, and in this case they in fact do.

5:03

We are starting to introduce a new technique here so

let's go through the conditions required for this technique.

The first one is independence.

We want to make sure that our sampled observations are independent of

each other, for which we need a random sampling or random assignment.

And if we're sampling without replacement,

we want our sample size to be less than 10% of the population.

And we also want to make sure that each case only contributes to

one cell in the table.

So we don't want to for example, identify a potential juror as both white and black.

That is a possibility, but for the purposes of the chi-square test,

we want to make sure each case can only go to one cell in the table.

This is also another way of thinking about independence

because if our cases showed up in multiple cells in the table,

then the observations wouldn't exactly be independent of each other.

6:07

Let's remind ourselves quickly the anatomy of a test statistic.

The general form of a test statistic is a point estimate minus a null value

divided by the standard error of that point estimate.

Earlier we calculated the expected counts for our table, and

it appears that we indeed do have at least five expected cases.

We also have no reason to believe

that these observations are not independent of each other.

So it appears that we have met the conditions for this hypothesis test.

What we need to do next is to develop a new test statistic for count data.

But let's take a look back at what we've been working with so far.

The general form of a test statistic is that it's a point estimate

minus a null value divided by standard error of the point estimate.

There are two things a test statistic tries to accomplish.

One, it identifies the difference between a point estimate and and

expected value assuming that the null hypothesis were true.

Two, it standardizes that difference using the standard error of the point estimate.

So these two are going to be useful when we start

thinking about developing a new test statistic for camp data.

7:21

That is called the chi-square statistic.

When dealing with counts and investigating how far the observed counts are from

the expected counts, we use this new test statistic called the chi-square statistic.

It's calculated as the observed minus the expected for each cell squared

divided by the expected counts and we want to sum this over all of the cells.

Remember, when we mean cell we're basically referring

to levels of the categorical variable.

And we're introducing a new term here, not to confuse you but

because when we get to the other chi-square test that we're going to talk

about when we have more than one categorical variable we're dealing with,

it's going to help to make the distinction between a level and a cell.

8:12

We saw in the formula that we squared the differences between the observed and

expected count and also namely this is a chi-square test statistic so

it is obvious that we're doing some squaring here, but why do we do that?

One, we want to make sure that our standardized differences are positive,

because otherwise, if you add some positives and negatives to each other,

they're going to cancel each other out.

But another way of getting rid of negative signs would have been to use an absolute

value sign, and it seems like that's not what we're doing here,

because by squaring, we accomplish one more thing.

We accomplish that highly unusual differences between the observed and

expected counts appear even more unusual.

8:55

In order to determine if the calculated chi-squares statistic is considered

unusually high or not, we need to first describe its distribution.

The chi-squared distribution has only one parameter, the degrees of freedom.

It influences the shape, the center and the spread of the chi-square distribution.

And for goodness of fit test, the degrees of freedom can be calculated as k-1 or

k stands for the number of cells.

9:22

Here we can see a bunch of chi-square distributions starting from the blue

thick line that has only 2 degrees of freedom,

going up to the pink dotted line that has 9 degrees of freedom.

Take a look over here and think about how the shape

of the chi-square distribution changes as well as the center and

the spread as the degrees of freedom increases.

9:52

Our null hypothesis was that the observed counts of jurors from various

race ethnicities follow the same ethnicity distribution in the population.

Our alternative hypothesis was that they do not

follow the same race ethnicity distribution in the population.

To calculate the chi-square statistic,

we're going to need those individual components.

So for the chi-square statistic for this hypothesis test, for the first cell white,

we take our observed count and subtract from that the expected and square that and

then we divide that by the expected count then we add to that the quantity,

the same quantity for the blacks.

So that's going to be 3407- 302 squared

divided by 302 again, the expected count.

And we can go through this for each one of the other cells as well.

10:54

The chi-square statistic then comes out to be 22.63.

In order to find our p-value, we also need to know something about the distribution

of the chi-square statistic, and for that we need our degrees of freedom.

We have five levels here.

White, black, Native American, Asian and Pacific Islander, and other.

So 5- 1 = 4 degrees of freedom for this test statistic.

Then the only thing left is the calculation of the p-value.

The p-value for

a chi-square test is defined as the tail area above the calculated test statistic.

Because the test statistic is always positive, and

a higher test statistic means a higher deviation from the null hypothesis.

So just like F tests, within that chi-square tests as well,

the p-value is always defined as the tail area.

That's above the observed test statistics.

So our chi-square distribution looks something like this.

It's right skewed and remember that it's a squared value, so

it always needs to be a positive number and we shade the tail area beyond the,

observed chi-square statistic that we calculated.

Well how do we find that tail areal?

One option would be to use R.

For that we can use the p chi-square function where

we feed in our observed chi-square statistic, the degrees of freedom, and

I've also specified that we don't want the lower tail.

Because as we just showed in a chi-squared test we always want the upper tail and

that p-value comes out to be a pretty small p-value 0.0002.

Another possibility is to use the applet.

12:39

So in the applet we first pick the chi-squared distribution

then we select our degrees of freedom, which was 4.

And then we want to make sure we're getting the upper tail, and

we're going to look for the tail area that's beyond 22.63.

It appears that in this particular applet, the maximum we're getting is 14.9.

But even the tail area for

the chi squared distribution with four degrees of freedom for

any area that is beyond 14.9 is roughly 0.5%.

Remember that our test statistic was much larger than 14.9, meaning that the tail

area that's going to be left beyond that is going to be much smaller than 5%.

13:26

And lastly, we can also use a table to find this p-value.

So a chi-square table looks something like this.

It works a lot like the t tables where instead of probabilities incited,

we actually have some critical values.

First, we want to locate the row that's associated with our degrees of freedom, so

that's 4.

And then within this row,

we want to locate our observed chi-squares statistic, which was 22.63.

It seems like that value would be off the table,

and it's going to be on the right side of the table.

Meaning that our p-value, so the tail area that's going to be left beyond it

is going to be over going in the same direction as well.

We can see that as we move to the right on this table,

the p-values are getting smaller and smaller.

Therefore, we can say that based on this table,

our p-value is a number that's less than 0.001.

Which seems to agree with the exact key value operation from R.

With such a small p-value, we would reject the null hypothesis,

which in this context means that the data provide convincing evidence that

the observed distribution of the counts of race ethnicities of jurors does

not follow the distribution in the population.