We thus need the within strata variances in order to pull this off,

in order to be able to do this.

So let's assume that we've done simple random sampling within each of the strata.

And we go back to our display, and we've done the calculation.

We've taken the 23 sample cases for the assistant professors, and

we've computed an s sub h squared 125.

We did the same thing for the 15 sample associate professors, 250.

And for the full professors, that 42 on the sample,

their average square deviations by that formula, the estimate square is 500.

Notice that it increases as the salary level increases,

the y bars of h, so do those variances.

And there is oftentimes a relationship between

the sampling variance and the mean.

Actually, it's better to do the sampling standard deviation,

the square root of those variances and the means.

There is a relationship that does show up in the real world in that particular case.

So, this is not an unusual case.

So, here what we need to do is combine everything and

it's a little bit beyond the scope of what we are doing.

But I'm going to go through this anyway.

We need to know a W sub h squared, well we know W sub h,

the W sub is 0.2875 or 0.1875.

Whatever it happens to be.

We also need to know one minus the sampling fractions of each of the strata.

That's 0.8, one minus 0.2 in each of them.

We need the s sub h squared and we need the sample sizes.

So, if we put all of this together, I know this is kind of busy.

But I think you know what the elements are now.

The logic, what I am really concerned with is that you understand the logic of this

kind of thing.

That it flows from how the sample was selected.

For stratum one,

the variants are estimated sampling variance of the mean now.

Which turns out in the end to be 3.453.

Has three components, one from each of the strata.

Where for each strata there is a W sub h squared.

A one minus sub of h squared and a sample size, in the denominator.

And there you've got the expressions for each of them, all added together,

all combined to give us the overall sampling variance, 3.453.

We'll take the square root of that to get a standard error.

That standard error, we're going to use a confidence interval.

Let's recognize that what we've just done are sort of step 6, a and b.

A was to compute the element variances within each of the strata, and b was to

combine them with the W sub h squared factor, adding them up across the strata.

All right, there's one more step to this, right?

There's a seventh step about the confidence interval.

This is our way of expressing our uncertainty about estimates.

Taking into account both the mean and that standard error, and

the distributional properties of that mean.

In this case, that mean is going to be normally distributed and

we're going to use that in forming a confidence interval.

So here, our last step 7, the confidence interval,

we're going to do by the same process we did before.

We're going to use the mean plus or minus a margin of error.

But that margin of error is driven by two factors as you recall.

The t-value, we're going to use the t-value here.

Where we're going to cut up the number random events and subtract one, well,

in this case we're going to subtract one in each of the strata, and

then get that t value and use it times the standard error to form that margin of

error before forming the confidence interval.

Now, degrees of freedom.

How many random events are there in our sample?

There are 80.

But, there are 23 in stratum one, and in that particular stratum,

we also computed a mean.

And that mean alters the randomization that's going on there in terms of how many

degrees of freedom we have, how many random events we have.

So, we actually have n sub h 23 minus 1 degrees of freedom from stratum one.

n sub 2 minus 1.

Let's see, that was 15 minus 1, or 14 from stratum 2.

And n sub 3, that was our 42.

And that was 42- 1 or 41 degrees of freedom.

Overall, adding those up, we have n- H.

n, 80- H, 3, or 77 degrees of freedom.

So, we will look up the t value which we've done here which happens to be 1.991.

Not 1.96, that's why we're using the t.

It's a little bit larger because we have some uncertainty about the quality of

each of the S sub h squared.

What we are doing is counting up the stability factors for each for the S sub h

squared and adding them together and using that to pick out the T value.

So, the 95% confidence interval take that t value, times the standard error,

and adds and subtracts it to the mean, and you can see the final result.

Our 95% confidence interval goes from 71.05 to 78.45.

All right, well, that's it.

No, not quite, because we're wondering as in cluster sampling,

how does this relate to simple random sampling?

So let's wrap it up here by talking about design effects and effective sample size.

When we're talking about sampling variances,

what is that variance that we got, that 3.458, whatever it was.

And how does it relate to what we would've had for

a simple random sample of the same size, ie, what's the design effect, or deff.

For a simple random sample then,

the denominator is what we're lacking right now.

We're going to take that same data and

treat it as though it's a simple random sample.

It's not, it's 80 cases drawn from a random stratified sample.

We're going to treat it incorrectly.

To calculate and estimate the variance of the mean.

And from a separate calculation, we have to calculate s squared.

We know f, we know the sampling fraction, but we don't know the s squared, we're

going to take the 80 values and compute a s squared and here you see it's 647.8.

And so now what we've got for a sample of size 80 from a population of 400.

That's a sampling fraction of 0.2.

Our sampling variance of the mean under sample line of sampling is 6.478.

That's what we're going to then compare.

It's going to end up being in the denominator of our comparison

where we're taking these actual sampling variance 3.453.

I'm sorry I misspoke that before, divided by 6.478.

Now, this is the opposite of cluster sampling.

We now a design effect less than one.

Here, with proportionate allocation and

the particular circumstances we have in terms of the differences of the means

across the groups, we have a reduction in variance,

by the simple expedient of using auxiliary information in our sample design.

As a matter of fact, it's a pretty big one.

It's a 47% reduction in sampling variance.