Let's consider just one more pass through this, just to fix these ideas in our mind.
Let's consider a simpler example to see how allocation
can affect sampling variance, one more time before we wrap up.
And go on to our last lecture.
In particular, suppose that we had this distribution.
I'm going to go back to Qatar.
I know I've introduced Qatar before.
But it has a very interesting set of properties.
In its population distribution,
there is a very large share of the population that are expatriate.
They are not native to the population.
There are white and blue collar expatriates there who are working.
And let's just suppose,
this is a little bit of a departure from the actual distribution.
Suppose there are a million people in Qatar.
Actually there are more than 2 million there.
But a million people and 20% of them are native Qataris, the other 80%,
800,000 of them are white or blue collar expatriates.
And we happen to see distributions we know from past data for
a characteristic we're measuring, say income.
That we've got very different variances between the strata.
You'll notice it in row two there.
The S squared, the overall S squared, and then the S1 squared, S2 squared for
each of the two strata are very different.
And the means are very different.
Here's a good case for doing stratified sampling.
We know the means are different.
A proportionate allocation ought to get us gains in precision.
So will other allocations, potentially.
So why don't we do the following exercise?
Rather than trying to come up with a minimum variance allocation by some
formula, which exists.
Suppose we just start with our proportionate allocation.
We're going to do a sample of 1,200.
1,200 from the 1 million that are there.
And we're going to do it proportionately.
20% of our sample of 1,200 should come from stratum 1.
20% of 1,200 is 240.
And the remainder from stratum 2.
What will the variance of the mean be?
Well, there's our formula that we have for the variance of the mean.
Where we take the square of the fraction of the population in the group.
Times 1 minus the sampling fraction divided
by the sample size nh times the sh squared.
And we have those elements and at least approximately in this case because
the sampling fractions are fairly small.
1200 from a million is so small.
We can just round it.
We're not going to worry about it in this particular case.
It just complicates the calculation.
But here let's just take the essence of it.
Wh squared, sh squared divided by nh for each of the two strata.
W1 squared S1 squared over n1, W2 squared S2 squared over n2 for stratum two.
And put in the numbers and do the calculation, and
we see we get a variance of the mean, in this particular case, of about 1300, 1333.