0:02

Just like biology has a central dogma, statistics, and

in particular statistical inference, has a central dogma as well.

The central dogma is sort of the central idea that

explains what you're trying to do in the field.

And so, the central dogma of statistics has to do with this specific problem.

Suppose you have a huge population, like you see in the top left-hand corner, and

you might want to know something about that population.

In this case, it's an idealized example, so we might want to know how many pink and

how many gray samples are there.

So in general, the problem might be that measuring the whole population, or

taking measurements on the whole population, might be really expensive, or

it might be very hard to do for a number of different reasons.

And so, what we want to do is take advantage of, basically, probability to be

able to say something about the population without measuring the whole population.

So what we do is we take,

use probability to take a small sample from that population.

You may have heard of sort of a randomized sample,

there's a number of different ways you can use probability to get this sample, but

the idea is that you would like it to somehow represent the larger population.

So once you've taken that sample, we can maybe make measurements on the smaller

number of objects that we've collected here.

So we have these symbols in the lower right hand corner,

there's only three of them, so it might be relatively cheap,

or relatively easy to take measurements on it.

So we see that there are two pink symbols and one gray symbol, and

so then what we use is statistical inference

to make a guess about what the population looks like.

So we might say, you know, on average there are going to be more pink symbols

than there are gray symbols in the whole population,

because that's what happened in our sample.

And if we did the sampling right and the probability sampling right,

then that best guess might be pretty good.

Another important component of the central dogma of statistics is that

this best guess isn't quite enough.

So, we took a sample, we didn't measure actually everything in the population,

we only measured a subset.

So, it turns out that the whole, the, our best guess is actually, potentially,

kind of variable.

And so, it could be that the best guess is off in one direction,

we might actually have more gray symbols in the population.

Or, it could be in the other direction,

that it might be more pink symbols in the population.

2:04

So the question is, how do we quantify that variability?

How do we say,

we took this sample, how do we see what's actually going on in the population, and

that's the, sort of, the central idea behind statistical inference.

And it's really important, so knowing the population is maybe one of the most

fundamental ideas in statistics, and it's central to the central dogma.

So, in this same example, suppose we have a population that consists of pink and

gray symbols and we take a sample from that population.

And, then it turns out that between the time that we took that sample and

we actually want to do the inference, the population changes.

So now, all of a sudden, we've introduced some purple symbols.

2:42

Now, if we want to do that same inference,

we end up in trouble, because the sample no longer represents the population.

This is actually a very common problem, and it's a very under sort of, appreciated

problems in statistical inferences, knowing what the population is.

So here's an example of that.

You may have heard about Google Flu Trends.

Google Flu Trends tries to use search terms to predict

flu activity in the United States and in other places.

And it got a lot of press because it's sort of a cool,

and a very inef, a very efficient way of trying to predict the flu,

you just need the search terms and you can create the prediction.

But it turns out that Google Flu Trends, despite being very good when it very,

was first released,

ended up being pretty bad at predicting when flu outbreaks would occur.

And the reason why was that the population changed, the way people searched for

symptoms of the flu changed over time.

And so, that was one of the major reasons why the prediction algorithm

they originally developed no longer worked, because the population changed.

So the central idea of statistical inference, and

the central dogma of statistics is, we have a population,

we want to take a smaller sample from that population using probability, and

then use statistical inference to say something about the population, and

in particular, the variability of our estimate for that population.