Hello, everyone, and welcome back.
In this next set of lectures,
we're going to concern ourselves with something that I've been alluding
to several times in previous lecture sets.
Instead of focusing on the variability of individual values in any single sample now,
what we can go start to build is the idea of
understanding variation in a sample-based estimate,
like a sample mean or sample proportion or
sample incidence rate across theoretically multiple random samples of the same size.
Well, in research, we only get to observe one single sample from a population of
interests in order to understand
the potential uncertainty in the estimate we get from that sample,
like a sample mean or proportion.
It might help to understand what the variation of this estimate would be
around other samples we could have gotten just by random chance from the same population.
So, we're going to develop and define the idea of something
called the sampling distribution of a sample statistic,
like the sampling distribution of a sample mean
or the sampling distribution of a sample proportion.
Then we'll demonstrate with some computer simulations empirically some examples of these,
and we'll show that how the characteristics of the sampling distribution depend on
the size of the sample the statistics we're computing over and over again are based on.
We'll see some consistent results empirically when we do this,
whether we do look at the behavior of means of
continuous data across multiple samples of the same size,
means or proportions for binary data or incidence rates for time to event data.
Then, we'll take what we've seen empirically demonstrated,
sort of anecdotally and empirically,
that we've demonstrated empirically.
We'll take that and talk about a mathematical result that tells us, "Well,
I could've told you what you would have gotten
before you did those simulations because what
you saw repeatedly happening empirically is a characteristic of sampling distributions."
Then, we'll start to think about, well,
how can we use the result of this mathematical piece called
the central limit theorem to help us build and
quantify a sampling distribution
that looks at the characteristics of taking
an infinite number of random samples of the same size,
and looking at the distribution of the sample summary statistics,
like the means across the samples or the proportions.
How can we characterize that when we only have the information from one sample?
So, now, I'd hope I've created a little suspense that would
get you interested in proceeding forward with this set of lectures.