Hi, my name is Brian Caffo. I'm in the Department of Biostatistics at

The Johns Hopkins Bloomberg School of Public Health, and this is Mathematical

Biostatistics Bootcamp--lecture eight, on asymptotics.

In this lecture, we're going to take a trip to Asymptopia.

Asymptopia is a land where we have an infinite amount of data, so it should be

fun. We're going to talk about limits, but

limits of random variables. And so there's intricacies that you have

to account for when you consider random variables instead of standard mathematical

limits. And it's a quite difficult subject, but

we're going to show that we have basically two tools, the Law of Large Numbers and

the Central Limit Theorem, that are going to be our primary methods for looking at

random variables. So let me just review numerical limits

first. And I'm not gonna go into too much detail,

and I'm gonna kind of treat it here as heuristically.

Just to illustrate, in case you asymptotic's a little bit rusty.

Suppose I had a sequence, where A1 was .9, A2 was .99, and A3 was.999.

And I, and I hope you can get the pattern at this point.

So clearly, this sequence in some sense of the word converges to one.

It converges to .9999999, that gets closer and closer to one.

As, the element of the sequence, gets larger and larger.

Well we can formalize this. The formal definition of a limit is for

any fixed distance we can find a point in the sequence.

So if the sequence is closer to the limit than the distance from that point on.

So take, for example. This sequence.

The distance between AN and one is ten to the - N where N is the point in the

sequence. And so, if we pick any distance, say,

epsilon, we can find an n so that ten to the minus n is smaller than epsilon.

And then, because ten to the minus n just keeps getting smaller as n get larger,

then it's smaller than epsilon from that point onward.

So that satisfies our definition of limit. So clearly, this converges to one.

And it's kind of an interesting fact that and infinite sequence of .9s and one is

the same number. But that's kind of an interesting fact

that if you ever take a class on real analysis though, they'll discuss that sort

of thing at length. But anyway I hope you get this basic sense

of a limit as'n' goes to infinity the sequence converges it looks more and more

like it's limit so that the distance just gets closer and closer and closer and

never gets bigger again. The problem with that is that, that only

works for just a series of numbers, right? Now we wanna talk about say, limits of

averages of coin flips. And then it gets much harder.

So take, for example, the average. And now we're gonna talk about.

In average comprised of n observations. So let's say x n bar and now instead of

saying x bar like we typically do which is our average, we're going to annotate it by

a subscript n to show that it's an average of the first n of a collection of IID

observations. So for example x bar could be the average

of the result of n coin flips which is the sample proportion of heads.

Well, there is a limit theorem for averages, and we would say that xn bar

converges in probability to a limit. And we, relate this back to the ordinary

definition of limit, by saying, well, the probability that x is closer, than any

specific distance converges to one. So in this case, probability xn bar minus

the limit being less than, any quantity epsilon that you fix.

Then that probability is a number pn. Right?

And the definition of convergence in probability is that sequence of numbers,

pn, converges to one, right. So we've converted the problem of what

does it mean for a random variable to converge, we've converted that back to the

definition of convergence of a series of numbers.

We've said convergence and probability. Implies convergence of the collection of

probability numbers in the standard sense of the definition of convergence.

So now we have a way of talking about how random variables converge.

So, establishing that a random sequence of variables actually converges is hard, as

you can imagine, it's hard. If you look back at the previous

definition, it's not the easiest thing in the world to think about.

So we have something that makes it a lot easier for us, and that's called the law

of large numbers. So, if you've heard people talk about the

law of averages, typically they don't know what they're talking about.

But probably they are referring to the law of large numbers.

And the law of averages and there is the law of large numbers.

Basically says that if x and x on a IID from a population from mean-mu and various

sigma squares so in this case we are going to assume that the random variables have a

variance. Its interesting to note that there are

distributions where there is no variance. You try to calculate the variance and you

get infinity or something like that. In this case we're gonna assume that there

is a variance and it's finite. Then the sample average of IID

observations always converges in probability to mew.

And that's the, called the law of large numbers.

Probabilists make a lot of distinction over various kinds of the laws of large

numbers. And they've worked very hard to get kind

of minimal assumptions for the law of large numbers to work.

And in fact, we're using a very lazy version of the law of large numbers.

They would probably upset at us for teaching this one, but it's okay, we don't

care. The basic idea I want you to get is that,

averages converge to mu. Averages of IID Observations converge to

mu, the population mean, that, from which the observations were drawn.

This is a good thing, right? This basically says, if we go to the

trouble of collecting an infinite amount of data.

Then we get the number that we wanna estimate mu, exactly.

Right? Which is good, because collecting an

infinite amount of data takes a lot of time.

Actually, infinite amount of time. If you're willing to make this many

assumptions, the Xs all have a finite variance.

It's pretty easy to prove the law of large numbers using Chebyshev's inequality

which, Chebyshev's inequality if you remember had a pretty simple little proof

so this kind of very complicated idea, it's amazing that it has a fairly simple

little proof. So remember that Chebyshev's inequality

states that the probability that a random variable is more than k standard

deviations from the mean is less than one over k squared.

So therefore the probability that Xn bar minus mu, in absolute value, is bigger

than or equal to k standard deviation of x bar sub n is less than or equal to one

over k squared. Now, let's pick an epsilon.

Pick any distance epsilon. Because remember to establish the

convergence of a limit of numbers we have to pick an epsilon.

And now to establish convergence and probability, we have to show that the

probability that xN bar minus mu, being bigger than epsilon, goes to zero or being

less than epsilon goes to one, those two statements are equivalent.

And so let's let K, from our previous definition, be epsilon divided by the

standard deviation of Xn bar. K is not a random variable, right?

Epsilon is a number that we pick. And standard deviation of Xn bar, Xn bar

is a random variable, but the standard deviation of it is sigma over square root

N. Okay?

So this is just a number, there's nothing random in our definition of K.

So if you plug that back in for K, right? You get the probability that Xn bar, minus

mu, being bigger than epsilon, is less than or equal to, the standard deviation

of Xn bar squared, which is the variance, of Xn bar, divided by epsilon squared.

And we, from a previous lecture, already calculated what the variance of Xn bar is.

It's sigma squared over n. So this probability is, less than or equal

to, sigma squared over n, epsilon squared. Now as n goes to infinity, sigma squared

isn't changing and epsilon squared isn't changing and the n's in the denominator,

so this whole thing goes to zero, okay? So the probability that the random average

xm bar is more than epsilon away from the mean goes to zero as n goes to infinity or

we stated the probability that xm bar is less than.

Epsilon from the mean goes to one as N goes to infinity.

So either of those statements equivalently say that Xn bar converges in probability

to mu. I think it's kind of staggering that it's

really basically two lines is all you need to establish this fairly complicated

result. Now on the next, page, I just have a,

simple example where I, simulated random normal with a mean of zero, and I show the

cumulative sample mean. So I took one random normal, and then I

took that random normal and generated a second random normal and then averaged it.

And then generated a third random normal and just averaged it with a remainder.

And the iteration at the bottom is the number of observations that goes into that

mean, right? And then on the vertical axis it shows the

value of the average. And you can see if, first, there's quite a

bit of variability, right? Remember, the variance of the average is

sigma squared over n. The variability is going to zero and this

dash line is the asymptote. Right?

And as you can see this average, as we include more observations in it, is going

to converge to this. You can see it already converging a little

bit by 100 iterations to this dashed line. That's simply the law of large numbers.

So let's cover some useful facts about the law of large numbers.

One Interesting fact is. Functions of convergent random sequences

converge to the function evaluated at the limit.

So. This includes sums, products, differences.

So, for example Xn bar, squared, converges to mu squared, right?

Because, x bar converges to mu, and so this is just a function of x bar.

So Xn bar squared converges to mu squared. Something different is that average of the

squared observations converges to a different entity okay, right?

Let's go through this a little bit carefully, just because it's kind of an

odd little point. So Xn barred squared converges to mu

squared, but if we sum up Xi squared the individual observation squared and divided

by n that no longer converges to mu squared.

Well why not? Well it's the difference between the

square of the average and the average of the squares.

So, in this case it's the average of the squares.

In this case each xi squared is a random variable so we could just call it y

instead of Xi squared. And then their average of these Xi

squareds or average of these Ys is gonna converge to the population mean of those

Ys. Well we can calculate that.

We know what the expected value of Xi squared is because we can use the shortcut

formula for the variance. Which, recall, was expected value of Xi

squared minus expected value of Xi quantity squared.

We can just work that formula to solve for expected value of Xi squared.

And show that that's equal to sigma squared plus mu squared.

So, the average of the squared observations converges to sigma squared

plus mu squared, whereas the square of the average converges to mu squared.

So it's kind of an interesting little point, but just remember that those things

are different. And by the way, this little fact that we

just sh-, showed, we can use this to prove that the sample variance converges to

sigma squared, and we'll do that on the next slide.

So let's actually go through this proof that the sample average converges.

To sigma squared. And I think you'll see in the process of

the proof that it doesn't matter whether we divide by n or n - one, it's going to

converge to sigma squared. So here we have the definition of the

sample average, summation xi - xmbar squared, all divided by n - one, we're

going to use the unbiased estimate of the sample variance.

Well, recall there was a shortcut formula for the sample variance and it worked out

to then be the numerator had a shortcut formula that was summation Xi squared - n

xbar squared. And so we're going to use that formula.

And then we get summation Xi squared over N minus one minus N barred squared over N

minus one. And let's just rearrange terms, and

multiply and divide by some Ns because that N minus one is a little annoying.

So we have n over n - one times summation xi squared over n - n over n - one Xn bar

square. Let's look at each of these things in turn

and remember, from the previous slide I told you, I didn't prove this, and just

have to take it as true, is that, you know, if you multiply convergent sequences

they converge to the product of the limits.

If you add and subtract convergent sequences they converge to the difference

of the limits and so on. So let's look at each of these terms one

at a time. N over n minus one, clearly that converges

to one. You don't believe me, plug n over n minus

1n for very big value of n in your computer and you'll see that it gets

closer and closer to one. Probably the easiest way to see this is

it's one over, one minus one over n and that one over n clearly goes to zero.

Okay. We just on the previous slide talked about

how, summation Xi squared over n converges to sigma squared plus mu squared.

So, the second term converges to sigma squared plus mu squared.

And we have minus n over n minus one, which again converges to one.

And then Xn bar squared converges to mu squared.

We talked about that on the previous slide.

So we have this expression right here, sigma squared plus mu squared minus mu

squared, which is just sigma squared. So that proves that the sample variance

converges to sigma squared, and then of course, the sample variance if we happen

to divide by the bias sample variance, if we happen to divide by N instead of N

minus one, also converges to sigma squared.

And then we can square root the sample variance and get the sample standard

deviation and see that it converges to sigma as well, just by the rule from the

previous page where we said that functions, in this case the square root

function, of convergent random variables converge to the function of the limit.

So what we've found is, we have our law of large numbers, and with a couple of rules

that we just stipulated, we've got that the sample mean of IID random variables

converges to the population mean that it's trying to estimate.

The sample variance converges to the population variance that it's trying to

estimate. The sample standard deviation converges to

the population standard deviation that it's trying to estimate.

And in all these cases you see the pattern that the sample entity converges to the

population quantity that it's trying to estimate.

Basically saying, that if you go to the trouble of collecting an infinite amount

of data. Then you actually get the value you that

want to estimate. You don't get it with noise, you get the

actual value. We give this name a property and we say

that an estimator is consistent if it converges to what you want to estimate.

And the Law of Large Numbers is basically saying that the sample mean is consistent,

and then we know, now know that the sample variance and the sample standard deviation

are consistent, and it doesn't matter whether you're dividing by n or n - one,

they're all consistent. But also remember the sample mean and the

sample variance are unbiased as well. And by the way the sample standard

deviation is not unbiased. Consistency by the way is a very weak

property so the sample standard deviation is biased, unlike the sample mean and the

sample variance. Consistency is a very weak property.

Saying that an estimator is consistent is not even really a necessary property.

It, it seems like it should be necessary, but if, if something converges to mu plus

epsilon where epsilon is a miniscule number that is of no importance, then that

estimator is not consistent. So it's fair to say that consistency is

sort of a, kind of a weakly necessary but definitely not sufficient property for an

estimator to be useful. We have also seen that being unbiased is

neither necessary nor sufficient for an estimator to be useful either.

For example, we've talked about the biased variance trade off that estimators can be

slightly biased and you can want that, because you improve on the variance.

So, what we're winding up with is a collection of properties that describe

estimators. And you really need to think about the

collection of properties as a whole to evaluate an estimator.

And these various mathematical concepts are, are useful but they never, in

isolation, tell the full story on the utility of a estimator.

They might be useful for eliminating really dumb things, if something's

definitely not consistent in a way that it doesn't converge anywhere near the

estimate. Probably that's not something that you

wanna use. But apart from those kind of stark

circumstances, these properties you need to take as a collection to try and decide

which estimators are the right ones to use.

So let me give an example of an estimator that's consistent but not very good.

So take the data, and, only take the first half of the collected observation.

So we have, instead of Xn bar, we have Xn over two bar, right?

That estimator is, of course, consistent, as n goes to infinity.

You know, it just has, fewer observations than if you took all of them.

But still the n is going to infinity in this case.

It's just n over two. So that estimator's consistent, but it's

got an obvious better estimator right in front of you.

Basically, the estimator using all of the data.

So there's a, a particular example of an estimator that's not consistent where

there's a better estimator that comes to mind.

Here you have to actually account for the fact that the estimate with all of the

data has a lower variance than the estimate with half of the data.

So that's enough discussion of limits and the law of large numbers.

Next we're gonna go on to the central limit theorem.

A very important theorem in statistics.