In this video, I'm going to show you the concept of confidence intervals.

So we have over 26,000 data points for New York for

over 25 years of data that we have for average daily temperatures.

And I'm going to use this as a way of illustrating what it means to take

a sample, and then using that sample to come up with a complex interval.

I have already gone ahead and

calculated my average based on the entire data set that I have.

So, if you look at this one you will see that it's the average of

where the data for New York sits.

And that gives me the average of 55.2, and

it gives me the standard deviation of 17.38, roughly.

So, I have taken one sample and that sample has 200 points in it, so

I use the same principle that I used in my earlier

video to show you that I went to data analysis.

Then went to Sampling, and then I selected a sample size of 200.

So here's my sample size of 200.

So first I need to know what is the mean of this sample.

So the way I find that is by taking its average, and

the average of the values that sits right here.

So I'm going to click on the first value, hold Ctrl+Shift, and

I will pick the entire 200 points.

Closed parentheses, Return.

And you need to scroll up just a tad to see it again.

So this sample gives me a mean of 56.36.

Next, I need to calculate the standard deviation of the sample.

And I will do that by taking STDEV.S, dot S is for sample.

Pick the first value, again control shift down, close the parenthesis, return.

And it will give me the standard deviation of 17.99 for this.

Now based on this I need to calculate the standard deviation for

this sampling means.

That means if i were taking samples over and over again that's what I would get.

The formula for that is the standard deviation of the sampling means

is known as a standard error and

we use the sample standard deviation and divided by the square root of n.

So this is what I need to do.

I'm going to write that here.

It's going to be my standard deviation divided

by the square root of my sample size.

So that's exactly what that equation is.

So I would press return and this would be the standard

error which is the standard deviation of the sampling means.

Then the confidence interval.

Let's say here my confidence interval is .95.

So then what is these two values?

In the Power Points, when we don't have access to t distribution

I have said to you that we can go ahead and use a z value.

And for 95, I pretty much know that's a 1.96.

So remember what the confidence interval of 95% will be.

To be exactly right, we should be using a t-distribution.

But in the PowerPoints I've been telling you that if your sample size is

large enough, we can use a Z-distribution, because as the sample size gets larger and

larger the t distribution.

And the Z-distribution starts to become very similar.

Let me just in this video show you a simulation where

it shows the difference between a t distribution and a normal distribution.

If you look at this animation that's happening right here.

The black curve is the normal distribution.

The red curve represents a T distribution and as its

degrees of freedom goes up, and degrees of freedom is sample size minus one.

What you see then as it becomes closer and closer to 50.

At 50, they're almost identical.

So what I have said in my PowerPoints is that it's easier for

you to just use an estimation when the sample size is large enough.

One of the things that we know is that 1.96 represents 95%

confidence interval when it comes to normal distribution.

And how do I know this?

Remember what a normal distribution looks like.

Normal distribution is the symmetrical curve that looks like this.

And if I say I'm looking for a confidence interval of 95%,

I am saying that here it's 95%.

So then I want to know what is this z value, and

this is what we call z of alpha over 2 And Z of alpha/2.

One is positive and one is negative.

This 95%, the remaining 5%, 2.5% of it is going to be on this side of the curve,

and 2.5% of it is going to be on this side of the curve.

The area to the left of this Z is really actually .975.

So that's what I'm going to put in order for

you to see what that value is going to be.

So first I'm going to show you the z value,

then I'm going to show you the t value.

So to do that I'm going to say norm.s.inverse and

I'm going to put everything to the left of that value.

So it.s .975 and this is going to be close to 1.96.

And that's one of the things that I have said to you,

that 95% confidence interval is very common.

And you want to remember that, that it's 1.96.

Key distribution, looks exactly the same way.

So let me get rid of this drawing.

Key distribution looks exactly the same here except it's tail is a little longer.

So again, let me go back to my simulation so you can see that visually.

Look at the red line versus the black line.

The red line is the four to t distribution and it becomes more and

more like a normal distribution as the sample size increases, but

look at its tail, it's just longer, slightly longer.

Going back, the t-distribution also has a similar function