[MUSIC] In this video, I'm going to show you the concept of confidence intervals. So we have over 26,000 data points for New York for over 25 years of data that we have for average daily temperatures. And I'm going to use this as a way of illustrating what it means to take a sample, and then using that sample to come up with a complex interval. I have already gone ahead and calculated my average based on the entire data set that I have. So, if you look at this one you will see that it's the average of where the data for New York sits. And that gives me the average of 55.2, and it gives me the standard deviation of 17.38, roughly. So, I have taken one sample and that sample has 200 points in it, so I use the same principle that I used in my earlier video to show you that I went to data analysis. Then went to Sampling, and then I selected a sample size of 200. So here's my sample size of 200. So first I need to know what is the mean of this sample. So the way I find that is by taking its average, and the average of the values that sits right here. So I'm going to click on the first value, hold Ctrl+Shift, and I will pick the entire 200 points. Closed parentheses, Return. And you need to scroll up just a tad to see it again. So this sample gives me a mean of 56.36. Next, I need to calculate the standard deviation of the sample. And I will do that by taking STDEV.S, dot S is for sample. Pick the first value, again control shift down, close the parenthesis, return. And it will give me the standard deviation of 17.99 for this. Now based on this I need to calculate the standard deviation for this sampling means. That means if i were taking samples over and over again that's what I would get. The formula for that is the standard deviation of the sampling means is known as a standard error and we use the sample standard deviation and divided by the square root of n. So this is what I need to do. I'm going to write that here. It's going to be my standard deviation divided by the square root of my sample size. So that's exactly what that equation is. So I would press return and this would be the standard error which is the standard deviation of the sampling means. Then the confidence interval. Let's say here my confidence interval is .95. So then what is these two values? In the Power Points, when we don't have access to t distribution I have said to you that we can go ahead and use a z value. And for 95, I pretty much know that's a 1.96. So remember what the confidence interval of 95% will be. To be exactly right, we should be using a t-distribution. But in the PowerPoints I've been telling you that if your sample size is large enough, we can use a Z-distribution, because as the sample size gets larger and larger the t distribution. And the Z-distribution starts to become very similar. Let me just in this video show you a simulation where it shows the difference between a t distribution and a normal distribution. If you look at this animation that's happening right here. The black curve is the normal distribution. The red curve represents a T distribution and as its degrees of freedom goes up, and degrees of freedom is sample size minus one. What you see then as it becomes closer and closer to 50. At 50, they're almost identical. So what I have said in my PowerPoints is that it's easier for you to just use an estimation when the sample size is large enough. One of the things that we know is that 1.96 represents 95% confidence interval when it comes to normal distribution. And how do I know this? Remember what a normal distribution looks like. Normal distribution is the symmetrical curve that looks like this. And if I say I'm looking for a confidence interval of 95%, I am saying that here it's 95%. So then I want to know what is this z value, and this is what we call z of alpha over 2 And Z of alpha/2. One is positive and one is negative. This 95%, the remaining 5%, 2.5% of it is going to be on this side of the curve, and 2.5% of it is going to be on this side of the curve. The area to the left of this Z is really actually .975. So that's what I'm going to put in order for you to see what that value is going to be. So first I'm going to show you the z value, then I'm going to show you the t value. So to do that I'm going to say norm.s.inverse and I'm going to put everything to the left of that value. So it.s .975 and this is going to be close to 1.96. And that's one of the things that I have said to you, that 95% confidence interval is very common. And you want to remember that, that it's 1.96. Key distribution, looks exactly the same way. So let me get rid of this drawing. Key distribution looks exactly the same here except it's tail is a little longer. So again, let me go back to my simulation so you can see that visually. Look at the red line versus the black line. The red line is the four to t distribution and it becomes more and more like a normal distribution as the sample size increases, but look at its tail, it's just longer, slightly longer. Going back, the t-distribution also has a similar function to our normal distribution, and it's called T.INV. And it is looking for probability, again .975 and the Degrees of Freedom is always n-1. So it is 200-1. So Degrees of Freedom is always n-1. So I will return that and you will see that these numbers are pretty close. Z would have given me 1.96, using a t distribution I get a 1.97. That's why in my slides I have told you when the sample size is large enough, you can go ahead and just use 1.96. It's minor problems. So but being accurate and being in excel, I am going to actually use the correct one which is the T distribution. So I'm going to highlight this for you to remember, you will use this value. For me to copyright the lower and upper values on my confidence interval, I need to know my margin of error, and margin of error is simply. Your critical value, how far you are from the mean in that distribution times your standard error, which is right here. 1.97 multiplied by 1.266, so this is my t value and this is my standard error. Okay? And, if I multiply that, this is the value I get. So now that I have my margin of error, the lower bound of my confidence interval is going to be my sample mean, so the equation for my confidence interval is X bar + or- margin of error. So in this case, it's going to be 56. This is my mean of my sample- the margin of error. And then, it's going to be upper value is going to be 56 + the margin of error. We are 95% confident, that the population parameter, the temperature, the average temperature for New York, falls somewhere between these two values. And what was our temperature? Our actual temperature was, 55.2. So in this case, we got a sample that gave us the right answer. There is a 5% chance that we would have had something that did not result in this value. Now every value in this interval is as likely as anything else.