[SOUND] Once we have formulated the hypotheses, then we can gather data, and after that we can start analyzing. Conducting an experiment goes back to the principles we discussed in the earlier course about how to produce data and conduct sample studies. Here, we're assuming that we have followed the proper steps in attaining our data. So we are focusing on how to proceed with the data analysis. The first thing is to set the significance level, referred to as alpha. This will be used for testing the hypothesis. I hope you remember that we hard about alpha when we learned about confidence interval in the previous course. Confidence level is 1- level of significance. So for instance, when alpha is .05 then confidence level is 95%. Take alpha of .05 and a two tail test where we are testing the equal to not equal. Alpha, is the probability of taking a sample from the blue shaded area. This is a sample that doesn't represent the population. Because it could lead us to making the wrong conclusion. So before we can talk about the analysis, let's look at where our analysis could lead us to. I'm going to go back to one of the examples we had in the previous lesson where a chocolate manufacturer wants to see if the bags weigh 300 grams as they are supposed to. When we do our analysis we can make two mistakes. Type I and type II errors. Type I error, also known as alpha, is when we reject the null hypotheses incorrectly, and type II error is when we don't reject or retain the null hypotheses incorrectly. Let's look at the example where we were checking the weights of chocolate bags. We believe that the bags weigh 300 grams, thus we are taking samples to check on this. The possible conclusion we can have are summarized in this table. There are two possibilities about our production process. One is That in reality, the bags do weigh 300 grams. And the other possibility is that they don't. If the production process is working correctly, but our sample shows otherwise, then we will end up rejecting the null hypothesis. This is a mistake, and it's known as a type I error. We can also have a production system that is not working properly. Thus, the bags are not really 300 grams. Now, if you take a sample and that leads us to believe that the system is working properly, then we have made type II error. Notation here being beta. The other two possibilities represents we correctly reject a false null hypothesis, or don't reject the hypothesis, which is true. In different settings, the probability that we can make a mistake is called different things. For example, in manufacturing and quality control, type I error is referred to as manufacturer's risk. This is when a production line is shut down unnecessarily. Type II error is known as the consumer's risk. This is when they fail to stop a production line that really needs adjustments, and defective products are passed on to the consumers. We also refer to these errors differently in medical setting. In a medical setting, type I error is called a false positive, which is based on lab results falsely calling your patient sick, when in fact, they are healthy. And type II error is called a false negative, when the lab results will lead a physician to think a patient is. And type II error is called false negative when the lab results will lead a physician to think a patient is disease free when in fact they have a disease. You can think of Lance Armstrong was being declared drug free when in fact he was using performance enhancing drugs. The lab results were giving false negative results. So now that you know we can make mistakes in our analysis, we can go back to alpha. Before any analysis is to begin, we need to set this value. The statistical tests rely on the sampling distribution of the statistics that estimates the parameter specified in the null and alternate hypothesis. Alpha represents the chance of getting a sample that differs from the null hypothesis by as much as the sample we have. If null hypothesis in reality is true. In the case of a chocolate manufacturer, the production line will be stopped to be fixed if the mean of the sample we have taken will fall in either direction into one of the blue shaded area. We will want to stop the production line and make adjustments if the bags are too heavy or too light. We can, by chance, take a sample that is that far from the mean and lead us to stop the production, when in reality the production line is just fine. When we set alpha at .05 then the chance is evenly split between the two tails. That is what is called a two tailed test. Alpha is usually set to a low value to minimize the probability of rejecting a true null hypothesis. In business, alpha is typically set to .05 and in this setting it takes strong evidence to reject the null hypothesis. Setting of .01 and 0.10 are also sometimes used for alpha. Alpha of .01 would be considered a low value, which requires strong evidence to reject the null hypothesis. And 0.10 would be considered a high value. There is a relationship between alpha and beta, type I and type II errors. For a fixed sample size, the smaller we make alpha, the larger beta becomes. That is the lower the probability of rejecting a true null hypothesis, the higher the probability of failing to reject a false null hypothesis. As I mentioned before, we don't set the value of beta. But we can calculate for it. Here, again, as a general rule the test will be it's better for larger sample sizes and as the sample size grows, the probability of making either mistakes alpha or beta would reduce. But here again is the question of cost. After stating the null hypothesis, and specifying the level of significance, we can move on to the next step, which is calculating the p-value for the sample data. Once we have the p-value, the only thing remaining is to make a decision about our null hypothesis, that is, to reject or not reject. We will explore these two steps in the next lesson.