We've discussed probability, the relationship of samples and populations, and the central limit theorem. Now, it's time to begin talking about inferential statistics by describing the steps involved in hypothesis testing. Hypothesis testing is one of the most important inferential tools when it comes to the application of statistics to real life problems. Hypothesis testing is used when we need to make decisions concerning populations on the basis of only sample information. >> A variety of statistical tests are used to help arrive at these decisions. For example, the analysis of variance test, ANOVA. And the Chi- Square Test of Independence, to name a couple. But they all include the same basic steps. Steps involved in hypothesis testing, include specifying the null hypothesis H subscript 0, and the alternate hypothesis, H subscript a. Choosing a sample, assessing the evidence and drawing conclusions. Statistical hypothesis testing is defined as assessing evidence provided by the data in favor of or against each hypothesis about the population. To provide an example of hypothesis testing, we're going to use the NESARC data set. A representative sample of 43,093 adults in the United States. To evaluate whether or not there's an association between a diagnosis of major depression and how much a person smokes. We're going to work through the example using the four steps. Specifying the null and alternate hypothesis, choosing a sample, assessing the evidence and drawing conclusions. First there are two opposing hypotheses for question. The null hypothesis, commonly shown as an H subscript 0, is that there's no difference in smoking quantity between people with and without depression. The alternate hypothesis, shown as H subscript a or sometimes shown as H subscript 1, is that there is a difference in smoking quantity between people with and without depression. The null hypothesis basically, says nothing special is going on between depression and smoking. In other words, that they're unrelated to one another. The alternate hypothesis says that there is a relationship and allows that the difference in smoking in those individuals with and without depression could be positive or negative. That is, individuals with depression may smoke more than individuals without depression, or they may smoke less. After stating the null and alternate hypothesis, we need to choose a sample. We're going to use the NESARC data set, and we're only going to evaluate these hypotheses among individuals who are smokers and who are younger, rather than older adults. We subset the NESARC data to individuals that are, 1, current daily smokers. That is, they've smoked every day in the month prior to the survey. And, 2, are age 18-25. This sample, N = 1320, showed the following. Young adult daily smokers with depression smoked an average of 13.9 cigarettes per day with a standard deviation of 9.2 cigarettes. Young adult daily smokers without depression smoked an average of 13.2 cigarettes per day with a standard deviation of 8.5 cigarettes. >> While it is true that 13.9 cigarettes per day are more than 13.2 cigarettes per day, it's not at all clear that this is a large enough difference to reject the null hypothesis. Or to say that smokers with depression smoke significantly more than smokers without depression. >> While it's true that 13.9 cigarettes per day are more than 13.2 cigarettes per day, it's not at all clear that this is a large enough difference to reject the null hypothesis. Or to say that smokers with depression smoke significantly more than smokers without depression. So now we need to assess the evidence in order to determine whether the data provides strong enough evidence against the null hypothesis. That is, against the claim that there is no relationship between smoking and depression. We really need to ask ourselves, how surprising or rare is it to get a difference of 0.7 cigarettes smoke per day between our two groups? That is, those with depression, and those without, assuming that the null hypothesis is true, that there is no relation between smoking and depression. >> This is a step where we calculate how likely it is to get data like this when the null hypothesis is true. In a sense, this is really the heart of the process since we draw our conclusions based on the probability estimate. The null hypothesis is generally assumed to be true until evidence indicates otherwise. >> The probability that we'll get a difference of this size in the mean number of cigarettes smoked in a random sample of 1,320 participants is roughly .17 or 17%. We'll talk about how this gets calculated for the different statistical tests later. The important point at this stage is that it's this kind of evidence that we will be considering every time we decide to accept or reject the null hypothesis. So how exactly do we use this probability to come to a conclusion about the null hypothesis? Remember, if the null hypothesis is true, there is no association. There's a probability of .17 or 17% of observing this size of difference between smokers with and without depression. The translation of this 17% probability, is that if we took 100 random samples from our population, we would be wrong 17 out of 100 times if we rejected the null hypothesis and said that there was a difference in smoking quantity for smokers with and without depression. >> Now we have to decide whether or not this is something that we feel comfortable about. Do we mind making a mistake and saying that there's a difference in smoking quantity 17 out of 100 times? Does this probability of 0.17 make what we're observing rare enough to make us feel confident about rejecting the null hypothesis? We can all probably agree that a probability of 0.50 would certainly not give us enough confidence to reject the null hypothesis. Because 0.50, or 50%, means that we would be right 50 out of 100 times, and wrong 50 out of 100 times. No better than making decisions based on the flip of a coin. >> Being wrong 17 out of 100 times would make us far less likely to be wrong in rejecting the null hypothesis but we would still be less certain than if the probability were even smaller, say 10 or even 5%. Basically this is our decision when testing hypotheses. In order to make this decision it would be nice to have some kind of guideline or standard. What probability would give us confidence in rejecting a null hypothesis?