In this unit we're going to be talking about doing inference for categorical variables, where the parameter of interest is a proportion, as opposed to the mean that we've been talking about in the previous units. Take a look at these poll results from Gallup where the American public was asked about their opinion on marriages between same-sex couples. Or another one where data were collected from a variety of countries trying to answer the question, do most children in this country have the opportunity to learn and grow every day? Or another study, a published study on antihypertensive medications and serious falls, where researchers look at nearly 5,000 Americans over age 70 during a three year period. And found that those who were taking antihypertensive medications had a 30 to 40% greater likelihood of experiencing severe fall-related injuries like hip fractures and head trauma. What is common between these studies is that they deal with categorical variables like opinion on marriages between same-sex couples, like whether children in that country have the opportunity to learn and grow every day. And like whether patients taking a certain type of medication are more likely to have fall-related injuries. We're going to start our discussion with talking about one categorical variable at a time. And we're going to first consider the simple case where the categorical variable only has two levels that we can categorize as a success or a failure. Remember, when we say success, we don't necessarily mean something positive. You might, for example, define success as a patient dying or a patient suffering from a certain type of disease, or somebody graduating from high school. It doesn't matter what the context is. The important thing is that these are categorical variables that are binary. In other words their levels can be categorized as either one thing or the other. And in this case the parameter of interest is going to be defined as the proportion of success. Then we're also going to talk about situations where you only have one categorical variable that has more than two levels. For example, if you think about socioeconomic status, that tends to be categorized as low, medium, or high. And we're going to learn to do inference on the distribution of these types of variables as well. Then we're going to move on to working with two categorical variables, first talking about two categorical variables that both have only two levels. So for example, one can think of whether somebody is male or female, and whether they decide to pursue a major in the science fields. So that could be gender, male or female, looking at the relationship between that and whether they decide to pursue a major in the science field or not. Then were also going to extend our discussion to two categorical variables, where either one or both of the variables have more than two levels. So one might consider, for example, socioeconomic status as one of the variables, where you have low, medium, and high. And then the other variable could be educational attainment where it could range anywhere from, let's say, finished high school, junior college, college, or graduate degree, and we could look at the relationship between these two variables as well. In this case, we're going to learn to evaluate whether these variables appear to be dependent or independent. We've talked about this type of stuff before, but we've always been cautious about making certain statements too concretely because we hadn't yet really talked about statistical inference and statistical significance when it comes to categorical variables, and that's what all this unit is going to be about.