Now we come to the final type of analysis, causal analysis which can tell you if one thing, like spending on advertising, is affecting another thing like your sales number. By the end of this lesson, you should be able to explain how to run a regression analysis, and assess causal relationship between two variables using regression analysis. A causal relationship basically means that you want to assess how variable X affects variable Y. Marketing applications for such type of analysis are numerous. For example, what is impact of advertising on sales? What is impact of price on sales and so on? From there, you need to be able to model, which is a relationship of how X affects Y. The idea here is that the variation of the dependent variable can be explained by variations in the independent variable and other captures that are not captured in the model. And so in regression analysis, it formally means that you will assume that Y sales is going to be equal to a, an intercept, + b times X. Where X is going to be advertising, for example, and b is going to be the effectiveness of advertising on sales. And the error term is going to capture all the other factors that are not captured in the model used to predict Y. To summarize, a + bX is going to be my model, and errors are going to be everything else. Now there are two important questions for marketing researchers here. One question is, what are the values of A and B? Are they statistically significant or not, which we'll see later. The other question that is very important is, how much variations of sales can I explain by my model? So later on, we are going to use something called an adjusted R-square that gives you the information of how variations of my sales can be explained by a + bX compared to variations in errors. And so I come back to my initial example of a marketing researcher or a chief marketing officer who wants to understand the effectiveness of advertising on sales. He or she collects data on the history code of dozen expenses and unit sales. And wants to run this regression. So we need to build the model, and so the first question is, what is a dependent variable? And so here, the dependent variable is going to be the key performance indicator that we want to understand, in this case, unit sales. The independent variable is going to be advertising, and so we write the model, Y = a + bX + errors. Which means that Unit Sales = Sales without Advertising, a, + the combine Effects of Advertising spending X, and its effectiveness on sale, b, + Everything Else. So visually, it means that the relationship is like this, where X is going to be advertising and Y is going to be sales. The red line represent the model Y = a + bX, and b is the slope of this regression which in our case, again, measure the effectiveness of advertising on sales. So imagine that you've collected these data for the last few quarters, and on the left-hand side, you have a measure of advertising expenditures. And on the right-hand side, you have a measure of unit sales. So you run the regression, and this regression can be run in Excel, or SPSS, or Qualtrics. I ran this regression here in Google Sheets and got the following information. So the first question here is, do I have a good model? Does my model explain variation of sales properly or not? To answer this question, I need to look at the adjusted R-squared, which here is about 98%, which tells me that about 98% of my sales variations can be explained by my model, with my model being a + bX. That means that I have a pretty good model because all of the other factors come from less than 2% of variations, so I'm very happy about that. The other thing I want to look at is the parameter estimates and their T values of standard errors. So the estimate tells me the value of A and B, in this case, A is going to be around 1,000, and B is going to be around like 0.22. Now the question is, what are these numbers means? Are they statistically significant or not? And for that, I need to run a test, a T-test, where basically the null hypothesis is going to be that those parameter estimates 0.22 and a 1,000 are not statistically speaking different from 0. And the alternative hypothesis is that those numbers are statistically speaking different from 0. And so here are the two tests that I need to do. Luckily, most softwares give you the answers for the T-values. And so we can see that in most cases, the key values are going to be above 1.96, which means that at the 95% confidence level, again, in both cases, reject the null hypothesis that those numbers actually equal to 0. It means that the alternative hypothesis that those numbers are not equal to 0 is actually correct. So if I'm a CMO, a Chief Marketing Officer, I'm very happy because it means that advertising worked in driving sales. So that's all for this module. The next module will expand on the simple linear regression, and we're first going to look at cases where you have more than one independent variables like advertising and price together to expense sales. We're also going to go deeper into the assumptions behind the linear regression. Do the assumptions are important to assess the validity of the results. Then we look at regression models that are useful when the dependent variable is not continuous but discreet. In marketing, that could be, for example, choosing brand one over brand two instead of looking at how many units of brand one I bought compared to how many units of brand two I bought. Finally, we look at three other advanced techniques that are commonly used by marketing researchers called factor analysis, closed to analysis, and conjured analysis.