In this lecture we are going to look at the gold standard of obtaining causal effects - that is, the research design that - in theory - is guaranteed to yield the causal effect of an independent variable. This gold standard is the randomized controlled trial. And it works through what is known as randomization. This - and how it relates to the linear regression model is the topic of this lecture. First, we have to go through some tedious notation. The independent variable is going to be very simple - it takes only two values - treated and non-treated. Hence, in the case of randomized controlled trials x is often denoted T for treatment. Therefore, either T is one, if the person is treated or zero if not treated. Note that treatment can be all kind of things. Medical treatment, a training program for unemployed or getting a college degree. For each person in the dataset we observe an outcome Y. This can be whatever kind of relevant outcome. Whether the person has been cured from and illness, has found a job or labor market earnings. The outcome is either the outcome if treated, when T is equal to one; or outcome if not treated, when T is equal to zero. The treatment effect at the individual level is then the difference between the outcome if treated and the outcome if not treated. Both are never observed at the same time for the same person. And therefore, this notation is known as the counterfactual approach. Either the outcome when treated is observed or the outcome when not treated is observed, never both. Hence, one of the outcomes is counterfactual. So we will never know the treatment effect for a particular individual. But we may hope to learn the average treatment effect - the average outcome for the treated minus the average outcome for the non-treated. If we knew the counterfactual outcome this would be straightforward to calculate. However, as we do not, we need that the treated and the non-treated are on average comparable. We return to this issue. But we define the average treatment effect - the ATE - as the average difference between the observed and the counterfactual outcome for each person. We further define two other types of outcomes. The average outcome when treated for those who actually got the treatment. This would correspond to the average earnings with a college degree for those who actually have a degree - as opposed to the counterfactual earnings - the earnings with a degree for those who did not get one. Finally, we also need the average outcome without treatment for those who are not treated. That is, earnings without a degree for those who actually do not have one. Note that both the average outcome with treatment for those treated and average outcome without treatment for those not treated are both observed in the data. With this new notation, we can now decompose the observed difference in outcome for those treated and those not treated. First, we write the average outcome with a treatment for those treated minus the average outcome without treatment for those not treated. This is observable form data. Next, we introduce counterfactuals. We add and subtract the average outcome if not-treated for those who were actually treated. This is not observed so it is an unknown number. Nevertheless, as we both add it and subtract it, this does not change the calculation. Then we rearrange, so that we first write the difference between the average outcome with and without treatment for those actually treated. This is the average treatment effect for those who were treated - ATT. Next, we add the difference between the average outcome for the treated and the untreated when neither receives treatment. This would be the average difference in earnings for those with and without a college degree if neither group had a degree. A baseline difference that occur if the two groups are not comparable on background characteristics. Hence, the difference in outcome between the treated and the not treated can be decomposed into a true treatment effect and a bias term that arises because the treated and not treated are not comparable in the absence of treatment. How does the new notation relate to the notation of our simple linear regression model? It turns out that there is a simple correspondence. We now write our independent variable t for treatment. It is now a so-called dummy variable that takes the value one when treated and zero when not treated. Then we can write the average outcome for the treated with treatment as the constant term plus the coefficient of the treatment variable and we can write the average outcome for the non-treated as the constant term. The average difference between the two groups is then just the regression coefficient. In addition, we know from the previous lecture that unobserved confounders may bias it. Therefore, bias from confounders are the same as the baseline bias we just described. This slide tries to elaborate on the duality of baseline bias and bias from confounders. The average outcome for the treated when they receive treatment is in the regression terminology the constant term plus the regression coefficient plus the average value of the effect of unobserved variables (confounders) for the treated. Likewise for the untreated, except that they do not get the effect from the regression coefficient (the treatment effect). The average counter factual outcome for the treated is the same as the observed outcome for the treated expect now they do not get the effect from the regression coefficient. Combining terms we can once again write the difference between the average outcome for the treated and the untreated, the average treatment effect of the treated plus bias in terms of the regression coefficients. Eventually this becomes the causal effect, b, plus differences in the effects of the confounders - i.e. the differences in the error terms for the treated and the non-treated. This last difference is zero if the effect from unobserved variables is the same among the treated and the non-treated or put differently, that is, if the error term is independent of the treatment status. In most cases, it would be unwarranted to assume that the error term is independent from the treatment status. It is easy to think of variables that affect both the event of getting a college degree and earnings. For instance, IQ and conscientiousness are both important in terms of educational achievement and in getting a good job with a high wage. Therefore, if both these variables are unobserved, it is very likely that the observed differences between those with and without a college degree reflect both a potential causal effect of a college degree but also confounding or baseline differences in IQ and conscientiousness. A convenient way out of this pickle is randomization. That is, we make a lottery that determines whether people are allowed to get a college degree. It is not considered ethical to make such a lottery but for the sake of our case, we nevertheless imagine that we can make this lottery. Thanks to the lottery, the average outcome for those treated and those not treated had they been treated, are now the same. The first is observed, the latter is counterfactual, but we know they are the same thanks to the lottery. The same goes for the outcome when not treated - the average outcome is the same for both those treated and not-treated because on average the two groups are comparable because they were allocated into treatment by the lottery. Therefore, when we calculate the difference between the average outcome for those treated and those not treated we get both the average treatment effect for the treated and the untreated with no baseline bias. Until now we have assumed that the treatment effect is uniform - everybody get the same causal effect of the treatment, b. This is not realistic. However, it turns out, that as long as we are only concerned about average treatment effect the assumption of homogenous treatment effect has limited importance. However, we will make a slight digression to show you how heterogeneous treatment effects works. Consider again the simple linear regression model with the independent treatment dummy variable. Now the regression coefficient has a subscript l indicating potentially individual treatment effects. That is both the size and the sign of the treatment effect vary across individuals. We also allow for the possibility that the effect and the error is correlated, that is, those who are treated might experience different treatment effects compared to those who are not treated. In the health sciences, this makes very much sense. You do not treat healthy people because you do not expect them to be (positively) affected by the treatment. You do not take aspirins if you do not have a headache. In addition, as before, we allow for confounders in the sense that the error term might be correlated with treatment status. In order to proceed we decompose the individual treatment effect into a common effect and an individual part with mean zero. This is innocuous as the decomposition is tautological. We now study what happens when we calculate the difference between the treated and the non-treated using data from a randomized trial and when the data is generated from a model with heterogeneous treatment effects. Again, treated and controls are on average equal in terms of the effect of the unobserved cofounders due to randomization. Therefore, the average treatment effect is the common treatment effect plus the average individual treatment effect plus the difference in baseline. The latter is zero by randomization and the average individual treatment effect is zero by assumption. Therefore, the difference between the average outcome of the treated and the non-treated from a randomized control trial is the average treatment effect. Randomization therefore guarantees to yield an estimate of the average causal treatment effect even if treatment effects are heterogonous and even if individuals would select themselves into treatment on the basis of treatment size in the absence of randomization. Therefore, randomization is a very powerful tool when trying to estimate causal effect. Even though randomized trials are such a powerful tool when trying to establish causal relations empirically, they are not always easy or feasible to carry out in practice. Therefore, there is not an abundance of randomized controlled trials in the social sciences. One notable example though is the project STAR. This project randomized more than 11,000 students into two treatments and one control group. When entering school in kindergarten students were randomized into either a small class (less than 17 students) and an ordinary class (more than 22 students) or an ordinary class with a teacher's aide. Student where followed throughout school and into high school and achievement scores were recorded at the end of each school year. Here we show the math achievement score at the end of kindergarten for all students. In addition, we show the distribution of students across treatment arms. Approximately a third of the students are in each of the three groups. The reason that it is not exactly third in each group is that even though students were randomized into the three groups some student subsequently decided to leave the class to which they were allocated by the randomization procedure. Such non-random dropout threatens the final data set to be complexly randomized. We return to this issue later. We now run a regression on the math achievement using dummy variables for whether the student is in an ordinary class (control group) or an ordinary class with a teacher's aide. The reference group is then students in a small class. The top regression is using math achievement after kindergarten as outcome. Here we find that student in the control group and student with a teacher's aide scores approximately 10 point lower on the math scale compared to students in a small classes. Assuming that randomization was successful, this is an average causal effect. Despite that student may experience very different outcomes when taught in a small compared to ordinary class, ON AVERAGE student benefits from small classes. Running the same regression using math achievement after 1st grade as outcome yields a similar result. To interpret this result some bookkeeping is needed, because students can now potentially switch between treatment groups, and thus may only be 'partially' treated. To avoid this, the regression using math in 1st grade as the outcome uses only students who has attended the same class type in both kindergarten and 1st grade. Obviously, if randomization should neutralize any baseline differences individuals may not circumvent the randomization procedure and 'self select' into different treatment groups reinstating a correlation between the error term and the treatment indicator. In practice it is unavoidable that a few individuals for some reason escape the randomization procedure. To assess how important this is in practice one can compare the treatment groups based on observable characteristics. If data is balanced based on observables it is credible that is it also balanced on unobservables as well, indicating that randomization is successful despite some reallocation of individuals after randomization. To asses this in our example we show the distribution of whether the student is eligible to free lunch (due to low-income parents), ethnicity and gender. In no cases can we reject to null hypothesis of independence between background characteristics and allocation to treatment groups. Thus, it seems that randomization in project STAR was indeed successful. Another way to see that randomization was properly successful is to look at the estimated effect of class type with and without using background characteristics as controls. We hypothesize that background characteristics are potential confounders and that we therefore should find a different effect of class type when we control for confounders. However, as we see from the two regressions reported in the tables, there are practically no differences between the estimated effects of class type across the two regression models, again indicating that randomization was successful. In this lecture we learned about the gold standard in terms of causal inference, the randomized controlled trial. However for many reasons it is not feasible to conduct a randomized controlled trial. Therefore in the coming lectures we will deal with less demanding research designs that will nevertheless address causality. Thank you for your attention. Looking forward to seeing you next time.