How does the new notation relate to the notation of our simple linear regression model?
It turns out that there is a simple correspondence. We now write our independent variable t for
treatment. It is now a so-called dummy variable that
takes the value one when treated and zero when not treated.
Then we can write the average outcome for the treated with treatment as the constant
term plus the coefficient of the treatment variable and we can write the average outcome
for the non-treated as the constant term. The average difference between the two groups
is then just the regression coefficient. In addition, we know from the previous lecture
that unobserved confounders may bias it. Therefore, bias from confounders are the same
as the baseline bias we just described. This slide tries to elaborate on the duality
of baseline bias and bias from confounders. The average outcome for the treated when they
receive treatment is in the regression terminology the constant term plus the regression coefficient
plus the average value of the effect of unobserved variables (confounders) for the treated.
Likewise for the untreated, except that they do not get the effect from the regression
coefficient (the treatment effect). The average counter factual outcome for the
treated is the same as the observed outcome for the treated expect now they do not get
the effect from the regression coefficient. Combining terms we can once again write the
difference between the average outcome for the treated and the untreated, the average
treatment effect of the treated plus bias in terms of the regression coefficients.
Eventually this becomes the causal effect, b, plus differences in the effects of the
confounders - i.e. the differences in the error terms for the treated and the non-treated.
This last difference is zero if the effect from unobserved variables is the same among
the treated and the non-treated or put differently, that is, if the error term is independent
of the treatment status. In most cases, it would be unwarranted to
assume that the error term is independent from the treatment status.
It is easy to think of variables that affect both the event of getting a college degree
and earnings. For instance, IQ and conscientiousness are both important in terms of educational
achievement and in getting a good job with a high wage.
Therefore, if both these variables are unobserved, it is very likely that the observed differences
between those with and without a college degree reflect both a potential causal effect of
a college degree but also confounding or baseline differences in IQ and conscientiousness.
A convenient way out of this pickle is randomization. That is, we make a lottery that determines
whether people are allowed to get a college degree.
It is not considered ethical to make such a lottery but for the sake of our case, we
nevertheless imagine that we can make this lottery.
Thanks to the lottery, the average outcome for those treated and those not treated had
they been treated, are now the same. The first is observed, the latter is counterfactual,
but we know they are the same thanks to the lottery.
The same goes for the outcome when not treated - the average outcome is the same for both
those treated and not-treated because on average the two groups are comparable because they
were allocated into treatment by the lottery. Therefore, when we calculate the difference
between the average outcome for those treated and those not treated we get both the average
treatment effect for the treated and the untreated with no baseline bias.
Until now we have assumed that the treatment effect is uniform - everybody get the same
causal effect of the treatment, b. This is not realistic.
However, it turns out, that as long as we are only concerned about average treatment
effect the assumption of homogenous treatment effect has limited importance.
However, we will make a slight digression to show you how heterogeneous treatment effects
works. Consider again the simple linear regression
model with the independent treatment dummy variable.
Now the regression coefficient has a subscript l indicating potentially individual treatment
effects. That is both the size and the sign of the
treatment effect vary across individuals. We also allow for the possibility that the
effect and the error is correlated, that is, those who are treated might experience different
treatment effects compared to those who are not treated.
In the health sciences, this makes very much sense.
You do not treat healthy people because you do not expect them to be (positively) affected
by the treatment. You do not take aspirins if you do not have
a headache. In addition, as before, we allow for confounders
in the sense that the error term might be correlated with treatment status.
In order to proceed we decompose the individual treatment effect into a common effect and
an individual part with mean zero. This is innocuous as the decomposition is
tautological. We now study what happens when we calculate
the difference between the treated and the non-treated using data from a randomized trial
and when the data is generated from a model with heterogeneous treatment effects.
Again, treated and controls are on average equal in terms of the effect of the unobserved
cofounders due to randomization. Therefore, the average treatment effect is
the common treatment effect plus the average individual treatment effect plus the difference
in baseline. The latter is zero by randomization and the
average individual treatment effect is zero by assumption.
Therefore, the difference between the average outcome of the treated and the non-treated
from a randomized control trial is the average treatment effect.
Randomization therefore guarantees to yield an estimate of the average causal treatment
effect even if treatment effects are heterogonous and even if individuals would select themselves
into treatment on the basis of treatment size in the absence of randomization.
Therefore, randomization is a very powerful tool when trying to estimate causal effect.
Even though randomized trials are such a powerful tool when trying to establish causal relations
empirically, they are not always easy or feasible to carry out in practice.
Therefore, there is not an abundance of randomized controlled trials in the social sciences.
One notable example though is the project STAR.
This project randomized more than 11,000 students into two treatments and one control group.
When entering school in kindergarten students were randomized into either a small class
(less than 17 students) and an ordinary class (more than 22 students) or an ordinary class
with a teacher's aide. Student where followed throughout school and
into high school and achievement scores were recorded at the end of each school year.
Here we show the math achievement score at the end of kindergarten for all students.
In addition, we show the distribution of students across treatment arms.
Approximately a third of the students are in each of the three groups.
The reason that it is not exactly third in each group is that even though students were
randomized into the three groups some student subsequently decided to leave the class to
which they were allocated by the randomization procedure.
Such non-random dropout threatens the final data set to be complexly randomized.
We return to this issue later. We now run a regression on the math achievement
using dummy variables for whether the student is in an ordinary class (control group) or
an ordinary class with a teacher's aide. The reference group is then students in a
small class. The top regression is using math achievement
after kindergarten as outcome. Here we find that student in the control group
and student with a teacher's aide scores approximately 10 point lower on the math scale compared
to students in a small classes. Assuming that randomization was successful,
this is an average causal effect. Despite that student may experience very different
outcomes when taught in a small compared to ordinary class, ON AVERAGE student benefits
from small classes. Running the same regression using math achievement
after 1st grade as outcome yields a similar result.
To interpret this result some bookkeeping is needed, because students can now potentially
switch between treatment groups, and thus may only be 'partially' treated.
To avoid this, the regression using math in 1st grade as the outcome uses only students
who has attended the same class type in both kindergarten and 1st grade.
Obviously, if randomization should neutralize any baseline differences individuals may not
circumvent the randomization procedure and 'self select' into different treatment groups
reinstating a correlation between the error term and the treatment indicator.
In practice it is unavoidable that a few individuals for some reason escape the randomization procedure.
To assess how important this is in practice one can compare the treatment groups based
on observable characteristics. If data is balanced based on observables it
is credible that is it also balanced on unobservables as well, indicating that randomization is
successful despite some reallocation of individuals after randomization.
To asses this in our example we show the distribution of whether the student is eligible to free
lunch (due to low-income parents), ethnicity and gender.
In no cases can we reject to null hypothesis of independence between background characteristics
and allocation to treatment groups. Thus, it seems that randomization in project
STAR was indeed successful. Another way to see that randomization was
properly successful is to look at the estimated effect of class type with and without using
background characteristics as controls. We hypothesize that background characteristics
are potential confounders and that we therefore should find a different effect of class type
when we control for confounders. However, as we see from the two regressions
reported in the tables, there are practically no differences between the estimated effects
of class type across the two regression models, again indicating that randomization was successful.