0:10

This video is on causal effects.

Here we're going to more formally define what we mean by causal effects.

And in particular, we'll discuss two types of causal effects.

Average causal effects, and the causal effect of treatment on the treated.

So we're going to formally define them using statistical notation,

potential outcomes.

So this video is going to be a little more technical.

And then we're also going to spend some time focusing on the difference between

conditioning and a variable or variables versus manipulating or setting variables.

So first we're going to talk about hypothetical worlds and

average causal effects.

So, let's begin by thinking of some population of interest,

which we're depicting just with a circle.

So this circle is everybody you're interested in.

It's representing a whole population of people that you're interested in.

It's sort of your target of interest.

So if you are interested in people who have diabetes and what

treatment is better, then your population would be the population of diabetics.

This circle represents the whole population that you're interested in.

And now, we're going to think about hypothetical worlds.

So remember with potential outcomes, we were thinking about hypothetical worlds

and hypothetical interventions, so we are still thinking about that now.

So we're not really thinking about data yet.

This is, we're just imagining what we would ideally like to see.

So what we would ideally like to see is two worlds, two hypothetical worlds.

So World 1, which is depicted by this grayish circle,

is that everyone in our population gets treatment A=0.

So treatment A=0 could be, it actually could be no treatment,

it could be a placebo, whatever you imagine.

So we're picturing now, this is a world where our entire population,

every single person, got treatment A=0.

Versus, some other hypothetical world where everyone received

the other treatment, A=1, so depicting that with this light blue circle.

But the most important thing here is that World 1 and

World 2 have the exact same people, it's the same population of people.

But in one case, we do one thing to them, and in another case, we do another.

And then, if we were able to observe both of these worlds simultaneously,

we could collect the outcome data from everyone in the populations, and

then we could take the average value.

So I say mean of Y in World 1.

And mean of Y in World 2.

And then that difference would be the average causal effect.

And so this is what we mean by an average causal effect.

So, it's an average in the sense that it's a mean and

it's a population sort of level average causal effect.

It's over the whole population.

We're saying, what would the average outcome be if everybody got one treatment,

versus if everybody got another treatment?

So, of course, in reality, we're not going to see both of these worlds.

But this is what we want, this is what we define as the average causal effect.

This is what we would like to see, and this is what we're hoping to estimate.

And we can define that more formally using statistical notation.

So here, the E refers to expected value and that also means that's the mean.

And here we're then taking the average of

difference of these two potential outcomes.

So, remember, Y^1 is a potential outcome if treated with A=1,

and Y^0 is the potential outcome if treated with A=0.

And so then,

we could take the average difference of that to get an average causal effect.

So this quantity,

this average causal effect is the average value of Y if everybody was treated with

A=1 minus the average value of Y if everyone was treated with A=0.

And that's exactly what I showed you on the previous slide.

Mean of Y for World 1 versus mean of Y for World 2.

So in the case where Y is binary for example,

this would just be a risk difference.

In fact, it would be a causal risk difference.

6:01

But it's a little bit difficult to think about,

what a difference in probabilities means or a difference in risk means,

so one thing that's helpful is to quantify that in terms of the number of people.

So we could imagine if 1,000 people were going to have hip fracture surgery.

Then if the casual effect is -0.1 we would expect about

100 fewer people to have pulmonary complications under

regional anesthesia compared with general anesthesia.

Now we can look at another example.

Suppose treatment is thiazide diuretics versus no

treatment among hypertensive patients.

And our outcome Y here is systolic blood pressure.

So our population, that circle that we're interested in is hypertensive patients,

so patients with hypertension.

And we're interested in, is it effective to treat with thezide diuretics?

7:56

So first, what we mean by conditioning isn't the usual statistical sense.

Where it's a given or conditional on a particular variable.

In this case, we're focused on treatment.

So, the main idea is that this average causal effect,

which is the mean of this difference of potential outcomes,

which is what we see on the left side of this inequality.

The expected value of Y1-Y0, that's not in general going

to be the same thing as the expected value of your observed Y,

given A=1 minus the expected value of Y, given A=0.

And this idea is really fundamental to causal inference.

8:49

So let's think about what these different types of quantities mean.

So whenever you see this vertical line in the expected value,

one way you could read it is, the expected value of Y given A=1.

But I usually like to think of it as, iit's defining a subpopulation.

So rather than say, given A=1, you could read it as

the expected value of Y among people who have A=1.

Or among the subpopulation of people who have A=1.

So this vertical line, this conditioning,

is really restricting to a sub-population of people.

The expected value of Y given A=1 means what is the average value of

Y in the subpopulation of people defined by those who actually

receive treatment equal to 1?

But they might differ from the population as a whole in important ways.

So for example, people at high risk for

flu might be more likely to choose to get a flu shot.

So imagine that that's the case that people at higher risk for

the flu might be more likely to get the flu shot.

Then if we take the expected value of Y among people who actually got the flu

shot, we're taking expected value of Y among sort of a higher risk population.

And that's different than the expected value of the potential outcome Y^1

because Y^1 is the outcome if everyone in the whole population got treatment.

It's not restricting to a subpopulation.

So when I say setting treatment, I mean manipulating,

or in the potential outcome situation.

And when I say conditioning on, I really mean restricting to subpopulations.

We can see this pictorially.

So, again, imagine our population of interest is a circle, but in reality,

some people get A=0, so that's this gray partial circle on the left.

And you'll notice that it's not the original population anymore.

It's a population of people who actually got A=0.

So, we're not hypothetically manipulating anything at this point.

This is just reality.

Some people actually get A=0, and that's them.

On the right hand side, blue partial circle,

that's the population of people who actually got A=1.

And you'll notice that these are not the same population of people.

So we could take the mean of both of those subpopulations.

And that would be an average difference in the outcome between subpopulations that

are defined by treatment group.

So it would be the difference in outcome among two subpopulations.

But the people who get treatment A=0 might differ in

fundamental ways from people who get treatment A=1.

So we haven't isolated a treatment effect, because these are different people, and

it might have different characteristics in general.

So that's why this distinction is very important,

between the thing we're targeting, which is this average cause and effect.

This causal effect where we're manipulating treatment on the same

group of people versus this thing that we actually observe, which

is the difference in means among some populations that are defined by treatment.

13:51

So there's a lot of other causal effects that you might be interested in.

And what you choose might depend on the particulars of your study,

of your research question, or even what data you have available to you.

So some of this will be covered later on in the course, but

I'll give a few examples of causal effects.

So here, now instead of a difference in

potential outcomes, we're taking a ratio.

So this is the expected value of Y1 divided by Y0.

And so, if we have binary outcomes, that would be a causal relative risk.

So we're taking a risk if everyone was

treated versus the risk if no one was treated.

So that is a causal relative risk.

Another somewhat common example that people are interested in is the casual

effect of treatment on the treated.

And so you'll notice here that we condition on A=1.

And that's what we mean by, on the treated.

We actually are restricting to a subpopulation, so

the subgroup people who have actually received treatment, but

we're contrasting potential outcomes for that group.

So we're still in causal effect territory.

And if this one's confusing, and I'll show and then next slide,

what this represents in a picture, and I think it will be more clear.

16:01

And we might want to know, well, what is the causal effect in this subpopulation?

So maybe among people who have a bio-marker that's high or low, or

what's the casual effect among men, or among women, or by age, or by race,

we're still isolating a treatment effect, but maybe in certain sub-populations.

So let's look at the real world versus another example of a causal effect.

So we saw this before.

In the real world, some people get treated, some people don't.

These are different populations.

But what we actually might be interested in, in the case of causal effect of

treatment on the treated is focusing on the sub population who are treated.

So that's this little semicircle, that's the treated population.

But you'll notice then when we get to World 1 versus World 2,

we're restricting to the same group of people.

16:56

So we have this gray semicircle and this blue semicircle, but

they're the same shape, they represent the same people.

And now we're giving everyone treatment equals A=0, versus treatment A=1.

This is a causal effect of treatment on the treated.

It's a causal effect because it's the same group of people,

it's this population of people who in reality were treated.

But now we want to imagine in World 1 what would have happened if we had not

treated them.

And in World 2 was sort of what actually happened where we actually did treat that.

So, that will be the causal effect of treatment on the treated.

The important distinction here is that we're still

comparing the same group of people.

So we can define a subpopulation, and that's fine as long as on the treatment

side of things, we are doing the same thing to everybody.

So in World 1, we might not be treating everybody.

In World 2, we're treating everybody, and then we're contrasting the mean.

But, again, we're going back now to this fundamental problem of causal inference.

We only observe one treatment and one outcome for each person.

Again, that's the fundamental problem of causal inference.

And so, what we're moving into is how do we then use observed data

to link observed outcomes to potential outcomes?

So in reality, for each person we're going to see one treatment,

we're going to see one outcome.

But we want to infer something about what would have happened.

And how are we going to do that?