1:36

if we hadn't factored in x, this would be the mean for the group that, say, received

the treatment, and this would be the mean for the group that got the control.

Now there's a pretty clear linear relationship between

the outcome and the regressor.

So what we could do is fit a model that looks like y = beta

naught + beta 1 times our treatment indicator, 01 for

our treatment indicator, + beta 2 * x + epsilon.

This would fit two parallel lines.

Beta one would represent the change in intercepts between the groups,

whereas beta two would be the common slope that exists across the two groups.

5:06

clearly didn't randomize, and which model

here is the right one to consider is not really up for discussion for today.

What we're just talking about today is about how

the inclusion of x can change the estimate.

Now which is the right model to consider?

That is a different thing.

For example, this might occur, an example of when this might occur is let's say

treatment is whether or not you're taking some blood pressure medication, okay?

And y, your outcome, is your blood pressure.

But suppose that the x variable was

5:47

cholesterol or something highly related to whether or

not you would've gotten prescribed this medication to begin with, okay?

Then you could see that adjusting for x is really just adjusting for

the same sort of thing that would lead you to have treatment.

So again, this is what makes observational data analysis very hard

as opposed to instances where what you're interested in has been randomized.

Okay, so this is an example, just to reiterate this point,

this is an example where we had a strong marginal effect when we disregarded x,

and a very subtle or a non-existent effect when we accounted for x.

Let's try some different scenarios.

7:09

There is some direct evidence right there comparing the two groups.

But, it also is kind of a hard

case because if you look, here's the marginal mean for the red group.

Here's the marginal mean for the blue group.

There's a probably significant effect here that says the red is higher than the blue.

However, we fit our model and look at the change in the intercepts,

what we see is that the blue is higher than the red.

So our adjusted estimate is significant and

the exact opposite of our unadjusted estimate, okay?

And again, during this lecture we're not gonna talk about which one's the right

one, we're just gonna talk about how this can obviously occur.

Here is a picture where you can see exactly what's happening.

This phenomenon here is often called Simpson's Paradox, and

the idea is you look at a variable as it relates to an outcome,

and that effect reverses itself as the inclusion of another variable.

That's often called Simpson's Paradox, which basically just says

things can change to the exact opposite when you perform adjustment,

which Is actually looking at this picture not that surprising.

It's not a paradox whatsoever.

8:28

All right, let's try some other examples.

Again, the next slide's just gonna reiterate these points.

In this example, there's basically no marginal effect.

However, there's a huge effect when we adjust for x so

this is a case where we went from, we saw a case where we went from significant

effect to when we adjusted for x, we got a non-significant effect.

Well, here's an instance where we had a non-significant effect if we ignore x,

and then a significant effect when we include x, okay?

So there's no simple rule that says this is always what will happen with

adjustment.

Pretty much any permutation of going from significant to non-significant, staying

both significant, staying non-significant, flipping signs, all of them can occur.

9:24

Here's the final example like this I'd like to show, and

this just considers an instance where we would surely get this wrong

if we were to assume the slopes were common across the two groups.

Obviously, the slopes are different.

And we know how to fit a model like this.

If we were to fit a model that said,

y = beta naught + beta 1 * our treatment effect,

+ beta 2 * x, + beta 3 treatment, * x + epsilon.

That would fit two lines with different intercepts and different slopes,

and we could get a fit like to this data set.

Another important thing to ascertain for

this data set is there is no such thing as a treatment effect.

If you look right here, the red and the blue,

there's no evidence of a treatment effect.

If you look right here, there's a big evidence that blue

has a higher outcome than red, and if you look over here,

there's a lot of evidence that red has a higher outcome than blue.

And the reason, the interaction is the reason

that this main treatment effect doesn't have a lot of meaning.

The end result is that this coefficient,

the coefficient in front of the treated effects, which just spits out of course,

is not interpreted as the treatment effect by itself.

You can't interpret that number as a treatment effect.

As we can see from this picture,

there is no such thing as a treatment effect for this data.

The treatment effect depends on what level of x you're at.

So you can't just read the term from the regression

output associated with the treatment and act as if it's a treatment effect,

if in fact, you have an interaction term in your model.

So that's an important point, but this also just goes to show how adjustment can

really change things if you have a setting like this where you have not just

adjustment, but so-called modification.

Okay, so again that was a crazy simulation.

This just summarizes some of the points.

You often see interactions, but

you rarely see interactions that start, but still, nonetheless, they can occur.

And then I want to reiterate that nothing we've talked about is specific to having

a binary treatment and a continuous x.

In this case here, we have our same outcome, y.

But in our x1 variable is continuous and our x2 is continuous, but because

it's kinda hard to show 3 variables at the same time, x2 is color coded.

So higher lighter values mean higher, and

more red darker values means lower, okay?

So in this case, if you look at this plot you would say there isn't much of

a relationship between y and x1, however,

lets look at this in three dimensions, and then I need a different setting.

I need something where I can rotate the plot around.

So, I'm going to use RGL, and it's pretty easy.

You can, this doesn't show how I generated the x1, x2, and y,

that's in the mark down document, but here, I'll show you how to get it.

So there's the plot or data set that's equivalent cuz I reran the simulation.

And here's our plot.

So here's exactly that plot recreated,

16:12

Okay, and then for automated model selection, which is another process that

you'll talk more about in your machine learning class, that's a different thing.

And I don't think it's particularly well suited for

painting interpretable coefficients.

It's data for obtaining good predictions with respect to a loss function.

So that's a different process.

I think there's no substitute if you're doing model building with a regular

data set where you want interpretable coefficients to getting your hands dirty,

getting the team of people that are working with on it together,

some with the right scientific expertise, some with the statistical expertise, and

some with the computing expertise, and so on, all together to fit the models.

And if there's a big change in coefficients after different adjustment

strategies, well, then those need to be discussed and

vetted and the benefits and side effects, and

downsides of each of the models should be discussed, okay.