So that's what we were calling a, so that's our treatment is education.

Outcome y is lwage, and the instrumental variable was nearc4.

And so

anytime you do a data analysis you'll probably want to look at the data first.

And so one thing I'll do right away is just I'll take a look at how many,

what proportion of people got encouragement?

So that's this mean of the variable nearc4 and we can see that over here,

so about roughly 68% were encouraged lived near four year college.

And I'm also going to look at histograms of the outcome variable, and

the education variable.

So these are two continuous variables.

So I'll just look at histograms of those.

Just the main reason to do this is just, so

you have a better feel of how our data are.

So this is a histogram of wage.

Well Rob of wages and you'd see that,

that's relatively symmetric there with no obvious outliers.

And then here's the histogram for education.

And you'll see there's quite a bit of variability there.

This is number of years of education, and you'll see there's a big spike here,

which is at 12.

So a lot of people had exactly 12 years of education or

in other words finish high school.

And then you'll see beyond 12 there is, either some college or

these are people here who have probably finished a four year degree.

And so

now we have some sense of the variability in our treatment if you will variable.

This number of years of education.

So we do see that there's a fair amount of variability in how much

education people had.

Next, we'll look at estimating the proportion of compliers here and

we're going to do this to estimate the strength of our instrumental variable.

And we could actually leave education as the continuous variable, but

for the purpose of estimating the proportion of suppliers, and

sort of making this analogous to a randomized trial.

What I'm going to do here is just create this variable which is education 12 here,

which means it's an indicator variable

that you have more than 12 years of education.

And so I'm imagining right now that treatment is, we're just comparing greater

than 12 years of education versus less than or equal to 12 years of education.

And I'm imagining that's what we're interested in.

You could leave treatment as a continuous variable to analyze it that way.

I'm mostly dichotomizing it to just illustrate the idea of compliers.

So what were the complier mean here?

Will compiler here would mean that, if you lived close to a four year college

you would end up having more than 12 years of schooling

whereas if you didn't then you wouldn't and that would make you a compiler.

And so we can estimate the proportion of compilers by taking the mean