In this video, we'll see how to actually fit

a linear model in our accounting for a complex sample design.

So, the data set I'm going to use is one that's bundled with the R survey package.

It's called the Academic Performance Index File.

This API is computed for all California schools based on standardized testing students,

and there are several data sets in this API that's bundled with survey.

It gives information for all schools with

at least 100 students and for various probability samples of the data,

just for illustration, and we've got one record per school in all these data sets.

So, how do we set up things for R?

It's very similar to what we did for descriptive statistics,

where you get to start with a survey designed object to

specify things like cluster, strata and weights.

We will fit the model with a function called SVYGLM,

for general linear model,

and it will take certain parameters that I'll illustrate for you in a bit.

The variables that we'll look at just for this illustration are;

the Academic Performance Index 00,

which is the API in the year 2000,

The percent of English language learners in the school,

that means the percentage of students for whom English is not their first language.

Then we've got the percentage of students who are eligible for subsidized meals.

If, you know, poor to low income students in California are eligible to

get contribution toward lunches or maybe free lunches and sometimes breakfast,

it depends on the school.

And then we've also got mobility,

which is the percentage of students for whom this is the first year at the school.

Now, to fit the model,

first we require surveys so we can get at SVYGLM and the data set.

Then I say data API,

that gives me access to a number of

these probability samples associated with the academic performance index file.

Survey design, this ID equals ~1.

This means I've got no clusters.

I've got strata defined by ~stype,

which is school type.

It has to do with the grade levels in the school.

The weights are the ~pw field in this file.

I tell it what the data set is a API Strat,

that's one of the- it's a stratified simple random sample without replacement.

And then there is an FPC on the file,

which I'll use because we want to get credit for drawing

a substantial proportion of the schools in some of the strata.

And now we fit the model with SVYGLM.

So, what I'm doing is regressing the academic performance index in

the year 2000 on English language learners, meals, and mobility.

And I tell it here's my design object.

So this, this function,

SVYGLM knows what the strata clusters, weights and FPC are,

based on this definition right here in the SVY design.

So, here are my estimates,

the user- the output from SVYGLM,

I called for a survey on M1, my fitted model.

So, here's the estimates and you can see the intercepts by far, are the biggest.

We get the standard errors, t values,

on each of those and then p values associated with each parameter estimate.

So, the first and the third are quite significant,

the English language learners and mobility are not so much.

Now, another thing that we can do that is a nice feature is,

you can test subsets of the coefficient estimates.

So, the term-the function for doing that is called

regTermTest and I send it my model object.

So, you have to have saved your model,

fitted model, into some objects so with that's m1,

and then I want to test those two coefficients

that were non-significant, just for illustration.

So English language learners and mobility - I'm going to use

a Wald test and here's the output;

It gives me an F statistic 1.046 on two and 194 degrees of freedom.

P value is 0.35.

So, what does that say?

It says that you can't reject the null hypothesis and

the null hypothesis is that simultaneously,

both these coefficients are zero.

So, you know, if you're just eyeballing T statistics and doing this joint test,

it looks like those two things - percentage of

the English language learners and students for whom this is their first year in school,

are not too important in predicting the academic performance index.

So, we will follow up this with some diagnostics in the next video.