Okay. So, welcome back to our discussion of

fitting multilevel statistical models to dependent data.

In this lecture we're going to be focusing on

multilevel models for binary dependent variables.

So, we'll be talking about multilevel logistic regression models.

So let's think about the way we write these kinds of models,

when we have binary dependent variables.

Last week we learned how to write the model for a binary dependent variable,

using this logit link.

This natural log of the ads,

that the dependent variable is equal to one.

So we see that we write the natural log,

of the probability that a binary dependent variable Y,

measured on person i within cluster j is equal to one,

divided by one minus that probability,

we call that the logit function recall.

So, we write this is the logit of

the probability that that dependent variable Y is equal to one,

and in the multilevel specification we again have this combination of fixed effects,

these are the fixed unknown constant parameters that

we want to estimate to describe the relationships,

of the predictors with the log adds,

of the dependent variable being equal to one.

Then we also have these random effects,

that are capturing these dependencies within the same higher level cluster,

in this case denoted by j.

So, this is an example of a random coefficient model where we have the random effect,

use your j, which allows each cluster to have a unique intercept in the logistic model.

Then we have the random effect u one j,

which allows each cluster j to have a unique relationship of x,

with the log adds,

that the dependent variables equal to one.

We could also rewrite this using the multilevel specification if we so desired,

this is just the single level equation incorporating the random effects.

So, we make the same distributional assumptions that we did in

multilevel linear regression models about these random cluster effects.

We assume that they're normally distributed,

that they have a mean vector of zero,

that means the mean of each random effect is zero,

and the random effects have unique variances, and covariances.

So, same distributional assumptions as we would have

made in the multilevel linear regression model.

Recall, we're fitting a multilevel model,

because we have explicit interest in

estimating the variance of the random cluster effects.

So, part of our research question involves

estimating the amount of between cluster variance,

in the dependent variable of interest in this case in

the log adds that the dependent variable is equal to one.

Remember that has to be an explicit part of our research question.

When we fit these kinds of

generalized linear regression models to

non-normal outcomes and we include random effects,

estimation becomes a lot more difficult mathematically.

We're not going to dive into the math in this particular lecture,

but it's a much more difficult computational problem,

to estimate these models from non-normal dependent variables.

So again, the clear motivation for fitting

these multilevel models that explicit interests,

in estimating the variance of random cluster effects becomes very important,

because it does take longer to fit these models computationally.

So, in estimating the model parameters,

we fit multilevel models to non-normal outcomes,

and in these cases it's difficult to write down the likelihood function.

We talked a little bit about the likelihood function

previously when introducing these kinds of models.

In the case of non-normal outcomes becomes much more difficult to

write down that likelihood and in some cases we may not even be able to write it down.

There may not be what's called a closed form solution for that likelihood function.

So, what we have to do in practice in

many cases is first of all approximate that likelihood function.

So, we use mathematical methods to come up with an approximation,

of what the likelihood of the observed data would be under the given model specification.

Then, once we approximate that likelihood function,

we find the estimates of the parameters,

the fixed effect parameters, the variances,

the covariances that maximize that approximate likelihood.

So that's a key aspect of fitting multilevel models

to non-normal dependent variables like binary variables.

Oftentimes we can't write down the likelihood explicitly,

and part of the computational process involves first approximating the likelihood,

and then maximizing it,

finding the perimeter estimates that maximize it.

So long story short,

it just takes longer to fit these models from a computational perspective.

One possible approach to this process is called Adaptive Gaussian Quadrature.

This is just an estimation method that involves approximating the likelihood,

and then maximize net likelihood.

We've included a deep dive reading for this week by Kim and colleagues,

where they perform some simulation studies,

to evaluate alternative estimation approaches for these types of multilevel models,

and they found that Adaptive Gaussian Quadrature generally works well,

in a variety of scenarios especially for smaller samples.

What about testing the model parameters?

So, using the methods that we've talked about in a previous course, in previous weeks,

we can again compute confidence intervals,

or test hypotheses for the parameters that we're interested in estimating.

We would test null hypotheses about the parameters of interest,

that is a fixed effect is zero,

or variance component is zero meaning that the random effects don't vary,

we can test these null hypotheses again using Likelihood Ratio testing.

So, we would use the same likelihood ratio testing approach

for multilevel logistic models,

that we discussed for multilevel linear models.

Assuming that we have large enough samples of clusters,

and observations per cluster.

So again, there's a reading this week that provide specific details,

and how to perform these types of likelihood

ratio tests for the parameters in multilevel models,

and that'll be part of our materials for this week.

So, let's revisit that NHANES example,

where we introduced logistic regression,

and recall that we fitted a logistic regression model to model

the probability of ever smoking 100 cigarettes in your lifetime,

as a function of selected predictor variables.

So if you think back to week two,

we talked about fitting this kind of logistic model to the smoking data within NHANES.

In that analysis, in week two we assumed that

all NHANES observations were independent of each other.

In reality this is not true,

because of the study design that was used for the NHANES.

In the NHANES multistage probability sampling was used,

where there were several stages of random selection of sampling clusters,

or geographic areas more generally.

So the observations, on this indicator of ever smoking 100 cigarettes in your lifetime,

they come from these randomly sampled clusters as a part of the NHANES sample design.

Because we have many people nested within the same cluster in that sample design,

their observations on this indicator may in

fact be correlated with each other for one reason or another.

So, we can't make the assumption that

all NHANES observations are truly independent of each other,

when fitting models to the NHANES data.

If in fact, the smoking observations are correlated within areas,

the standard errors in the kind of

naive logistic regression analysis that we performed in week two are likely understated.

So, what does that mean? That means our estimates,

the regression parameters describing the relationships of these predictors,

with the probability of ever smoking 100 cigarettes,

or the mean of that binary dependent variable,

these estimated coefficients will have standard errors that are too small.

So, we're basically saying that

the sampling variability is smaller for those estimates than it really should be.

Why should it be larger?

Because those observations on the dependent variable,

are correlated within areas,

and that increases the sampling variance of our estimates.

So, we need to make sure that our model accounts for that aspect of the study design,

and including random cluster effects is one possible way to account for that fact,

that observations are correlated within the areas.

That's generally going to increase the standard errors of our estimates,

accurately reflecting the study design.

In addition to the modeling aspect of this, we again,

we want to make sure that we're accounting for that between

cluster variability or in other words that within cluster correlation,

in the values of the binary dependent variable,

we may also have explicit interest in estimating the variance

between the NHANES sampling clusters in terms of the probability of smoking.

So, it's a combination of making sure that we get the model right

and that our standard errors accurately reflect the sample design,

but we also have this ability to make inference about the variance

between sampling clusters in terms of the probability of smoking,

as a part of the multilevel modeling approach.

So, here's a graph where we can visualize the amount of variability

between the NHANES sampling clusters in

terms of this probability of ever smoking a 100 cigarettes.

So, each of the bars in this graph corresponds to one of

the unique NHANES sampling clusters reflecting the complex sample design that was

used and the size of each bar represents

the proportion of individuals in that cluster who have ever smoked a 100 cigarettes.

So you can see that these bars bounce

around a whole lot across the different sampling clusters.

That's a visualization of that between cluster variance in the mean of

the dependent variable of interest that we're worried

about estimating when we decide to fit multilevel models.

So again, we may have explicit interest in estimating the amount of this between

cluster variance and we use random cluster effects to capture that variance.

So, let's think about fitting our multilevel logistic model.

In the model that we're going to consider,

we include random effects of those randomly sampled NHANES clusters.

What this means is that the intercepts in the model that we're

fitting are allowed to randomly vary across those sampling clusters.

For this example, we're not considering the case of random slope.

So, the coefficients for all of our predictor variables,

we're going to assume that those are constant across the sampling clusters.

We're only interested in estimating variability in the intercepts,

allowing each cluster to have a different proportion in expectation.

So, when fitting this model to the NHANES data,

what we end up with is very similar inferences regarding which predictors are

significant compared to week two when we weren't accounting for the random effects.

We see slight changes in the estimated fixed effects,

but for the most part we see the same coefficients in terms of

the relationships of these predictors with

the probability of ever smoking a 100 cigarettes.

But one key difference is that

the standard errors of the estimated fixed effects are now larger because

again the sampling variance is reflecting that between

cluster variability that's being captured by the random effects.

In addition, the estimated variance of the random cluster intercepts was 0.046.

Now, that doesn't seem very large,

but to truly evaluate that variance component,

we have to perform a likelihood ratio test in order to make inference.

So when performing the likelihood ratio tests,

we find that we would reject

the null hypothesis that the variance of those random cluster intercepts is zero.

We definitely have strong evidence that there is between cluster variability in

those intercepts after including all of

the various covariates or predictor variables in this particular model.

So, it seems like including

those random cluster effects is an important contribution in terms of our model fit.

So, even after adjusting for all those other predictors of smoking,

the randomly sampled clusters still vary in terms of their smoking prevalence,

and again, we're capturing this via those random effects.

So, let's think about model diagnostics.

We see that including random cluster effects in the logistic regression model

improve the fit of the model based on the likelihood ratio test, that's a good thing.

But let's look at the distribution of

the predicted values of these random effects or the EBLUPs.

Are there potential outliers?

Remember, there are no residuals to worry about in the simple logistic regression model,

the variance of the dependent variable is

defined by the mean like we discussed in week two.

Another key consideration is that we might center

continuous predictor variables so that the intercept is interpretable.

So, depending on whether our variables are continuous,

we might center those variables at the mean so that we can interpret

the intercept as representing an expectation of the log odds,

of the probability of smoking when the predictor variables are set to their means.

So, here's a graphic of the predicted random effects

for the EBLUPs for the random intercepts in this particular logistic regression model.

This normal QQ plot suggests that the random effects on

the intercept due to the NHANES sampling clusters are normally distributed.

You see that all the points lie on that 45 degree line.

In addition, we don't see any evidence of

outliers like we saw when analyzing the European social survey data.

We saw that some interviewers were outliers.

In this particular graphic,

we see that everybody tends to follow the same distribution and there are

no very unusual outliers in terms of the randomly sampled NHANES clusters.

So what conclusions can we draw from this example?

Compared to the model where we did not account for the random cluster effects,

we find the same predictors of smoking.

So the same predictors of smoking that we found in week two are still important.

But given the significant unexplained variance in

these random cluster effects that we included in the multilevel model,

we might take a next step of trying to explain variance

by including fixed effects of cluster level predictors.

So for example, we might include an indicator of socioeconomic status of

a given NHANES sampling cluster in an effort to

try and explain that variability in the random intercepts.

However, when we compare variance components between

multilevel models that have different cluster level fixed effects,

so, we're trying to explain that variance,

both of the models fitted must include the same respondent-level fixed effects.

So we can't change the predictors that are being measured at

the respondent level when trying to evaluate these changes in variance components.

We have to make sure that both models are fitted using the same cases,

the exact same number of observations,

and they include the same respondent-level fixed effects.

For a deeper dive reading on this issue,

in terms of the magnitude of the variance component,

and why we need to keep this set of predictors at level one fixed,

we would refer you to the textbook

multilevel analysis: techniques and applications by Joop Hox and colleagues.

This is now in its 3rd edition,

and section 6.5 in that textbook explicitly talks about

this issue of comparing variants components between different multilevel models.

So, for all practical purposes,

we just want to make sure that the level one predictor stay the same when

comparing these variance components. So, what's next?

Now that we've talked about multilevel modeling and different approaches to using

multilevel models to account for this kind of dependency based on the study design,

we're going to look at a full example of fitting multilevel models to

longitudinal data with Python and making inference based on those fitted models.

We're also going to do an exploration of a web application

that allows us to visualize the fits of these kinds of models.

Then we're going to turn our focus to marginal models

for dependent data and alternatives for modeling

clustered and longitudinal datasets that don't

rely on these random effects of higher level clusters.