Hello, everyone. Today, we're going to be talking about the first of

the three sections of the case study on Bayesian approaches to statistics and modeling.

So, we're going to be walking through

a multi-level regression problem using one of these Bayesian frameworks.

In Bayesian frameworks, they're

an incredibly flexible way of doing things such as variable selection,

regularization, kind of a kin to

the Lasso models and the rich models that we've seen earlier.

Fit models where the number of parameter exceeds the number of observations.

We can model dependence,

hierarchical structures, and a lot more,

all within the same model because we have

these ideas of posterior distributions on all of our parameters.

We're going to barely scratch the surface of methods,

but we want to explore the Bayesian workflow for modeling,

and most importantly, we want to get

an intuition as to what's going on with these methods.

The one thing that I do want to say is that if you are familiar with these methods,

I'm going to gloss over some of the math.

I'm going to gloss over some of maybe

even the best practices in order to try to get at the heart of what we're trying to do.

So, please be patient if you would've noticing something like that.

It's something that we definitely emphasized but there

was a trade off because this is a special topics part of the course.

So, before we end of getting started,

if you are interested in a lot of this analysis,

I recommend two books.

The first one is doing Bayesian data analysis.

It's also called the puppies book.

This is a very approachable great introduction to Bayesian statistics,

and it is by far, in my personal favorite on the subject.

Then, the second one is Bayesian data analysis by Andrew Gelman and Hal.

He ends up writing this beautiful book that's typically used at the graduate-level.

He's one of the authors of STAN.

He's overall just a great person to follow in the statistics community,

especially whenever it comes to Bayesian data analysis.

Before we end up proceeding, we need to get back to

this idea of what Bayesian data analysis is.

In order to do this, we need to talk about the three steps.

So, we typically go through whenever we build up one of these models.

The first thing is we need to establish a belief about the world.

This includes a prior and likelihood functions.

You can think of this as setting up

the model and making sure that all of the working parts are in place.

The second thing that we need to do is we need to use data and probability.

We need to update our beliefs.

We need to check that the model agrees with the data.

This is checking fits.

This is making sure that our model is actually

trying to capture as much of reality as we can.

Then, the third thing that we need to do is we need to update

our view of the world based on the results from our model.

Given our data and given our model,

how should we change our beliefs so that we can go about

conducting this process all over again if we explore new data?

So, for this case study,

we could have used a lot of different data.

The one that I end up choosing was based on the National Longitudinal Survey of Youth.

It has 434 observations,

and is trying to look at the kid's IQ score given the IQ score of a mother.

So, imagine your mom in this first example has an IQ score of 121.

Your mom went to high school.

Let's highlight some of these variables here.

So, your mom went to high school.

Your mom has an IQ score of 121.

Your mom is 27.

I'm going to try to see if I can predict

your child's IQ score based on these three variables.

That's going to be at the heart of this analysis.

This dataset is pretty large but it's still small enough that we can start

seeing the benefits of Bayesian data analysis.

For this, we're going to be using a linear model.

For better of worse, I end up choosing a linear regression model to start out with.

We, in frequent statistics, see this all the time.

We express the form of our model.

In this form, we say that a child's IQ is equal to sum intercept

term plus some slope term times the mother's IQ,

plus some slope term times the mom's age.

I'm going to start out with this very basic regression model.

We're not going to include the high-school variable,

and we're going to do this for a few reasons.

The first one is to keep it simple at first,

and we'll build out more advanced model in a second.

The second thing is we want to really see what's going on.

How does this compare to traditional models?

If I just fit this in a frequentest way what advantages

and disadvantages do I have with doing this in a more Bayesian way?

So, up until now,

we haven't done anything more different than what we've done in the past.

But in the Bayesian framework,

we need to specify prior distributions on our beliefs,

and this is something that's new.

So, a key point,

and this is the first key point,

every parameter must begin with a distribution that captures our belief.

These distributions that we place on these parameters are called priors.

Just as we did in the intro case,

imagine that I have the belief in

this model that my intercept, lets go back to this line,

imagine that I believe that my intercept should be centered on zero, and look like this.

I may express this with using a normal prior.