Sometimes, we'll use a straightforward linear regression model, but

we certainly are going to be willing to transform if we see curvature.

And sometimes, if our outcome variable is not a continuous variable,

we might consider using a logistic regression.

So we discussed regression models.

We talked about the questions regression models can answer.

In particular, regression modeling is your go to tool for predictive analytics.

But the important thing about a regression model

is because it incorporates uncertainty explicitly in the underlying data,

it's going to provide you with ranges for those forecasts.

Not just best guesses, which we sometimes call point estimates.

We'll give you a range of feasible values.

We've discussed correlation and linear association.

Correlation is a one number summary that is useful for basically saying,

how close are my points to a line?

How well does a line fit the data?

We've discussed the methodology that we used to fit the best line,

that's known as the method of least squares.

Find the line that minimizes the sum of the squares of the vertical distance of

the point to the light.

We've discussed interpreting regression coefficients.

Remember that interpreting your quantitative models is a very,

very important step.

If you can interpret your model,

you have a chance of discussing your model with other people.

And if you can discuss and convince people that your model makes sense,

is useful, then you have a chance of implementation,

which is typically our ultimate goal with the modeling process.

So we talked about interpretation regression coefficients.

We saw prediction from these regression models, and in particular the idea

of using the root mean squared error, the standard deviation of the residuals from

the regression, to create approximate 95% prediction intervals.

Now the sort of interval that I presented to you in these slides relied

upon a normality assumption for the spread of the points around the regression line.

Now that's an example of an assumption and if we're a good modeler,

we're going to check our assumptions.

And the way that I thought about doing that was saving the residuals from

the regression model, and then plotting them.

Plotting a histogram of the residuals and asking myself the question,

does it look like these residuals are approximately normally distributed?

In the example that I showed you, the answer to that question was yes, and so

we had some support for our underlying assumption.

So there's a bigger story here if you're making assumptions,

you need to check them.

We call those model diagnostics.

We also talked about multiple regression,

which is how we start to make more realistic models of the world.

If you don't think a simple regression is doing enough for you, for

example the prediction intervals are too wide for your use, you might look for

some additional predictor variables.

So we talked about putting horsepower into the fuel economy model.

And of course, there are many, many other variables that one could incorporate.

This is why these regression models are always implemented in software.

Nobody does these calculations by hand anymore.

With your spreadsheet program you're going to be able to fit these regression models.