But we know that this equation does not estimate the observed response variable

for that observation perfectly.

In fact, urban rate and Internet use rate together explain only about 18% of

the variability in female employment rate.

So there's clearly some error in estimating the response value with

this model.

The residual is the difference between the expected or predicted female

employment rate, and the actual observed female employment rate for each country.

We can take a look at this residual variability,

which not only helps us to see how large the residuals are.

But also allows us to see whether our regression assumptions are met and

whether there are any outlying observations that might be unduly

influencing the estimation of the regression coefficients.

First, we can use a Q-Q Plot to evaluate the assumption that the residuals from our

regression model are normally distributed.

A Q-Q Plot plots the quantiles of the residuals that we would theoretically see

if the residuals followed a normal distribution against

the quantiles of the residuals estimated from our regression module.

What we are looking for is to see if the points follow a straight line,

meaning that the model estimated residuals aren't what we would expect if

the residuals were normally distributed.

If we scroll down to the Q-Q Plot, we can see that the residuals generally follow

a straight line, but deviate somewhat at the lower and higher quantiles.

This indicates that our residuals do not follow perfect normal distribution.

This could mean that the curvilinear association that we observed in our

scatter plot may not be fully estimated by the quadratic urban rate term.

There may be other explanatory variables that we might consider including in our

model that could improve estimation of the observed curvilinearity.