Before we check the inferences from our model, we should perform conversions diagnostics for the Markov chains. Let's start another section here and call it Convergence. Now, we'll start with trace plots. So we'll plot Mod1_sim. This produces the different trace plots, where the different colors are the three different chains that we ran. We see no long term trends in these trace plots, so they look pretty good. Next, let's take a look at the Gelman and Rubin diagnostic, gelman.diag(mod1_sin) And we see that the potential scale reduction factors for the three different parameters is very close to 1, indicating that we probably have converged. Let's also look at the auto correlation We see very high auto correlation with the initial lags in the intercept term, as well as our second data term for the slope. This prompts us especially to check our effective sample size. Recall, we ran the chain for 5,000 iterations for 3 different chains, so we should have about 15,000 samples. Out of those 15,000 samples, our effective sample size for beta 1, or the intercept, is only 350. This one is only 360, and this one is about 14,000. So the sigma parameter mixed very well, our coefficients not so much. These effective sample sizes are still okay, if all we're interested in is the posterior mean for these two parameters. But if we want to create something like a 95% posterior probability interval, we want to have more effective samples. So we would want to run the chains for longer. Next, let's look at a posterior summary from this model. We can see that we use the first 1,000 samples as a burn in, and then we saved the next 5,000 samples. We ran three chains and here are the resulting estimates of the parameters. They're very similar to what we got from the reference analysis, where we used a flat prior, because the priors for these coefficients in the JAGS model had a very large variance. So the reference analysis was called lmod. Let's compare. The estimated mean from this model was about 7.1 or 7.2, and here the intercept term was estimated at about 7.1 in the reference analysis. The slope between the log income and the log infant mortality rate was estimated to be about -0.52 in this model and about -0.51 in the reference analysis. The estimates are very close. The standard error estimate for the residuals, which is the estimate for sigma in the reference analysis was about 0.69. In our model however, in JAGS we estimated IT to be about 0.97. This is because we used a prior guess that the variance was 10, in other words that the standard deviation was a little higher than 3, and we used an equivalent sample size of 5 for that prior. If that is our true belief about the variance ahead of time, then this would be our answer for our posterior. But if we're less confident in our prior, it might make sense to lower the prior effective sample size in this prior. Remember that if we interpret the results from these models, we have to keep in mind that we modeled the log of infant mortality versus the log of income. So this negative correlation here is on the log scale. The last thing we'll do in this segment is check the residuals. Residuals are defined as the difference between the response, the actual observation, and the model's prediction for each value. Residuals are extremely important in linear modeling, since residuals can reveal violations in the assumptions we made to specify the model. In particular, we are looking for any sign that the model is not linear, that it is not normally distributed, or that the observations are not independent from each other, at least conditionally on the explanatory variables. First, let's look at what would have happened if we fit the reference linear model to the un-transformed variables. That was, if we fit the linear model to infant mortality rate versus income not on the log scale. Remember when we plotted these relationships, It didn't look like a linear model was a good idea here. Let's plot the residuals for lmod0. Which will give us residuals on the y-axis and index on the x-axis. The index refers to which row of the dataset it came from. If the data points were not independent from each other, we might see a pattern in this plot. There doesn't appear to be much of a pattern here. So it's at least somewhat reasonable to assume that these data are independent. Another very important look at the residuals is if we plot the prediction of the model, which we can create with the predict function, lmod0, on the x-axis and the residuals on the y-axis. Again, in this plot, we don't want to see any patterns. We want to see essentially randomness. Which is not the case in this plot. First of all, we can see a downward trend, as the values of the prediction from the model get higher, the residuals get smaller. And our assumption that the residual variance is the same for all observations appears to be wildly incorrect in this case. For all of these predicted values down here the variance in the residuals is quite small, but for large predicted values, we have large variance. Finally, we can check the assumption off normality using the qqnorm plot on the residuals. This plot shows the theoretical quantiles or percentiles of an actual normal distribution on the x-axis with the sample quantiles of the residuals on the y-axis. If the residuals actually came from a normal distribution, the points on this plot would essentially follow a straight line. In this case, we have a curvature going up that increases and gets more extreme at the high values. This indicates that the residuals have a distribution that is right skewed and not normal. These plots that we've looked out so far are a bad example. They show us what that residual plots should not look like. Now, let's return to our model fit to the log transformed variables. This time, we'll use our residuals from our Bayesian model that we fit in JAGS. Predictions from Bayesian models come as posterior predictive distributions. So in reality, with the Bayesian model all residuals, or each residual, would have its own distribution. We're going to simplify things a little bit and look at the residuals that come only from predictions based on the posterior means. First, we need to create the design matrix, the matrix that contains the explanatory variables. We'll do that by cbinding, that is combining columns, first, of the term that goes with the intercept. We will repeat the number 1 n times, where n comes from data1_jags. We saved this variable earlier. There are 101 observations. So this wrap will create a vector of 101 1s. The second column of this x-matrix is data1_jags$log_income. We can look at the head(X), And this looks correct. And we're going to base our predictions on the posterior means of the parameters. Let's save that in a variable called pm, for posterior means, _params1, and it'll be the column means Of our combined simulation, that is the simulation of all three chance combined into one matrix. We named that earlier csim. We'll run that and take a look at the values, pm_params1. So these are the posterior mean estimates for the parameters. Next we need to get a predicted value for each of the observations from the model. We'll call that yhat1. This would be the intercept times a 1 plus the slope times the value of log income. In matrix notation we can do that using our x matrix doing matrix multiplication and then supplying our posterior means for the two coefficients, pm_params1. And we're giving it just the first two values for those coefficients, they'll look like this. If we run that, we get the predicted values in the form of a matrix. Let's turn it into a vector by rerunning this and putting it inside the drop function. The drop function gets rid of the matrix structure and turns this into just a vector. Now we have a vector of predicted values from the model. The residual is defined as The actual data, data1_jags, where we take out y. Those are the values. And we're going to subtract our predicted values based on the posterior means. We'll run that. And let's plot these again against the index. This already looks much better than before. Again, we don't see any patterns or trends here, so this plot looks good. Let's also plot The predicted values on the x-axis against the residuals. Again this one looks pretty good. We don't see any trends or patterns in the mean of these residuals. They approximately have a mean of 0. There are two slight concerns here. One is that the variance appears to increase as you go from small predicted values to large predicted values, and we see two very strong outliers. We'll address these in a moment. First, let's look at the qqnorm plot for the residuals. And remember, if the residuals are approximately coming from a normal distribution, then this plot would have a linear form between the points. That seems reasonable in this case, except for our two outliers that we identified earlier. Let's find out which countries those outliers are associated with. We'll get the row names from our dataset, just dat. First of all, let's take a look at that. If we do the row names from dat, we get all of the names of the countries. We want to order these names of countries by the size of their residual. So we're going to index it with the order function. We're going to order this by resid1, the value in resid1, and we want it to be a decreasing Ordering. We don't want to see all of the countries. So let's do the head function on this. We'll run this, and we'll see these are the first five countries that have the largest residuals. The two biggest residuals appear to be Saudi Arabia and Libya. When outliers appear in your analysis, it is a good idea to double check that they are not just data errors in your data entry. If the values are correct, you may reconsider whether these data points really are representative of the data you were trying to model. If you conclude that they're not, for example, they were reported in different years, or they just don't belong, you may be able to justify dropping these data points from the dataset. If you conclude that the outliers are part of the data and should not be removed, we have several modeling options to accommodate them. And that's what we'll address in the next segment.