In the last video, we explored Bayesian model selection using the Bayes information criteria or BIC with the kids cognitive score example, were some of the models had similar values of BIC. If we proceed by reporting just the best model then we're ignoring the presence of other models that maybe equally good. A credible intervals or confidence intervals maybe an error as the uncertainty is being ignored and not propagated through to reporting result. Narrow intervals are not always better if they miss the truth. We will talk about to convert BIC to a Bayes factor and find posterior probabilities of all post full models using R. To help think about model uncertainty, let's consider the forecaster's job of predicting where a hurricane may make landfall. The figure shows the predicted path of hurricane Joaquin based on several different computer models. Having an accurate prediction and measures of uncertain is important for early warning. In this case there substantial and certainty about where the hurricane will make land fall, as you can see from the many color paths depicted in the figure. To represent model uncertainty we will construct a probability distribution over all possible models where the probabilities provided measure of how likely the different models are. We will start by assigning a prior probability to each model. Bayes theorem provides the solution for how to update our prior probabilities by multiplying the prior probability of each model by the likelihood of the model. This serves to reweight our prior probabilities, so that models with high likelihood have higher weights while models with low likelihood receive lower weight. We renormalize, so that the reweighted prior probabilities add up to 1 by dividing by the term of the denominator that sums over all possible models. If we have p predictors, there are 2 to the p possible models that are in the sum. We can also calculate the posterior probabilities using the Bayes factors in prior odds. Here the Bayes factor is the ratio of the marginal likelihood of model M to a base model B. Any model can be used as the base model and it could be the best model or maybe the model with just the intercept. The other ingredient is the prior odds of model M to the base model. We can see that the evidence from the data in the Bayse factor serves to up or down weight at the prior odds. Returning to the kid's cognitive score data from before, I'm going to walk through how to fit all possible Bayesian regressions in R. First, to represent modelling certainty we will construct the probability distribution overall possible models where the probabilities provide a measure of how likely the different models are. We will start by assigning a prior probability to each model. You need to specify a model formula to describe the full model with all predictors, this is the first argument. Here we have the kid's cognitive scores to response in four explanatory variables. Mom's high school status, mom's IQ score whether not the mom work during the first three years of the kid's life in mom's age. By default, an intercept is always included. You need to specify which prior distribution you're going to use for the regression coefficient in each model to compute the marginal likelihood of each model. Here, I used prior equals BIC corresponding to the Bayes information criterion that we define in the last PDF. This is use to approximate the likelihood of each model. Taking the base model to be the model with only the intercept, the Bayes factor for comparing any model to the model with just the intercept or model 1, is proportional to exponentiating negative BIC over 2. This is a function of the R square for the model M, and the number parameters in model M denoted as p sub m. The next line in the code model prior equals uniform corresponds to assigning equal prior probabilities to each of the 16 models. Where each model will have probability 1 over 16 or 0.0625. Finally, we need to specify which data frame we are using with the data argument. This is exactly the same as the LM function. We'll assign the results to the variable cog_bas, but you may choose any name for the output object. The summary function in R can provide a summary of the bas LM object that we've created. It shows information about the top five models out of the 16 possible models. Where each model is represented as a variable inclusion indicator in that row. The first row is the best model, the ones mean that the model includes an intercept the indicator for moms highschool, status and for mom's IQ while the zero indicates that moms age and moms work were excluded. All models always include the intercept. The models in the table are ordered from best to worst using their posterior probabilities. And under our uniform prior distribution we trade each model as being equally likely with the prior probability of 0.0625. The prior odds of any model compared at the model with no variables as one, we've updated our prior adds by reweighting them, by base factors to compute the posterior probabilities. The logmarg column is -1/2 times BIC and the column for the Bayes factor uses the best BIC model as the base model. So the top model has a base factor of 1, while the likelihood of the second best model is just over half of the top model. As the posterior probabilities are functions of the ordinary R squared, but penalize model complexity through the dimension of the model. We can see that the full model with an intercept and forward predictors has the highest R squared but is not the highest probability model. Under this prior and possible models, we believe that there is a 53% chance that the model with the mom's high school and the mom's IQ is the true model. While there is a non negligible probability of approximately 0.3 on the model that includes mom's IQ only. This adds up to 83% of the probability, with the remaining 17% distributed across the remaining 16 models. I summary, we've shown how to calculate posterior probabilities for models using expressions of the marginal likelihood or Bayes factors. We've used BIC as a way to approximate the log of the marginal likelihood. You should be able to R to find posterior probabilities of all possible models using BIC for data sets with 20 or fewer predictors. And examine uncertainty through the posterior probabilities on the top five models and remaining models. Next, we will look at a way to visualize model uncertainty beyond the top five models. And then introduce Bayesian model averaging or BMA as an approach to use the ensemble of models for posterior imprints.