What about after we've observed data? What's our posterior predictive distribution? So in this case, we can think about flipping a coin. We don't know what the probability comes up heads is. But suppose we observe, after one flip, we got a head the first time. We want to ask, what's our predicted distribution for the second flip, given that we saw a head on the first flip? So in this case, we can think about a posterior predictive distribution, f of y two, given y one, and so we can write that as the integral of f of y two, given theta and y one, times f of theta, given y1, d theta. Again, we're integrating out theta, but this time, we're going to use the posterior distribution for theta, instead of the prior distribution for theta. In most cases, y2, retreating as independent from y1, so that will imply that we can simplify this expression, and just say its y2 given theta, times f of theta, given y1, d theta. So again, you can see this looks very much like the prior predictive, except we're using the posterior distribution for theta, instead of the prior distribution. As a particular example, again, suppose we're thinking of a uniform distribution for theta, and we observe the first flip, y1 = 1, we get a heads. What do we predict for the second flip? This is no longer going to be a uniform distribution like it was in the first case, because we have some data. We have one head coming up on the first flip. So, this gives us some information about the coin. We're now going to think it's more likely we're going to get a second head, because it's more likely that theta is at least one half, and possibly larger than a half. So let's compute this, f of y2, given y1 equals 1, is equal to the integral, this case, the integral goes from 0 to 1, of theta to the y2, 1 minus theta, to the 1 minus y2, times the posterior, which is 2 theta d theta. We can simplify this a bit, to be, 2 times theta to the y2 plus 1, times 1 minus theta, to the 1 minus y2, d theta. We could work this out in a more general form, but in this case, y2 has to take the value 0 or 1. The next flip is either going to be heads or tails, so it's easier to just plop in a particular example. We can compute the probability, y2 equals 1, given y1 equals 1, and so here we can just plug in a 1 for y2, and we get the interval from 0 to 1, of 2 theta squared, d theta, and so, doing that integration, we get a theta cubed, and this just works out to be two-thirds. Thus, the compliment, probability of getting a head, given we saw a tail the first time, is going to be one-third. So we can see here, that the posterior is a combination of the information in the prior and the information in the data. In this case, our prior is like having two data points, one head and one tail. Saying we have a uniform prior for theta, is actually equivalent in an information sense, to saying we have observed one head and one tail. And so then, when we do go ahead and observe one head, it's like we now have seen two heads and one tail, and so our predictive distribution, our posterior predictive distribution for the second flip, says, if we have two heads and one tail, then we have a probability of two-thirds of getting another head, and a probability of one-third of getting a tail.