As a motivating example, consider a slightly simplified version of a real clinical trial that was undertaken in Scotland. It concerned RU-486, a morning after pill that was being studied to determine whether it was effective at preventing unwanted pregnancies. This example was studied in a previous lecture. It had 800 women, each of whom had intercourse no more than 72 hours before reporting to a family planning clinic to seek contraception. Half of these women were randomly assigned to the standard contraceptive, a large dose of estrogen and progesterone. And half of the women were assigned RU-486. Among the RU-486 group, there were no pregnancies. Among those receiving the standard therapy, four became pregnant. Statistically, one can model these data as coming from a binomial distribution. Imagine a coin with two sides. One side is labeled standard therapy and the other is labeled RU-486. The coin was tossed four times and each time it landed with the standard therapy side face up. A frequentist would analyze the problem in the following way. He would set up a null hypothesis with the RU-486 side has p greater than or equal to one-half. And the alternative hypothesis is that p is less than one-half. The significance probability is the chance of obtaining no RU-486 outcomes when p equals one-half, which is just one-half raised to the fourth power. And this is 0.0625. Since the significance probability is greater than 0.05, a commonly used threshold, then the frequentist would conclude that there was no reason to reject the null hypothesis. And he would not conclude that RU-486 is superior to standard therapy. Now suppose a Bayesian performed the analysis. She may self-illicit her beliefs about the drug and decide that she has no prior knowledge about the efficacy of RU-486 at all. This would be reasonable if, for example, it were the first clinical trial of the drug. In that case, she would be using the uniform distribution on the interval from 0 to 1, which corresponds to the beta(1,1) density. From conjugacy, we know that since there were four failures for RU-486 and no successes, that her posterior probability of an RU-486 child is beta, with parameters (1+0) and (1+4). This is a beta that has much more area near p equal to 0. The mean of a beta with parameters alpha and beta, is alpha over alpha plus beta. So this Bayesian now believes that the unknown p, the probability of an RU-468 child, is about 1 over 6. The standard deviation of a beta distribution with parameters in alpha and beta also have a dice form. Before she saw the data, the Bayesian's uncertainty expressed by her standard deviation was 0.71. After seeing the data, it was much reduced. Her posterior standard deviation is just 0.13. We promised not to do much calculus, so I hope you will trust me to tell you that this Bayesian now believes that her posterior probability that p is less than one-half is 0.96875. She thought there was a 50-50 chance that RU-486 is better. But now she thinks there's about a 97% chance that RU-486 is better. Suppose a fifth child were born, also to a mother who received standard chip therapy. Now the Bayesian's prior is beta(1, 5) and the additional data point further updates her to a new posterior beta of 1 and 6. As data comes in, the Bayesian's previous posterior becomes her new prior. So learning is self-consistent. This example has taught us several things. First, we saw how to build a statistical model for an applied problem. Second, we could compare the frequentist and Bayesian approaches to inference and see large differences in the conclusions. Third, we saw how the data changed the Bayesian's opinion with a new mean for p and less uncertainty. Finally, we learned that Bayesian's continually update as new data arrive. Yesterday's posterior is today's prior.