Hello, everyone? Today, we are going to be talking about Bayesian approaches to statistics and modeling and we're going to be talking about how they vary from some of the stuff that we've been doing from earlier in the class. I want to illustrate this by means of a case example. All right. Overall, we've been trying to use maximum likelihood. This method that fits models to the data and it picks the best values. It picks a single value. For fitting means we're trying to say our best estimate is that it's a five, but we don't say much about the uncertainty until we talk about standard errors. Probability in this frequency setting is a long-run frequency. It's something about the long-run expectation if I do this many times. Bayesian methods on the other hand, think about these problems in a fundamentally different way. So, let's illustrate this by an example. So, imagine that someone is coming up to me and they say, ''Hey Mark, I want you to determine the average IQ of students at the University of Michigan.'' They say, "Hey, what do you think the average IQ of University of Michigan students might be?" So, the first thing that I do, I have some maybe views on the topic or because I don't know much about IQ's, I'm just going to go online, I end up saying, ''Hey'' looking up in the overall distribution for the US population. It's mean 100, standard deviation 10. I'm just going to assume that the total population that's going to be my first belief for what's actually going on. I'm going to say, '' Yep, my mean is at 100, my standard deviation is 10.'' This may not be the best belief, but it's the one that I'm going to start with. It's what I hold in my head. But if someone else can have a different belief, at the end of the day, it's totally okay. So, what does this starting belief look like? So, I'm going to plot it out. This is going to be my starting belief. So, all I did was, I end up going online, I found out that the overall population has a mean of 100, a standard deviation of 10. So, this is going to be what I'm going to start out with. Similarly, anybody else could have done this and they can have any other belief. Maybe they believe that the distribution of IQ scores for U of M students, let's say that they say that they are a little bit higher. This belief is totally okay too. Someone else comes to the analysis they end up saying, "No, I actually think that it's skewed." So, they can end up saying something like this. Let's say that I believe that it has actually skewed right. I believe that I actually could find some really intelligent people on this tail. Someone else coming up and they say, ''No, it's multimodal. I think that we're going to have a bunch of normal people and we're going to have a bunch of really really higher IQ people.'' Each of these are just beliefs, but the one that I'm going to hold in my head is going to be this normal 100,10 belief. These beliefs also can have smaller variance. This is something that I want to point out. I'm pretty uncertain with my belief of where the mean IQ score of U of M students is going to be. So, I say, yet centered at 100. It's centered at 100, but I have quite a large variance. Whenever I decrease the variance of my estimate, what I'm implicitly saying is I have a more certain belief over where that IQ score might be. But at the end of the day, these are just beliefs that I hold before I observe any data. Let's go observe some data and let's see what ends up changing about my belief. So, I have my belief, I'm going to go out and I'm going to test it. So, I go out and I find someone, they have an IQ of 125. So, I have my belief, I see someone who has an IQ of 125, how should my belief change? How should my belief change given that I observe this data point? Should it shift left, right, or stay the same? The answer is, I observed someone that is greatly larger than what I originally believed the mean to be. So, I should expect my mean overall to be pull in this direction. Maybe my mean might look like this. This is how my belief should update. So, I originally had this belief, this was my original belief. This is going to be my updated belief now that I've observed someone with an IQ of 125. So, this is my original belief, I observed someone with an IQ of 125, here is my original. We can see that it shifted right. I originally thought that the mean was at 100, I now think that it's probably around 124. All right. So, this indicates that the mean of my original belief would actually be higher than one I originally thought. I now observe someone who has an IQ of 115. So, I go about, I observe someone with an IQ of 125, that's this distribution. So, this is my 125 distribution. This is my original belief. This is the distribution of having observed 125 and someone with an IQ of 115. The first question is, why did it shift left? I originally thought now that the mean was this after observing 125. 115 is less than that, so I expect the distribution to shift a little bit. But we can also see that now that I've observed two data points. The variance of my belief goes down. I originally had a lot of uncertainty, I now have far less. That's what we hope. Whenever we observe more data, we get to be more certain about our estimates. I'm going to continue to do this. I'm going to keep going around campus, I'm going to observe someone with an IQ of 125, 115, I've observed another 115, a 120. I observe a 125, I observe a 117. Each time I'm taking my original belief, so this is my original belief again, and I'm updating it until eventually I get this belief right here. So, this is my belief after seeing data. Belief about IQ after seeing data. So, this is my belief after seeing a bunch of data. We can see that because all these points are bigger than a 100, I expect the mean to increase a bit. Because all of these points are relatively centered, let say around that 117, 118 area, that's where I expect my mean to be and they're all relatively clustered within that range. So, my belief of the average changes quite a bit. We can see that more data allows us to better focus our belief about the mean IQ of students at U of M. The process that we just went through actually has a name. This idea of starting out with a prior, my original belief, and updating it with data using some function, in a Bayesian way, it's called Bayesian updating. This is how Bayesians think about the world. This is a class of statistics. They think about the world. This new distribution that I get is called a posterior. So, imagine that I was doing this in a frequentist analysis, I would actually get a point estimate. I would say that my point estimate. Point estimate is equal to sum number. Let's just call 118 for argument's sake, and a frequentist analysis, I would get something just like this, just a single point. In a Bayesian scenario, I actually get an entire distribution. So, I get a distribution on the belief, and this distribution is called the posterior. So, the way that I update my belief, I eventually get this thing called a posterior. This posterior allows us to update and answer questions about our quantity of interest. So, I just now plot in my posterior. I originally had a normal 110 distribution. Originally, I had this. I've observed all this data, I now have a posterior distribution that looks like this, okay? We can see that it's centered roughly at about 119, and I can ask a bunch of questions using the other tools of this course about this distribution to try to see what I'm actually working with. What can I actually say about my new belief about the mean? What can I talk about? Right? I can ask a lot of questions. So, when is it coming up to me. My friend comes up to me who asked me this question. He says," What is our expected belief? What do you think your best guess of the IQ scores are?" I can end up saying, "Well, my belief has a mean of 119. 55." Right? So, I can end up saying, "The mean of this distribution is right here, 119.55." What is the most likely value? If you had to put all of your money on one point, I'd probably choose this point right here. This is called the map estimate, is in a aside. But this is the most likely value. As a side note, this is actually normal distribution. So, these two equal each other, okay? What is the median? If I had a different distribution similarly, I can end up calculating the median, and my median for this case, this is also a normal distribution as I just said. So, the median is the same as the mean. Imagine someone started out with a skewed prior, right? Imagine that their posterior looks like this. They would have a different expected belief in me because they had a different prior, and this is totally okay. We can now ask things about ranges of values. What's the 95 percent credible interval? All credible interval is? Is it's analogous to a confidence interval in frequentist statistics? But it talks about probability instead of confidence. So, in a frequentist setting, I would say I'm 95 percent confident, but if I were to repeat the study many times, that the true mean IQ of U of M students falls between 116 and 122. In a Bayesian setting, I can say that there is a 95 percent probability, right? That the belief is between this and this. We can say different things because we think about them in different ways. We can get this all from the posterior distribution. Being able to say things about probability. This is very powerful, but nothing comes for free, and we'll talk about that at the end of this lecture about what are some of the downsides of having this improved interpretation for what I'm talking about, right? So, this distribution on the quantity of interest is called the posterior, right? From this standpoint, we can talk about the Bayesian model thinking. We start out with a belief about the world. So, we start out about a belief about the world. This is our prior. We then collect data and we run it through a model, right? So, this is the step that we'll talk about in the case study. So, this is modelling, and then what we do is we use our model to update our belief in the world, right? This is taking the posterior and using it now is a prior for new analysis. So, let's talk about that a little bit. This cycle is basically analogous to what we do in Bayesian statistics. Take a belief, update it, update our prior, and continue to go around the circle until we converge on what we think. The steps to a Bayesian analysis. So, as I said, we need to establish a belief. This was originally the belief that the IQs are distributed normally with a mean of a 100 and a standard deviation of 10. I'm going to collect a bunch of data. This was one of our- I went up to a bunch of people and I asked them for their IQ scores. I'm going to update my beliefs, I get a posterior. I'm going to repeat now steps two and three. So, I'm going to go to two and three. I'm going to keep doing this, keep collecting more data until one of two things happen. Number one, I get the resolution that I want or number two, I run out of people to ask, and then I can use my posterior to ask a bunch of different questions about my belief, about the average IQ of U of M students. So, Bayesian methods in posteriors, let's take a step back, and let's ask ourselves, why would someone want to do this? The first thing is all questions about our beliefs of our quantity of interest can be found via the posterior. We combine this with the loss function for optimal decision-making, and we can really use this to get a better model of the world, right? This is a very powerful idea. Now, that we have distributions and not just point estimates, we can combine this with ways of optimizing loss to really get at this idea of decision-making under uncertainty, right? However, this interpretation of being able to think about probability, this interpretation of being able to think about these methods in a different way has some downsides. We have to rewrite our entire definition of probability to make this work. In mathematically, the process can be quite difficult. It can sometimes even be intractable, and at the very least , is usually pretty computationally expensive depending on the model. So, we talked about we need to change our view of probability. It requires us to flip probability on its head. Frequentists, what we'd been doing originally, view probability is a long run limiting frequency, right? Parameters are fixed and unknowable. In constants, our parameters are fixed on noble constants, and the data itself is random. There is some mu. I have a bunch of data in a frequentist scenario, and I'm going to try to estimate mu as best as I can, and my best X-Men is called x-bar. In the Bayesian setting, Bayesian view probability is a degree of belief. They treat data as fixed and they posit the perimeters are random variables. So, what does this look like? Imagine this is my data in a Bayesian setting. I don't believe that there's a single mu, I believe that there's an entire distribution of them, and we're going to try to infer what this distribution is using the data. In a frequentist analysis, this is a constant. We don't know, we try to estimate. In a Bayesian analysis, this is a belief that we keep updating with new data, and this is a very different way of thinking about the world. Nothing is free though, these enhanced interpretations are sometimes very difficult mathematically. Unless you work with very specific priors, the math can get very difficult very quickly, involve very high dimensional integrals, involved computational problems. Some of these models are so complex that we have no choice, but to use sampling methods to estimate the results, and as a result, we actually sometimes end up having to dip our feet into computational methods, just to even get the fits from these, right? To illustrate why we would want to use these methods and what this entire idea of flipping probability on its head actually means, we're going to try to illustrate this via case study. The focus for this will be on the application and interpretation and on the modeling, and not necessarily on the theoretical side of Bayesian computation. The entire idea is that we want to try to see what this looks like from a modeling standpoint and maybe some of the advantages of doing a Bayesian model, while also giving notice to the downsides. These things can be quite difficult to fit and they can involve a lot of time actually trying to fit them. Thank you so much for your time.