As we said, there's many different kinds of queries that one can use a graphical model to answer. Conditional probability queries are very commonly used, but another commonly used type of inference is called MAP Inference. So, what is MAP inference? MAP stands as a shorthand for what's, for Maximum a Posteriori, and it's defined as follows. Here, we have a set of evidence of observations, e over some variables to E, and we have a query Y. And it turns out that for computational reasons, that we're not going to discuss for the moment it's important that the set of Y's is everything, all of the variables other than E. [SOUND] So now, our task is to compute what's called the MAP assignment, which is the Y, the silent little y, to the variables y that maximizes the conditional probability of the variables y given the evidence. Now so for those of you who haven't seen the notation Argo Max. Argo max is the y that provides the maximal value to this expression, over here. Now, note that in some cases, this maximizing value might not be unique. That is there might be several different assignments say, A1, Y1, and Y2 that give the exact same probability. And so, the MAP is not necessarily a unique assignment. This has many applications, some of which we've discussed before. So, in the context of message decoding, for example, where we have a set of noisy bits that are passed over the channel of a noisy communication channel, what we often want to get is the most likely message that was transmitted. That is, an assignment to the transmitted bits that is the most likely given or evidence. In the context of image segmentation, we would like to take the pixels and figure out the most likely assignment of pixels to se, to semantic categories. So both of these can be viewed as MAP assignment problems. And an important thing to understand about MAP is that it really is a different problem than conditional probability queries. So to understand that, let's look at the following very simple example over just two random variables. So here, we have a Bayesian network over the variables A and B. And if you multiply the to CPDs, it's you get the joint distribution shown over here. And it's not and it's fairly immediate to see that the MAP assignment is this one, because it has the highest probability. Can we get at the map assignment by looking separately at the variable A and at the variable B? So if we look at that, we see that if we look separately at the variable A, we can, the most likely assignment to the variable A is A1. As opposed to in the MAP assignment, the variable A took the value, A0. And so, one can't look separately at the marginal over A and over B, and use that to infer the MAP assignment. And the reason is that we're looking for a single assignment over all of the variables that together has has the highest probability. Unfortunately, just like the conditional probability inference however, this problem too is empty heart. So, again let's formalize what exact problem is empty heart in this context. And it and so here is, again an, one example of an empty heart problem in the context of MAP which is just define the joint assignment with the highest probability. It's not the only NP-hard problem. Here is another one. Figuring out what, for a given probabilistic graphical model and some threshold little p, whether there exists an assignment, little x, whose probability is greater than p. That problem too NP-hard. So should we give up. Well, just like in the context of conditional probability queries, the answer is no. And there is algorithms that can solve this problem very efficiently. In fact, in a even broader set of problems then, for conditional probability queries. So, let's again look deeper into this problem and understand the foundations of what might make it tractable. So, going back to our example of a Bayesian network, once again we're going to view CPDs as factors. So here a P of C again translates into a factor over C just like before. And whereas, in the case of conditional probability queries, we want it to sum out some of the variables marginalize them. Now, we're going to what to find the arg max which is the assignment of these variables which maximizes the product. And so, here we have a max of a product, which is why this is often called a max product problem. Let's break down the max product problem. So, imagine that we, in more general have, in the more general case have probability over Y given E equals little e and re, let's remind ourselves that Y is is the set of all variables other than the ones in E. And so, by the definition of conditional probability, we have this ratio whose where we're trying to find the maximal Y that maximizes this ratio. And notice that the denominator is constant relative to Y. [SOUND] Sorry yes, with respect to Y. Which means that for the purpose of finding the maximum Y, we don't really care about the denominator only the numerator which is the probability of Y, Ee = e, the unnormalized a normalized measure. The prob, this numerator in the general case is a product of factors normalized by the partition function, where the factors here are the reduced factors, reduced relative to the evidence. Once again, we notice that this is the partition function and its constant relative to y. Which means that we can ignore it. And so, once again, we have that this expression is now proportional to a product of the same reduced factors. And so, what we're trying to do is we're trying to maximize a product of factors in this more general case, as well. Which gives us the following as the optimization problem. There's many algorithms that can solve the MAP problem. The first is analogist two algorithms for some product. these algorithms take the maximization operation and pushes that into the factor product, giving rise to a max product variable of a nation algorithm. We also have message-passing algorithms, which again are a direct analog to algorithms for some product. And they give rise to a class of algorithms called max-product belief propagation. However, because the MAP problem is intrinsically an optimization problem, one can also call in techniques that are specific to optimization. And the class of such methods include methods that are based on integer programming, which is a general class of optimization over discreet spaces. It turns out that this class of methods that build on integer programming techniques, is one of the most popular methods in the last few years and has given rise to, a whole new range of math algorithms that are considerably better than many of the previous algorithms developed after that point, especially for the approximate case. It also turns out that for certain types of networks that some of which we'll discuss, there are specific algorithms that are very efficient for that particular class of of graphical model. And one of those, perhaps the most commonly used if not the only one, is methods based on class of algorithms called graph cuts and. And, and finally, because it's an optimization problem, one can also use standard search techniques over communatorial search spaces, and they're problems for which this is also a very useful and successful solution. So, to summarize the MAP problem, aims to find a single coherent assignment of highest probability. And, that means that it is not the same as maximizing individual marginal probabilities as we saw in the example. one can reformulate this problem as one of finding the max over a factor product. And, this is a communitorial optimization problem which admits a whole range of different solutions on which some are exact and others approximate.