So before we get into the details of probabilistic graphical models, we need

to talk a little bit about what a probability distribution is, just so we

have a shared vocabulary. So, let's start with a very simple

example of a joint distribution. One that is going to be extended in

examples later on in the, in other parts of the course.

and let's start with an example that involves just three random variables.

this is what I call the student example and you have a student who has, who can

be described, in this case, by a variable representing his intelligence.

And that could be high or low. The student is taking a class.

The class might be difficult or not so this random variable, B.

So, the random variable I has two values. Difficulty variable also has two values

and then, there is the grade that the student gets in the course, and that has

three values. In this case, we're going to assume A, B,

and C. Now here's an example, joint distribution

over this over this set of three random variables.

So this is an example of P of I, D, G. It's a joint distribution.

And let's think about how many entries are in such a joint distribution.

Well since we have three variables and we want to, we need to represent the

probability of every combination of values for each of these three variables,

and so we have 2 * 2 * 3 possible combinations.

For a total of twelve possible values that we need to assign a probability to.

So there's twelve total parameters in this probability and I'm going to

introduce a notion of independence parameters which we're going to talk

about later, as well. Independent parameters are parameters

whose value is not completely determined by the value of other parameters.

So in this case, because this thing is a probability distribution, we know that

all of the numbers here on the right have to sum to one.

And therefore if you tell me eleven out of the twelve, I know what the twelfth

is, and so the number of independent parameters is eleven.

And we'll see that, that is a useful notion later on when we start evaluating

the relative expressive power of different probability distributions.

What are things that we can do with probability distributions?

Well, one important thing that we can do is condition the probability distribution

on a particular observation. So, for example assume that we observe

that the student got an A. And so we have now an assignment to the

variable G which is G1. And that immediately eliminates all

possible assignments, but they're not consistent, with my observations.

So everything but the G1 observations, okay?

And so that gives me a reduced probability distribution, and so this is

an operation that's called reduction. I've taken the probability distribution,

I've reduced away stuff that is not consistent with what I've observed.

Now, that by itself doesn't give me a probability distribution, because notice

that these numbers no longer sum to one, because they summed to one before I threw

out a bunch of stuff. Umm, and so what I do in order to get a

probability distribution, what I do is I take this.

Normalized measure. .

An indication the word measure indicates that it's a form of distribution but the

fact that it's un-normalized means that it doesn't sum to one, it doesn't

normalize to one. So this un-normalized measure if we want

to turn it into a probability distribution, the obvious thing to do is

to normalize it. And so what we're going to do is take all

of these entries and we're going to sum them up.

And that's going to give us a number, which in this case is 0.447.

And we can now divide each of these by 0.447.

And that's going to give us a normalized distribution.

Which in this case corresponds to the probability of I, D given G1.

So that's a way of taking an un-normalized measure, and turning it

into a normaliting a normalized probability distribution.

We'll see that this operation is one of the more important ones that we were

using, throughout the course. Okay, the final operation I'm going to

talk about regarding probability distribution is the operation of

marginalization, and that is an operation that takes a probability distribution

over a larger subset of variables and produces a probability distribution over

a subset of those. So in this case we have a probability

distribution over IND. And say that we want to marginalize I

which means we're going to basically sum up we're going to throw away, I'm going

to restrict the tension to D. And so what that does.

Is, for example. If I want to compute the probability of

d0. I'm going to add up both of the entries

that have the d0, associated with them. And that's, the one corresponding to I0,

and the one corresponding to I1. And that's the marginalization of this

probability distribution.