One of the most generally useful class of sampling methods one that's very commonly

used in practice is the class of Markov Chain Monte Carlo methods.

And those are methods that allows us to design an intuitive sampling process that

through a sequence of steps allows us to generate a sample from a desired target

distribution that might be intractable to sample from directly.

So what are these Markov chains and how do you use them.

So, a Markov Chain is a, is a method for

sampling from a distribution p that is intractable to sample from.

And so we see one example of such a distribution p.

if you'd like to sample from a from say Bayesian network.

Where we have some evidence. We don't really know how to sample from

that. If you'd like to sample from a Markov

network, we don't really know how to sample from that either, in general.

And so here we have examples of distributions p.

And we'd like to come up with a way of generating even one sample from that

distribution p. So mark up chain gives us a general

mechanism for doing that. And what the Markov Chain does is it

defines an iterative process by which the first sample that you generate is not

going to be from a distribution p but ultimately, as you move along,

you are going to get closer and closer to generating a sample from p.

So, so let's understand what that what that means so we have a Markov chain, and

a Markov chain is defined over a state space which we are going to use x's to

denote. And so here is an example of such a state

space. This is a very simple state space, it

starts with the zero point over here and you can imagine, and it has, I, the, four

negitive numbers to the left and four positive numbers to the left.

And a Markov chain defines a probabilistic transition model which,

given that I'm at a given state, x tells me how likely I am to transition to a

different state, x prime. And that is a probability distribution as

indicated in this formula here. So that's for any x, the probability, the

sum over the probability over the states you can transition is exactly one.

So for example if in this case we have our little grass, grasshopper that starts

out at state zero. We can see that it has a probability of

0.25 of transitioning to the right. A probability of 0.25 of transitioning to

the left. And a probability of 0.5 of not making

any progress and staying exactly where it is.

And in fact that same general probability distribution is actually in this example

replicated across the different states in the chain, with the exception of the

states at the end, where if the poor grasshopper tries to go to the left when

it's at negative four, it's going to hit the wall and it's going to end up staying

where it is regardless. And so the probability in this case of

staying is actually 0.75, corresponding to the two different cases that we just

talked about. But anyway, this is a Markov chain, and

it and you can imagine. Simulating a random process by which a

grasshopper traverses this chain. And so, initially, it starts out with at

say, at state zero. And then

and then, what happens, as. And then, as it moves, it selects to move

left with probability of quarter. Right with probability of quarter.

And and stay at the same place with probability 0.5.

Once the states move to the left, it now does exactly the same thing.

So let's think about the temporal dynamics of this type of process.

We can ask what is the probability that a time, that it's step t + 1 so this is the

step. what is the probability that at that

time-step the state, at time t1, + 1 is equal to some value x fine.

So we can get that by a recurrence relationship that looks at the state of

time T. So if we had previously computed the

probability distribution over where the grasshopper might be at time T.

We can sum up over all possible states where x, where the grasshopper might be

and ask if it was at state x at time t, what is the probability that it ends up

at being that it ends up going to x prime.

So this together gives me a distribution over pairs.

X comma X prime, where this, which, which measures the propability that at time, t

grasshopper is at state X and the T plus one is at X prime, and since I'm only

interested now in asking about T plus one, I sum up, or marginalize the time T

step, state X. So if we go back to our grasshopper

example, we can simulate this process. And here is the first three steps of it.

So at time zero, the grasshopper is at state zero with a probability of one.

At time one, it's going to be at negative one with a probability of a quarter and

it's positive one with a probability of a quarter, probability half it's stuck at

the same state. And now we can simulate the next step.

So, it's, at the next time step, the probability that it moves, manages to

move, all the way to -2 is 0.25 squared, because it considers two successful moves

to the left. Here you have two successful moves to the

right. At stage zero you have, for example, a

sum of different events, which is the probability that it stayed in the same

state twice. So.5 squared.

Plus the two events that correspond to moving to the left once and back to the

right, or moving to the right and back to the left, each of which happens with

probability 25^2.. So this is an example of how you would do

this. Now it turns out from many of these

chains and we'll describe conditions in a moment ultimately as the process evolves,

the probability distribution kind of equalizes which means that the

probability for the time t your at state x prime is almost the same as the

probability that at time t + one you're at x prime.

So sort of it kind of the distribution over where you might be tends to

equalize. And and so, we can then consider what's

called the limiting process, the limiting distribution that you would get as you

simulate the process for more and more steps.

And that is typically denoted by pie, which is called the stationary

distribution. It also has a bunch of other names.

But, stationary is the most common. And if you plugin.

You see, basically take out this approximate equality.

And you can see that what we have is a condition on Pi.

Is that the distribution of ti, at one time step is the needs to be at, the, the

probability of X prime in the stationary distribution needs to be equal to this

summation over here. Where Pi now appears both in the left

hand side and within the summation on the right hand side.