Hey. Let us understand how to train PLSA model.

So, just to recap,

this is a topic model that predicts words in documents by a mixture of topics.

So we have some parameters in this model.

We have two kinds of probability distributions,

phi parameters stand for probabilities of words and topics,

and theta parameters stand for probabilities of topics and documents.

Now, you have your probabilistic model of data,

and you have your data.

How do you train your models?

So, how do you estimate the parameters?

Likelihood maximization is something that always help us.

So the top line for this slide is the log-likelihood of our model,

and we need to maximize this with the respect to our parameters.

Now, let us do some modification in this formula.

So first, let us apply logarithm,

and we will have the sum of logarithms instead of the logarithm of the products.

Then, let us just get rid of the probability of the document because

the probability of the document does not depend on our parameters,

which they do not even know how to model this pairs.

So we just forget about them.

What we care about is the probabilities of words in documents.

So we substitute them by the sum of our topics.

So this is what our model says.

Great. So that's it.

And we want to maximize this likelihood,

and we need to remember about constraints.

So our parameters are probabilities.

That's why they need to be non-negative,

and they need to be a normalized.

Now, you can notice that this term that we need to maximize is not very nice.

We have a logarithms for the sum,

and this is something ugly that is not really clear how to maximize.

But fortunately, we have EM-algorithm,

you could hear about this algorithm in other course in our Specialization.

But now, I want just to come to this algorithm intuitively.

So let us start with some data.

So we are going to train our model on plain text.

So this is everything of what we have.

Now, let us remember that we know the generative model.

So we assume that every word in this text has

some one topic that was generated when we decided to reach what will be next.

So let us pretend, just for a moment,

just for one slide,

that we know these topics.

So let us pretend that we know that the words sky, raining,

and clear up go from sub topic number 22, and that's it.

So we know these assignments.

How would you then calculate the probabilities of words in topics?

So you know you have four words for this topic,

and you want to calculate the probability of sky, let's say.

This is how you do it.

You just say, "Well,

I like one word out of these four words.

So the probability will be one divided by four."

By NWT here, I denote the count of

how many times this certain word was connected to this certain topic.

So, can you imagine how would we evaluate the probability of

topics in this document for this colorful case.

Well, it's just the same.

So we know that we have four words about this red topic,

and we have 54 words in our document,

that's why we have this probability for this example.

Well, unfortunately, life is not like this.

We do not know this colorful topic assignments.

What we have is just plain text. And that's a problem.

But, can we somehow estimate those assignments?

Can we somehow estimate the probabilities of the colors for every word?

Yes we can. So, Bayes rule helps us here.

What we can do, we can say that we need probabilities of topics for each word

in each document and apply Bayes rule and product rule.

So, to understand this,

I just advise you to forget about D in all this formulas,

and then everything will be very clear.

So we just apply these two rules,

and we get some estimates for probabilities of

our hidden variables, probabilities of topics.

Now, it's time to put everything together.

So, we have EM-algorithm which has two steps, E-step and M-step.

Each step is about estimating the probabilities of hidden variables,

and this is what we have just discussed.

M-step is about those updates for parameters.

So we have discussed it for the simple case when we know the topics assignment exactly.

Now, we do not know them exactly.

So, it is a bit more complicated to compute NWT counts.

This is not just how many times the word is connected with this topic,

but it's still doable.

So, we just take the words,

we take the counts of the words,

and we weight them with the probabilities that we know from the E-step.

And that's how we get some estimates for NWT.

So this is not int counter anymore.

It has some flow to variable that still has the same meaning,

still has the same intuition.

So, the EM-algorithm is a super powerful technique,

and it can be used any time when you have your model,

you have your observable data,

and you have some hidden variables.

So, this is all formulas that we need for now.

You just want to understand that to build your topic model,

you need to repeat those E-step and M-step iteratively.

So, you scan your data,

you compute probabilities of topics using your current parameters,

then you update parameters using

your current probabilities of topics and you repeat this again and again.

And this iterative process converge and hopefully,

you will get your nice topic model trained.