And this really forms the basis for what we call the baseline predictor.

And the baseline predictor. And what we do is we take each of these

rating values or we, we start with the average actually and we take we take that

raw average, remember last, from last time that was just the 3.5 then we add in

a bias for the user and we add in a bias for the movie.

So the biases are going to be positive if it's better received or if you tend to be

a more lenient critic. And they're going to be negative if it's

a movie that's, tends to be worse or if the critic tends to be more harsh.

And so we have to find the bias for each of these users.

And that's, that's the whole idea here, is that we really need to to find those

those bias values. And so finding the bias is, is somewhat

of an art actually. there, there is a quote unquote right way

to do it probably which would be that we have to solve a complicated optimization

problem. [SOUND] Optimization right over here and

but we're not dealing with any calculus or linear algebra in this course.

We won't go that far, we won't deal with that.

We will just look at a simpler approach to doing this.

And so we need to find the bias values so we'll try to do something intuitive.

Why don't we just compare it to the mean value?

Right, so let's just compare it to what the mean was.

Again, remember the mean at the overall data set was 3.5 so let's see how much

higher or lower the mean is than 3.5, and that's how we find the bias.

So we take the average, of the lower column.

We'll start with, maybe we'll start with the harsh critic in D, so we'll take.

2 plus 3 plus 1 plus 2 is the average of these values 2, 3, 1 and 2.

Again remember we're not including anything that is in the test set when we

do these biases out. Divide that by 4, just for those values,

and that is 8 divided by 4, which is 2. But then that we actually have to

subtract out the mean because we want to do it relative to the overall mean, so we

do 2 minus 3.5. So we've just can write up here this is

really minus 3.5, this is minus 3.5. So we get 2 minus 3.5 which is negative

1.5. So that's the bias and that should make

sense because D is a harsh critic so he's well below the average rating of minus

1.5. He was below the average of 3.5.

And now let's try the good movie, or movie three, so we can take the average

of the values again. So this is for D.

Now we'll try three. So, you, we add up 4 plus 5 plus 3 plus 5

and again, there's five of those, there's four, sorry, there's four of those

values. this we don't have and this is in the

test sets, we don't use it. So this comes out to be 17 over 4, which

is, 4.25. And then remember we subtract 3.5 from

all of these values, minus 3.5, minus 3.5.

And this is 4.25 minus 3.5, which gives us positive 0.75.

So, for D we have a negative value of negative 1.5 and for 3 we have positive

of 0.75 which is significantly above zero, or it's above zero reasonably

enough, which is what we expected, because we expected that 3 was a much

better movie than the others. And so the key idea here, again though,

is that you can't use the test set. So the test set has to go.

And because we're not using that in these prediction schemes because we're, that's

what we're going to test the RMSE on. And so we can do the rest of the values

out and I've do note, I've given the values at the end of the columns or rows

here. So the bias for A is positive 0.83, for B

is 0.5, for C is negative 1. really easy to see we could do C is, for

instance, right now there's only two values, there's a two and a three here,

so we just do 2 plus 3 divided by 2 minus 3.5, which is 5 divided by 2, which is

2.5 minus 3.5 which is negative 1.0. And you can verify the rest of these

also. and then on the movie side, again, two is

pretty negative movie actually. 3 plus 2 plus 2 divided, divided by 3,

and then, subtract out, the mean of 3.5 from that.