0:22

And a vector called x, where x is in n by 1 and

Â z is n by 2 and z looks like this j n1 and

Â then an n1 vector of 0's and then two vectors of 0's,

Â I'm sorry and then an n1 vector of 0's and then j and 2's so

Â the Z matrix looks like the two way anova matrix from previously, but

Â we've appended an x onto it as well, an x factor.

Â So, this is the example, if we do leaf squares with this W.

Â We are interested in fitting models where we have a regression line

Â 1:53

Let me just call ut mu as the vector of mu.

Â Okay, so we can write it out like that and

Â then let's figure out what this works out to be.

Â So, let's use our standard trick where we hold beta fixed and

Â we come up with the estimate for meu condition as it depends on beta.

Â Well, if beta's held fixed, that's just a vector.

Â And this is just a two-way ANOVA problem that we discussed previously.

Â Remember that the two-way ANOVA problem

Â 2:36

So, that has to be Y one bar, the group one mean for the Y's minus x1 bar, beta.

Â And then mu 2, the mean for group two, as it depends on beta,

Â has to be y2 bar minus x2 bar times beta.

Â Now, if I were to plug those back in for mu 1 and mu 2, into here.

Â 3:06

What I get is nothing other than the centered versions of y.

Â So, I get y1 minus y1 bar times jn1,

Â y2 minus y2 bar times jn2.

Â That vector minus

Â x minus x1 minus x1 bar, x2 minus x2 bar.

Â I didn't define x1 and x2.

Â But let me just say those are just the group components of x.

Â x1 is the first And 1 measurements of x and x2.

Â And x2 is the n2 latter measurements of x.

Â And that should be times beta, okay.

Â So, oops, and I shouldn't say equals.

Â It has to be greater than or equal two because we plugged in

Â the optimal estimates from mu1 and mu2 for a fixed beta.

Â Oh this is now nothing other than regression to the origin.

Â With the group version center of y and the group center version of x.

Â So, we know that the estimate of beta, beta hat, the best beta hat that I can

Â 4:16

get has to work out to be the regression to the origin Estimate from these data.

Â So, that's just summation.

Â Let me write it out this way.

Â The double sum of, well, here.

Â Probably the easiest way to write it out first is y tilde,

Â the inner product of y tilde and x tilde.

Â Over the inter product of X tilde with itself.

Â Where Y tilde is the group center version of Y, and

Â X tilde is the group center version of X.

Â In other words,

Â by group-centered i mean each observations with its group mean subtracted off.

Â 4:59

And you can show, and I have this in the notes okay,

Â well let's just do it really quickly here.

Â What does this work out to be?

Â This works out to be the double sum, let's say,

Â over i and j or yij minus y bar i.

Â So, i equal 1 to 2 and j equal 1 The n sub i,

Â x i j minus x bar i all over the double

Â sum of xij minus x bar i.

Â And let me just explain my notation here, so yij is the jaced component of

Â Let's say y11 is the first component of the vector y1.

Â y12 is the second component of vector y1.

Â y21 is the first component of vector y2, and so on.

Â 6:02

So, we can write this out.

Â And I think this is probably the nicest way to write this out.

Â As for I equal one we can write this out as P

Â times beta one hat plus one minus P times beta two hat.

Â Where beta one hat is the regression estimate.

Â 6:27

The regression estimate For only group 1.

Â If you only had the x1 and y1 data, the center x1 data and the center of y1 data.

Â And beta2 hat is the regression estimate if you only had the y2,

Â the centered y2 data and the centered x2 data.

Â 6:46

Okay, so it is interesting to note that the slope

Â from ANCOVA works out to be a weighted average

Â of the individual group-specific slopes, where, in this case,

Â p works out to be the summation of xij minus x.

Â Summation of x1j minus x1 bar over sum.

Â This should have been squared.

Â Sorry about that.

Â The double sum of x1j minus x1 bar.

Â 7:37

Yeah, Xij minus x bar i.

Â So, P works out to be the percentage of the total variation in the axis.

Â From group one so if most of your variation in your x's is in your group one

Â then your group one slope contributes more to the over all end of the slope,

Â if the group two is more variable then the group two contributes more and

Â if they are equally variable then both of them contribute equally.

Â Okay, so let's go back to now once we have our beta hat,

Â we can figure out what our mu1 hat and our mu2 hat is.

Â So, mu1 hat is equal to y1 bar minus x1 bar beta1 hat,

Â and mu2 hat is equal to y2 bar

Â Minus x2 bar beta 2 hat.

Â So, the difference in the means mu 1 hat minus mu 2 hat,

Â works out to be y1 bar minus y2 bar minus x1 bar minus x2 bar beta hat.

Â 8:44

Now, so, one way to think about this,

Â the most common way to think about ANCOVA is the instance where

Â you want to compare treatments, so treatment one versus treatment two,

Â but you have some confounding factor that you need to address for.

Â Say for example you're looking at a A weight loss treatment,

Â and your confounding factor is the initial weight of the person.

Â 9:11

Okay.

Â And so if the initial starting weight of the people receiving the one weight loss

Â treatment is different than the initial weight of the other weight loss treatment,

Â then you'd be worried about just directly comparing The two means.

Â Well, this shows you what in addition to the two means you need to subtract off

Â if you model the data as an model.

Â Most interestingly, is if you randomize and

Â your randomization is successful in the sense of balancing this observe

Â covariant that the Baseline age,

Â then the group one average should be pretty close to the group two average.

Â So, this difference in means should be quite small, so that whether or not,

Â you adjust for baseline weight or omit baseline weight for your model and

Â just do straight two group on over, the estimate should be very similar.

Â However, on the other hand, if you happen to not have had some randomization,

Â and had to have an imbalance so that the average for

Â group one is very different than the average for group two, then the difference

Â between the unadjusted estimate and the adjusted estimate can be quite large.

Â Okay, so that's that's an important example,

Â I have some more written about it in the notes.

Â But I think you can actually learn a lot about regression and

Â adjustment just by thinking about this one example

Â