0:00
Greetings and welcome to Lecture Set 6.
This has a rather long title but it's actually composed of two parts.
In Lecture 6A, we're going to give an overview of the idea of
multiple regression for estimation adjustment in basic prediction.
And then we're going to look in sections B through D at the specific case of
Multiple Linear Regression for these purposes.
In lecture set 7 and 8, we'll cover the same purposes of multiple regression for
a logistic and Cox proportional hazards.
0:31
So in this letters, set of lectures we will develop a framework for
multiple linear, logistic, and
Cox proportional hazards regression as I said before in the first section.
And then the remaining sections we'll focus on multiple linear regression which
is a extension of what we did with simple linear regression and
it provides a general framework.
For estimating the mean of a continuous outcome based on multiple predictors.
Each of which may be binary, categorical, or continuous.
1:02
So hopefully from this section, at the end of the section, you'll be able to
identify the group comparisons being made by a multiple regression slope
regardless the outcome variable type, whether we have a continuous, binary, or
time-to-event outcome being modeled by the regression.
And appreciate that multiple regression allows for both an outcome to
be predicted by taking into account multiple predictors with one method.
We don't have to look at associations one at a time anymore we can better
predict an outcome by taking multiple predictors into the story by one equation.
Multiple regression also allows for easy adjustment of a relationship or
relationships of interest for potential confounding variables.
And it's something we will cover after we lay down the basic framework of
multiple regression for these three types is this can be extended.
These methods can be extended to also look to
effect modification which we'll take on in lecture set 9.
1:59
And then I want you to realize, and hopefully this is.
You could almost do this in your sleep, now you're so used to doing it,
is to realize the approach to creating confidence intervals for
multiple regression intercepts and
slopes regardless of the type of regression, is more of the same.
2:16
So let's just go back to taking what we did in stat raising one and extending it.
So regression provides a general framework for
the estimation and testing procedures that we covered in the first term.
And as we discussed in lectures 1 through 3, many of the methods from
statistical reasoning 1 can be framed as simple, simple regression models.
But regression is nice because it allows for extensions, we
can add more predictors to our predictor set, and as such, multiple regression.
Allows for the extension of the methods from Statistical Reasoning 1 to allow for
multiple predictors of an outcome in a single method, and allow for
the estimation of adjusted associations relatively easily.
3:19
So here's the basic structure.
It's basically going to look like our simple regression models with more Xs.
So, I actually queued you a little bit in the simple regression when we did.
The multicategorical predictors, we needed more than one X to model that.
So in the strict definition it was actually a form of
multiple regression because there was more than one X.
But there was only one predictor in that model set,
it just required more than one X to model it.
So I think of it contemp, conceptually as a simple regression.
But the basic structure of a multiple regression model will be
a linear equation with potentially multiple X's and slopes.
3:57
So I'll just say it's equal to some intercept beta not
hat plus beta 1 hat X1 beta hat 2X 2 onward and upward to.
Beta hat P times X P over P is just some number of Xs that we have.
And these Xs are the predictors of interest.
And the only difference for Cox regression.
It'll look pretty much the same but just replace that Beta naught hat
with the Lambda naught hat of T in the above equation because in Cox regression,
even in the multiple situation.
Will have the inner set that will vary as a function of time.
4:41
For continuous outcomes, the left hand side, the thing that we'll be estimating
as a linear function of our outcome is the mean of this value variable.
And we can estimate the mean as a function of multiple predictors
by the equation we get.
For binary outcomes, the left hand side is the log-ons of the binary outcome.
Just like we saw in simple logistic.
And then for time-to-event outcomes, the left hand side is log of
the hazard rate or incidence rate for the time-to-event outcome.
5:13
So the right hand side the thing that looks the same.regaurdles of
the regression types with that slight switch out for
Cox regression on the intercept includes our predictors of
interest generically I'll call them x1, x2 up to xP.
And these can be binary, categorical or continuous.
So the thing that we're going to see here is, we're going to extend the definition
of the comparison we make with multiple regression a little more,
make it a little more specific than when we had simple regression.
Generically speaking,
each slope estimates the difference in the left hand side of the equation.
For a one unit difference in the corresponding x.
That's the same thing we talked about in simple linear regression.
The key here is that this comparison, when we have more than one predictor,
is adjusted for the other x variables or other predictors in the model.
So the associations we get between an outcome and
single predictors via the slopes.
Will automatically have been adjusted for the other variables in the model.
So this gives us a nice framework for
looking adjusted associations in the presence of potential confounders.
The intercept estimates whatever's on the left hand side, whether it be a mean for
continuous data, a log odds for binary data.
Or a log incidents rate, or
hazard rate for timed event data when all of our x's are zero.
6:37
So let's just look at a generic interpretation example to start.
Suppose we wish to estimate a multiple regression with three x's we've
done a study of intravenous drug users.
From four cities so this is sort of an international study with
four different cities, including Baltimore.
We have London to represent Europe.
We have Delhi in India, and then we have Capetown in South Africa.
So we want to look at some outcome,
and, and see how it's related to three predictors at once.
One is the sex of the person.
7:12
The second predictor, which requires three x's because it has four levels and
is nominal categorical, is city.
So I'm going to make Baltimore the reference group and
I'm going to create indicators for each of the other three cities.
So x2 will indicate London, x3 will indicate Delhi.
X4 in Capetown, and x5 is the age of
the IVDU intravenous drug user in the sample in years.
7:40
So the general rule to model we estimate for any type of outcome would,
depending on the outcome, we'd have this left-hand side.
Again, this could be a mean, a log odds or log incidence rate.
And they'll be equal to some intercept plus some slope times X and
then our second predictor, we only have three predictors but we have five Xs,
and the way I think of that is our second predictor is city and that requires 3 Xs
because there's 4 cities, 4 categories and then our third predictor is a.
So just for example,
beta one hat here compares the difference in the value of the left-hand side.
Remember sex here is coded as one for females, zero for males.
So it's the difference for a one unit difference in our x.
The only difference, higher to lower, is one to zero.
Or females to males.
So the slope for sex is the difference in the estimated value of the left-hand side
for females compared to males, adjusted for city distribution differences
between the sexes and age distribution differences between the sexes.
So in other words, this is.
The difference in the value of the left hand side for
females compared to males of the same city and age.
We've removed any vari, variability in those between the sex groups.
Beta 5, for example, is the difference in the value of the left hand side for
subjects who differ by one year in age, adjusted for sex and city.
So of the same sex and from the same city.
9:16
So this compares whatever we have on the left hand side for
a one year difference in age adjusted for those two things.
Beta 2 here, remember this is the indicator for
London and a 0 if not would be the difference.
Between London and the reference city, which is Baltimore, after adjusting for
sex differences and age differences between those two sites.
9:41
So, just a reminder, the metric on which these slopes that will
compare the left-hand, if the outcome is continuous or
left-hand side, is the mean of some continuous outcome, and
the slopes are adjusted mean differences comparing the groups we just discussed.
If the outcome was a binary y, a one or a zero,
then the slopes are adjusted log odd ratios estimates.
And we can exponentiate em them to get adjusted odds ratio.
And if the outcome is tied to event where we have some binary y.
And which indicates whether an event occurred or censored and
then there was the time to go with it.
Then the law of left hand side is the log hazard of having an event
at a given time and the slopes are the adjusted log hazard ratio estimates
10:37
If we want to get confidence intervals for our slopes and intercept,
if our intercept has relevance to the population from which the sample comes.
If it's not a placeholder quantity and
we'd be interested in using it to actually quantify some aspect of our population.
we, we can get that by the intercept estimate plus or minus two
standard errors of the intercept, which will be given to us by the computer.
And for any slope, if we have the estimate and the standard error,
we can get the confidence interval in the same manner.
And this will generally be done by a regression package in a computer, but
it's exactly the same idea as almost all other inferences we've done.
11:24
What about a general approach to hypothesis testing?
Well, we'll think of this in the context of slopes, and the high hypothesis for
any single slope in the model.
Of course, my hypothesis is the null is that the slope of
the population level is equal to zero.
There is no association between the outcome and
this predictor Xi after adjusting for the other predictors in the model.
The alternative hypothesis is that the population level association is not zero.
In other words there is an association after adjusting.
So the general concept of the null hypothesis is after accounting for
the information in the other predictors in the model, this particular x,
xi is not associated with the outcome.
Does not add information about the outcome above and
beyond the other predictors in the population from which the data was taken.
12:20
How would we do this hypothesis test?
Well, the same old approach.
The general approach is to compute a distance measure.
Sometimes it's called T, sometimes Z, but it's always the same computation.
Taking our estimated slope, subtracting what we expect it to be under the null,
which is zero, though we really just take the estimate and
divide by the standard errors of our estimate, our estimated standard error,
and that gives us how far in standard errors our estimate is from zero,
what we'd expect it to be under the null and, we can translate that into a P value.
12:57
Something I didn't discuss in simple linear regression, because I
just wanted to get the idea off the ground, but something to think about is.
When we have multi-categorical predictors like city, ethnicity, etc,
that require more than one x to uniquely specify each of the level of the category.
13:14
If we actually want to ask the, the general question of whether such
a predictor on the whole is associated with the outcome,
we're going to have to test more than one slope at once with one hypothesis test.
13:28
So let me just give you an example, in, in this more generic regression model.
Let's assume I have bunch of x's but
like we did before, one of the predictors is city of interest.
And if I want to test whether city or their locational differences,
in the outcome at the population level after adjusting for
the other things in the model.
I can't do it by testing any one slope alone.
For example, if I just test beta 2, all I'm going to answer is whether there's any
differences between London and Baltimore, the reference group.
But I'm not going to actually get in any information about
differences between Delhi, Cape Town, and
either the reference group or differences between them and London.
So if I actually want to formally test whether the predictor city is
statistically associated with the left hand side after adjusting for
other things, the null I really want to test is that all three slopes are zero.
Because if all three slopes are zero, there's no differences between any
of the three cities with the x variables and the reference of Baltimore, but
additionally we've seen we can if I wanted to for example compare Delhi
to London adjusting for other things in the model I could take.
Beta 3 minus beta 2, to get the difference between Delhi and
London, since neither was the reference group, I can still combine my
slopes to get differences between groups that are not the reference.
So if all slopes are zero, then all differences are zero, and
in other words, there's no differences in the outcome between any of the groups,
the cities after adjusting for the other things.
And the alternative is that at least one of these three is not zero.
15:46
So in summary, multiple regression is a general method for
relating an outcome, whether it be continuous, binary, or
time to event to multiple predictors with one model or one method.
And multiple regression models allow both for better outcome predictions by
using more than one predictor at a time and estimation of adjusted associations.
In the next sets, we'll look at specific examples.
Where our outcome is continuous and we're using multiple linear regression.