Let's look at some additional exercises related to logistic regression. So you may recall the resulting logistic regression for the analysis when we had an outcome response to antiretroviral therapy as a function of baseline CD4 count that was dichotomized. We did a regression that showed those who had lower baseline CD4 counts had higher log odds and hence higher odds and hence higher probability responding to treatment. And this comes from a logistic regression that says the log odds response is equal to negative 1.67 the intercept plus 0.58 times x1. So 0.58 is the slope times x1 where x1 is a binary indicator which is a 1 for baseline CD4 counts less than 250, and 0 for subjects with baseline CD4 counts or greater than or equal to 250. So I'm wondering if you could figure out what the resulting estimates of the intercept and slope would be in the model, it looks similar. I'm just putting a star here to indicate that we may get different numbers. If x 1 is instead coded as 1 for subjects with the greater CD4 counts of baseline and 0 for the subjects that has lower CD4 counts. So let's think about this for a moment and do the logic here. Remember, we're only ultimately estimating two different log odds for two different groups here, but the log odds that we had in the original coding was given by the following equation. Our intercept was negative 1.67 and our slope was 0.58 x1. x1 was a 1 for those with lower CD4 counts. And 0 for those with higher CD4 counts. So let's just parch this for a moment. So the intercept here gives us the log odds for the reference group. For those with CD4 counts greater than or equal to 250 is negative 1.67. This slope here is the difference in the log odds for the group with lower CD4 counts coded 1 compared to the group coded 0, the group with higher CD4 counts. And subsequently to get the log odds for the other group, the odds for the group coded 1 were those with CD4 counts of less than 250. We would take negative 1.67 and add that slope of 0.58 times 1 which is 0.58 and if we do the math for that it equals negative 1.09. So now let's go and recast as we want to refit the model or pretend like we'd fit it with a different coding for x1 and figure out what our results are. If x1 is now 1 for those greater than 250 and a 0 for those with less than 250. So let's think about this for a moment. This slope here is still the difference in the log odds of response for the two CD4 count groups, but just in the opposite direction. Whereas before, it was the difference between those who were 1s, those were less than 250, compared to those who are greater than or equal to 250. This is just the difference in log odds in the opposite direction. So this slope will actually just be the negative of what it was before. So the log odds we've got a piece of this equals some intercept plus negative 0.58 times x1 of the new coding. So what is the intercept going to be now? Under this new coding this is going to represent the log odds for when the new x equals 0 for the group with CD4 counts of less than 250. But we already know from this operation over here that this log odds is negative 1.09, so we can put that in there. And what we've done is we've just shown that we could estimate the intercept of the slope under a different type of coding. And the end result, the estimated log odds for the two groups we have will be the same regardless of how we coded it. But the intercept and slope will change value to reflect that we're changing the direction of comparison with our slope, etc. Just something else to note here, when we exponentiate this, the odds ratio. We get, comparing the odds of response for the lower CD4 count to the higher, was equal to 1.79. If we do it in the opposite direction, and we get the slope of negative 0.58 the odds ratio in the opposite direction is e to the negative 0.58 which equals 0.56. But just FYI e to the negative 0.58 is equal to 1 over e to the 0.58, which is equal to 1 over the odds ratio in the opposite direction. So on the log odds scale, the slopes change sign. We'll do the opposite value of each other but when we exponentiate to the odds in odds ratio scale, one odds ratio in one direction is the reciprocal of the corresponding odds ratio in the other direction. So let's look at the respiratory failure and gestational age sample little bit further. Recall, even though gestational age categories were ordinal, the authors made categories, there are four categories for gestational age. And so they made one the reference group, which they use full term, 37 to 40 weeks is the reference group. And then they made indicators for 34 weeks, 35 weeks, and 36 weeks. So here are the recall the resulting results for the logistic regression relating the odds as for log odds for respiratory failure to gestational age. The reference group described by the intercept alone was those 37 to 40 weeks and then x1 was one for 34 weeks and as you otherwise x2 was a 1 if it was 35 weeks, 0 otherwise and x3 was the one if it was 36 weeks. So I wanted to see if we could figure out, based on this model, what is the odds ratio respiratory failure for children with gestational age of 36 weeks compared to the children with gestational age of 34 should be not months but should be weeks. Well, let's think about this. So one way we could just write it out in full form but it wouldn't then get the result was if we were looking at the group that was 36 weeks, their estimated log odds is equal to the intercept of negative 5.5 plus the slope for being 36 weeks to 2.0. I'll put that over here. I should probably put it next to the plus sign just to indicate where it fell in that equation. If we do the same thing for 34 weeks. Is equal to the intercept of negative 5.5 plus the slope for 34 weeks, so 3.4. So if we take this difference in log odds, the intercept cancels and it just happens to be the difference in slopes, 2.0 minus 3.4 which is equal to negative 1.4. So if we were to exponentiate this negative 1.4 we would get an odds ratio of about equal to 0.25. Just one thing to note, if I's exponentiated before taking the difference, I'd written it like this, e to the 2.0-3.4. That can be reexpressed as e to the 2.0 times e to the -3.4, which is equal to e to 2.0 over e to the -3.4. So it's the ratio of the odds ratios. These exponential slopes are odds ratios for each of these groups, 36 weeks compared to the reference divided by the odds ratio of respiratory failure for the 34-week group compared to the reference. So it's the same mathematically, but it's just interesting to note that on the log odds scale, this is a difference in log odds. And if we had started on the ratio scale, we could have taken the ratio of the odds ratios, comparing each of those groups to the same reference. So let's just think here, the authors have categorized the ordinal gestational age categories, making the reference, the largest gestational age is 37 to 40 weeks, and having indicators for each subsequent, 35 weeks, and 36 weeks. So let's just think about what happens here. Let's start with, when we go from 37 to 40 weeks down to 36 weeks, the jump in log odds is what we would add 2.0. So the difference in the log odds scale between going from 37 to 40 weeks to 36 weeks is 2.0. The difference going 40 weeks to 35 weeks, that jump is 2.8. So the difference in the jump between 35 and 36 weeks is that difference of 0.8. So going in reverse here, we would start at 36 weeks and we'd add 0.8 to get to 35 weeks. If we go from 37 to 40 weeks to 34 weeks, that difference is 3.4. So the difference going here between 35 weeks and 34 weeks is an additional, you add, start with the 2.8, then you would add additional 0.6 to get that 3.4. So, first think about this. The initial jump from 37 to 40 weeks down to 36 weeks was a jump of 2.0. And then from 36 weeks to 35 weeks, it attenuated a bit to 0.8. And from 35 weeks to 34 weeks, it attenuated a bit to 0.6. So I would say qualitatively speaking, these two jumps are similar, but qualitatively different from that first larger jump from full term to the first level of pre-term. If there was a linear relationship, we would expect these differences between subsequent one week differences to be similar in value. And I would argue that they are not because that initial difference between 37 and 40 weeks. The largest, highest gestational category of 36 was larger than the subsequent jumps between 36 and 35, and 35 and 34. And so I think we would have missed that nuance. We certainly would have missed that nuance if they had modeled this as strictly linear relationship. Albeit, it's still consistent that a decrease in gestational age incurs an increase in the log odds of respiratory failure or phrased in the opposite direction, increase in gestational age is associated with lower odds of respiratory failure. So if they did use a linear approach, they would still get the big picture and estimate the proper direction of association. But they would miss some of those nuances between subsequent categories. Let's go back to the example of obesity and HDL cholesterol level. And we've gone through this already. I just wanted to expand on the explanation I had left at the end. So just to recall, I wanted to know the odds ratio of being obese, proportions with the HDL of 100 mg/dL versus those with HDL of 80. The difference on the log odds scale or slope scale turned out to be the difference in the x values, 100 minus 80 or 20 times the slope of negative 0.034, turned out to be -0.68. And if we exponentiate that, we get the odds ratio comparing the odds of obesity for those with 100 mg/dL compared to 80, and it was 0.51. And then I said to you, if we had started with the original odds ratio, for a one unit difference, if we'd started with the odds ratio for one unit difference, in mg/dL for HDL cholesterol, the odds ratio of being obese was e to the slope value -0.034 or 0.97. So if we had started with that, we wouldn't necessarily have to transform it back to the log scale, multiply it by 20, and then reexponentiate. We could start with that odds ratio itself, that odds ratio, and simply raise it to the 20th power. And so just to remind you of why the math works out on this, if I were to write, and I'm not doing anything more than explain in more detail what I did here, e to the 20 times the slope which we already did before. This is e to the, this is beta 1 hat times 20 can be rewritten as e to the beta 1 hat To the 20th power. But either the beta 1 hat is simply the odds ratio for a one unit difference In x raised to the 20th power of that 0.967x raised to the 20th power. So if you have the odds ratio and you want to do such a comparison across the different differences in x's, you can simply take that odds ratio for one unit difference and raise it to the power, the difference in x's you're looking at. And just to show you, you don't have to go back to the log scale. So, let's look at breastfeeding and child age for a minute. And the first question I'm going to ask you is another multiple thing here. And we'll just run through that again if we were comparing groups that differ by more than one unit next. Log odds of being breastfed as a function of child's age is as follows. The log odds equals the intercept of 7.9 plus the slope of -0.24 times age in months. We have the standard error for the intercept and slope here. What is the estimated odds ratio 95% confidence interval of being breastfed for children who are 30 months old compared to children who are 24 months old? So, We could start, I could, one way to do this, let me do this the fast way, is I can tell you that the odds ratio for one unit difference in age, 1-month difference, is either the -0.24, which is equal to 0.786 or roughly 0.79. So at 21% reduction in odds per 1 month increase in age, I want it for a 6-month difference in age, 30 months to 24 months. So let's just do this the fast way. We'll take this odds ratio for one odd ratio for 1-month difference. Difference in age. And we'll raise it to the 6-month difference we have here. And if we do that, that turns out to be roughly, 0.25. So just coincidence, this ended up being similar to another odds ratio we computed in another example. It's just that coincidence. But anyway, that's the cumulative effect of a 6-month difference. If you lose 18 21% per month of age, it accumulates to a total reduction of 75%. In other words, an odds ratio of 0.25 over 6 months. We want to confidence interval for that? Well the easiest way, we have the standard error. This is solely a function of slope, it does not involve the intercept. We have the standard error for the slope. We could create a confidence interval for the slope, 95% CI. For the slope is the slope estimate, plus, or minus, 2 estimated standard errors. This would give us confidence interval endpoints of (-0.32, -0.16). And we could multiply these by 6, and exponentiate. Or we could exponentiate them first to get the confidence and our endpoints for the odds ratio, a month difference in age. And then raise each of those to the sixth power. In any case, if you do it, you ultimately get a 95% confidence interval for this comparative interest from 0.15 to 0.38. So our estimated odds ratio is 0.25 with a 95% confidence interval of 0.15 to point 0.38. The last thing we're going to do is estimate the proportion of children who are breastfed at 30 months. This isn't something I expect you to be able to do. I can't do it even with the information given, we would need a computer for it. But I want to indicate that you can get confidence intervals for predicted proportions based on logistic regression. So let's just write this out in generic form, our equations of the following form. We have an intercept, plus a slope times x1, beta one, after x1. If we wanted to get the estimated value of p-hat for 30 months, we'd first do it on the regression scale. We'd estimate the log odds for that group. Generically speaking, that's going to be the intercept, plus the slope times 30. And if we do that through, then it would be 7.29 +- 0.24 (30). If we do the math, this is equal to- 0.09. So the log odds for this group is equal to- 0.09, as the odds of being breastfed for 30-months-old is equal to e to the- 0.09. And the estimated probability is equal to the estimated odds over 1 plus the odds, which equals e to the- 0.09 over 1 plus e to the- 0.09. And this is roughly equal to 0.52, or 52%. So an estimated 52% of the 30-month-old's in this sample are breastfed. To get a confidence interval for it is complicated. The uncertainty starts in the log odd scale. And because this is a linear combination of an intercept and a multiple of the slope, the standard error of the estimated log scale is going to be a linear combination as well. It will include information about the standard error of the intercept, information about the standard error of the slope. And that multiplied by 30, but also will have to involve information about how these two covary across multiple studies of the same size. Not something we can get just for having the standard errors of each quantity, respectively. So it's already a difficult, and requires a computer to estimate in the log odd scale. And then the confidence of the interval endpoints based on that standard error for the log odds need to be doubly transformed in the same way to get back to the probability scale. So I'll just give you the end result to indicate that this can be done by computer, but the confidence interval for this is 0.62 to 0.42. So what's the big picture here? I'm obviously not showing you how to do this by hand. I can't do it by hand, I would need a computer. But I do want you to know that because we can estimate proportions and probabilities from logistic regression, those of course are based on estimated slopes and intercepts. So our end result is an estimate as well. There's an uncertainty your sampling variability for that particular estimate if we repeated the study over and over again, and estimated this proportion with the logistic regression results from multiple regressions, there'd be variability estimates. So we have to address that. So we can put confidence limits on these results even though they're not on the original logistic regression scale. So just be aware of that when you're working association with others and thinking how to present results. If they involve predictive probabilities and it's appropriate, you can give confidence limits. Certainly I hope these exercises were helpful at revisiting some of the ideas we discussed in the logistic of rush unit, and we'll see you back online.