So I wanted to show you the slides that I was working on now, the actual R markdown files. This is the dot r m d file. And at first, I want to plot the data that I'm going to look at. This is the Galton parents' height and childrens' height data. So I am going to plot it here using g g plot. And there's my plot. Now I'd like to go with the information that is on the slide, which is some R code. So I thought it'd be probably better to just show you, show you the running of the R code rather than showing you a slide with the, the output of having run the R code. So here I, for convenience, I've defined my outcome as the child's height's galton dollar sign child, so let me define those. And I've defined the. X, the predictors, as the parent's height, Galton dollar sign parent, and this is just going to save me on some typing. Now first I'd like to indicate that the solution that we specified is the same solution that R will give you with its built in regression function. So here's beta one, it's the correlation between y and x. Times the standard deviation of y divided by the standard deviation of x. And here's my beta nought, which is the mean of y, minus the beta one that I just estimated, times the mean of x. So I can, for example, look at these with beta zero and beta one. In this c function for just creating a vector. Let me check that with the l m function. I'm going to copy the function and paste it down here and then I'll go through each of the components. L m in r stands for linear model. Regression is a component of linear models, and so, this function is the general function whether you want regression or you want some of the more elaborate versions of regression that we're going to cover later on in the class. So we want l m, the outcome y, tilde, the predictor x. And that does a linear regression. By default it includes an intercept. Coef takes the output of the linear model and just grabs the coefficients. And of course you see we get the same numbers, 23.94 and 0.64, 0.65. Very briefly now I just want to show you that if I reverse the y and x relationship the formula, of course holds but now with standard deviation of x in the numerator and standard deviation of y in the denominator. So I'm going to now define my beta one as correlation of y and x, but now with standard deviation of x in the numerator and standard deviation of y in the denominator. My intercept is now mean of x minus beta one times the mean of y. Now I'm simply going to concatenate these slope and intercept estimates with those that you get with l m where now x is on the left hand side of the tilde and y is on the right hand side of tilde, reversed from what it was previously. And there of course down here you see that you get a, you get the same numbers. So our formula is correct and we know how to use it and we know what happens when we reverse the x y relationship. Another point that was made thus far in the lecture was that regression through the origin yielded the same slope as linear regression with a not necessarily zero intercept. If you mean centered the y's and mean centered the x's first. So let's just check that computationally. My centered y is just going to be y minus the mean of y. My centered x is just going to be x minus the mean of x. And recall that the regression to the origin equation for the slope was just the sum of the y variable times the x variable, divided by the sum of the x variable squared. So, let's run that and get our coefficient that is estimated through a regression to the origin. And then now what I'm going to do. Is I'm going to take this beta one that I got through my equation of, for regression to the origin. Where I've mean-centered my data first. And then I'm going to compare it to the linear regression estimate, using l m with y tilde x, meaning y is my outcome, x is my response. And l m automatically puts in an intercept. And I am going to grab the second coefficient, which is the slope. When I do that, if you look down here you see of course that you get the same numbers. I want to very briefly also just show you how you can actually do regression to the origin. Using l m how you get rid of the intercept. So in this case I'll get the same number if I take the centered y and use the centered x as a predictor. But I want to subtract out the intercept, so you put a minus one to get rid of the intercept. And you'll see here point six four six three. You get the same slope estimate. Another point that was made in the lecture, was that if we were to normalize the y's or the x's so that they have standard deviation one, the slope would be the correlation. So let's just double check that really quickly. Here, I'm, I'm normalizing my child's heights by subtracting off the mean and dividing by the standard deviation. Let me just juggle, double check that these now have standard deviation one. There you go. And I'm going to do the same thing for my x variables now. So now both my y and my x variable, the y n and x n variables. Can be interpreted as standard deviations from the mean for the, in the y's case for the child's heights and in the x's case for the parent's heights. So we've gotten rid of the, the original units, the inches. And now they're in standard deviations from the mean. Now I simply want to show with this concatenation operator here. That the correlation between the original y and the original x, is the same as the correlation when I normalize the y and I normalize the x. And its the same as the coefficient for the linear regression when I fit the normalized y as the outcome and the normalized x as the predictor. And when I run it, if you go down here. Of course you see that all three numbers are the same. The last thing I'd like to do is simply show how to add a regression line to a g g two scatter plot. Here over on the right I am showing the somewhat fancy plot I've been given for this data. Where the sizes of the circles are the number of parents and childs. At that specific x y location. And the color of the circle is also representative of the frequency. I've defined over here my aesthetics, so that x is parent and y is child. And if you want to add a linear regression line to this point, you simply take your plot. Which I've assigned to the variable g. And you want to add geom underscore smooth. You want to use the method as l m. And you want to give it the formula, y tilde x. If you omit the formula, it's going to assume that formula is y tilde x. So let's add that layer to our g g plot and then let's re-plot it. And there you see the regression line overlaid on top of my scatter plot. I would also note that g g plot two does a very good thing for us on our behalf. Is they automatically give us a confidence interval around the line. We'll talk about how to generate this confidence interval later on in the lecture. But it's very nice that they're thinking of statistical uncertainty automatically.