So here's another example that I'm going to use to show you simple linear

regression using data analysis.

This data is actually downloaded from Zillow for homes on sale in Chicago and

there are over 2,000 data elements in this file.

Now, one of the things that we expect is that the square footage of a home impacts

the price of the house.

So that the larger the house is, the higher the price of the home will be.

So I'm going to use simple regression to check this.

I'm going to go to data, I'm going to go to data analysis, and

I'm going to pick the regression.

So first of all, what is our Y?

Y is our price here.

We are expecting price to be dependent on square footage.

So I'm going to pick price, and I'm going to control shift down, and

pick all of it.

And as you can see there are over 2300 data points here.

That's my Y value, then I'm going to go up and pick the X value, and

that's the square footage of the home.

I'm going to say that I have labels and this time,

I'm going to just put the output right next to it.

So I'm going to click here, I'm going to click somewhere here and say okay.

I'm going to expand my output.

So here's our output.

First of all, look at R square, in this case 14.3.

So only 14.3% of the price of the home seems to be about how large it is.

So it doesn't seem to be a very good explanatory variable.

Exactly what I had said before.

If you look at here, majority of if it is coming from the Residuals.

Residuals are all the other sources.

So if you think about the total variations that we see which is shown here,

only a small percentage of it is coming from regression and

the rest of it is coming from residual.

Remember, these are to the power of 14 and to the power of 15.

So there is an alignment to the difference between residual and the SS form.

However, if you look at, what we see here, if you look at the P-value for

the square footage, just focus on the P-value.

The P-value for the Square Footage is extremely small.

So what it's saying to you is the following.

That you have identified a variable, Square Footage.

Which has a significant relationship with the price of the home.

However on its own it's not enough to explain the varabilities

that you see in the prices of the home from one to the other.

This is actually an example that I will expand on when you will do multiple

regression.

So this is a process.

Sometimes when we do a simple linear regression,

we may find that we don't have a very good model, and that's when we have more than

one thing influencing our dependent variable, the one that we want to predict.

So it doesn't mean that you would throw away all your analysis because what I'm

seeing as that I have come up with a good variable.

A variable that has a very strong influence on what I'm trying to predict.

Price of a home.

But on its own it's not enough.

Now what are you saying here that right now based on what you are seeing.

For every increase in the Square Footage the price of the home will go up by

132 dollars.

That's basically what it means, this is the rate of change.

Do you remember, if I was drawing this, and

it's a positive value because it has no sign, and its correlation is about 7037%.

Remember R, which is the correlation, can go between zero and 1 to -1.

So the closer you are to one the higher is the correlation.

So they are correlated.

They are not as strongly as we want it to be.

And that shows in our square, right?

It's, theirs other things are influencing the homes price.

It could be its location it could be the age of the home.

How many bedrooms it has, how many bathrooms we have.

So we can develop this further.

But let me just focus on the coefficient of 132.

So what this is saying to use that, if this is a square footage and

this is the price of the home, the line is somewhere like this.

The slope is 132.69.

So as you go by one point up, the price of the home will go up by $132.69.

Now since R Square is so small, I am not going to use this output for prediction.

I'm not going to say, okay now if I say the house is 2,000 square feet, can you

predict what the price of the home will be because It's not a good predictive model.

We just have identified a variable which has a very small p

value which means it has a significant relationship with the price of the home.