In this segment, we will

assume that we have some data

and we will do three things.

We will look at the data,

we will guess what the model is,

and we'll estimate the model.

So, I chose this dataset simply because it's available,

but there are data repositories today at

various places which have

as far as regression is concerned,

this is free datasets where

you can go and test your skills.

This particular dataset is from the odd library,

and you will be seeing where it comes from soon.

So what it has is data on housing the prices.

The price is the last element

on this data which is called the

median value of owner-occupied homes

in thousands of dollars.

The rest are all what we

called the explanatory variables, right?

So we can think of the price

as what we call it as the response variable,

and other variables as explanatory variables,

and that you're trying to

explain the response variable in

terms of the explanatory variables.

The other thing to note is is you

have things like crime rate,

you have things like percentage

of land occupied by non-written businesses.

But then you also have these are all numerical data,

we also have one variable that's

the tract bound the Charles River.

Now, it's a one or a zero.

So it's sort of a qualitative variable,

yes or no, but it's quoted as one or zero.

So what we see that is

your explanatory variables could be

numerical or could be qualitative, right?

Often when they're qualitative,

they had to be quartered into different levels of red,

blue, green, high-price, medium price,

low price, and so forth.

So it is not necessary in

regression that you only have numerical variables.

Similarly, in this particular case

the response variable is a numerical variable,

it's the medium value.

Medium value is going to take

value in decimals because it's in thousands, right?

But the response variable,

we will see in the next module could be also qualitative.

It's always a good practice to go over these,

and make sure that we have

a complete list of all the data,

all the features of the data we think will affect price.

So when I first saw this, for example,

you have something like

nitric oxide concentration of parts per 10 million.

I wonder whether it will affect

house prices. We don't know.

So we do interview,

and then you collect a whole set of values up there.