Unlike the assumptions of Normality, Linearity, and Homoscedasticity,

the assumption of Independence cannot be fixed by transforming variables, or

otherwise modifying the variables in your analysis or excluding observations.

This is because it it typically the structure of the data itself,

that results in violation of this assumption.

So it's important to understand the process, or

study does design that generated your data.

If that process produces data that are hierarchically structured or

clustered, or has correlated observations,

then the best solution may be to use an alternative regression method,

that can take into account the lack of independence in your data.

Most of the methods are simply extensions of the linear regression model.

So having a good understanding of linear regression will make it easier to

understand and apply these alternative statistical methods, that can account for

lack of independence among observations.

Although not one of the big four assumptions,

Outliers in multicollinearity can affect your analyses in undesirable ways.

So you have to do some investigating to see if either one or both are present.

Outliers are observations that have unusual or

extreme values relative to the other observations.

In regression analysis, Outliers can have an unusually large influence on

the estimation of the line of best fit.

A few outlying observations, or even just one outlying observation

can affect your linear regression assumptions or change your results,

specifically in the estimation of the line of best fit.

The analysis will attempt to try to fit the outliers.

As a result, the estimated regression line will not rest of the data as well as it

should, increasing the prediction error for the majority of the observations.

You can often identity outliers by just looking at a scatter plot.

In this scatter plot, there are two observations that appear to be different

or unusual, compared to the other observations.

This observation here is definitely an outlier.

It's far from all the other observations, and it's nowhere near the regression line.

This single observation could definitely have an impact

on your regression assumptions, and on your results.

If this is the case, then something needs to be done with it.

The other observation here, also looks like it could also be an outlier,

because it is far away from the values of the other observations.

But it still fits along the regression line.

It may have an extreme value, but

it shows the same linear association as the rest of the observations.

This means that including this observation in the analysis

will not have an impact on your results, and should be retained in your analysis.

Histograms and box plots can also be used to identify unit variant and

by variant outliers.

So the question is, what do you do with outliers?

Just getting rid of them might not be the answer.

Here's a decision flow chart that can help you decide what to do with outliers.

The first thing to do is check to see whether the observation changes,

whether or not your aggression assumptions are met.