In probability theory and statistics,

the mathematical concept of correlation is used to calculate

a possible linear association between two continuous variables.

Correlation is measured by a statistical relationship called the Correlation Coefficient.

A correlation coefficient measures how strong a relationship is, between two variables.

It is a dimensionless quantity that takes a value in the range of -1 to +1.

A correlation coefficient of zero,

indicates that no linear relationship exists between the two continuous variables,

and a correlation coefficient of -1 or +1,

indicates a perfect linear relationship.

If the coefficient is a positive number,

then the variables are directly related.

This means, that when the value of one variable goes up,

then the value of the other variable tends to go up.

On the other hand, if the coefficient is a negative number,

then the variables are inversely related.

This means, that when the value of one variable goes up,

then the value of the other variable tends to go down.

It is important to note that if the relationship

between two continuous variables is not linear,

it is not correlation in statistical terms.

A well known and commonly used type of

correlation coefficient is Pearson's product-moment correlation coefficient.

It is denoted as rho for a population parameter,

and as r for a sample statistic.

The difference between a statistic and a parameter,

is that a statistic describes a sample,

whereas, a parameter describes an entire population.

This coefficient is a measure of the strength and

direction of the linear relationship between two variables,

that is defined as the covariance of the variables

divided by the product of their standard deviations.

It is used when both variables being studied are normally distributed.

For example, you might want to quantify

the association between body mass index and systolic blood pressure,

or between hours of exercise undertaken and body fat percentage.

Another statistical method used for the analysis of medical data is Regression Analysis.

Regression analysis uses mathematical models to describe relationships.

The main difference between correlation analysis and regression analysis,

is that correlation analysis focuses primarily on association,

while regression analysis is used to make predictions.

Simple linear regression is used to calculate the relationship between two variables.

Where one variable, the dependent variable, denoted by y,

is expected to change as the other variable,

the independent variable, denoted by x, changes.

This technique fits a straight line to data,

where it is the so-called regression line.

For example, suppose that height was the only determinant of body weight.

And if we were to plot height,

the independent variable on the x axis as a function of body weight,

the dependent variable on the y axis,

we're likely to see a linear relationship.

Graphical representations are particularly

useful to explore associations between variables.

Such graphical displays are usually in the form of scatter plots.

Scatter plots are similar to line graphs,

in that they use horizontal and vertical axes to plot data points.

Scatter plots are important in statistics,

because they can show the extent of correlation if any, between two continuous variables.

Scatter plots can be used to visualize many examples of medical and healthcare data,

ranging from birth-rate and life expectancy to

lung cancer death rates versus all other cancer death rates.

In the next video, we're going to look at

how probabilistic modelling can be applied to population health studies,

with a few examples of how it can be used to address population health concerns.