[MUSIC] We round off week three with a further look at this concept of variants, and I'd also like to extend our discussion from the previous session on the normal distribution as well. So, variants. When we introduce this as S squared, a few sessions ago, we were looking at the sample variance of a set of data. Now what I'd like to consider here is the so called population variance, i.e., the variance of a theoretical probability distribution. So, if we backtrack a little bit and think back to our week two, where we introduced some simple probability distributions. Let's return to the concept of the score on a fair die. So remember for that, we had our sample space, the possible values which this variable x could take, namely those positive integers, one, two, three, four, five, and six, and we said, if it was a fair die, those six outcomes were each equally likely, and we develop the probability distribution, and we assign the probability of 1/6 to each of those six possible outcomes. We then introduced the concept of the expectation of X. We viewed this as an average, effectively a mean, but this was a mean with respect to some population or theoretical distribution. So remember, we consider the expectation of X as a probability-weighted average, whereby we took each value of X, multiplied it by its respective probability, and added them all together. So there we found that the expectation of X, where X was the score on a fair die, was equal to 3.5. We also noted that this was never an observable value in any single role of the die, rather we view this as a long run average. So, distinguish the expectation of X, which we might denote by the Greek letter mu, to indicate a population or theoretical mean, and contrast that with the sample mean, X bar, which we've seen this week, which is the mean just of a set of our observations drawn from some wider population. So having introduced X bar, we then also considered the sample variance S squared as a measure of dispersion, but again with respect to the sample. So I think now we are in a position to work out the equivalent concept of the variance, but at the theoretical level, the so called population variance with respect to a probability distribution. So remember, S squared, I sort of instructed it to you, and to think of it like an average. The average square deviation about the mean, and we had our formula, of course the mean we're talking about here, was the sample mean X bar. What if we want to work out the variants for a theoretical distribution? We really want the same kind of concept i.e., we need an average i.e., an expectation of the square deviation about the mean. So whereas previously our expectation was the expectation of X, the expectation of that random variable, we still require an expectation, but now the expectation of X minus mu, all squared, i.e., the expected squared deviation of X about the mean. So just as E of X was a probability-weighted average, the expectation of X minus mu, all squared, is also a probability-weighted average. It's just that now, we don't multiply the X values by their corresponding probabilities, rather we multiply the X minus mu squared values by their corresponding probabilities. So let's revisit the score on a fair die and calculate the true variance of such a score. So we know the values of X are one, two, three, four, five and six. We've already determined that the expectation of X, which hereafter, we can denote by mu, was 3.5. So for each value of X, for example, the 1, we subtract mu, so (1- 3.5) square that value, and do a similar operation on the other remaining five values. We then multiply each of these by the corresponding probabilities of occurrence, but of course, as this was an assumed fair die, each of those scores has the same probability of occurence of 1/6. So you multiply each on of these with 1/6, and add them all together, and doing so you will get a total of 2.92, and this represents the variance for the score on a fair die. If you wanted to we could take the positive square root and consider the standard deviation of the score on the fair die, but do be conscious of the notation being applied. So sigma squared will correspond to a population variance and it's positive square root, Sigma the population standard deviation, and to be clear conceptually between the distinctions of those, which have been derived from theoretical probability distribution with their sample counterparts of the sample variance S squared and the sample standard deviation S. Now we're going to make much more use of these different means, and variances, and standard deviations as we progress to the statistical inference part of the course over the next couple of weeks. But perhaps just a nice way to round off our week three, is to revisit the normal distribution, because now we have perhaps a clearer understanding about what mu, the population mean, and Sigma squared, the population variance, represent. So we mentioned in the previous section that really there's an infinite number of different normal distributions, each characterized by different combinations of values for those parameters of mu and Sigma squared. Now it would be helpful if we could perhaps have some kind of standardised normal distribution. One where it's very easy to relate to. Well such a distribution exists, called the standard normal distribution. Now because this is so special, we will assign it its own special letter of Z. So whenever you come across the letter Z, in sort of statistical courses, think in term of standardized variables. Now why on Earth are these things of any great importance to us? Well, first of all, let's define what we mean by a standardized variable. This is one which has a mean of 0, and variance of 1, and of course, given the standard deviation as the positive square root of the variance, if the variance is 1, by extension, so too, is the standard deviation. So in notation we might say Z, as a standard normal variable, is distributed as a normal distribution, with a mean of 0, that's the value for mu in this special case, and a variance sigma squared special value of 1. So why are standardized variables, of use to us? Well, we've previously have mentioned the concept of an outlier. Remember when we were comparing means and medians, and which one might be a preferable measure of central tendency, what we did note, that means, we are very sensitive to the inclusion of any outliers. But, as yet, we haven't really offered any sort of formal definition of what an outlier might be, other than it's a sort of extreme observation. So if we take a variable X, which is not necessarily normal, just perhaps some measurable variable how would we go about standardizing X? So this requires a special transformation, what we will call standardization. Namely we take take our original variable X, and from it we subtract the mean, mu, and we divide by the standard deviation Sigma. Now we will omit the sort of the technical proofs sort of behind this, but trust me, that this transformation will standardize the original X variable, and turn it into a Z variable, i.e., a standardized variable with a mean of zero, and a variance of one. Now again, why is this of any great interest to us? Well an interesting fact about the normal distribution is as follows. So remember this sort of symmetric bell-shaped curve? Well this represents a probability distribution, such that the total area under the curve is equal to one, and so at the height of that normal curve just reflects how this unit of probability is distributed across the sample space of that distribution. Well an interesting fact about the normal distribution is that just over two-thirds of the probability, actually about just over 68% of the probability, so the 68% of the area under the curve lies between one standard deviation of the mean. So, the distribution is centered on mu, the mean, and if we consider going one standard deviation both above and below the mean, then that's going to capture about 68% of the area under the curve. If we now extended this from not just one standard deviation from the mean, but to two standard deviations of them mean, that's now going to capture approximately 95% of the total area, under the curve, and if we went one standard deviation further, and hence considered the mean, plus or minus 3 standard deviations for a normal distribution this captures about 99.7% of the total area under the curve. Hence, it is very unlikely to observe a normal random variable, an observation which is beyond free standard deviations of the mean. But now let's consider this in standardized terms, i.e., we have a standardized variable, i.e., its mean is zero, and its standard deviation is equal to one. So if mu is zero, and sigma is equal to one on a standardized, i.e., a Z scale there's some very simple numbers we just need to remember. So, for most sort of symmetric continuous distributions, such as the normal, then we said, within one standard deviation of the mean we get about 68% probability. So on a standardized basis, with the mean of zero, and a standard deviation of one, that equates on a Z set scale to being between -1 and +1. So there should be about 68% chance of being between -1 and +1. Now, extending it to being within two standard deviations from the mean, well, on a standardized basis this means being between plus and minus 2, and of course if we consider three standard deviations from the mean that equates to being between up plus or minus 3. So if we convert things to a standardized variable then immediately we can decide whether an observation is an extreme value or not. So for example, if we looked at the returns on a stock, or maybe movements of an exchange rate. Let's say a stock moved by 3.6% in a particular day. Now is this a dramatic movement or a less dramatic movement? Well, it's quite hard to judge just by considering that return of 3.6%, because it depends in terms of the context of the distribution that what is its mean, what is its standard deviation? We would really need to know that to judge how extreme such a movement might be. But if we now converted such a percentage change to a standardized basis, i.e., take the original observation, subtract the mean, divide by the standard deviation, and express it on a standardized Z scale then immediately we can see whether or not we have an extreme observation. Because then we simply compare that Z value to the range between minus one and plus one, about a 68% chance of such an event occurring, between minus two and plus two, about a 95% chance of that occurring, and between plus or minus three, roughly a 99.7% chance of that occurring. So if you were told that you had an event occurring on a standardized basis of let's say four or five, so that means being sort of four or five standard deviations beyond the mean, and this will correspond to an extremely rare event, and something which we may wish to call an outlier, indeed perhaps an extreme outlier. So if you're looking at comparing different variables, which are measured on very different scales, in order to facilitate a sort of easy comparison of them by doing this standardization transformation, by getting them on to the same scale, it allows you to compare apples with apples, rather than comparing apples with oranges. So going into our week four of the course, we'll be doing more statistical inference when we're going to be drawing on some of this theoretical knowledge. So join me for that. [MUSIC]