Ever heard of the normal distribution? And did you know that it looks like this? You probably have. But how do you know that your data follows this distribution? If this is the histogram of your CDQ, does it follow the normal distribution? In this video, I will teach you how to answer this question. We will study a tool called the probability plot to find the correct distribution for your data. But why do you want to know which distribution fits to your data. Well, you will need that information for many statistical tools, such as the empirical CDF, ANOVA or regression, which are all coming up in the next videos. So, let's have a look at a probability plot. For that, we first go back to our call center, where a project has started to improve the total handling time. Which is the total time that employee is busy with answering a call. The CTQ is therefore Total Handling Time. And you gathered data about this process, and one of the variables is the total handling time. This is what it looks like. In this video, I will show you how to fit a distribution to this data. But first, let me show you the distributions that we will consider. You saw them already in the video on normal, Weibull, and lognormal distributions. There are many different probability distributions. In the Six Sigma project, you will often only encounter these three though, so that makes your life a lot easier. You will see the normal, the Weibull, or the lognormal distribution. Let's return to our example of the total handling time. From the shape of the histogram, we can already see that the data is not normally distributed. But is it then Weibull or lognormally distributed? The tool we will use to find the best fitting distribution is called the probability plot. Let's take a look at how to make one with Minitab. So, pause the video, load your data before you continue. I only copied the variable THT or the total handling time into Minitab. But you can have the rest there too, it doesn't matter. For the probability plots, we go to Graph. And then you'll find the Probability Plot over here, select. We have a single variable, so we select Single and OK. Well, we want to make a probability plot for total handling time, that's it. OK, let's have a look. This is a probability plot for the normal distribution. Let's make one for the lognormal and Weibull, before we interpret the graph. So we go back to Graph. You go back to Probability Plot> Single> OK. And we still have total handling time there, but now we go to Distribution. And instead of having the default for normal, we select Lognormal. OK> OK. And we have a second probability plot for the lognormal distribution. Let's make another one. An easier way to go back is with this Edit Lost Dialog button, and you get there straight away. Go to Distribution, and select the Weibull distribution here. OK> OK, and this is our third probability plot. Well, let's have a look at these outputs. This is the probability plot. On the horizontal axis, you see the data. To check if the distribution fits well, we have to look at the measurements, the dots, are on a straight line, and as much as possible between the two outside lines. The middle line presents the normal distribution. We see here that the normal distribution is a poor fit for a total handling time. Let's take a look at another probability plot. Here, the data's compared to the Weibull distribution. The data's not in between the lines. Hence, it does not fit well. So, let's take a look at the Lognormal distribution. Here we see that the data lie in-between the red lines, so all the data is lognormally distributed. If you compare all three graphs, you can conclude that a lognormal distribution fits best to your data of THT. As a second check, you can compare the AD values, this is the Anderson Darling statistic. It measures the distance between your data and the theoretical distribution, such as the normal, Weibull, or lognormal distribution. The lower the value, the better the fit. So we select the distribution with the lowest AD value. In this case, the lognormal distribution has a lowest AD value and therefore fits the data best. Remember this diagram? The probability plot is the tool to determine which distribution you should use to model the population. Apart from this, you can also use the probability plot to spot data anomalies. So, concluding, the probability plot is a powerful tool to determine which distribution fits best, and it can be used to spot these anomalies. Let's have a look at some common situations. This probability plot clearly shows outliers. It can also help to detect bimodality. In that case, you will see an S shape representing the two distributions. Probability plots that have small stacks of data, really show that the data have been rounded off. You should then focus on the mid-points of the data stacks. In this example we can conclude that the data is approximately normally distributed, but also rounded off. In summary, a probability plot can be used to find a theoretical distribution that fits best to your data. If many data points deviate from a straight line, and do not fall between the two outside lines, then we call it a poor fit. If the data lies more or less on the straight line, and many of the points lie between the two outside lines, we call it a good fit.