[SOUND] Continuous random variables are usually measurements, and they take on infinite number of values. For example, the exact temperature outside. This type of a random variable is not defined at specific values. Instead, it's defined over an interval of values and is represented by area under a curve. In advanced mathematics, this is known as an integral. The probability of observing any single value is equal to 0, since the number of values which may be assumed by the random variable is infinite. Variable X, which could take on any value over an interval of real numbers. Let's consider water temperature. Water freezes at 0 degrees Celsius and boils at 100 degrees Celsius. So if water temperature is what we are recording, then X, our random variable, is water temperature and it can take on any value between 0 and 100. Then the probability function piece of X must satisfied the following, P sub x for all values of x and that interval is greater than equal to 0. In another word, no value has a negative probability. And the total area under the curve is equal to 1, which means collectively all possibilities are accounted for and they sum up to one. Normal distribution is one of the best known distributions used for continuous random variables. But as you will see, as you will go through this course, is that the normal distribution applies to more than continuous variables. Given the right conditions, we can use this distribution to approximate, for distributions used for discreet variables as well. because it's very important to learn about this distribution and learn how to use it. Normal distribution is a bell-shaped curve, which is defined by its central point, and its standard deviation, sigma. The mean, which is also the median, represents the center point of the distribution and the standard deviation controls the shape. Assume we have the following distribution, where the mean is 500, and the standard deviation is 15. This curve is most big at the mean of 500. Now if we have a distribution with the same standard deviation of 15, but mean of 300, the curve will look like this. The shape of the two curves look exactly alike because they both have the same standard deviation. The second curve, however, peaks at 300. It's me. Now I imagine another normal curve. This time, the mean is the same value of 500, but the standard deviation is 5. As you can see, both curves peak at 500. But the second curve's spread is not less than the first curve. So the larger the standard deviation, the more spread out the values of the distribution are, as is the case for this curve. Likewise, the smaller the standard deviation, will mean the values are less spread out. In which case the mean is a much better representation of what one might observe. Other properties of a normal curve is that the curve is symmetrical by this mean, that means the left and the right side are mirror images of one another. Both tails of the distribution extend to infinity getting closer and closer to the horizontal axis, but never touching it. The left side goes to the negative infinity, while the right side goes to the positive infinity. And in spite of its infinite width, the area under the curve is 1. Since normal curve is symmetrical, the area under the normal curve to the right of the mean equals the area under the curve to the left of the mean, and each of these equals 50%. Together, they make 100%, or 1. Imagine we have a normal distribution with a mean of 1000, and a standard deviation of 10. We want to know, what is the probability of this random variable X to be more than, or equal to, 990, or less than or equal to 1009? To answer this question, let's first see what we are asking in a visual form. The answer to our question is the blue shaded are under this normal curve. To answer this mathematically, we first find the equation for the curve and then take the integral under the curve, between two end points of 990 and 1009. And this is definitely one way to solve for the question asked, but there is another and more expeditious way of solving for this question. And that is by understanding a special form of a normal curve known as the standard normal curve. The standard normal distribution is a special case of normal distribution, which has a mean of 0 and standard deviation of 1. Also, normal random variable of a standard normal distribution is called a standard score or most commonly known as the z-score. Z-score tells us how many standard deviations the value of interest is above or below the average. We saw z-score in an earlier lesson and it's calculated by taking the value of interest minus average of all values, divided by the standard deviation of all values. Now why bother with this? Well we have calculated the area under the standard curve for all possible values of z. There are normal tables to look at and all statistical softwares have these values figured out. So then for any given normal curve, all we need to do is to convert it to a standard curve instead of doing calculations individually for each value of mean and standard deviation. The function Norm.Dist in Excel will return a Z value based on the standard normal distribution. Make sure to watch the Excel illustrations videos to learn about this. Now back to our example. With a mean of 1,000, standard deviation of 10, we want to know what is the probability of a random variable to between 995 and 1,005. We can convert the values to z-score. Again, this is called standardizing. For 995, z is calculated by subtracting 995 from 1000, and dividing the difference by 10, which is negative .5. And for 1005, z is just 0.5. So, basically asking what is the probability of X between 995 and 1,005 is the same as asking what is the probability of X being between negative 0.5 and positive 0.5. We can answer this translation a lot faster by either looking up the information in a normal table or use a software like Excel, which would give us 0.3829. In another word for a normally distributed population, which has a mean of 1,000 and standard deviation of a 10, the probability of observing a value between 995 and 1005 is 38.29%. Again to learn how to use Excel to find these values, please watch the Excel tutorial videos. For most data sets, the majority of observations clump around the average with the number of observations decreasing the farther values are from the average in either direction. Standard deviation is the most common measure of variation, and tells us how the whole collection of values varies. For normal distribution, we expect to see about 68% of a population to be within one standard deviation from the mean. 95% of observation fall within two standard deviations from the mean. And 99.7% fall within three standard deviations of the mean. Observation outside of the three standard deviations are considered rare and are called outliers. So now, let's practice. We have these two normal curves centered around the value of 50. Can you approximate the standard deviation of curve A and curve B? Look at Curve A. The curve seems to go from mean to about 95, before the tail becomes very thin. So the value 95 is roughly around 3 standard deviation. So that would be a width of 95 minus 50, which is 45. And if that is about three standard deviations, then the standard deviation must be around 15. Now, look at Curve b. Same approximation applies. The curve seems to get thin at about 75. So take 75-50 and divide it by three. This distribution has a standard deviation of about eight. While perfectly symmetrical curves may not exist in real world application, we often do encounter phenomenon in real world which follow at least a near normal distribution. This allows researchers to use normal distribution as a model for assessing probabilities associated with these real world phenomenon. Furthermore, as you will learn later in the course, the normal curve allows us to use sample information in order to get understanding about the population. Normal curve also can be used as a great approximation for some discreet random variables and distribution. For this reason, it is the most important probability distribution in the field of statistics.