Hi, everybody. Now, we are going to talk about Gaussian probability distributions in one dimension. These are very, very useful distributions, and you will be a happier person if you understand a little bit about how they work. When we draw a probability distribution, the x-axis represents the values our random number x can take on. The y-axis is the probability density. There are lots of different kinds of distributions with lots of different shapes. There's an example of one distribution, kind of exponential looking. Here is another one. Here is another one, very tall and skinny. But there's one very important, very special probability distribution called the Gaussian Probability Distribution. Since it is so special, it is often called the Normal Distribution. And when you hear someone talk about a bell curve, they are talking about the Gaussian probability distribution. How do we write the Gaussian probability distribution? We write that P with a G for Gaussian. The probability density at x is 1 over the square root of 2 pi sigma squared, e to the- X minus U quantity squared, all over two sigma squared. If we are to draw this out, we will find that the distribution is centered at mu, and it has a width that is more or less described by the parameter sigma. Sometimes you'll see the Gaussian drawn on a logarithmic scale. So, in this case, the Y axis is the log of the probability density. In this case, the distribution looks like an upside down parabola. And that is because log of a Gaussian distribution is equal to sum constant minus x minus. Mu quantity squared, all over 2 sigma squared. And that is the equation for a parabola. But for our purposes, we won't worry about the log rhythmic scale. We'll just think of it as this bell curve shape. So, how do you think about probability distributions? There are many ways, but I like to think of probability distributions in terms of random number generators. So, let's say you have a random number generator with a little button on top. And every time you push the button, a number pops out. The distribution, p of x, tells you the probability that the number that pops out is between a lower bound say a, and then upper bound b. And the way the distribution tells you that is the following. The probability that x is between a and b, the probability the random number generator will spit out a number between a and b is the integral from a to b of that distribution P of x dx. So, let's put a g there for Gaussian. Because we're doing all this calculus stuff and dealing with integrals, technically, p sub g of x, this Gaussian distribution is a probability density. A density just means that you have to integrate over an interval to find the probability of your random number being within that interval. So, for example, if the value at b were say, .25, does that mean that there is a 25% chance of drawing the value b when you push the button on your random number generator? No, in fact, because this is a distribution or density over a continuous set of numbers. There is a zero percent chance that the number that pops out will be exactly b. And this is because the random number generator can spit out an infinite number of different numbers. And so, it will never spit out exactly the number you are looking for. Therefore, the probability of that number is zero, yet you can still calculate finite probabilities that the number that gets spit out will be within a certain interval. And these probabilities are always between zero and one. What if a is minus infinity and b is plus infinity? What is the probability that x is between minus infinity and positive infinity? Well, that should be 1. Because all of the real numbers are between plus and minus infinity. That implies that if you were to do this integral from x equals minus infinity to x equals plus infinity of the Gaussian distribution, you will get 1. And in fact, this is true for every distribution. A probability distribution should always integrate to 1. Let's go back to the math and take a look at the actual form of this distribution. So, we said that the probability density, at a point x was 1 over the square root of 2 pi sigma squared e to the minus x minus mu squared over 2 sigma squared. Let's analyze this just a little bit. P of x is a function of x, and what are its two parameters? Well, e, that's a constant, 2, that's a constant. Pi, that's a constant. So, its parameters are mu and sigma. And the way to think about mu and sigma are that mu controls the center of the distribution, and sigma controls the spread of the distribution. So, here are a couple examples. So, that's p, and that's x. So, here's maybe one distribution centered at mu 1. If we were to. Increase mu, that distribution would move up to mu 2. If we were to increase sigma, the distribution would spread out. Notice, however, that since the area under the distribution, that is the integral under distribution, has to be 1. If sigma increases and the distribution spreads out, it has to necessarily lower its peak. Because mu and sigma are the parameters of the distribution, of the Gaussian distribution. Sometimes we write this distribution in shorthand as N of mu and sigma squared. You can put the x there to indicate that it's a distribution over x. The N stands for normal. Now, mu and sigma,which are the parameters of the distribution, of the Gaussian distribution, are not just any parameters, but they are very special. And they are related to the mean and variance of the distribution. In fact they are so special, and so related to the mean and variance, that mu actually is the mean of the distribution, and sigma squared is the variance, and how can you figure that out? Well, how do you calculate the mean or the expected value of a random variable x which we sometimes write as E[x]. Well the expected value of x is just the integral from -infinity to infinity of all the possible xs, so we're adding up all of the possible xs. But we're weighting them by their respective probabilities, so that is the expected value of x. If you were to write out P(G) explicitly, that would be x times 1 over 2 pi sigma squared, e to the -x- mu quantity squared, all over 2 sigma squared dx. And the amazing thing is that if you calculate that integral, which you should at least once in your life, you will get the answer mu. So mu is the mean, it is the expected value of x. What about the variance? What is the variance? The mean told us where our distribution was centered, and the variance tells us how spread out the distribution is. The way it does that is by being the average or expected value of the squared distance of our random number to the mean. So if most of the samples from that random number generator are very near the mean, this quantity inside the brackets will be close to 0, and will have a small variance. If the samples from our random number generator are far away from the mean, that value will be large on average, so we will have a large variance. You can calculate this in a similar way to how we calculated the mean, but now we're adding up all of those possible squared distances, and then weighting them by their respective probabilities. And again, this is an integral you should do at least once in your life. And if you do that integral, you will see that it equals sigma squared. Thus, sigma squared is the variance of your distribution, that's pretty cool. How would we figure out the standard deviation of the distribution? Well that's easy, the standard deviation is just the square root of the variance. So the standard deviation is just sigma, that's pretty cool. This means that mu and sigma squared appear explicitly in our expression for our Gaussian probability density. The mean and the variance are right there. You don't have to do the integral like you would for a lot of other probability distributions, to figure out the mean and the variance of the Gaussian. If you have it written out, you just look for mu and sigma squared. That is a very nice property, that also means that the Gaussian is completely specified by its mean and its variance. As I said at the beginning of this tutorial, there are many different kinds of probability distributions. And they are all distributions over random numbers or, as you'll probably hear more often, random variables. However, the Gaussian is the most special distribution. So if you're trying to look for a Christmas or a birthday gift for one of your friends, and you have to give them a probability distribution, give them a Gaussian distribution. That will mean the most. So, why are Gaussians so great? I've told you they're great, but I haven't really given you many reasons. Well, I have given you one reason, and that is that the mean and variance completely specify them. So they're completely specified by the mean, mu, and the variance sigma squared, or alternatively the standard deviation sigma. Another reason is that in real life, which is one of the best kinds of life, lots of things are approximately Gaussian. So for instance, the current generated through a rod photoreceptor membrane after a photon strikes it is approximately Gaussian. The average firing rate of a large population of neurons is approximately Gaussian in many cases. When we present white noise stimuli, such as those that we used when we were figuring out the spike-triggered average, or doing spike-triggered covariance analyses. These are Gaussian stimuli. What Gaussian means, in terms of the white noise stimulus, is that if you were to present the white noise stimulus over and over and over again, and look at the stimulus' value at a specific point in time across all of those trials, you would see that it would be Gaussian distributed. Sometimes we actually enforced Gaussianity on on our stimuli during our experiment in order to make our lives easier. And you can see this because, if you been reading up under the discussion forums or reading up on things like maximally informative dimensions and what not. Gaussian assumptions make things like the STA, the spike-triggered average much more useful. If your distributions are non-Gaussian, the STA won't always give you a good measure of the stimuli that are causing your neuron to spike. Instead you have to go to more complicated analyses like spike triggered covariance or maximally informative dimensions, and those can be a handful sometimes. Gaussian stimuli and Gaussian distributed spike-triggered ensembles make life easier. Next, if you build a new random variable out of other random variables that are independent and Gaussian distributed, the new random variables will be Gaussian distributed. So for instance, let's say you have the variable x, and that's Gaussian distributed. Sometimes we write this little twiddle in order to indicate is distributed according to. So x is distributed according to a Gaussian with a mean of mu sub x and a variance of sigma squared sub x. If that is the case, then if you multiply your random variable by a constant to be a new random variable y, that new random variable will be distributed according to a new Gaussian. And that new Gaussian will have a mean of mu sub x and a standard deviation of a sigma x, which means that it has a variance of a squared sigma x squared. Secondly, let's say you have a random variable z, that is also Gaussian distributed with its own mean and it's own variance. In this case, suppose we define a new random variable w, which is x + z. if W is the sum of two Gaussian random variables, then W itself will be Gaussian distributed with mean mu x + mu z, and variance sigma x squared + sigma z squared. So if you're building blocks of Gaussian, then the things you build out of them will be Gaussian as well, and they will have means and variances that you can figure out right away. Now we come to a very important reason why Gaussians are great, and this is the central limit theorem. The central limit theorem says, okay, let's suppose we have a bunch of random variables that are all independent and all identically distributed. That means that you keep pushing the button on that random number generator over and over again. And you get out this list of random variables, each from the same distribution and each independent of all the others. Sometimes we write that as i.i.d., independent and identically distributed. So, let's say we have n of these variables. And let's say, and this is important, that we don't know what distribution was used to generate those random variables. So maybe each one came from the same exponential distribution or maybe each one came from the same uniform distribution. It doesn't matter what distribution those guys came from as long as they all came from the same distribution. So this can be from any distribution. Now let's define a new random variable, that is the average of all of our iid variables. So the average is X one plus X two plus dot dot dot plus X N divided by the number of samples we had. The central limit theorem, CLT for short. Central Limit Theorem tells us that as N goes to infinity, the probability distribution over Z over the average of all about of our samples goes to a Gaussian. This is amazing, here we made no assumption about the original distribution that each of our x's was drawn from. And yet, when we take the average of a large number of them, that average is, itself, Gaussian distributed. That's very cool. This comes up all the time in science. It is very often the case that you will run the experiment over and over and over again, and get many trials. And you may not know what distribution is generating the observations you get in each of those trials. However, if you have enough of those and they are presumably, all generated from the same underlying distribution. Then you can be fairly certain, that their average will be Gaussian distributed. This is very useful for doing things like hypothesis tests, or calculating P value, this is the central limit theorem. And the last important reason that Gaussian's are so great, is that they are unimodel distributions. All that means is that if you were to actually draw the distribution out, there is only one peak. This is great because a lot of times when doing estimation problems it is our goal to find the random variable with the highest probability. If you know that random variable is Gaussian distributed Then if you find one peak of that distribution, you know you're at the global peak of the whole distribution. You know that there is not another peak over here that you forgot about because you know that your random variable is Gaussian distributed. So this makes optimization problems easier. Now there are some subtler reasons why Gaussians are useful but that won't really come up in this course. But if you're curious, some of them are that one, the Fourier transform of a Gaussian is also Gaussian. Two, of all the distributions with a given mean and variants, the Gaussian is, in a certain way, the most random distribution. And that certain way specifically refers to its entropy. So the Gaussian is the distribution with a given mean and variant that has the highest entropy. And don't worry about what that means for now, if you don't understand it. We will talk about that a little bit later when we get to information theory. And another really nice thing about Gaussian's is that if you have two variables, x and y, and they are jointly Gaussian distributed. Then for example, if x is hidden and y is observed, the optimal estimator for x, given y, is a linear function of y and everybody likes things that are linear. So that's great. All right, hopefully I've convinced you that Gaussian's are pretty great, but you shouldn't get too carried away. There are a lot of times when it's a very bad idea to use Gaussian. So I'm just going to list a couple so that you are aware when they are not going to approximation to a true probability distribution. One example is when the true distribution is bimodal or you think your true distribution is bimodal. That is, your true distribution looks something like that, or if it has more than two modes. Since a Gaussian has only one mode, this will miss a lot of the important qualities of your true distribution. Secondly, a Gaussian is a bad idea if your true distribution is a distribution over small integers. In this case your probably better suited to a Poisson distribution. The third case is when your distribution is strictly positive so, things like waiting times. The random variable that's corresponds how long you have to wait for the bus if you have just arrived at the bus stop. A waiting time is always positive therefore it will not be well fit by a Gaussian distribution because the Gaussian distribution assigns non-zero probability to negative numbers. However, you can occasionally fit a Gaussian to a strictly positive distribution, if it's narrow enough that the probability of a negative value becomes very, very small. So for example, if you have a neuron that fires on average 200 times a second when you present a stimulus, but it varies a little bit about that. Even though the neuron's firing rate has to be positive, you might be able to fit a decently skinny Gaussian around that mean firing rate of 200 spikes per second. So you can probably think of some more scenarios when Gaussian's wouldn't work very well. But here a few common ones just to be aware of. So as a quick summary of our important points, we had that one a Gaussian was written first of all in shorthand as N sub x, U sub x and sigma squared sub x. And the actual mathematical form was 2pi, or 1 over square root of 2 pi sigma squared, times e to the -x- mu quantity squared, all over 2 sigma squared. I guess we'll give the mus and sigmas little x subscripts to stay consistent. So that's what it looks like. And we had that mu sub x was the mean and sigma squared sub x was the standard deviation. Secondly we have that, the probability and this isn't a density, this is the true probability that x is between a and b is equal to the integral from a to b over the probability density of x dx. And this is always the case regardless of whether your distribution is Gaussian or not. When it is Gaussian, this quantity is actually pretty tricky to calculate by hand, but there are a lot of computer programs that will do it for you. Next, we had the Central Limit Theorem and that said that sums and averages of i, i, d random variables were Gaussian distributed if you have enough samples in your sum or average. And lastly, just beware because Gaussian's, even though they can do a whole lot, they can't do everything. So there are a lot of instances when a Gaussian will not work for your situation. However, if a Gaussian is a good approximation to your situation then your life is going to be much, much easier. That's all for now, see you next time.