Welcome back to R notebook here on stationarity. Now we're going to go over the exercises which should recap what we learned in last video, and then we're going to get into how to actually identify stationarity. So in this example, we again are working with 100 data points. We pull in our different files. Each of these are NumPy arrays, and we plot our run sequences. And looking at each of these, hopefully after the examples that we gave in the last video, it's fairly clear why each of these maybe non stationary. So the first one, dataset SNS1, clearly has heteroskedasticity. The variance changes right in the middle there. And then for dataset SNS2, we see that there's some trend in some heavy auto or maybe heavy autocorrelation which is clearly again not stationary. So what are some better practices besides just running these run sequence plots to identify stationary? So starting off as we mentioned, these run sequence pots will do a fairly good job of telling us right off the bat whether there's something like trends, whether there is seasonality and so on. Now another thing that we can do is to check if there's a constant mean or variance, is to chop it up. Chop up our series into smaller sections and once we have those smaller sections, see if that mean or that variance stays constant in each one of those different sections. So if we look at our trend data and we split into chunks in this np.split, will allow us to take our original array and split it into as many indices or sections we want. Here we're going to split it into ten sections. And then we can plot these out. This is just a chunk and then these are just string formats where you pass in the actual mean, the string mean, the string variance. And then we print out a bunch of lines so that we have something like a chart here and then we're going to print out the actual chunk that, or in the mean of that chunk and the variance of that chunk. And we see here that the mean continually increases throughout for our trend, whereas the variance says relatively constant. And that makes sense given we were looking again at that trend series. What could have been faster if we just wanted to see the mean of the chunks? We could've just called np.mean chunks and it would perform that each separately as well as np.bear on that chunks. Now again, while that variance is relatively constant, the mean changes overtime. And something that I want to note here is that we do expect some fluctuation in the values. It's going to to be highly unlikely that either the mean or the variance will remain exactly the same from chunk to chunk, but it should be close in general if we want it to be stationary. Now if we want something more sophisticated, we can run statistical tests and we will do something like this later on. But another useful tool will be to actually plot out the histogram. Now we're looking here at the trend and we're looking at the histogram. And plotting the history of the time series should give you some important clues to the underlying structure. So a normal distribution generally will give confidence that the mean and variance are constant. Whereas if we look at the values and we see that they are uniform like we have here, it would make sense, that is something with the trend. The idea being that every single value is very far away from every other value. And if every value is very far away from every other value, they're all equally likely to show up in your histogram, and therefore you have a uniform distribution. Whereas again, if you think about a normal distribution and it will pop up for the stationary series, most of the values are clustered around that zero where they're a little bit less likely to be out on the stages, but they seem to come back to that zero value. Now, a statistical test that we can do in order to test for stationarity is the augmented Dickey Fuller test. And this is a statistical procedure to help source out whether a time series is stationary or not. And we don't go into the nitty gritty details here, but we will have an example later on where we test for the series with different variances. And what I highlight if you do want to dive deeper into the augmented Dickey Fuller test, is that this is testing to see whether there's a unit root. Or in other words, whether or not there's really high correlation between the current value and some lag values. And that will be the case if you have trends, that will be the case if you have that trend going up and down as we saw either in the spot here, but as well in this plot here. Or in this part that we have up here with the autocorrelation structure. But coming back down to where we are, again, our null hypothesis is that the series is non stationary. And our alternative is that it is stationary. So if we reject the null, then we would start getting to the fact that it probably is stationary. So we import this AD Fuller test and we can call out AD Fuller on our stationary series. And that's going to give us the ADF, the actual test statistic, which is -10. And then we can also look at that P value. And we can see that that P value is very, very small, and if we recall with any type of hypothesis testing that were doing, if the P value is very, very small, we reject the null. And therefore, the null hypothesis of the series is not stationary, would be rejected. Some other values that are output that may be important are the number of observations, so just to remind us that we're working with 99 observations. And then another one which will be useful is the different critical values given our test. So for the test statistic, if we want to know if it passes the 1% threshold, that would be at a test statistic of -3.5. 5% threshold would be a little bit smaller at -2.9. And then the 10% threshold, which would be even easier to reject, it's going to be a -2.6. Now let's look at the trend series with our ADF and we run this. And we see here before we worked with the stationary series, here the P value is very, very high. And we would definitely not reject the null of the non stationary series as we are working with a non stationary series. So it was able to correctly identify that. And we have the same results when we try to look at that lag data, which is again the one with high autocorrelation that I showed you above just before. And we see again that we wouldn't reject the null that this is a non stationary series. Now, I'm going to pause the video as we did last time and in the next video, we'll get into the exercises that we have here. As well as jumping into some common non stationary to stationary transformations that you can use hence, that we'll go into deeper in lecture after this. All right, and I'll see you there.