In this video, we're going to walk through some methods of identifying non-stationary time series. Now, if we can identify the sources of non-stationarity within our data. It'll allow us to move forward with constructing stationary model inputs from those non-stationary data. There are several ways to identify non-stationary time series data. The first is going to be the run-sequence plots, which are just the actual line plots over time. That's what we've been looking at this far. We've seen how we can identify non-stationarity through the examples we discussed. Another option is summary statistics and looking at the means and variances throughout our time series to see if it remains somewhat constant. We can leverage histogram plots since the distribution of stationary data will typically be close to the normal distribution. Then there are also statistical tests such as the Dickey-Fuller test. Which we will get into a bit. But basically, we will be testing the null hypothesis that our data is non-stationary. Starting off with the run-sequence plot. That is simply a plot of the unadjusted time-series data over time. This is often going to be the first step in any time series analysis. It will show if there's any underlying structure. With that, we should be on the lookout for trend, seasonality, as well as those heavy autocorrelation that we discussed. We see in the plot to the right a great example of a stationary time series. It shows that there's that constant mean, that constant variance, and there's no autocorrelation structure that hints to there being any movement due to an autocorrelation greater than one. The next method that we can leverage is calculating the mean and variance over time. A simple but effective way to do this is to split the data into chunks and then just compute statistics for each one of those different chunks. Then large deviations in either the mean or variance amongst those chunks would suggest us that the data is probably not stationary. As an example, if we were to split our data into three equally sized chunks, and you can do more than three. It can be five, it just depends on the size of the dataset. Here we see that the mean stays relatively constant throughout each one of the three different chunks. Again, when we divide it into chunks, these are each different time periods. Chunk 1 is the first time period within our full-time series. Chunk 2 is our second time period. Chunk 3 is our third. Then with that, we see that the variance also remains somewhat constant throughout. Because both of those remain somewhat constant throughout each one of the different chunks, we would say that this is probably a stationary dataset. Now with a different dataset. We see here that the means are actually quite different with 49.5 being fairly far off from 19.8 and then jumping back up to 30.6. Now the variances still remain relatively constant. But due to the fact that the mean does change, we would suggest and probably know that this is not a stationary dataset that we're working with. We'd require some type of transformation to make this some stationary model. Again, another useful way to clue you in on whether your data is stationary is to use a histogram plot. If the distribution of your data is approximately normal, then it's likely the time series is stationary. On the other hand, if the data is not normal, it's most likely that you're not working with stationary data. We can imagine data with a consistent mean and variance. That's again, a stationary dataset. Having most of the values towards that center, towards that consistent mean, and the values being less and less likely the further we move away from that mean value. If we have a consistent mean and a consistent variance, we would assume that most of the values are close to the mean and that the further away they move from the mean, the less likely they are to occur within our time series. On the other hand, if we were to imagine something like a series, for example, with heavy trends. We'd have many different values. If we had a heavy trend, if it was trending up, for example, with no clustering around any specific value, because every value is different as it continues to trend up. Therefore we'd end up with something like this uniform distribution that we see here. Thus a non-stationary dataset. The final way of identifying stationarity, that will discuss here, is the augmented Dickey-Fuller test, which is built specifically for testing stationarity. It's going to be a hypothesis test. That test the null hypothesis that our test is non-stationary. Or more specifically, what it's testing is that there is a unit root. We'll not go over exactly what this means that unit root. But what is best for is testing whether that mean is stationary, whether you have now autocorrelation structure that we saw would lead to non-stationarity. This test will return some p-value. If that value is less than 0.05, then we'll reject the null of non-stationarity and are more likely to be working with a stationary model. This will again be less appropriate to use with small datasets. Because we're looking for statistical significance and it's hard to find statistical significance with smaller datasets, and then again, as mentioned, will not do as well with testing something like changes in variance or heteroscedasticity. Due to the fact that this is more focused on looking for that autocorrelation structure or some type of difference in trend. Thus it's going to be best-practice, due to the fact that it can't always capture every aspect of non-stationarity, that we pair the ADF with the other techniques that we discussed. Such as run-sequence plots, summary statistics, or histograms as well. Just to recap. In this section, we covered, starting with the last video, different changes in trend and variance leading to non-stationarity. Dependence on recent observations or high autocorrelation leading to non-stationarity, and then the same for seasonal patterns. We saw examples of each in that last video with their respective line plots. Then in this video, we discussed different approaches to identify sources of non-stationarity. Such as plotting out those run sequences, looking at summary statistics and histograms, and even running statistical tests such as the augmented Dickey-Fuller test. That closes out this section. In the next video, we'll take a look at what we learned and discuss the best means of transforming our data so that we can work with stationary inputs when building out our models.