0:00
Hello, everyone. This is Tural Sadigov.
And in this lecture, we will continue our SARIMA fitting process
and this time we're going to look at the sales data at a souvenir shop in Australia.
Objectives is to fit SARIMA models to the dataset about
the sales at that souvenir shop and this data is from Time Series Data Library.
And then the second objective is to forecast the future values of the same time series.
And our modeling process is the same.
We're going to look at the time plot;
if they need transformation,
we're going to transform the data.
If we need differencing - seasonal or non-seasonal,
we're going to do differencing.
And then we're going to look at ACF and PACF to determine our orders.
Now, there were orders for autoregressive terms, moving average terms,
seasonal auto regressive terms and seasonal moving average terms.
Once we have some idea of a lot about our orders PQ,
we're going to look at a few different models as we did before.
We're going to use the parsimony principle and choose the smallest AIC value.
At the end, of course,
we're going to do the residual analysis.
And let me just remind you,
the parsimony principle that we adapted in this lecture or in
this lesson is that the sum of our parameters,
p + d + q + P + D + Q,
all of them should be less than or equal to 6.
And Time Series Data Library,
as we said before,
is created by professor of statistics, Rob Hyndman.
He's a professor at Monash University in Australia.
So this dataset is a monthly sales for a souvenir shop in Queensland, Australia.
And it is recorded from January 1987 'til December 1993.
And if you look at the time plot of the monthly sales,
we see the following.
We see that some kind of seasonality going on, right?
Is that every year there's this high - every year there's this high - high value.
And every year it seems like the variations kind of increases.
So there's a change in variation, there is seasonality.
In fact, if you - if you carefully look at this,
we can almost see a non-seasonal trend as well.
All values are almost increasing.
Okay. So we can look at our ACF and PACF.
ACF will already tell us that if there is a seasonality or not.
We can see autocorrelation at lag 12,
lag 24, 36 and so forth.
Seasonality is definitely existent in this data.
Since there is already a trend and
different variation - though we will have to do seasonal differencing,
non-seasonal differencing - but even before all of these,
since the variation is increasing,
let's do a transformation first to stabilize the variance.
Because at the end of the day,
when we fit our SARIMA model to a dataset,
we expect our dataset to be a stationary dataset.
At this point, it's definitely not stationary.
Okay. So we're going to take the log transform - that's what we usually do,
we take the log transform - and once we have the log transform,
we will need non-seasonal and seasonal differencing.
So d is going to be 1,
D is going to be 1 and of course,
the span of the seasonality is 12 months,
so this is going to be 12.
So basically, this is the operator we're going to apply to our dataset;
logarithm, differencing and the seasonal differencing.
Okay. So let's look at it.
This is our Dataset Time Series.
Once I take the log transform, this is what we obtain.
Somehow we stabilize our variance,
even though there is definitely a trend and seasonal trend that's left there.
First, we take non-seasonal differencing,
so that would get rid of that trend.
As you can see, there is no trend anymore,
but there is seasonal trend.
The seasonality still is there.
And once we take non - I'm sorry,
once we take a seasonal differencing and then we obtain this green plot,
which we will assume that it is now a stationary dataset.
Now one can say that actually the variance at
the beginning of this time series is definitely different from variance at the end,
but at this point, we will assume that this is a stationary time series.
If I look at ACF and PACF of
our transformed and non-seasonal and seasonal difference dataset,
we see the following;
we have one significant autocorrelation at lag 1 that will tell
me that the q - the order of the moving average term is either 0 or 1.
Probably 1, but we'll see.
We don't see any significant autocorrelation in other lags that's closer to 0.
So Q is for us going to be either between 0 and 1.
If I look at seasonal lags,
this is almost significant,
but not that significant.
But I see a significant lag at 36,
there's significant lag at 22.
So we'll just try few different - this is actually 34.
But we're going to try a few different values for seasonal moving average term.
If I look at PACF,
which will tell me usually the order of autoregressive terms
and/or seasonal autoregressive terms,
we have a significant lag at 1;
so our capital P can be 0 or 1.
But if I look at seasonal lags 12,
24 - there is no significant autocorrelations.
So we are going to assume maybe that capital P is
either 0 or 1 and we'll look at those values.
So order specification; you have q values,
capital Q, p and capital P values.
So we look at few different values of P-Q, of course.
Remember the parsimony principle,
that these values should add up to six or less.
If I look at AIC values,
the minimum in this slide
- there's one more slide that we're going to look at - in this slide,
the minimum value is actually negative 34.54 as this model.
But if you look at the next slide,
then we see that there's another minimum value and this is negative 34.98,
which is a minimum of all of these values.
It's the model we are going to adapt,
we're going to fit to our time series.
This is going to be 1 1 0 0 1 1 12.
And let me just note,
the smallest SSE value corresponds to a different model.
So if I look at the residuals from the SARIMA model (1,1,0,0,1,1)12,
this is my Standardized Residuals. It looks white.
There is no significant autocorrelation, sample autocorrelation.
If I look at Q-Q plot,
the middle part is linear,
but then there is a systematic departure at the tails.
But if I look at the p values from Ljung.box statistics,
it tells me that there is non-significant autocorrelation left in the residuals.
So if we use SARIMA routine or ARIMA routine in our -
we'll get this coefficients for autoregressive term.
This is because we have 1, the order 1,
for the autoregressive terms and we have order 1,
seasonal moving average terms.
These are our estimates,
standard areas for these estimates.
And if I look at p values,
p values are so small;
which means that both of these coefficients are significant.
So let's actually model our Time Series.
X_t is the sales at the souvenir shop,
but what we modeled is a logarithm of it.
So the logarithm, we call it Y_t.
Y_t became the SARIMA (1,1,0,0,1,1)12.
So I have one difference;
the one is non-seasonal differencing. That's one_minus_p.
One seasonal differencing, that's one_minus_p_to_the_12th.
And there's one autoregressive term,
which is one_minus_p_B, that's
our polynomial and the one is the degree of that polynomial, basically.
There is no seasonal autoregressive terms.
On the right-hand side,
we do not have any moving average terms,
but we'll have seasonal moving averages terms -
that's why we have one_plus_teta_B_to_the_12th.
If we expanded it,
we get a model for Y_t from our SARIMA routine.
The previous slide, we obtained phi hat and theta hat.
These are our estimates - point estimates for these coefficients.
If you put them in, this becomes our model.
So this is the model for the logarithm of the sales data.
And here Z_t is approximately normal.
As a model, it's normal with a variance 0.03.
If we look at the forecast routine,
basically this is the forecast for the logarithm.
It gives us the forecast.
This first shaded area - this is 80 percent confidence interval.
The second shaded area - this is 95 percent confidence interval.
In fact, if you look at the forecast model for the next year,
we have the confidence interval endpoints - the limit points for
the 80 percent confidence interval and 95 percent confidence interval.
We also have the point estimations.
Now this is the data we looked at.
This is the time series we started with.
This is the forecast,
which is stretched out - that's the - for next year.
So this ends at, let's say, 85.
This starts at 85 and goes up.
And if you combine them,
this is the monthly sales data until here.
And then this last period,
that last part is our forecast.
Now I'd like to note the following.
If you look at ACF of the transformed and differenced seasonality,
non-seasonality differenced data set,
this is what we had.
And we said that ACF tells me that I have only one significant autocorrelation,
so that might tell me that the Q order of moving average term is actually 1.
Then one might say wait a minute,
I do have - well,
even though these two lags are not
significant because they are less than dash lines, they're almost significant.
So one might try different values of q - little q - up to three.
If you do that,
we obtain another model,
which is SARIMA (0,1,3,0,1,1).
In this case, we do not have autoregressive terms,
but instead we have three moving average terms.
In fact, AIC value and SSE values of
this new model is actually smaller than our previous model.
So if you think that those two lines are actually significant,
you might want to fit SARIMA (0,1,3,0,1,1) instead of our model that we fit.
And if you look at the P value from Ljung.box statistics, it's actually bigger.
And if you look at the residual analysis for this new model,
you see that P values are very high.
No significant sample autocorrelation function.
The residual looks white and our residuals are almost normal,
but there's a systematic departure on the left tail.
Okay. So what have we learned?
We have learned how to fit SARIMA models to the dataset about
the sales at the souvenir shop in Australia.
This dataset was from a Time Series Data Library.
And we learned how to forecast, again,
future values of this examined Time Series.