Welcome back to our notebook here on manipulation of time series using pandas. We saw in the last video how to work a bit with that DatetimeIndex and how that can be leveraged using pandas. Here we will continue to leverage that DatetimeIndex, starting with the resampling functionality that will be available to us. What that's going to mean is that we are going to change the frequency from days to months or from days to weeks or even from months back down to days. We'll see how that all happens using both the resampler objects, as well as the asfreq method, spelled A-S-F-R-E-Q as we see here, asfreq method. The first thing is resampling and we'll start off with upsampling. Moving to a longer period, so moving from days to weeks or days to months. Again, our sales_new is going to be a frequency of days. We're going to call resample. I'm going to pull this out real quick. If I just call the resample with sales the whole word, we now have a resample object, and in order to actually get a data frame out of it, we have to call some type of aggregation. Here we call.sum and we get that sum value, so we summed up all those days into a week. Then we see that we have weekly values, the 9th, 16th to 23rd. Then we're going to do the same thing for monthly sales, quarterly sales, and annual sales. You can see we have each one of those values here and they should increase as we increase that amount of time that is aggregated over. Now downsampling is going to be moving in the opposite direction, so moving from say, years back down to days or from years down to months. When we do that, it's going to be assuming that we only have data. Say we're working here with the annual data, that these are actual data for what? A month, and when we want to resample from a year down to a month, the assumption is that we're missing 11 months of data. Therefore, starting off when we resample down from years down to months, we'll have a bunch of null values and we have this option to interpolate those null values and we can use different methods. We can look at this interpolate link that we have here in regards to different methods that are available in regards to filling in those values. I'm going to run just this top one first and see what this looks like. We can just do this and say.sum, for example, doesn't matter. It's only going to show one value again, and then zeros throughout the rest of it. Then when we call the interpolate, we see that if we look at that month, we started off here at the top with 157,192.9 and then let's actually just look at the top 13 values here. Then the next value was 170,00. It came up with a means. Let's do the head here came up with the means of filling those values. As you see, it goes from 157129.9 to that 17518.2 and fills in those middle values, and that's the same for office supplies as that decreases and technology is that one decreases as well. Now we can do this using asfreq as well and pass in D for day. Let's run this. We have business day using B and then whether if we wanted to change it to hourly, we can use h. We can see this hourly data now. Again, this is downsampling, so it's just going to have null values through the minute. This is daily, so you imagine we're downsampling, there's going to be quite a few blanks for every hour, 24 to be exact. We looked at 25. We can see here that first value come up for the next day. Now moving on to variable transformations that will be available to us. For time series models, we're going to often want to use log transforms or the differences and we'll see how those differences will play a role in getting that stationary, that stationarity that we talked about. In order to do those efficiently, we can use this DatetimeIndex and leverage that in order to transform our data using at sales_monthly, which is already has that DatetimeIndex by month, we can call.diff. Let's run this just to see what it looks like and we see we have the difference from one value to the next. We see a decrease in the first month by 3,000 and increased by 12,000. We can call percent_change and we see there's 60 percent decrease and then the 580 percent increase from there. Then we can do a log transformation, and just take a log of the whole thing. That's not going to be specific to working with the DatetimeIndex. We have one plus the values to ensure that there's no zeros. What's also available with NumPy is just calling np.log1p and we can subtract this and it will have the same output. If you looked, we add 8.7 before, I call that again, and we still have that 8.7. Then just to prove that this is working accordingly, we join that sales monthly percentage right here to our original sales monthly and where the values are the same, we just add the suffix percent change. We see that this decreased 60 percent, which makes sense given the furniture values, and then it increased by 580 percent as we see it went from 2,130-14,574. Now another thing that will be available to us, is working with the rolling averages with a certain window. We may want to smooth out our data. Again, this will be powerful for many of our models as well. In order to smooth our data, we're first going to set this window_size equal to seven, and we're using that daily data, so we're averaging across each week. We can call rolling window, which is going to be, we create this rolling objects. Let's just go above. We didn't define the window size just yet we'll put that here above as well. You see we have this rolling object and then we can just call the mean and when we call the mean, then we have that rolling average across the past seven days. You can see that as we move forward and we start off with no values because you can't have a rolling average until you have those days available to you. Here we're actually dropping those null values and we can see the rolling mean, the rolling standard deviation and then going back to the original sales new. So no longer working without rolling window objects, we can call cumulative sum and then get the cumulative sum for a certain amount of time as we move forward. We see that they keep adding on to one another. We can see the cumulative sum and how that grows over time. Now Pandas is going to have built-in plotting functionality. If we recall our sales quarterly or sales monthly, let's look at that data very quickly so we know what we're plotting. Let me just call that head. You see that we have the date as the index and then we have different columns, furniture, office supplies and technology. The default if you just call dotplot when you're working with time series data, it'll create this time plot for you. Then if you have different columns, it'll do it across the different columns. So different lines for each column. You see here we have the furniture, office supplies, and technology trends for every year, or actually this is for every quarter, for every month and then for every day. The next thing that we're going to plot is going to be that standard deviation, as well as the percent changes, the cumulative sum, and then the percent changes for the quarterly data. Again, the output, if we look back up here to those rolling averages, as well as the cumulative sales, it's going to be that same format as we have here in regards to that Pandas DataFrame. So when we plot this out, or again, getting the date on the x-axis and then different lines for each one of our different categories and we can see the standard deviation, rolling seven-day average, the monthly sales percentage change, the cumulative weekly sales and the quarterly sales. Now the last thing that I want to touch on are some important plot's working with time series data and we'll get more into many of these later on. But these will help you find first with the ACF and PACF, where there's correlation between one period and the next. We discussed in lecture how that's going to be important concept to keep in mind when working with time series data that auto-correlation. Then we'll see the monthly plot as well as the quarterly plot where we can see over time whether or not that month that was lower or higher than other month or that quarter was lower or higher than other months and we'll see what that looks like in the plot in just a second. So we import our ACF, our PACF our month plot and our quarter plot. These are all available in these time series plots and stats models. We call plot ACF and we're just going to pass in one of our columns. We say lags of 30 and we say for 30 different time periods and looking back 30 time periods at a time, how autocorrelated, how correlated is each value with the value of lag 1, value of lag 2, value lag 3, all the way through value of lag 30. This blue line is supposed to say whether or not a statistically significance and we see some of them jump up, none of them are too high up in regards to how much they correlate with the past value. The autocorrelation, which we'll talk about a bit further later on, is going to be how much is correlated with the last value. The partial autocorrelation is going to be the same thing, except it's going to not take into account all the other lags that may be interfering. So if there is autocorrelation of 1, you'd imagine that the second value out, so lag 2 would also be autocorrelated because it's correlated with lag 1, which is correlated with the following value. Partial autocorrelation rules out and looks at it just specifically for lag 2, while, accounting for lag 1, holding that constant. We're then going to, just to make this a bit clearer, look at the sales data. So we're here, we're looking at a lag 12 and we see that there's that autocorrelation that slowly diminishes, which will be something that will get used to as we start to talk about autoregressive models and what this actually means. We see that the spikes are up here for the partial autocorrelation to this strong lag 1 and lag 2 correlation and then beyond that there's no statistically significance lags, or autocorrelation between lags. Now we're going to look at them month plot. What that month plot shows you on the x-axis is going to be January or February, March, April, so on and so forth and then we see the actual values. For every year we actually have plotted for January, each one of the values and then the red line is the average of those values. So we can see that the December sales are much higher than the January sales, for example. We can do the same thing for quarterly. Now we're just seeing Q1, Q2, Q3, and Q4, and we see how it trends up over those quarters no matter what year you are working with. That closes out our discussion here in regards to everything we want to introduce you to, one working with Pandas time series. In the next video, we will go over the exercises that hopefully you've already worked through in regards to coming up with different manipulations and plots of your own. I'll see you there.