In this video, we're going to take a look at applying deep learning to a regression. I'm going to use the same airlines data set. But this time, We're going to predict the arrival delay. And for a bit of variety, we’re not excluding so many fields. So our business case here is we want to know how many minutes late the airplane will be after it’s taken off. So this should be a much easier problem because we get to know how many minutes late it was leaving. So the only fields we're excluding are the answer and the brilliant was it delayed or not. The actual elapsed time, which of course, is another way of saying the answer. And obviously arrival time is cheating. We're excluding title number simply because it's high cardinality and we're presuming it's low information. See the previous videos for why. The other interesting thing I'm going to do this time is include variable_importances = TRUE. This will tell us which of the fields are providing most information. It's a bit approximate when you're using deep learning. But as we'll see it's better than nothing and actually fairly useful, fairly insightful. So going to jump back to Python. And I've run all these commands, we've initialized h2o, imported the data, split it, setup which fields we're going to use, Everything you've seen before. Bring in Deep Learning Estimator. And then run it, variable_importances = True is set in the constructor, everything else we're using defaults. And that took about 18 seconds to run. So the regression is taking approximately the same amount of time as the classification that we looked at earlier. Let's see how it did. Got an MSE of 55.6. These numbers might be more useful because they are in minutes. So on average it guessed 3.8 minutes wrong. Root mean square error Is notably higher, and we're going to come back to why that might be later on. This command will run a variable importance plot. The default is to give you the first 10 variables, I've set it to give me the first 30. But the most important variable is air time. How long it was in the air. Then year curiously is the second most important. And the third most is how long the flight was supposed to be. And the fourth, departure delay. Nothing too fascinating here. Distance is also important. You can see, was the departure delayed or not. If if was that's an important variable. So the next thing I tried was giving it more time. As we talked about before, these are the defaults for early stopping, I'm just specifying them explicitly. So the only change is giving it 200 epochs. So giving it up to 20 times more effort. This took one and three-quarter minutes to run. And gave me, the MSE went down from 55 was it? To 47. And these numbers have also come down just slightly. The scoring history is quite interesting. In the end it used, This model. So at about 30 epochs was the best model. Not that different though from, I don't know, about 15 epochs. And not that different from at about 45 epochs. But from that point on you can see this gap just steadily grows, so it's entered the realm of overfitting. By using the scoring_history command, you can see that same chart numerically. What I wanted to focus on here though was comparing the rmse with the mae. And you can see the rmse, the root means quid error, is always higher than the mean average however. And what does that mean? It means when it gets it wrong, it gets it more wrong. And if that doesn't make sense, it's because the distribution of ArrDelay is very long-tailed. So I've taken that particular field arrival delay is y, and run a histogram on it. And I've requested 200 bars just so we can see some detail. So you can see centered around 0. There's actually two peaks just to either side of 0. But both of them look like an exponential decay. So I had this crazy idea of let's try the Laplace distribution. That's the only thing I'm changing is setting the distribution. Took about the same amount of time to run, let's see how it did. And, if you can remember the numbers from earlier, again we've seen a slight improvement. Although the root means squared error is significantly more than the mean arithmetic error. And this is some Python just to give us that as a nice little chart. So the first one is mean arithmetic error. By giving it more time and changing the distribution, we managed to improve from being 4 and a half minutes wrong on our guesses to being 2.2 minutes wrong. And we also, the root mean squared error, we managed to improve that from 7.3 to 5.3. So do be aware of the different distributions that h2o offers even in a deep learning regression model. And bear in mind the distribution of the thing you're trying to predict, to make an educated guess of what might be better.