We've learned a lot about supervised machine learning so far. Including different model architectures and how they learn, and even some of the approaches for training and evaluating. But now we're going to zoom out a bit, and talk about how we evaluate the performance of our models. And to begin, we're going to revisit our discussion about data. And yes, I know we've already talked a lot about data already, and that's to be expected. After all, a common saying in machine learning goes that, 80% of a machine learning project is comprised of data cleaning, labeling, and preparation. While the other 20% is comprised of data cleaning, labeling and preparation. >> We discussed some of these points on data earlier. And that the data we'll use, is split into different sets, training, validation, and test. But let's spend a little more time on this, to set up for our discussion on model metrics. Usually an 80-20, or 70-30 train-test split is considered reasonable. Where 70 to 80% of your data is used for training and validation, and the rest is used for testing the model performance. Setting aside data as a test set for later is absolutely critical. This is a little trickier with time series projects. But most people hold aside old data from the most recent time point, and train the model on the data before that. For example, training and validating data from 2012 through 2015, and then testing on 2016 through 2017. >> There's other nomenclature to be aware of, when we were discussing splitting up our data before starting a project. You may encounter both the train-test split, as well as the train-validation-test split. Remember the model hyperparameters, and how we decided to set them. When we decide to set them, we typically experiment with a bunch of different options. Training different models initialized with different hyperparameters, and then comparing their performance on that validation set. We can then pick the hyper parameters associated with the best performance to use in our final model, that will evaluate on that holdout test set. The terminology can be confusing here, and some swap the definitions of the validation and test sets. So please be mindful of this. Also, the validation term does not mean external or clinical validation, and can also be confused here. So, it's good to be mindful of that as well. And some groups refer to this as the development set or dev set, rather than the validation set. Finally, keep in mind that in all the relationships between the train and test sets, also apply to the validation set. We want the data in the validation set, to be the data the model hasn't encountered in the training set, but it's still similar in nature to it. >> Another method for shuffling the training and testing data, is something called cross-validation, or a K-fold cross-validation. This is very similar conceptually, to a train-test split. But instead of just creating one split, we essentially do this multiple times. More specifically, we split our data set into k subsets or folds, and then train on k minus one of them. What we're doing, is holding out the last one of the subsets to use as a test set. When we do this over and over, we can get many estimations of the model on a variety of test sets or sub samples of the data. And this can be an advantage when there is not much data to begin with.