In this demonstration, we use convolutional neural networks to analyze and classify image data. CIFAR-10 data set is a collection of images from the Canadian Institute for Advanced Research that has been widely used to train and classify computer vision models. The data set contains 10,000 total 32x32 color images from 10 classes: airplanes, automobiles, birds, cats, deer, dogs, frogs, horses, ships, and trucks. Our goal is to train deep learning algorithms to recognize and distinguish between the 10 classes of images and then use our model to classify new images. To begin, let’s first load the SWAT package as always, and let’s also load the repr package to set the default plot dimensions. Next, connect to CAS with all the same arguments as previous demonstrations and create the conn connection object. In all our previous lessons, we’ve loaded data into memory from the client. In this demonstration, the CIFAR-10 data set is on the Linux server, and we need to move it to the CAS server. To do so, we need to add a caslib to connect to the server directory where the data is stored. Note that this method of adding a caslib is identical to when we saved our champion machine learning model. From the table action set, we use the addCaslib action. We’ll name the library mycl, which is short for mycaslib; the path is the location of the data; and the data source is a given path. We’ll set the activeOnAdd argument to false to keep the default library active. That is, we’re using this new caslib only to load data to the CAS server. Notice also that we set subDirectories to true. This enables tables or files in subdirectories to be accessed from the caslib. We add this option because the CIFAR-10 data file contains 10 subdirectories, 1 for each image class. And each subdirectory represents the name of the class, such as airplane or automobile. This enables us to account for the image labels more easily when we read them to the CAS server. Next, we’ll load the image action set. As you can see, it contains actions to load, save, compare, and process images, as well as other tools for working with image data. Let’s first use the loadImages action to move our images to the CAS server. The caslib argument specifies the mycl CAS library we just added, and the path argument specifies the name of the folder in the caslib that contains the data. Remember, inside this cifar10 folder, there are 10 separate folders, 1 for each class of images. And each class of images contains 1000 images. Given the structure of the data in the subdirectories, we set recurse equal to true. This loads all the images from the 10 subdirectories and, most importantly, keeps the subdirectory folder names as labels for the images. This is much easier than loading all the images from a single folder and then trying to label the images on the CAS server. The decode argument, when set to true, decompresses images to the output table, the distribution argument specifies how to distribute images to worker nodes, and the labelLevels argument specifies the maximum number of directory levels to include in the label. Because the subdirectories were only one level down from the cifar10 file, we set this argument to 1. Finally, we’ll save these images in a CAS table named cifar10, and therefore my indata variable name is cifar10 as well. When we run this cell, it tells us that we did in fact load all 10,000 images from the cifar10 file into memory. The summarizeImages action provides a few more details about the images now contained in the cifar10 CAS table. The images are all PNG files, all 10,000 images are of dimension 32x32, and the three channels here refer to the pixel intensity on the red, green, and blue scale respectively. The columnInfo action displays the column names in the image table, as well as the type and length of the variables. For image data, the _label_ variable holds the target class that we’ll use in our deep learning model. Let’s ensure that the _label_ variable does in fact hold the images for the 10 unique classes. First, load the simple action set. Then, use the freq action and pass the _label_ variable to the inputs argument. The freq action shows the 10 categories of the _label_ variable, airplane through truck, and each category contains 1000 images. Now, let’s partition these 10,000 images into training, validation, and test sets. To do so, first load the sampling action set. In the srs action, we’ll set the training sample percentage to 60%, the validation sample percentage to 20%, and the remaining 20% is saved for test data. Set the partition indicator to true to add the partition indicator to the existing table. When we run this cell, we can see that 6000 observations were sampled for training, and 2000 each for validation and test. Remember, shuffling the data before building a deep learning model is highly important. If the data are sorted by the target level, the model is likely to overestimate the probability of the target for the first level passed to the algorithm. For example, out of the 6000 training observations, if the first 600 observations have the target level airplane, the deep learning model will overestimate the probability that the image is of an airplane. Shuffling the data simply scrambles the order of images to prevent overfitting on any specific class level. In the shuffle action, we need only to specify the table and the same table name in the casOut argument in order to replace the table with shuffled rows. Now, let’s load the deepLearn action set and create our convolutional neural network. We first initialize our deep learning model with the buildModel action and then specify the type of neural network. In this case, we’re building a CNN. We’ll name this network in the model argument the same as its type, cnn. Now we can build our neural network by interactively adding layers using the addLayer action. The first layer is, of course, the data layer. The model argument specifies the same name from the buildModel action, so I like to use the replace equal to true option to avoid using the same layer names by accident. We’ll name this layer data so that we know how to connect it to the subsequent layers in the model. The layer argument contains all the unique options for this cnn layer. First, the type is set to input. Recall earlier from the data description that all the images are 32x32 color images. Because our data are color images, we set nchannels to 3 as they contain red, green, and blue components. Therefore, the data will be represented as three color channels, or matrices of pixel intensities, for each of the three colors. For black and white images, you can set nchannels to 1. The width and height of the images are set to 32. If necessary for the input data, we can choose to scale the images down for computational ease, but at a cost of image resolution. For larger, more complex images, scaling might help focus the model on the most important aspects of the photograph. Finally, you can standardize your input using the std option. Next, we’ll add a convolutional hidden layer to the model. We’ll name it cnn1 and connect it to the data as its source layer. In the layer argument, the type is convolutional. We’ll use the rectified linear activation function, and we’ll use the Xavier weight initialization method. To extract information from the inputs, we’ll use 10 total filters, each with a width and height of 5, and slide across the input matrices with a stride of only 1. Therefore, the three channel input images are 32x32 in width and height, by 3 colors. And as a result, the filters are 5x5 in width and height, by 3 separate channels, with each filter learning its own set of weights to create feature maps. The relu activation function here transforms the output values of the feature maps, which are then passed to the next hidden layer. After a convolutional layer in which we created multiple feature maps, we add a pooling layer to increase computational efficiency by down sampling the incoming information. The model name is, of course cnn, we’ll name this hidden layer pool1, and the source layer is just the previous convolutional layer. In the layer argument, the type is set to pooling, and the pooling hidden layer hyperparameters are each set to a value of 2 for the width, height, and stride, which means that we intend to analyze only 2x2 localized regions of the incoming information and slide the region 2 units at a time. In each 2x2 neighborhood of the transformed feature map, we keep, in this case, the maximum value. But you can also use other functions to summarize the region, such as minimum or average. Taking the maximum, or simply using the pooling layer in general, effectively summarizes each localized region of the feature map and aggressively shrinks the network to improve computational efficiency. Generating a summary of each localized region has the added benefit of making the output approximately invariant to small deviations in the input. That is, the pooling function is useful when the modeler is more concerned with whether an object exists, rather than its exact location. If spatial differences are a great concern for the problem of interest, then pooling should be used carefully or not at all. Next, we’ll use a fully connected layer to map the pooling features to the output. I’ll name this hidden layer fc1 and connect it to the pool1 hidden source layer. In the layer argument, we set type to fullconnect, and I’m choosing to use 100 neurons. Note that the fully connected layer incorporates a large number of parameters and therefore is expensive to train, so n should be set small initially, and the activation and weight initialization options are the same as the convolutional layer, set to relu and Xavier. I also added a 40% dropout rate to avoid potential overfitting. Recall that dropout removes hidden units, inputs, or both are dropped from training for several iterations and then returned, and this process continues throughout the optimization routine. Dropout effectively requires each hidden unit to be more of a generalist than a specialist because each hidden unit must reduce its reliance on other hidden units in the model. You can run your model and try different dropout rates, depending on the training and validation loss summary. That is, you can overfit the model initially and then apply dropout to generalize the model. The final layer is, of course, the output layer, which we’ll connect to the fully connected layer above. Because we have multiple target classes, we’ll use the softmax activation function and Xavier initialization. Finally, we’ll use the modelInfo action to view the structure of this model. After we run this cell to create the network, we see a table description of the cnn model. The structure of this convolutional neural network has five total layers: one input layer, one convolutional layer, one pooling layer, one fully connected layer, and one output layer. This is a relatively simple model for image classification that we are using to describe convolutional neural networks and to train the model quickly. But you can expand on the basic architecture by, for example, adding in additional convolutional and pooling layers, and you can use more filters with different sizes and strides and so on. Now that we’ve built the network, let’s go ahead and fit the model using the dlTrain action. In the table and validTable arguments, we specify the training and validation data respectively, using the partition indicator. The target here is the label variable, and the input is the image variable, both created when the data were read in to memory. We’ll set a seed, specify the modelTable with the name cnn that we created in the previous cell, and save the model weights as trained_weights. For the optimization arguments, we’ll use a miniBatchSize of 50 and train for 100 epochs, and you can also use the loglevel argument to keep the output messages verbose. In the algorithm argument, we’ll set the method to Momentum and the learningRate to 0.01. We saw in previous deep learning demonstrations that the Adam algorithm adapted the step size based on second order information of the objective function. Momentum is a different extension of stochastic gradient descent. However, unlike Adam, which adapts the step size of weight movements, momentum dampens potentially high oscillating search paths. Because stochastic gradient descent uses a subset of the data to update the weights, the search path can vary and be indirect to the optimal solution. Regardless, it is still computationally efficient with large data sets and converges faster than using the entire data set to update weights. Momentum effectively alters the update vector to curve the gradient toward the optimal solution. Momentum is simply an alternative to Adam and can potentially improve convergence depending on the model and data at hand. But of course, there is no one best optimizer for all deep learning methods. Let’s run this cell and look at the model output. The notes in the beginning of the output are provided by using the logLevel argument above and displays various output regarding the optimization process such as the memory cost and time to fit the model. The modelInfo table shows the network architecture again, along with the total number of parameters. For this convolutional neural network with only five layers, the model used over 250,000 weights. This might seem like a large number of weights but remember that the model is analyzing all the pixels of the images and passing them through the network at different hidden layer stages to classify individual pictures. The model fit and validation error decrease similarly over the 100 epochs and are near equivalent at the last epoch. To get a final assessment of this model, we’ll use the dlScore action to score the test data set. The table argument specifies the test partition using the partition indicator, the model is the same as before, and the initWeights are the saved weights from our model. We’ll use the copyVars argument to save not only the scored data to the output table but also the target _label_ and the _image_ class name so we can compare the prediction later. The layerImageType argument specifies how to store the output. Here we’ll use jpg even though the data are PNG files, and finally we’ll save this model as cnn_scored. When we run this action, we see that the misclassification error is near 50%. Although this is a large error in general, a random guess model would have a 90% misclassification error rate. Of course, we could try reducing the error further by running the model longer, using more hidden layers, and changing the hyperparameters of the optimization routine. Instead, let’s further analyze this model’s results. Using the crossTab action from the simple action set, we’ll compute a cross contingency table for the actual by predicted classes for the test data. The table is the scored test output data, the rows are the target _label_ levels, and the columns will be the predicted levels. Recall, the variable DL_PredName is provided by default in the scored output table. Setting the action equal to crosstab effectively pulls the contingency table to the client as a data frame. So instead of creating an object reference to the scored data and then downloading it to the client, this is an alternative method to bringing the results, in some form or fashion, to the client. When we run this cell, we see the actual versus predicted results with numeric levels for the rows and columns as opposed to the actual image class names. Regardless, we can see that the largest numbers do appear on the diagonal, indicating in most cases that the predictions are correct. In the next cell, I’m using the data frame results on the client to find the proportion of correct and misclassified for each class by simply dividing the diagonal elements, which are the correct predictions, by the total number of images for each respective class. The classes variable simply holds the names of each image class so that we can represent the labels without arbitrary numbers like in the contingency table. Finally, I’m joining the classes vector and the proportions for correct and misclassified for each target level and giving the three columns appropriate column names in the new data frame, df. Some images are certainly more challenging to classify than others. For example, bird, cat, dog, and deer images all had high misclassification rates. But the mechanical images such as airplanes, automobiles, ships, and trucks were more easily identifiable by this five-layer convolutional neural network. Because this data frame is on the client, we can plot the misclassification rates using open source functionality. In the bar chart, we can see that the frog pictures were the easiest to classify and that the pictures of cats and other mammals were the hardest. Finally, use the endSession action to end the CAS session.