Where do data engineers come into the picture? Don't forget data engineers build data pipelines, and machine learning pipelines are no different. If we want to have a flexible pipeline for all stages of machine learning, Kubeflow is a great option. Many people think that machine learning products are all about the code that ML scientists write locally on their machines. Does this code ensure that data going into it is clean? Can the code autoscale to clients who want to use it for serving predictions? What if we have to retrain the model, does it go off line at that point? The truth is, Production Machine Learning systems are large, complicated distributed system. There's a lot of DevOps involved with things like monitoring and processing management tools. Google started building Kubeflow to tackle these DevOps challenges using Kubernetes and Containers. One option to help manage the overhead of productionizing ML pipelines is to use Kubeflow. The capabilities provided by Kubeflow Pipelines can largely be put into three buckets: ML workflow orchestration, share, reuse, and composed, rapid reliable experimentation. You can think of the benefits as similar to those of Cloud composer but better tailored for ML workloads. Let's see what a pipeline looks like. To make things more concrete, let's look at a screenshot of an illustrative workflow that was run on Kubeflow Pipelines. This is just an illustrative workflow, and users can author and run many different kinds of workflow topologies with different code and tools in the various steps of the workflow. For each workflow that is run on Kubeflow Pipelines, you get a rich visual depiction of that topology so that you know what was executed as part of the workflow. In this workflow, we start with a data preprocessing and validation step. The preprocessed data then flows to a feature engineering step. This is followed by a fork where we train many different kinds of models in parallel, like a wide and deep model, and XG boost model, and a convolutional neural network or CNN. During training, you can click on the model for UI to view critical performance characteristics in the model. Here we can see the ROC curve of false positive rates versus true positive rates for the model during training. If you're familiar with TensorBoard, you can also view the TensorBoard metadata for the model as well. Once training is complete, the models that are trained are then analyzed and compared against each other on a test dataset, and you can choose from which one performed the best for your use case. Most ML products will stop here and iterate back to the beginning to continue improving model performance before moving to production. Finally, once you're happy with model performance, you can have a Kubeflow node, push it to production serving endpoint. For each step of the workflow, you can see the precise configuration parameters, inputs, and outputs. Thus, for a model trained with Kubeflow Pipelines, you never have to wonder how exactly did I create this model. Here you can quickly see how long the model training took, where the train model is, and what data was used for training and evaluation. You can define the ML workflow using Kubeflow as Python SDK. By defining the workflow, we mean specifying each steps; inputs and outputs, and how the various steps are connected. The topology of the workflow is implicitly defined by connecting the outputs of an upstream step to the inputs of a downstream step. You can also define looping constructs as well as conditional steps. Another nice Kubeflow feature is the ability to package pipeline components. This adds an element of portability since you can then move your ML pipelines even between Cloud providers. Kubeflow Pipelines separate the work for different parts of the pipeline to enable people to specialize. For example, a ML engineer can focus on feature engineering and the other parts of creating the model such as hyperparameter tuning. The ML engineer solutions can then be bundled up and used by data engineer as part of a data engineering solution. The solution can then appear as a service used by data analysts to derive business insights. Kubeflow makes it easy to run a number of ML experiments at the same time. For example, if you're doing hyperparameter optimization, you can easily deploy a number of different training instances with different hyperparameter sets. Kubeflow's run overview makes it easy to hone in on the techniques or parameters generating the best results. You can quickly identify what works and what did not work.