Much of the material for machine learning has focused on Building Models, and the different layer types that you can use to build models for particular solutions, such as convolutional neural networks for image recognition or recurrent ones for natural language processing. But ultimately, in creating a solution with ML, building models is just a small part of what you need to do. Production systems require a lot more from Data Management, through monitoring, analysis, process management, and a whole lot more. They're beyond the scope of this course, but the big thing I want you to understand is the serving infrastructure, which you can see here. We'll focus on that this week and how to get it to work for your models. The typical pipeline for creating a model looks like this. You start with data ingestion, go to data validation, then transformation before training a model itself. Once you're done, you can then analyze your model, and if you're happy with it, you can then put it into production. But then what happens? Maybe you use TensorFlow Lite to deploy the model to a mobile device so your users can start running inference on it right away. We have a course for this, where you can learn how to convert your model to the TensorFlow Lite format, and then deploy it to Android, iOS or embedded systems. Or, you could convert them model into a JavaScript representation and deploy it straight to the browser, where JavaScript codes can be used to run inference on the model. But in both of these cases, the model was deployed to a remote device, be it native mobile or a browser based on the mobile or desktop. A better option, might be to have a centralized model on a server that desktops, mobile devices and more can make requests from. The server would then execute the inference for you and return the predictions back. The results of which can be rendered on the device that made the call. There is another distinct advantage of an architecture like this, and this is perhaps best illustrated with this example. Maybe you have three clients. There could be mobile devices or browsers to which you've deployed a model. You've created model version one, and they all have access to it. But then you update your model and you use the usual techniques to update your clients. It might be an app update in the Play store for Android for example, but not all of your clients can have the new model, so they could have a different experience. Over time, other clients may get the updates, but even then not everybody will have it immediately. They can take a lot of time and management before everyone has the same experience. When they all finally have the same model, they can then have the same experience. But if the model isn't on the clients and instead it's centralized on a server, everybody can have the same inference experience, and that will continue even after the model is updated to a new version. Another nice thing, is that when you centralize like this, the serving environments can be upgraded based on demand with additional hardware, or of course with additional capacity based on the demand for your service by having multiple serving processes supported by some form of load balancing. This is ideal in Cloud-based environments, where you can have some form of dynamic assignment of serving processes, and you only pay for what you use. To achieve this with TensorFlow, you can use the TensorFlow Serving APIs. They're part of TFX, and like I said before, we won't be going over all of TFX in this course, as there's a lot there, but we will focus on serving. So of course in order to use it, you need to install it.