In this module, we will be discussing TFX pipeline orchestration and workflows on Google Cloud. TFX orchestrators are responsible for scheduling TFX pipeline components, sequentially based on a directed graph of artifact dependencies. Let's dive into learn more. First, it is necessary to revisit the motivation for orchestrating TFX pipelines. Why orchestrate your ML workflows? Orchestration is about bringing standardization and software engineering best practices to machine learning workflows, so you can spend more time focusing on solving your machine learning problem. And have the details of your computing environment abstracted away by not adopting standardized machine learning pipelines. Data Science and machine learning engineering teams will face unique project setups, arbitrary log file locations, unique debugging steps that quickly accumulate costly technical debt. Standardizing your machine learning pipelines, project setups with versioning, logging and monitoring, enables code sharing and reuse. And allows your pipeline to portably run across multiple environments. Production machine learning ultimately it's a team sport. Standardization allows machine learning teams to more easily collaborate. Experienced team members can focus on the problem you're applying machine learning to, while having the tools to manage and monitor their pipelines. New team members familiar with TFX pipelines and components also have an easier time ramping up on projects and making effective contributions. Orchestrating machine learning workflows in a standardized way, is a key technique that Google has applied to industrialized machine learning across alphabet. In the TFX pipeline development workflow, experimentation typically begins in a Jupyter Notebook. You can build your pipeline iteratively using an interactive context object, as shown to handle component execution and artifact visualization in embedded HTML windows within the notebook. The interactive context object also sets up a symbol in memory, ML metadata store using SQL lite and automatically stores and organized pipeline artefacts on the local file system. Using the interactive context export to pipeline method, you can also export your local TFX pipeline to a production ready orchestrators such as Apache Beam with minimal code changes. Although notebooks are great for experimentation and interactive execution, they do not have a level of automation for continuous training and computation necessary for production machine learning. Different orchestrators are needed to serve as an obstruction that seats over the computing environment that supports your pipeline scaling with your data. For production, TFX pipelines are portable across orchestrators, which means you can run your pipeline on premise or on a cloud provider such as Google Cloud. TFX supports orchestrators such as apache airflow, kubeflow pipelines and apache beam. TFX uses the term DAG Runner to refer to an implementation that supports an orchestrator. Notice that no matter which orchestrator you choose, TFX produces the same standardized pipeline directed a cyclical graph. Let's examine each one of these orchestrators individually and the use cases for each. First, TFX can use Apache Beam direct runner to orchestrate and execute the pipeline DAg. The beam direct runner can be used for local debugging without incurring the extra airflow or kubeflow dependencies which simplifies system configuration and pipeline debugging. In fact, using the beam direct runner is a great option for extending TFX notebook based prototyping. You can package your pipeline defined in a notebook into a pipeline.py file using the interactive context export to pipeline method. You can then execute that file locally using the direct runner to debug and validate your pipeline before scaling your pipelines, data processing an a production orchestrator on Google Cloud. When your pipeline works, you can then run your pipeline using the beam runner on a distributed compute environment, such as the cloud to scale your pipeline data processing up to your production needs. Second, TFX pipelines run on top of the kubeflow pipelines orchestrator on Google Cloud for hosted and managed pipelines. Kubeflow is an open source machine learning platform, dedicated to making deployments of machine learning workflows on Kubernetes, simple, portable and scalable. Kubeflow pipelines services on Kubernetes include the hosted ML metadata store container based orchestration engine, notebook server. And UI to help users develop, run and manage complex machine learning pipelines at scale, including TFX. From the UI you can create or connect to an easily scalable Kubernetes cluster for your pipelines, compute and storage. In KFP also allows you to take care of service account setup for secure access to Google Cloud services, like cloud storage for artifact storage and BigQuery for data processing. Third, TFX pipelines can be orchestrated by Apache airflow, the chair platform to programmatically author schedule and monitor your workflows. Google Cloud has a fully managed version of airflow called composer that tightly integrates with other services that can power TFX including BigQuery and data flow. The airflow scheduler executes tasks on an array of workers while following the specified dependencies. Rich command line utilities may complex graph construction and update operations on TFX pipeline DAGs easily accessible. The UI allows you to visualize TFX pipelines running in production, monitoring job progress, and troubleshooting issues when needed. Airflow is the more mature orchestrator for TFX with the flexibility to run your pipeline, along with pre pipeline data processing pipelines, and post pipeline model deployment in model prediction logic. Put another way airflow is a more general orchestrator for the entirety of your machine learning system. Finally, the TFX command line utility for manual pipeline orchestration tests is important to mention. The TFX command line interface also abbreviated as CLI, performs a full range of pipeline actions using pipeline orchestrators such as airflow, beam, and kubeflow pipelines. For example, you can use the CLI to create, update, and delete your pipelines. Run a pipeline and monitor the run on various orchestrators, as well as copy over template pipelines to modify and launch in order to accelerate your pipeline development. Now that you have a clear idea about orchestrator choices for your TFX pipelines, let's revisit Apache beam in greater depth to see its data processing capabilities.