This is Module 4, we will talk about telemetry and observability with STO, which is another problem that STO is trying to solve. So in this module, we'll talk about telemetry collection, Stackdriver, and On-Premise integrations. So we'll start with telemetry first. So the word is derived again, from Greek of course, which is remote and measure. The idea behind telemetry is that we want to measure the performance of our applications. We want to know how long did it take for a service for instance to reply, we want to know how much latency did our network introduce. We want to see perhaps a full trace from one micro service to the other maybe from the beginning of our mesh to the end of it. We want to have a full observability, and the way that we used to do that is we used to do that with application instrumentation. So you can see here on the top right corner is that we had an application, it had binaries. The developer had to develop for themselves, and then they had to instrument meaning uses libraries that allows us to use telemetry. So if you were to use any of these, you would have to bake them inside of your binaries themselves, and that can create a bit of problem. The first problem is that once you have a monolith, it's fine right, you have one application and that one application can deliver all eternal materials from one place, but now you broke down your application into multiple services in multiple places, and each and every one of them speak different languages. So now, all of your libraries should be in the same level for all of the languages and that is really difficult when you have a polygon environment, and a polygon environment is basically a fancy word for saying you have multiple programming languages in the same environment. That is a really hard problem to solve, and you need a lot of discipline in order to have the same libraries speaking to the same backend and it becomes a bit problematic. What we really want to do is we want to decouple the operators from the developers. We want to developers to develop their applications and the operators to do their own thing outside of the realm of the binaries. So the solution to that actually is to use Istio, and because all of your network functionality anyway goes from the unvoiced proxies, they can report back to us everything that we want to know all the telemetry information and we can decouple it from the code. What's really really cool about that is that once you remove all the code dependencies you have a truly portable application. If I don't have any telemetry binaries and instrumentation in my application it can go anywhere really it can go into on-premise. It can be on the Cloud. I can migrate it tomorrow and I don't have to update anything right because the mesh takes care of the telemetry bit. Now, what's really cool about the telemetry bit in mixer is that it has adapters. So everybody perhaps use different measurements of telemetry you have different telemetry backends. So you have maybe Prometheus or Stackdriver InfluxDB and all of them basically has an adapter into mixer. So mixer is not really a telemetry backend only does it's dumb messenger. It takes all the telemetry that the mesh collects and then it translated it into the adapter of your choice. So it integrates with any backend that you currently have. So imagine basically that you can decouple all of the telemetry all of the observability bit from your code into the network and the network basically reports everything to you. So you have traces you have telemetry and all of those things are integrated into the system of your choice whatever you use in your on-premise or on the Cloud. So the telemetry you get with Istio is quite unique because what you get is you get a full observability. So you have monitoring which is alerting and dash-boarding but you also can now use debugging, profiling, trace, dependencies, analysis. You have an end-to-end view of all of your communication across the whole mesh across different services. You can draw dependencies for instance there are tools today that will create a dependency tree based on the traffic that is routed in your service mesh. So you can see for instance if you want to release the software you can see which dependencies you need to notify which teams needs to know that you are now updating your application. Which services communicate with it so it's really powerful if you think about it. If you had to do that on level of your code of your binaries. It becomes very fragmented and becomes very hard to manage. So if you can have that functionality on the level of your network on the level of your mesh it basically solves a lot of problems and it makes your application truly portable because now your application focuses on its binaries and on its business logic and not instrumenting telemetry. So the observability here what you see here is that you have mixture has an open open API of course for a plugable architecture. You will see that also with pilot when we discuss it next. You have an open API you have a pluggable Infrastructure. So for instance in here you have the adapter with Prometheus or Stackdriver or whatever you want to use and it will standardize the environment and it will collect all the information from the proxies and deliver it into the backend for you. Depends on what you want to whichever backend you want to use. Now, this is a bit of a break from the actual STO and I want to focus on the SREs our version of DevOps inside of Google. They're basically a team that focuses on metrics and the four golden rules from them is latency, traffic, error and saturation. So latency is how much time did it take my obligation to respond. If you remember the life of a packet I have service A sends a message to service B and then service B processes my request and then it sends it back. Every step of the way telemetry is being sent to mixer. So now, I can say well, it took me about three milliseconds to move across the network to get from A to B. Then once the one service B Proxy pick up my packet I delivered it into the service and the service took about 20 milliseconds. Let's say to process my request and then send it back. It took me maybe three more milliseconds. Once you have that information you can have a graph and you can find any bottlenecks that you have in your infrastructure and when it grows really complex it can really help you understand your infrastructure. So that is latency. Traffic is the how many requests do you have so almost like request per second. How many HTTP requests did you have? How many queries per second did you have for your database etc. Errors is anything that is five O something this is another golden rule. Saturation is something that you define yourself. Saturation is you say well, you know what service B can handle up to 10,000 QPS and therefore queries per second or our requests per second. Therefore this is another measurement that you can measure when you define the capacity of your service and as long as you keep an eye on all of these mostly you are covered when it comes to a service that is the general idea. These are the four golden signals and three of them you get out of the box. With STO you get latency, traffic and error across your whole service mesh. Saturation though is something that you have to define yourself what is the limit and therefore you can have that when you have a dashboarding mechanism you can define what is the saturation for you.