After watching this video, you will be able to: Recognize how Site Reliability Engineering differs from DevOps Recognize the commonality between Site Reliability Engineering and DevOps Explain how Site Reliability Engineering and DevOps can be used together You might be wondering how DevOps compares with Site Reliability Engineering (SRE). Before we can explore this topic, we should probably describe what SRE is, how it differs from DevOps, and how you can leverage SRE in a DevOps environment. According to Benjamin Treynor Sloss, SRE is “…what happens when a software engineer is tasked with what used to be called operations.” Most system administrators are happy doing the same manual tasks day after day perhaps because they feel that it's their job to do these manual tasks. But if you ask a software engineer to build a server, they will probably do it manually the first time. If a few days later you ask them to build another server just like the first one, they might do that one manually as well. But by the time you ask for the third server, a software engineer is going to start writing a program that builds the server for them automatically. That's just how software engineers think. They're programmers; they write programs. The goal of site reliability engineers is to automate themselves out of a job. Of course, that will never happen because there are always more things to automate. One of the tenets of SRE is to only hire software engineers. You want people who know how to write code so that they can automate repetitive tasks using Infrastructure as Code. Site reliability engineers focus on reducing toil, that is, repetitive, manual tasks. It is recommended that they spend about 50% of their time reducing toil through automation. The idea is that anything you do repeatedly should be automated. You shouldn't be doing the same manual task day after day. SRE teams are separate from development teams. This is a big difference between DevOps and SRE. DevOps is the recognition that working in separate siloed teams is inefficient. But SRE, on the other hand, keeps those silos in place. The development team is a separate and distinct team from the operations team. In SRE, stability is controlled through something known as error budgets. Developers are allowed to deploy their applications into production as long as they don't cause too many production outages. The upper limit of allowed outages caused by errors is the error budget. So, let's say you’ve got a service level agreement of 99.9% uptime. That equates to about 44 seconds per month of downtime. As long as the outages are below 44 seconds per month, the developers are free to keep deploying their releases to production. Once the developers have caused enough outages to exceed their error budget, they're no longer allowed to deploy to production. This actually works pretty well. It solves the problem of developers waiting for operations, yet it still gives operations control over the stability of the production environment. One last thing about SRE is that developers spend about 5% of their time rotating through the operations team so that they understand what the SRE team is doing on a daily basis. Also if they cause too many outages, or the toil exceeds 50% of the site reliability engineer’s time, more developers are shifted to operations to help bring things back into balance. There is a big difference in teaming between SRE and DevOps. As we've learned, SRE maintains separate development and operations teams, but it does have one staffing pool. That means if you need another site reliability engineer, you take away one of the developers. If you want another developer, you take away one of the site reliability engineers. This is an effort to balance things out. DevOps on the other hand breaks down the silos into one team with one common business objective to deploy software to production quickly and safely. The other big difference between DevOps and SRE is how they maintain production stability. As we said, SRE uses error budgets that development has to comply to and those are based on service-level objectives. When a developer exceeds the error budget, making production unstable, they can no longer deploy to production. In contrast, DevOps maintains stability by using automation through Continuous Delivery pipelines, and by making sure that everyone is responsible for the code that runs in production. DevOps has this “you build it, you run it” mantra. Unlike SRE, developers are responsible for their applications in production. There is commonality between the two practices. Both seek to make development and operations visible to each other. Whether you have developers rotating through operations as in SRE, or you have development and operations on the same team as in DevOps, everyone understands what it takes to keep production stable. Both require a blameless culture. No one comes to work wanting to take down production. It’s usually the system that fails the people, not the other way around. So having a blameless culture is important in both practices. People can speak openly and honestly about how things are going and how to improve things. The objective of both is the same—to deploy software faster with stability. So, DevOps and SRE do have common goals, they just achieve them in completely different ways. When we look at how DevOps and SRE can be complements to each other and used together, I like to think of SRE as the team that maintains the infrastructure and DevOps as the team that uses the infrastructure to maintain their applications. If you are in a cloud environment, SRE includes the people who operate the cloud and DevOps includes the people who are consuming the cloud. This is why using things like platform as a service is so important to DevOps. The SRE teams provide a platform. The DevOps teams utilize the platform to deploy their applications. In this video, you learned that: SRE takes a different approach than DevOps, SRE and DevOps have some common goals. SRE and DevOps can be used together to both maintain and use computer infrastructure.