[MUSIC] Hi, okay. So today, we're going to look at our second case study on a multi-tenant cloud network virtualization system, which is NVP. The network virtualization platform that was introduced in this paper, Network Virtualization in Multi-tenant Datacenters by Teemu Koponen and co-authors in NSDI 2014. And this comes out of a product developed by the Nicira startup that was acquired by VMware. All right, now we saw last time in VL2 what the system was trying to do was build a virtual Layer 2 network. Okay, and we're moving in the same direction now. But there's a couple architectural ideas here, they're going to be pretty different in NVP. And one of them is the service that we provide is going to be not just a Layer 2 network, but an arbitrary network topology. Okay, so many of those will be a single Layer 2 switch that connect all of the virtual machines for that particular tenant. But sometimes you want a more complex network topology. For example some Layer 2 switching connected to Layer 3 routing. Because maybe this application expected to be deployed in an environment where there was a Layer 2 broadcast domain, and a separate other Layer 2 broadcasting that weren't connected. So the idea's we want to replicate the environment, that arbitrary network environment that a particular application or tenant was expecting and allows us to move that into the cloud, okay? So what do we need of the physical network. Well, a physical network might be a close network that we've come to know and love, but actually NVP is not going to depend on what that network looks like. It's just going to expect to see any standard Layer 3 network, a bunch of servers with IP addresses. And we're not going to dip into special mechanisms on the top layer switches, or decapsulation on particular top of racks, switches, or use of any cast. We're just going to see a regular Layer 3 network, and the virtualization is going to happen in software, outside of that. Now, the service and the physical network are brought together with this network hypervisor layer. And here the intuition of this architecture that we're seeing on this slide is an analogy to virtualization of servers where we can take an arbitrary machine, and run it on unmodified hardware and the virtual machine doesn't even know necessarily where it's running. Okay, and the virtualization layer decouples that virtual and physical environment, okay. So what NVP is doing is much more explicitly building a network virtualization layer that separates an arbitrary network service from an arbitrary physical network. What is that service that it provides? Let's look in a little bit more detail here. So, we might have this virtual network that the tenant wants to build. With a Layer 2 switch connected to a Layer 3 router, connected to another Layer 2 switch. Okay, what does that mean exactly? So in NVP, that is modeled as a sequence of data path elements that represent those switches. And these data path elements, each one of them is a OpenFlow forwarding table. That means the table will match on certain signature of packet headers and take certain resulting actions like dropping the packet, modifying certain fields in the packet, or forwarding the packet on. So the idea's that we can model the switching, and routing gear with the right sequences of the right OpenFlow tables that we set up. So walking through it we have a virtual machine. And it's going to send packets through what it thinks is it's neck, okay? Go to the network virtualization layer that realizes that this particular tenant virtual machine is connected to this particular virtual network. And the first pop in to that virtual network is this Layer 2 switch, represented by this logical data path. So the packet is going to make its way through each of the tables. It's going to be checked by the inbound ACL, access control list table. And then will be handed off to the Layer 2 forwarding table, the outbound ACL. And then sent out of this virtual network element, the Layer 2 switch. And the hypervisor, the network hypervisor, will realize that, all right that network element now connected to this logical Layer 3 router. The packet again proceeds through these hops, is again mapped by the network hypervisor into the last virtual switch, and there it reaches the end. Now at this point, going through this software data plane, we now have determined what is the output location of this particular packet in the virtual network. We know which virtual port it's going out. In other words, which virtual machine it's destined for. And at that point, all of this still occurring on the sender side host, we tunnel it to the final destination across the underlying Layer 3 network fabric. Okay, so, that walkthrough is the network service. So, we're a little more explicitly providing two interfaces here. There's a packet abstraction where virtual machines are able to inject traffic into the virtual network. And there's a control abstraction where the tenant was able to define this entire virtual network pipeline, that sequence of OpenFlow flow tables. That's the interface that the tenant is given, at least the lowest level interface, that the tenant is given to be able to program their virtual network. Okay, so how does this get built into the overall system. So we start with our underlying physical fabric, and what NVP sees is a bunch of hosts with IP addresses connected to this fabric. Now, some of those servers are going to be running a particular tenant's virtual machine. And that tenant, let's say once this particular control abstraction, this particular virtual network. It decides to construct this layer two switch, Layer 3 router, and another Layer 2 switch. Using APIs at the network hypervisor controllers, it's going to say here's the service I want to define. Here's the virtual network I want to define. And the controller will than instantiate that old program, the underlying software switching data plane which is at the virtualization layer of each of those servers on which the tenant's VM resides. And that is the virtual switch, that it's actual program is called Open VSwitch, and notice the virtual network, and instance of the virtual network is present at each one of those servers where the tenant's VM resides, all right? So, when the VM injects packets into the virtual network, we are on the local copy at that server processing the packet through the pipeline till we know the destination. And then it gets tunneled with tunnels that we now set up between each of these VM pairs to the final destination. All right, so that's how the system comes together. Now, building this system did bring several performance challenges, and some of them are at the controller side. Because there's a lot of state to compute here, right. We are ultimately computing a full virtual network state, and a full instance of the virtual network at every host with the tenant's VM, so it can locally do that virtual network processing and decide what the ultimate destination of the packet is. And also we have to set up tunnels connecting each of the VMs. So if you have nVMs in a particular tenant, each one of them has to connect with n minus one other VM so we have in the order of n squared tunnels. So one part of the solution that NVP adopted is that this total state is not recomputed from scratch at all times. There's going to be changes but you want to do that incrementally. You want to make incremental changes. And that's done automatically, not by hand, not with a hand constructed state machine but with a declarative language. That NVP developed called NLog to specify the data path and automatically make incremental adaptations. A second part of the solution to the performance challenge, is that the controller does not compute the low level forwarding tables of every instance of the virtual network. What it does instead is it computer a higher level representation of that called universal flows, that's an instance that is good enough for any one of the hypervisors, hosting that tenant's VM. And then it's sent out to these physical controllers that more locally compute that universal flow abstraction, and translate it down to the exact OpenFlow forwarding table entries at that particular host. Now there are also performance challenges. After we solve the performance challenges at the controller, there are also performance challenges at each host. So first, we're processing in software, entirely in software, this data flow pipeline. And there can be many elements in that pipeline and we're doing it in the software virtual switch. That can be slow. So, one solution here is that NVP does send the first packet of each flow through this entire pipeline does the computation. But subsequent packets of the same flow that will be handled the same way can be cached, so to speak, in the kernel of the operating system. And now this is done, rather than with the complex matching operations that appear in OpenFlow. Once you know the result that should happen to that particular packet, you can cache it with an exact match, exact match on a packet header in certain fields produces this result. A second challenge here is that tunneling interferes with TCP segmentation offload. Recall that TSO allows the operating system to offload the work of splitting up a data stream into smaller chunks to be sent over the wire with TCP, and some of the check summing and so on. Offloading some of that TCP dirty work to the hardware NIC, network interface card, okay? And if we're tunneling the packet, slapping an IP header on the front an additional IP header, the NIC doesn't see that the TCP header where it expects it to be. So, the solution here is a approach called STT, where we add a fake outer TCP header that there just for the connivence of the neck. What we saw this week was one of the first killer apps for software define networking. We needed to automate control of a rapidly changing dynamic environment. It wasn't suited to any past solution. But we can see in NVP, I think very strongly, the features of SDN architecture. We have a API to the data plane. We're programming Open vSwitch with the OpenFlow protocol. We have a centralized controller that's orchestrating that programming. And then we have high-level control abstractions, both in terms of the declarative language, and also applications that construct certain kinds of virtual networks, so that the tenants don't have to program OpenFlow themselves. So we see, the data plan API, we see the centralized controller, we see the high-level control abstractions. Those SDN principles come through very strongly, but next week we're going to see another example of an SDN system that is also very much SDN, but ends up looking very different. So rather than providing arbitrary virtual networks as a service, it's going to be oriented towards optimization of network performance and utilization. And rather than programming software switches entirely at the edge of the network, we're going to have more direct control of forwarding hardware. And we'll see that as we move on next week into the wide area needs of cloud networking. [MUSIC]