[MUSIC] >> Hi everybody, today we are here with Martin Casado. Martin runs the networking and security business unit at VMware and before that he was a co-founder of Nicira. And before that, a PhD student with Nick McKeown at Stanford University. And we're going to be talking about multi-tenant data centers and software defined networking today. So, hello Martin. >> Hello. >> Thanks a lot for taking the time to chat. >> Yeah, delighted to be here. So, let's start with maybe a little background about yourself. How you got into SDN, the genesis of SDN. Since you were one of the ones there at the very beginning. >> So, I guess the quick sketch is, after my undergraduate I worked for the Defense Agency. Actually, I worked for DOE, but I've worked for the National Laboratories. And I moved into the intelligence agencies, so this is after 9/11. And it was pretty clear at the time that market forces didn't really create technologies that were suitable for these environments, right? You were assuming that you were being attacked by nation states, with resources of nation states. And so, on the compute side we were able to program operating systems, or whatever, in order to be more fit for this type of environment. But when it came to the network, we just didn't have a model for changing it. Basically, whatever you've got from a vendor is what you could use. And so when I left that, and I went to Stanford, we just looked at this problem of okay, so how do you change the networking architecture to make it more programmable? And so we did some early work, you know SANE and Ethane, and then that led to OpenFlow and here we are. So that's kind of the quick crooked path. >> So what would be an example of what you want to program, really quickly? Because I mean, if you think about the way the network was originally envisioned there wasn't much to do, right? You put in your packets and then they get delivered and that's the end of the story. >> Yeah, so the original motivation was actually just saying, I want some high level policy that's going to dictate how things can communicate on the network. So if you'll take the context of the intelligence environment that I worked in. So there were these guys that were not technologists that had come up with policies on how they wanted the network to work. They're like, well, this group of people can or cannot access to this type of information from certain locations at certain times of day, or whatever. And, so there's these policies from the security wonks, and the lawyers, and the regulators. And these policies were written basically in, say, email documents or Word documents. And the trick was to get the infrastructure to obey these, right? Now, if you told a systems person this problem, you're like, I have some way that I as a human speak, that I want an operating system to behave. What they would probably do is create a domain specific language. They'd create a compiler for that language. They'd write declarations for what they wanted to do in that language, and then they would implement a run time. And so when we were looking at this problem we're like, so we know how we want the network to operate, right? We know who we want to communicate and when and how. But there was no way to basically create a high level language and then compile that down so it was enacted throughout the entire network. And in fact, that was what we first built, was a high level language with a namespace that includes users, that includes data, that includes things like that, and then is enforced throughout the network. >> And that was the Ethane work that you're referring to? >> Yeah it started at SANE and then that went to Ethane right? So SANE, it was kind of a first go, and we're like, all we care about is security. [LAUGH] So it was the most paranoid implementation of a network architecture ever were. Even paths were these onion encrypted source routes. So even if you got a hold of one of these things you didn't know where the packet would go. So we were even hiding topology information. And then Ethane was a much more practical, kind of more backwards compatible go at it, which was focused more on manageability than real paranoid security. >> Mm-hm. So that was the beginnings. And then, so how did the work change while you were doing your PhD and then leading into the startup? >> Yeah so, we built the system Ethane. We'd actually ran it under the Gates Building, which is the computer science building at Stanford. And so we'd kind of rewired the entire Gates Building to run across our switches. And we were writing for high level policy, we took it down like three or four times. [LAUGH] Because it turns out it's actually really hard to write policies that support the types of communication mechanisms that happen in normal networks. And I mean networks are super chatty, they rely on things like broadcast for discovery and so forth. So it took us a while to figure it out but we got that working. And then, that was about the time, we were mostly just focused on, how do you have high level languages drive networking at that time? And we're like, you know this is actually a fairly general way of building networking or doing networking. Which is, if you deep-couple the control planes, which has been suggested many times before. But now you can turn it into a distributed systems problem. So, when your managing state on the data path, you can have stronger consistency guarantees, you can use higher level languages to do that. And so then we're like, there's two things you need. You need the ability to talk to switches in a very general manner. So then we created OpenFlow. >> Mm-hm. >> And that software platform, which before was basically a language compiler, you can generalize to something that you can run multiple applications on top. Then we created NOX which was kind of our first go at a controller. And the idea is, you've exposed data structures like a graph of the network. And you would have high level language bindings, so that you could run multiple applications that would control routing, or control security, or do whatever folks wanted to. So that was kind of the next step. >> Mm-hm, got it. And, so then, that obviously had a lot of impact, that work. And you felt there was some opportunity to have some commercial impact as well. >> Yeah, that's right. You know what's so interesting about this whole journey is, I think at every step, we learn a lot more. And, so at first we were like, we're focused on the protocol, the switch protocol, and we were like, well, whatever. Whether it's OpenFlow or something else, it doesn't really matter. You just need to have these, set some guarantees on the protocol. You need to make sure that you can keep set. You need to make sure that you can efficiently track changes. Okay, but whatever the wire protocol is doesn't matter as long as everybody implements it. And then we thought okay, we're going to build these general purpose controllers. And everybody was super focused on controllers at the time. So there was probably ten different open source projects on controllers, and everybody had their version of a controller. But a controller doesn't solve any specific problem, it's just a platform, right? So it was clear at the time that networks hadn't really kept pace from demands in industry, especially in these large multi-tenant data centers, which we'll be talking about today. And we can talk more about that, and so we're like, what we're going to do is we're going to solve the operational problem of multi-tenant data centers. So there's Compute had been virtualized and that compute had been, gets results to people. But the network architecture's fairly static. And so even though they can spend up things like VMs in any configuration they want, the network still required manual provisioning and so forth. So we thought, you know what we're going to do is we're going to use this SDN approach to virtualize the networks. So you can make it as consumable and as flexible as compute virtualization, that was the idea. The was a big ah-ha, a big, big take away that we learned when we were doing that is, we were so focused on the controller. But it turns out actually writing an application is probably two orders of magnitude more work. So just like the protocol itself wasn't super architecturally important, you have to make sure you got a few things right. It turns out the controllers aren't architecturally important either. Whether you use any of the numbers of controllers, because the application manages all the state and does all the distribute state management, that's where all of the complexity comes in. So we started this company, Nasera, and it was to solve this virtually networking problem. We spent pretty much all of our time figuring out how to build the application for [INAUDIBLE] data centers. >> So the original design of kind of getting into the technical aspect of what Nasera built. Did you start with sort of okay, we're going to build a data center out of open flow switches and build up from there? Or did it start with a software virtual approach? >> That's a great question. So we actually started from the problem statement on the back, then we took a number of goes at it, and then we kind of figured it out. So the problem statement was the one I just stated to you, which is more and more data centers look like big computers, right. Those computers have been carved up and either used by internal developers or used for applications or resold in the public cloud or whatever. That fungibility of the network as an infrastructure piece didn't exist. We're able to carve up computing in any way that we wanted to, and even storage to some degree. But offering networking services, and different network service models on the same infrastructure, this still required human beings or buying more gear, or whatever it was. So the problem statement was, let's make the networking have the same operational properties as compute. That was the start. Then we tried a number of [INAUDIBLE]. A lot of this was based on heritage. Our first go was okay, what we're going to do is we're going to reprogram switches. So we took a go at that for about a year, and we looked at the problem pretty deeply, and we realized that that was actually the wrong solution. So there was actually two major trends that caused the big ah-ha behind Nasera. So the first one is, if you looked at the time and this is now we're talking 10 years ago. All right, if you looked at the megadata centers, how they were being built and by megadatas, I mean the Googles and the Facebooks and so forth. They've kind of all have opted on their own just to build large L3 ECMP fabrics. That was it. >> So their physical networks were super simple. They're just L3 fabrics, running OSPF or BGP or whatever. And they would use ECMP to efficiently use all links, and that was it. And so they would set them up and they wouldn't touch them again. [COUGH] And so you could see that had actually decoupled forwarding from features. So they'd rewritten the applications to do things you normally do on the network. So the applications would do the security and do the discovery and do default isolation and so forth. And all of that was done in the application. So you look at that architecture and you're like huh. It gives you a lot of the same properties of SDN where you use high level languages and software to handle a lot of the features. And then the physical network just provides forwarding. The second thing that we noticed was that at the time, in 2009, the number of virtual ports, these were ports connected to virtual machines, was quickly approaching the number of physical access ports globally. So companies like VMware were becoming the largest networking companies. And so you had these two trends, like big data centers were going to these very simple physical networks where software was providing the function, albeit not in an SDN way, but in an edge way. And networking the access layer to the network, which is architecturally the right place to put functionality, and we can discuss why that is, all of the ports going into this virtual layer. And so we're like, you know what would be cool is if you could, okay, now we've got all these VMs connected to virtual ports. How about we use SDN techniques to control these virtual ports? And implement all of traditional networking functionality there, L2, L3, L4 through 7 services, and traditional management interfaces. And do it in a way that you don't have to change your application. You don't have to be a Google. You can use traditional applications and use traditional interfaces. But you get all of the operational properties and cap-ex properties and op-ex properties of these big data centers. And so that was kind of the major shift. That happened about a year into this year, around 2009, that we kind of realized that controlling switches directly was probably the wrong approach. >> And did you give up anything by not having access to the hardware in that way? >> So for data centers where you have an insertion point at the edge. So whatever it is, whether it's NDOS, whether it's in the hyper visor, I don't believe there's anything that you need to do specifically with the hardware that's outside of operations and management. Yeah, I think that the best technical architectural approach is to push all the, like all service model intelligence to the edge. The physical network should be do load balancing. It should be able to isolate elephant and mice flows but you can do this through normal packet marketing so that for example, a heavy hitter flows don't fill up cues and provide latency for the smaller flows. But I actually don't think from a performance standpoint or from a functionality standpoint you give up anything. In fact I think you gain a lot because now functionality has a general purpose processor and it's distributed so you don't have to do it at an aggregation point like a top of rack switch. The one caveat is operations and management, which I think you need to be able to talk about the physical and the virtual layer to do things like debugging, trend analysis, and so forth. But I do think horizontal interfaces are the right way to do that, so that you actually have cross layer of visibility, and then you build tools to allow you to debug. >> I see. So then the core physical network worries about packet delivery, reliability and performance basically. >> Yeah the way I like to think of it about is if you see how chassis have evolved, like if you buy a networking chassis, you've got line cards and you've got a back plane. For performance reasons, for functionality reasons, often you put the intelligence in the line cards. And the back plane just provides packets reliably with the quality of service that you're asking for, and that's generally the model. So now the intelligence is on the X86 servers which already have to do tons and tons of stuff anyways right. I mean if you think about it, if a compute node sends a packet on the network. I mean, from some high level language, through some high level language libraries, through some guest operating system, to the hypervisor, all of this is part of the communication chain. So you add a little more functionality there, which you can totally attribute to that whole software path anyways, and that all just becomes part of the sending. You only have to do that at 10 gigs or whatever the NIC is on that thing and then the physical network is just responsible for getting it to the next point. So I think it's a nice division of functionality. >> Yeah, that makes sense. So digging in a little more though, you mentioned one thing where okay, what exactly is the interface Between the software and the hardware, so that the hardware can do it's job and I think one thing you mentioned was well, maybe you have some kind of marking of packets, of their type of sort of, the class of flow. Is that right? >> Yeah well I mean so I I think that it if I was to say what should the physical network do. What's its job? I'd say ok is it going to pack it from point A and point B in a complex graph, so in whatever topography that's efficiently as possible and obeying to the requested class of service from the edge. Right. So, I would say what does it need to do? It needs to look at a destination header. Send it to the right destination and it should be able to look at some sort of cross marketings and be able to enforce that class of service or those guarantees. I've found practically, I found in practice that there generally aren't enough Q's to do enough object or entity level of classic service. If you have a large public pub or you have hundreds of thousands of users, it's very unlikely that every user is going to get their own guaranteed band-work. That would require Q in the harbor per user. So, instead what we see practically are you have a certain network that is optimized, a certain set of queues that are just dedicated for high throughput flows and you've got a separate set of queues that are for low latency flows and if you can identify these at the edge somehow you get these nice running networks but that means the physical network really is just about destination forwarding. As far as the interface outside of OEM so it's just destination forwarding and looking at bits. Now when it comes to OEM operations and management of course it's a much richer interface. >> Got it yeah. Right so if at the edge you can mark like your in memory database queries versus your back ups and larger flows in those two classes then you can get essentially almost as well as you can do practically. >> Yeah, that's right and actually it was obvious there were many benefits for pushing this intelligence into the edge right? Points, you have a general purpose processor, but another one is you actually have deeper semantics because you're co-located application now and this is a perfect, perfect example of what you just brought up which is I mean, as long as I can remember in networking, we've been trying to guess what a heavy hitter flow is, right? By doing some sort of like approach or some bloom filter or some craziness but you never really guess using counters whether a flow is a long lived flow or not or a heavy hitter flow or not but once you start pushing intelligence to the edge you have all the access to all of the state. So, we do tricks today even where you can do things like look into the buffer. They're already there. So, I can look and I say how much outstanding data is and if there's a lot then you can go ahead and mark the packets as a heavy hitter flow or not and so not only do you have potentially application semantics like you know the application, but you actually have all the system semantics as well so you know what the application was requesting. And then, you can mark that, like as you said on the outer packet and then physical fabric can enforce that. >> How far do you think that should go or needs to go into the application. So, you mentioned you can look into the TCP send buffer, okay that's something that the OS can do or maybe the hypervisor without the application even being aware. That's more information than the network has. >> Yes. >> You have to go even further into the application? >> Yes, I think this is a very good question, which is how much should the network know? And in my opinion, you do want deep context instead of semantics but not for stuff like efficiency and performance. I think you want it for security. So, when it comes to efficiency and performance, honestly I think that if you look at any application almost always the problem. I mean we're not working guys we like to talk about the network but we measure end to end packet delays all the time from one VM to another VM and honestly just memory handling in the hyper biased. Like buffer copies and stuff. Dwarf anything that happens on the network. I do believe that there's a real problem between heavy hitters and non heavy hitters but I think that is sufficient context or getting a good performance and honestly if we didn't have to deal with older versions of TCP, I'm guessing just actually even turning elephants into mice. Taking heavy hitters and then chopping them up would be sufficient in almost every case and again, we're very honest about end to end package delivery. Almost all of the performance issues are on the edge or with storage. This is my experience and practice. >> Right. >> But on the performance side, I think you're okay without deep semantics. That said, when you're doing things like security, like regulatory compliance, where a regulator comes in and says in order to handle PII or in order to handle credit card information, you need to have this type of work load separated from that, and so forth. Now, to enforce those policies, you do want to have deep insight on what things are being done because it's part of a name space of what you're trying to get done. My declaration is these types of people and these types of data, so I have to have that as part of my namespace but then the discussion is, okay, is this now a networking problem anymore? [LAUGH] No I mean that just becomes a distributed systems problem but I do think to solve those types of problems which are often attributed to networking, that you do need to have a much deeper insight into the application. >> But that security layer you're saying would be a separate layer from virtualization of the multi tenant data center or even the network itself. >> Absolutely. I would say that's a layer above and it had much deeper semantic access. So, if you look at, you see a policy layer at the top, and the namespace should surface all the stuff you want to make that over. You want to say stuff like Brighton and stuff like Martini. Different types of data. You want to be able to say all of those things and then there's also a cross layer. Policy always touches all pieces of infrastructure. Compute, networking, and storage. So, it's not just the networking bit. So, that policy will compile down to abstractions that are decoupled from the physical hardware. VMs, virtual networks, virtual storage units. Right, so that will compile it down to a management layer that works with these abstractions and then those map down to the actual hardware. I think that's kind of the stack, so on the networking piece, for the physical networking it should still just be moving packets but there should be a virtual networking layer on the edge that exposes, but look like physical networks are actually virtual abstractions and then that could be driven by the policy layer. >> Got it. Got it. So, you made a point that was a good one that if we look at performance latency specifically, which is something that we're talking about in the class. If you look at where the latency comes from end to end it's not propagation delay through the network, for example. Which is since we're talking about relatively short distances in the data center speed of light. >> Right. >> It might be in some cases a congestion but you've always got that latency that you're paying to go up and down the host stacks on both ends. >> Yeah. >> And arguably, as we make more and more use of software and virtualization at the edge, Maybe we're making that problem worse. Are we and how do we deal with that? >> Yeah, no. Okay, so I think this is actually the question. Which is, we have such a deconstructionist view [LAUGH] of everything. [LAUGH] We're like, I study networking so I'm just going to focus on the network. And I think having a holistic end to end systems view is so important to answering this question. And every time we add a layer there are implications. So I actually just want to walk through. So let's say- >> Okay. >> One BM sending to another BM something. So what normally happens? Let's say you've got some Python code. And that Python code is talking to some Python library. Right, so now you know you've got that interaction. And the Python library is probably going to the Python interpreter and that's probably calling some lib c user space library that's calling the system call and the and now you go to the kernel. And the kernel and you've got all of this kind of software overhead. Now you're in the kernel, you've go the L12 prestacks. You've got all the retransmission timers. You've got all the buffering that's going on there, all of that memory management. That will trap into the hypervisor. And every time you do a domain crossing you lose your TLD, right? You probably lose some cache locality. I mean, this is like non-trivial stuff. >> Mm-hm. >> And then you put the packet, you DMA the packet and then it goes out. So if you look at that like that is a ton of software, a ton of memory handling that's going on. And in fact the most expensive thing in the case of virtualization is actually that domain crossing between the guest and the hypervisor because you do lose your TLB, you lose cache locality. So for example if a guest, so if you're on bare metal, you can send packets at 10 gigs, no problem. But if you put it in a hypervisor that number often drops down to, say 2 gigabits per second. So it turns out the most expensive thing, like I said, is that domain crossing. >> Yeah. >> So if you can somehow reduce that domain crossing, you get a lot more performance. So what is done in modern systems today. So we're all looking at the network and we thought the network was bad. Then we started profiling it and we realized that domain crossing was very expensive. So if you do optimizations, for example, expose TSO to the virtual drivers. So, TSO is TCP Segmentation Offloading. So, now you're saying, okay, if you send packets to me, if you send the TCP thing, instead of sending 1500 Bytes which would be like a sized packet you can say 32K. These massive frames and we'll go ahead and do the segmentation and the nick for example, now you get back to 10 gigs. So if you can reduce the interrupts between those two things by sending bigger frames, all of a sudden you get your performance back and these days VMs can send traffic almost as fast as without there being a VM. You can easily get 10 gigs out of these now. But the way you got there is you understood what was happening at the edge. This was like operating systems and computer architecture work, not networking work to get there. So I know that was a long answer. Yes, I do think we're adding a lot of overhead, and yes, I do think we can solve those things. But the way to do it is not making it a network problem. It's making it a systems, operating systems computer architecture problem. >> Right, right. Yeah, and that's something that is kind of exposing one of the broader challenges when we move from sort of a more hardware oriented world to a more software oriented world. We have some of these growing pains. >> Yeah, and also I would say it's also as the disciplines grow up and we realize that you can't, I mean how interrelated they are, and you have boundaries moving. LIke the axis layer of the network is now basically an X86 problem. So those of us that, you and I went to school in networking. I remember all my introductory classes. L2 and L3 and TCP sawtooth and all this other stuff. And you realize that even doing switching performance now, you have to understand caching architectures. [LAUGH] So there's really a blend going on in discipline in order to provide end to end guarantees. >> Okay so these were some of the challenges that Nicira, the company, and you worked through in order to get to high performance. Now we talked a lot about performance, a little bit about security. One of the interesting things I think is in where you started was building a way to build networks in a programmable way. And you started with high level languages and Nicira kind of went towards a controller, or an application I should say, that was targeted towards a particular use case. Where do you think the SDN controller and SDN application space is going, or should go in terms of solving actual use cases? Are there going to be controllers for specific problems? Are there going to be general high level languages, how is that architecture going to shake out do you think? >> Yeah, I mean this question really points to probably one of the three major insights that I've learned, or things that I've noticed over the past eight years, which is the following. I mean, we got so overrotated on the controller. We're going to build this general tool, it will make everyone's life easier so your mom can use Pythons to ride networking apps. And it just turns out that is so patently false. I'll tell you why, I think that we got to the source of why. Which is, networking sales are distributed problems. The distributed system is a problem, right? And so, in distributed systems normally, you're making trade offs between state consistency and scalability. I mean, the only thing that can actually make those trade offs is the application. It's like, I don't know from a controller architecture what you as an application writer, what guarantees you need, what consistency models you need, or so forth. So we ended up building this second version of, the second iteration controller called Onyx. I know there's a paper Onyx. And I, component and I were working on this controller and we're like okay, so Nox was great but it wasn't really a distributed system we have very basic notions of distribution. So we're going to build a controller that provides all of the distribution components for it generally and the application can use those. We'll have different distributed data storage. And different distributed coordination mechanisms, like distributed locking, and so forth. And that will make it much easier. But once you realize it. So we probably spent, I don't know, maybe four man years building that. What you realize is, any application, the only way to do something truly general is you expose all of these options to the application. So, you're really pushing the complexity in the application. Now the application has to implement anytime you, all of the actual choices. It'll manage data, it has to choose whether it's eventually consistent or strongly consistent, it has to do all of these itself. And so you're just kind of moving the complexity bubble up to the application, and so, if I do a comparison, we built a network virtualization system, that was probably 100 man years of work, versus 4 man years work for the controller. And so I think that the industry's understanding this now. Which I mean, it does help to have a general controller platform. But because so much complexity goes in the application, I think these should be as thin as possible. And we should stop using the analogy to OS platforms, because it's very, very different. I mean there's a much limited number of applications for a network. Right, I mean you know, think about it. In compute, listen, we use the same machines to solve physics problems, to do work, and to play Pac-Man, right? Like compute is anything solving the world's problems. All applications for STN are just to run a network. Like we don't play Pac-Man on networks, and we don't solve physics problems. They're all for the network. So I think there's a limited domain, and I think each one of those has its own set of tradeoffs. So I think we should have very thin controller layers, and we should really focus on building out correct applications. And we don't need because there's orders of magnitude less applications for the network. I don't think we need to over rotate on how much, duplicate work we do in them, just because they're so distributed by nature. >> You got it. Okay, so we've got a thin controller, and what does thin mean? Does that mean kind of a pretty low-level interface similar to open flow? Yeah, so what I would do, is, well, so I would. And again, I think it's probably not super architecturally significant. But I would have a higher level interface to state management on a network. So I always feel, so from wire protocol but you need some higher level state interface. So I like viewing the forwarding state of the network as basically a database. So as a state consistency pro, level. So I think that the controller should be like okay, here's state, and if you manage this state it's going to be consistent on the physical network. And you can manage one device and you can manage across those devices, but any sort of distributed coordination you're doing, any sort of consistency guarantees that's kind of up to you on how you implement that. So I think actually like a wrapper over the wire protocol open portal, whatever protocol you're using is probably sufficient, and then you want to use whatever protocol, like open flow, as efficiently as possible. So for example, if state changes down there, then you get the changes up here, you know, you're pushing it down an efficient manner. But I do think that just viewing that layers of state management protocol's probably the right one and having a high level protocol independent interface is sufficient, and this is just not a hard thing to do. >> So the data that the controller exposes might be pretty low level forwarding rules. Matches and actions and what, similar to open flow or MPLS or other. Not that those are directly comparable with their, you know, dealing with certain kinds of matches and actions and packets. That kind of low level, but you don't want the applications to have to each implement the communication protocol with the switches, so you expose something more like a database of this information. Is that what you're saying? >> That what I'm saying. So I said that the forwarding information is still exposed, is basic boarding information, but the actual, it looks kind of like a local cache of something that you're doing. I mean, the actual data maintenance is something that's actually provided by the controller layer. But the thing that's difficult, I've found, about exposing to applications higher level abstractions. It seems that the way that we describe communication really is best described by action lookup, right? I mean, and maybe if you're writing a new type of application this isn't the case, but if you're running existing applications they require L2 broadcast. That's just kind of an assumption that it's going to be there, right? All the packet headers which are chiseled into 30 years of operating system stacks. Assume IP addresses, it assumes L2 addresses. And so the addressing structure and the service models are kind of baked in there already. And if so, if you're going to be building a way that these actually work. You're going to be wanting to operate over these types of fields. And I think maybe you can evolve, you're going to evolve the adversing layer at the edge, and you're going to evolve the basic expectations of the application for the network. Like you don't get multitask, you don't get broadcast, and you start evolving that, then I think you can change the fording abstractions. But our first attempt was to use a high level domain specific language for managing state, and the problem is it just wasn't sufficiently general to manage legacy applications. And if that's the case you always revert to doing it anyways, so I think for the general purpose case, yes, you should just expose the low level stuff. Maybe, as we evolve applications, so that you don't have L2, we can put higher level abstractions or different addressing architectures that are exposed. >> Uh-huh. Yeah and if you look at Nicira's controller versus Google's controller for their wide area backbone, those are both, you can call them SDN applications but they're very different sort of applications and controllers. >> Totally, absolutely. And I have always maintained, and I still do, that the problem of the WAN is very different from the problem of the data center. So, if you look at the data center, I actually don't think any networking that you want to do. There's no like path selection, I mean, in the data center, bandwidth is really cheap and it's all close together. So any path is almost as good as any other path. So just using simple randomization like TCMP will get very close to optimal. Right, I mean there's even like a mathematical result, as you know on this. You've seen people approximate valiant load balancing and valiant load balancing which is basically sending a packet to a random location and then the destination is a factor of optimal, for any traffic matrix. So, I think that entire problem is solved by the physical network in the data center. Moving a packet between point A and point B efficiently. In the core, that's absolutely not the case. In the core, the choice of a path, different paths are very different, they may cost a lot more money because bandwidth's expensive or you may not have a lot of bandwidth on a certain path. So that actually intelligently choosing a path rather than just randomly selecting one is a very specific problem. And one in which global optimization actually makes a lot of sense. So I think the controller you would build for that and certainly the application you would build for that is very, very different from what you would do with a datacenter. So I think even the idea of having the same controller for the core and the data center may not make a lot of sense, maybe that thin state management layer. >> State. Yeah. >> That we spoke about, but nothing more than that. They're just profoundly different problems. >> Right, right. Okay, so we've talked about these different application domains and Nicira found an impactful application for SDN in the context of cloud networking. So since this is a class on cloud networking, let's focus on the cloud for a little. You know, there was this need to build virtual networks to support multi-tenancy in the data center, so that was one of the challenges to realize the cloud in an effective way. Where do you think cloud networking is going beyond that? Have we solved the data center problems? Have we solved a wide area of other problems? Where are we headed? As the cloud evolves. >> Yeah, so, I like to carve it out into three types of problems, just to focus on the data center right now. [INAUDIBLE] I'm not clear what the problems are on the core. Honestly, it seems to me that their job at the core is to move backwards between data centers. So, when it comes to the data center, especially modern cloud data centers, here are the two problems. The first one's an operational problem. The time it takes to configure and provision. The network relative to applications and computing embarrassing right. I mean it takes minutes or seconds to spit up the ems, it takes minutes or seconds to provision and automate the orchestrate provisions of applications. Took a long time. So the direct impact say to business is, if I want to get a job done like deploying an application, or boarding an new employee, or boarding a new customer, whatever it is. Spinning up a environment, it takes a long time. So that time. If you reduce that time say to zero, there's value to whoever tried to get something done, in this case, the business. So I think the first class of problems is basically automating the provisioning configuration of networking resources, so I think network virtualization does a good job of this, just like computer virtualization, so that's the first problem to me. I think that the problem space is well understood, all of the impact to people adopting this are around the operations of this. They have to change their operational run book, so it's actually, I think that we've moved out of a tactical domain now it's an organizational domain. And that's the state [INAUDIBLE] is there. >> I see. >> The second area problem domain that's very interesting is security. There's this old adage. We've been saying this decades but it's true that the majority of security is implemented at the perimeter and there's technical reasons for this. So if you look industry wide about 80% of security spend is on the perimeter. Perimeter is only some fraction of the bandwidth, right? I mean even though in the most massive data center you don't have a 100 gigabits per second going in and out of the perimeter which you can probably use to exit in it's a client systematic, right? I mean it's not not just a long bandwidth. However the traffic within the data center. The data center has terabits of capacity, I mean one of rack switches is probably a terabit, something like that 48 down, 10 gigs down, 4 gigs up, that's a ton of data. And so if we're going to be putting networking services in particular security services on traffic that never leaves the data center. The [INAUDIBLE] traffic you have to handle tons of terabytes of traffic in even moderate sized data centers. And so that's a very technical problem. Again, if you look at the cloud networking problem, you're saying okay, I'm basically taking a big data center, I'm carving it up, and I'm allowing people to use this carved up resource and often the traffic never leaves. So you now have to put security services on all of that traffic. And then again network virtualization tackles this problem pretty well because it distributes all of these services and it runs it in the edge. There's a lot more work to do there because as we spoke before, there's a lot of context you can get. because you're already there on the application and you're already there on the edge. And people only been able to distribute a couple of security services, because for example, distributed fire walling. And so if you think about it, you're taking something like a firewall, or an IDS, or an existing security appliance and your chopping it up into little pieces and you're running it all at the edge. This is only been successfully done with firewalling and IDS to my knowledge. And so I think there's a lot more work to do there. And then the final one is entry data center type use cases. So you've got automation, you've got security, and then the type of use case is between data centers. If you want to migrate work load between data centers for planned maintenance, maybe lightening strikes and takes out a data center. You want another one to pop up and so now you have to solve Some of the issues. You want to ideally do this without disrupting communication and that's also kind of. We know how to do this technically, but this is a much more difficult problem when you have to have a bunch of different organizations agree to different types of technologies. And there is a kind of massive upgrade one. So If I go across those three use cases I would say like automation we understand the technical problems as an industry and as a discipline, >> Yeah. >> But there are issues we don't get. And there is a lot of tooling to make this kind of easier. But I don't think we are there yet. Security is still tons of technical work to do. Getting better semantics on that, and just figuring out how to distribute these services. And then on the inter data center type use cases, I think that we understand the technologies for things. I don't think there's deep technical problems there. But figuring out how to roll out legacy environments was incredibly difficult, because people aren't going to just implement list everywhere, or whatever your chosen technical solution is. >> And do you think that's going to happen? So the second problem, so in the first case these kind of human organizational problems. In the second case with the threat surface is changing and broadening even quantitatively, right. You have much more traffic to process. And along with that, in the security challenge there's the technical challenge, but there's also this opportunity that you have much more context and information than you previously only had at the edge. >> Yeah absolutely. I think there's actually a ton of good. Technical work still to do because you've got a global view more visibility than you've ever had. You've got the ability now to consistently manage stage. You've got more semantics just like you said. And now we actually know how to build separate trust domains. On the edge. You can have a separate trust domain running there where you can do things like remediation. You don't have to like send a packet to a firewall to drop it. You can actually drop it within the hyper visor which is a separate trust domain. So it just feels like you can take a lot out of our understanding of operating. Systems and building like secure domains and operating systems. Even say like, language constructs and apply it to the network security problem and enforce it globally. So I just feel like there's so much work that we can do there that we're just starting. I mean really the state of the art today is, I've got a fully distributed [INAUDIBLE] firewall that runs on every server. So it looks like you've got a terabit firewall which is great, but that's actually taking a very traditional piece of security equipment. And distributing it as opposed to fundamentally evolving it now that you're running in x86. And I do think there's a lot of great types of work there to do. >> Cool. And then the last problem of getting to the wire. You mentioned ability and dynamic relocation, I think you said, of workloads. >> Yeah or disaster recovery. >> Oh, disaster recovery. >> Yep. >> If that's something that you think, so these are getting into kind of very large technical challenges. How do you think this will play out with the hyper scale cloud providers providing services, versus some entities building private clouds? And getting to that level of very high availability and being able to do disaster recovery. These are complex services to provide. >> Yeah, so well, it seems to me that from a technical component standpoint, we understand >> What's needed to do things like high availability or disaster recovery to data centers. Like if you take anybody that's studied networking, like okay you need to make sure that you have, you decouple an IP address from the location. They'll say list, or. [LAUGH] >> Right. >> We've got so many tools that we can throw at this problem. >> Yeah. >> And we know all of these tools, whether you do something super stupid like, I'm going to do slash thirty-two, source routing, I'm going to advertise every time I move to a new location, I'm going to use a level of indirection like Lisp or like tunneling or whatever, we know all these. The problem is that, these are implement, this is a global problem. We're like, if everybody on the Internet connecting to a set of data centers and so you need to effect everybody on the Internet, and so it becomes an Internet upgrade problem. Which is in order to do this effectively at the network layer, at the network layer, you're going to have to update all the networking components to do it. And I think that's why these problems are being solved at higher layers now. And I think a very interesting question, it's a little bit different than the question you asked, and we can get back to that question, but the most interesting question I have is it seems that, there's a race between the network layers evolving to solve problems and the applications. And it seems like the applications are winning right. If you look at modern applications solve a lot of these problems, so they don't rely on the network, so they don't rely on infrastructure. They have their own they have their own techniques, they have their own. They do their own discovery. And if all this functionality moves to the application because it's taking so long for the infrastructure to do it, it may be that problem is just, even on the global one, is getting package point A and point B. And don't worry about relocating. Don't worry about movement. >> Mm-hm. >> So again I think to answer your question, my guess, if I was to predict the future, is that Cloud applications. >> You know software applications that you get as a service are going to actually consume all of these things and it's not going to be the infrastructure. In the long term, that provides it. Whether it's, that is written, run by enterprise, run by a big cloud service, my expectation it's all going to go in the application. >> Okay. Well, on that prediction of the future, maybe that's a good note to wrap up on. We'll check back with you in five years or so. >> [LAUGH] [INAUDIBLE] >> See where that went. So thanks so much for taking the time to chat, Martine. >> [INAUDIBLE] >> And you're welcome back any time. >> All right. Thanks so much. >> Okay. Bye bye. [MUSIC]