Two weeks ago I gave a short presentation at the Open Networking Summit. With only 15 minutes allocated per speaker, I wasn’t sure I’d be able to make much of an impact. However, there has been a lot of reaction to the talk – much of it positive – so I’m posting the slides here and including them below. A video of the presentation is also available in the ONS video archive (free registration required).
[This post was written by JR Rivers, Bruce Davie, and Martin Casado]
One of the important characteristics of network virtualization is the decoupling of network services from the underlying physical network. That decoupling is fundamental to the definition of network virtualization: it’s the delivery of network services independent of a physical network that makes those services virtual. Furthermore, many of the benefits of virtualization – such as the ability to move network services along with the workloads that need those services, without touching hardware – follow directly from this decoupling.
In spite of all the benefits that flow from decoupling virtual networks from the underlying physical network, we occasionally hear the concern that something has been lost by not having more direct interaction with the physical network. Indeed, we’ve come across a common intuition that applications would somehow be better off if they could directly control what the physical network is doing. The goal of this post is to explain why we disagree with this view.
It’s worth noting that this idea of getting networks to do something special for certain applications is hardly a novel idea. Consider the history of Voice over IP as an example. It wasn’t that long ago when using Ethernet for phone calls was a research project. Advances in the capacity of both the end points as well as the underlying physical network changed all of that and today VOIP is broadly utilized by consumers and enterprises around the world. Let’s break down the architecture that enabled VOIP.
A call starts with end-points (VOIP phones and computers) interacting with a controller that provisions the connection between them. In this case, provisioning involves authenticating end-points, finding other end-points, and ringing the other end. This process creates a logical connection between the end-points that overlays the physical network(s) that connect them. From there, communication occurs directly between the end-points. The breakthroughs that allowed Voice Over IP were a) ubiquitous end-points with the capacity to encode voice and communicate via IP and b) physical networks with enough capacity to connect the end-points while still carrying their normal workload.
Now, does VOIP need anything special from the network itself? Back in the 1990s, many people believed that to enable VOIP it would be necessary to signal into the network to request bandwidth for each call. Both ATM signalling and RSVP (the Resource Reservation Protocol) were proposed to address this problem. But by the time VOIP really started to gain traction, network bandwidth was becoming so abundant that these explicit communication methods between the endpoints and the network proved un-necessary. Some simple marking of VOIP packets to ensure that they didn’t encounter long queues on bottleneck links was all that was needed in the QoS department. Intelligent behavior at the end-points (such as adaptive bit-rate codecs) made the solution even more robust. Today, of course, you can make a VOIP call between continents without any knowledge of the underlying network.
These same principles have been applied to more interactive use cases such as web-based video conferencing, interactive gaming, tweeting, you name it. The majority of the ways that people interact electronically are based on two fundamental premises: a logical connection between two or more end-points and a high capacity IP network fabric.
Returning to the context of network virtualization, IP fabrics allow system architects to build highly scalable physical networks; the summarization properties of IP and its routing protocols allow the connection of thousands of endpoints without imposing the knowledge of each one on the core of the network. This both reduces the complexity (and cost) of the networking elements, and improves their ability to heal in the event that something goes wrong. IP networks readily support large sets of equal cost paths between end-points, allowing administrators to simultaneously add capacity and redundancy. Path selection can be based on a variety of techniques such as statistical selection (hashing of headers), Valiant Load Balancing, and automated identification of “elephant” flows.
Is anything lost if applications don’t interact directly with the network forwarding elements? In theory, perhaps, an application might be able to get a path that is more well-suited to its precise bandwidth needs if it could talk to the network. In practice, a well-provisioned IP network with rich multipath capabilities is robust, effective, and simple. Indeed, it’s been proven that multipath load-balancing can get very close to optimal utilization, even when the traffic matrix is unknown (which is the normal case). So it’s hard to argue that the additional complexity of providing explicit communication mechanisms for applications to signal their needs to the physical network are worth the cost. In fact, we’ll argue in a future post that trying to carefully engineer traffic is counter-productive in data centers because the traffic patterns are so unpredictable. Combine this with the benefits of decoupling the network services from the physical fabric, and it’s clear that a virtualization overlay on top of a well-provisioned IP network is a great fit for the modern data center.
[This post was written by Bruce Davie and Martin Casado.]
With the growth of interest in network virtualization, there has been a tendency to focus on the encapuslations that are required to tunnel packets across the physical infrastructure, sometimes neglecting the fact that tunneling is just one (small) part of an overall architecture for network virtualization. Since this post is going to do just that – talk about tunnel encapsulations – we want to reiterate the point that a complete network virtualization solution is about much more than a tunnel encapsulation. It entails (at least) a control plane, a management plane, and a set of new abstractions for networking, all of which aim to change the operational model of networks from the traditional, physical model. We’ve written about these aspects of network virtualization before (e.g., here).
In this post, however, we do want to talk about tunneling encapsulations, for reasons that will probably be readily apparent. There is more than one viable encapsulation in the marketplace now, and that will be the case for some time to come. Does it make any difference which one is used? In our opinion, it does, but it’s not a simple beauty contest in which one encaps will be declared the winner. We will explore some of the tradeoffs in this post.
There are three main encapsulation formats that have been proposed for network virtualization: VXLAN, NVGRE, and STT. We’ll focus on VXLAN and STT here. Not only are they the two that VMware supports (now that Nicira is part of VMware) but they also represent two quite distinct points in the design space, each of which has its merits.
One of the salient advantages of VXLAN is that it’s gained traction with a solid number of vendors in a relatively short period. There were demonstrations of several vendors’ implementations at the recent VMworld event. It fills an important market need, by providing a straightforward way to encapsulate Layer 2 payloads such that the logical semantics of a LAN can be provided among virtual machines without concern for the limitations of physical layer 2 networks. For example, a VXLAN can provide logical L2 semantics among machines spread across a large data center network, without requiring the physical network to provide arbitrarily large L2 segments.
At the risk of stating the obvious, the fact that VXLAN has been implemented by multiple vendors makes it an ideal choice for multi-vendor deployments. But we should be clear what ”multi-vendor” means in this case. Network virtualization entails tunneling packets through the data center routers and switches, and those devices only forward based on the outer header of the tunnel – a plain old IP (or MAC header). So the entities that need to terminate tunnels for network virtualization are the ones that we are concerned about here.
In many virtualized data center deployments, most of the traffic flows from VM to VM (“east-west” traffic) in which case the tunnels are terminated in vswitches. It is very rare for those vswitches to be from different vendors, so in this case, one might not be so concerned about multi-vendor support for the tunnel encaps. Other issues, such as efficiency and ability to evolve quickly might be more important, as we’ll discuss below.
Of course, there are plenty of cases where traffic doesn’t just flow east-west. It might need to go out of the data center to the Internet (or some other WAN), i.e. “north-south”. It might also need to be sent to some sort of appliance such as a load balancer, firewall, intrusion detection system, etc. And there are also plenty of cases where a tunnel does need to be terminated on a switch or router, such as to connect non-virtualized workloads to the virtualized network. In all of these cases, we’re clearly likely to run into multi-vendor situations for tunnel termination. Hence the need for a common, stable, and straightfoward approach to tunneling among all those devices.
Now, getting back to server-server traffic, why wouldn’t you just use VXLAN? One clear reason is efficiency, as we’ve discussed here. Since tunneling between hypervisors is required for network virtualization, it’s essential that tunneling not impose too high an overhead in terms of CPU load and network throughput. STT was designed with those goals in mind and performs very well on those dimensions using today’s commodity NICs. Given the general lack of multi-vendor issues when tunneling between hypervisors, STT’s significant performance advantage makes it a better fit in this scenario.
The performance advantage of STT may be viewed as somewhat temporary - it’s a result of STT’s ability to leverage TCP segmentation offload (TSO) in today’s NICs. Given the rise in importance of tunneling, and the momentum behind VXLAN, it reasonable to expect that a new generation of NICs will emerge that allow other tunnel encapsulations to be used without disabling TSO. When that happens, performance differences between STT and VXLAN should (mostly) disappear, given appropriate software to leverage the new NICs.
Another factor that comes into play when tunneling traffic from server to server is that we may want to change the semantics of the encapsualution from time to time as new features and capabilities make their way into the network virtualization platform. Indeed, one of overall advantages of network virtualization is the ease with which the capabilities of the network can be upgraded over time, since they are all implemented in software that is completely independent of the underlying hardware. To make the most of this potential for new feature deployment, it’s helpful to have a tunnel encaps with fields that can be modified as required by new capabilities. An encaps that typically operates between the vswitches of a single vendor (like STT) can meet this goal, while an encaps designed to facilitate multi-vendor scenarios (like VXLAN) needs to have the meaning of every header field pretty well nailed down.
So, where does that leave us? In essence, with two good solutions for tunneling, each of which meets a subset of the total needs of the market, but which can be used side-by-side with no ill effect. Consequently, we believe that VXLAN will continue to be a good solution for the multi-vendor environments that often occur in data center deployments, while STT will, for at least a couple of years, be the best approach for hypervisor-to-hypervisor tunnels. A complete network virtualization solution will need to use both encapsulations. There’s nothing wrong with that – building tunnels of the correct encapsulation type can be handled by the controller, without the need for user involvement. And, of course, we need to remember that a complete solution is about much more than just the bits on the wire.
[This post was written with input from Martin Casado, Ben Pfaff, Justin Pettit and Ben Basler.]
The Open vSwitch (OVS) project is obviously dear to our hearts at Nicira (and now VMware). OVS is a fairly standard open source project, with many dozens of people from companies around the world contributing patches and reviewing them. However, there is more to openness than just open source software; open protocols (with freely accessible specs) are also important. Of course, Open vSwitch is well known as an implementation of the OpenFlow protocol, for which the specs are freely available. But there is another protocol, arguably just as important as OpenFlow, which is used to manage Open vSwitch instances. That protocol is known as the Open vSwitch Data Base management protocol or OVSDB protocol. While the specification of that protocol can be found within the Open vSwitch sources, it’s a bit of an effort to figure out exactly how it works. With that in mind, as well as a desire to be as open as possible, we decided to publish a spec for the OVSDB protocol in a more familiar and accessible format – an Internet draft.
To be clear, anyone can publish an Internet draft, and that does not make something into a standard. There’s a possibility that this Internet draft may be suitable for publication as an informational RFC. That wouldn’t make it a standard either, but it would at least provide an archival publication mechanism for a protocol that is quite widely used. Whether that happens or not depends on the “Independent Stream Editor”, part of the rather complex organization that handles RFC publication. (See http://www.rfc-editor.org/RFCeditor.html.)
So, what is this OVSDB protocol? Obviously, you could just go and read our new Internet draft, but here is the quick summary. While OpenFlow establishes flow state in a switch, there’s a lot more to Open vSwitch – indeed there is more to networking – than just setting up flow (or forwarding) table entries. In Open vSwitch, you can create many virtual switch instances, attach interfaces to those switches, set QOS policies on interfaces, and so on. None of these configuration tasks can be done with OpenFlow, so you need a management/configuration protocol to do them.
The OVSDB protocol has been part of the Open vSwitch implementation for many years. It is essentially a general purpose protocol for interacting with a database, and in Open vSwitch the database in question is a set of tables representing switch configuration data. (Some readers may be familiar with of-config – the OpenFlow config protocol – a more recent effort; we believe that protocol could actually be implemented on top of OVSDB.)
To step back for a moment, networking folks often think of any network device as having a control plane and a data plane. Sadly, the management plane has been all too often neglected. OVSDB is a protocol that was created to address that important but neglected aspect of networking. We think that making networks dramatically easier to manage is in fact one of the major benefits of network virtualization. That’s a topic we’ve discussed elsewhere; for now, I’ll just urge readers of this blog to go take a look at our current approach to managing and configuring Open vSwitch instances.
[This post was written with Jesse Gross, Ben Basler, Bruce Davie, and Andrew Lambeth]
Tunneling has earned a bad name over the years in networking circles.
Much of the problem is historical. When a new tunneling mode is introduced in a hardware device, it is often implemented in the slow path. And once it is pushed down to the fastpath, implementations are often encumbered by key or table limits, or sometimes throughput is halved due to additional lookups.
However, none of these problems are intrinsic to tunneling. At its most basic, a tunnel is a handful of additional bits that need to be slapped onto outgoing packets. Rarely, outside of encryption, is there significant per-packet computation required by a tunnel. The transmission delay of the tunnel header is insignificant, and the impact on throughput is – or should be – similarly minor.
In fact, our experience implementing multiple tunneling protocols within Open vSwitch is that it is possible to do tunneling in software with performance and overhead comparable to non encapsulated traffic, and to support hundreds of thousands of tunnel end points.
And that is the goal of this post: to start the discussion on the performance of tunneling in software from the network edge.
An emerging method of network virtualization is to use tunneling from the edges to decoupled the virtual network address space from the physical address space. Often the tunneling is done in software in the hypervisor. Tunneling from within the server has a number of advantages: software tunneling can easily support hundreds of thousands of tunnels, it is not sensitive to key sizes, it can support complex lookup functions and header manipulations, it simplifies the server/switch interface and reduces demands on the in-network switching ASICs, and it naturally offers software flexibility and a rapid development cycle.
An idealized forwarding path is shown in the figure below. We assume that the tunnels are terminated within the hypervisor. The hypervisor is responsible for mapping packets from VIFs to tunnels, and from tunnels to VIFs. The hypervisor is also responsible for the forwarding decision on the outer header (mapping the encapsulated packet to the next physical hop).
Some Performance Numbers for Software Tunneling
The following tests show throughput and cpu overhead for tunneling within Open vSwitch. Traffic was generated with netperf attempting to emulate a high-bandwidth TCP flow. The MTU for the VM and the physical NICs are 1500bytes and the packet payload size is 32k. The test shows results using no tunneling (OVS bridge), GRE, and STT.
The results show aggregate bidirectional throughput, meaning that 20Gbps is a 10G NIC sending and receiving at line rate. All tests where done using Ubuntu 12.04 and KVM on an Intel Xeon 2.40GHz servers interconnected with a Dell 10G switch. We use standard 10G Broadcom NICs. CPU numbers reflect the percentage of a single core used for each of the processes tracked.
The following results show the performance of a single flow between two VMs on different hypervisors. We include the Linux bridge to show that performance is comparable. Note that the CPU only includes the CPU dedicated to switching in the hypervisor and not the overhead in the guest needed to push/consume traffic.
|Throughput||Recv side cpu||Send side cpu|
|Linux Bridge:||9.3 Gbps||85%||75%|
|OVS Bridge:||9.4 Gbps||82%||70%|
This next table shows the aggregate throughput of two hypervisors with 4 VMs each. Since each side is doing both send and receive, we don’t differentiate between the two.
|OVS Bridge:||18.4 Gbps||150%|
Interpreting the Results
Clearly these results (aside from GRE, discussed below) indicate that the overhead of software for tunneling is negligible. It’s easy enough to see why that is so. Tunneling requires copying the tunnel bits onto the header, an extra lookup (at least on receive), and the transmission delay of those extra bits when placing the packet on the wire. When compared to all of the other work that needs to be done during the domain crossing between the guest and the hypervisor, the overhead really is negligible.
In fact, with the right tunneling protocol, the performance is roughly equivalent to non-tunneling, and CPU overhead can even be lower.
STT’s lower CPU usage than non-tunneled traffic is not a statistical anomaly but is actually a property of the protocol. The primary reason is that STT allows for better coalescing on the received side in the common case (since we know how many packets are outstanding). However, the point of this post is not to argue that STT is better than other tunneling protocols, just that if implemented correctly, tunneling can have comparable performance to non-tunneled traffic. We’ll address performance specific aspects of STT relative to other protocols in a future post.
The reason that GRE numbers are so low is that with the GRE outer header it is not possible to take advantage of offload features on most existing NICs (we have discussed this problem in more detail before). However, this is a shortcoming of the NIC hardware in the near term. Next generation NICs will support better tunnel offloads, and in a couple of years, we’ll start to see them show up in LOM.
In the meantime, STT should work on any standard NIC with TSO today.
The point of this post is that at the edge, in software, tunneling overhead is comparable to raw forwarding, and under some conditions it is even beneficial. For virtualized workloads, the overhead of software forwarding is in the noise when compared to all of the other machinations performed by the hypervisor.
Technologies like passthrough are unlikely to have a significant impact on throughput, but they will save CPU cycles on the server. However, that savings comes at a fairly steep cost as we have explained before, and doesn’t play out in most deployment environments.
[This post was written with Bruce Davie]
Network virtualization has been around in some form or other for many years, but it seems of late to be getting more attention than ever. This is especially true in SDN circles, where we frequently hear of network virtualization as one of the dominant use cases of SDN. Unfortunately, as with much of SDN, the discussion has been muddled, and network virtualization is being both conflated with SDN and described as a direct result of it. However, SDN is definitely not network virtualization. And network virtualization does not require SDN.
No doubt, part of the problem is that there is no broad consensus on what network virtualization is. So this post is an attempt to construct a reasonable working definition of network virtualization. In particular, we want to distinguish network virtualization from some related technologies with which it is sometimes confused, and explain how it relates to SDN.
A good place to start is to take a step back and look at how virtualization has been defined in computing. Historically, virtualization of computational resources such as CPU and memory has allowed programmers (and applications) to be freed from the limitations of physical resources. Virtual memory, for example, allows an application to operate under the illusion that it has dedicated access to a vast amount of contiguous memory, even when the physical reality is that the memory is limited, partitioned over multiple banks, and shared with other applications. From the application’s perspective, the abstraction of virtual memory is almost indistinguishable from that provided by physical memory, supporting the same address structure and memory operations.
As another example, server virtualization presents the abstraction of a virtual machine, preserving all the details of a physical machine: CPU cycles, instruction set, I/O, etc.
A key point here is that virtualization of computing hardware preserves the abstractions that were presented by the resources being virtualized. Why is this important? Because changing abstractions generally means changing the programs that want to use the virtualized resources. Server virtualization was immediately useful because existing operating systems could be run on top of the hypervisor without modification. Memory virtualization was immediately useful because the programming model did not have to change.
Virtualization and the Power of New Abstractions
Virtualization should not change the basic abstractions exposed to workloads, however it nevertheless does introduce new abstractions. These new abstractions represent the logical enclosure of the entity being virtualized (for example a process, a logical volume, or a virtual machine). It is in these new abstractions that the real power of virtualization can be found.
So while the most immediate benefit of virtualization is the ability to multiplex hardware between multiple workloads (generally for the efficiency, fault containment or security), the longer term impact comes from the ability of the new abstractions to change the operational paradigm.
Server virtualization provides the most accessible example of this. The early value proposition of hypervisor products was simply server consolidation. However, the big disruption that followed server virtualization was not consolidation but the fundamental change to the operational model created by the introduction of the VM as a basic unit of operations.
This is a crucial point. When virtualizing some set of hardware resources, a new abstraction is introduced, and it will become a basic unit of operation. If that unit is too fine grained (e.g. just exposing logical CPUs) the impact on the operational model will be limited. Get it right, however, and the impact can be substantial.
As it turns out, the virtual machine was the right level of abstraction to dramatically impact data center operations. VMs embody a fairly complete target for the things operational staff want to do with servers: provisioning new workloads, moving workloads, snapshotting workloads, rolling workloads back in time, etc.
- virtualization exposes a logical view of some resource decoupled from the physical substrate without changing the basic abstractions.
- virtualization also introduces new abstractions – the logical container of virtualized resources.
- it is the manipulation of these new abstractions that has the potential to change the operational paradigm.
- the suitability of the new abstraction for simplifying operations is important.
Given this as background, let’s turn to network virtualization.
Network Virtualization, Then and Now
As noted above, network virtualization is an extremely broad and overloaded term that has been in use for decades. Overlays, MPLS, VPNs, VLANs, LISP, Virtual routers, VRFs can all be thought of as network virtualization of some form. An earlier blog post by Bruce Davie (here) touched on the relationship between these concepts and network virtualization as we’re defining it here. The key point of that post is that when employing one of the aforementioned network virtualization primitives, we’re virtualizing some aspect of the network (a LAN segment, an L3 path, an L3 forwarding table, etc.) but rarely a network in its entirety with all its properties.
For example, if you use VLANs to virtualize an L2 segment, you don’t get virtualized counters that stay in sync when a VM moves, or a virtual ACL that keeps working wherever the VM is located. For those sorts of capabilities, you need some other mechanisms.
To put it in the context of the previous discussion, traditional network virtualization mechanisms don’t provide the most suitable operational abstractions. For example, provisioning new workloads or moving workloads still requires operational overhead to update the network state, and this is generally a manual process.
Modern approaches to network virtualization try and address this disconnect. Rather than providing a bunch of virtualized components, network virtualization today tries to provide a suitable basic unit of operations. Unsurprisingly, the abstraction is of a “virtual network”.
To be complete, a virtual network should both support the basic abstractions provided by physical networks today (L2, L3, tagging, counters, ACLs, etc.) as well as introduce a logical abstraction that encompasses all of these to be used as the basis for operation.
And just like the compute analog, this logical abstraction should support all of the operational niceties we’ve come to expect from virtualization: dynamic creation, deletion, migration, configuration, snapshotting, and roll-back.
Cleaning up the Definition of Network Virtualization
Given the previous discussion, we would characterize network virtualization as follows:
- Introduces the concept of a virtual network that is decoupled from the physical network.
- The virtual networks don’t change any of the basic abstractions found in physical networks.
- The virtual networks are exposed as a new logical abstraction that can form a basic unit of operation (creation, deletion, migration, dynamic service insertion, snapshotting, inspection, and so on).
Network Virtualization is not SDN
SDN is a mechanism, and network virtualization is a solution. It is quite possible to have network virtualization solution that doesn’t use SDN, and to use SDN to build a network that has no virtualized properties.
SDN provides network virtualization in about the same way Python does - it’s a tool (and not a mandatory one). That said, SDN does have something to offer as a mechanism for network virtualization.
A simple way to think about the problem of network virtualization is that the solution must map multiple logical abstractions onto the physical network, and keep those abstractions consistent as both the logical and physical worlds change. Since these logical abstractions may reside anywhere in the network, this becomes a fairly complicated state management problem that must be enforced network-wide.
However, managing large amounts of states with reasonable consistency guarantees is something that SDN is particularly good at. It is no coincidence that most of the network virtualization solutions out there (from a variety of vendors using a variety of approaches) have a logically centralized component of some form for state management.
The point of this post was simply to provide some scaffolding around the discussion of network virtualization. To summarize quickly, modern concepts of network virtualization both preserve traditional abstractions and provide a basic unit of operations which is a (complete) virtual network. And that new abstraction should support the same operational abstractions as its computational analog.
While SDN provides a useful approach to building a network virtualization solution, it isn’t the only way. And lets not confuse tools with solutions.
Over the next few years, we expect to see a variety of mechanisms for implementing virtual networking take hold. Some hardware-based, some software-based, some using tunnels, others using tags, some relying more on traditional distributed protocols, others relying on SDN.
In the end, the market will choose the winning mechanism(s). In the meantime, let’s make sure we clarify the dialog so that informed decisions are possible.
On a lark, I compiled a list of “open” OpenFlow software projects that I knew about off-hand, or could find with minimal effort searching online.
You can find the list here.
Unsurprisingly, most of the projects either come directly from or originated at academia or industry research. As I’ve argued before, with respect to standardization, the more design and insight that comes from real code and plugging real holes, the better. And it is still very, very early days in the SDN engineering cycle. So, it is nice to see the diversity in projects and I hope the ecosystem continues to broaden with more controllers and associated projects entering the fray.
If you know of a project that I’ve missed (I’m only listing those that have code or bits available for free online — with the exception of Pica8 which will send you code on request) please mention it in the comments or e-mail me and I’ll add it to the list.