Networking Doesn’t Need a VMWare …Posted: January 9, 2012
[This post was written with Andrew Lambeth. Andrew has been virtualizing networking for long enough to have coined the term "vswitch", and led the vDS distributed switching project at VMware. ]
Or at least, it doesn’t need to solve the problem in the same way.
It’s commonly said that “networking needs a VMWare”. Hell, there have been occasions in which we’ve said something very similar. However, while the analogy has an obvious appeal (virtual, flexible, thin layer of indirection in software, commoditize, commoditize, commoditize!), a closer look suggests that it draws from a very superficial understanding of the technology, and in the limit, it doesn’t make much sense.
It’s no surprise that many are drawn to this line of thought. It probably stems from the realization that virtualizing the network rather than managing the physical components is the right direction for networks to evolve. On this point, it appears there is broad agreement. In order to bring networking up to the operational model of compute (and perhaps disrupt the existing supply chain a bit) virtualization is needed.
Beyond this gross comparison, however, the analogy breaks down. The reality is that the technical requirements for server virtualization and network virtualization are very, very different.
Server Virtualization vs. Network Virtualization
With server virtualization, virtualizing CPU, memory and device I/O is incredibly complex, and the events that need to be handled with translation or emulation happen at CPU cycle timescale. So the virtualization logic must be both highly sophisticated and highly performant on the “datapath” (the datapath for compute virtualization being the instruction stream and I/O events).
On the other hand, the datapath operations for network virtualization are almost trivially simple. All they involve is mapping one address/context space to another address/context space. This effectively reduces to an additional header on the packet (or tag), and one or two more lookups on the datapath. Somewhat revealing of this simplicity, there are multiple reasonable solutions that address the datapath component, NVGRE, and VXLAN being two recently publicized proposals.
If the datapath is so simple, it’s reasonable to ask why network virtualization isn’t already a solved problem.
The answer, is that there is a critical difference between network virtualization and server virtualization and that difference is where the bulk of complexity for network virtualization resides.
What is that difference?
Virtualized servers are effectively self contained in that they are only very loosely coupled to one another (there are a few exceptions to this rule, but even then, the groupings with direct relationships are small). As a result, the virtualization logic doesn’t need to deal with the complexity of state sharing between many entities.
A virtualized network solution, on the other hand, has to deal with all ports on the network, most of which can be assumed to have a direct relationship (the ability to communicate via some service model). Therefore, the virtual networking logic not only has to deal with N instances of N state (assuming every port wants to talk to every other port), but it has to ensure that state is consistent (or at least safely inconsistent) along all of the elements on the path of a packet. Inconsistent state can result in packet loss (not a huge deal) or much worse, delivery of the packet to the wrong location.
It’s important to remember that networking traditionally has only had to deal with eventual consistency. That is “after state change, the network will take some time to converge, and until that time, all bets are off”. Eventual consistency is fine for basic forwarding provided that loops are prevented using a TTL, or perhaps the algorithm ensures loop freedom while it is converging. However, eventual consistency doesn’t work so well with virtualization. During failure, for example, it would suck if packets from tenant A managed to leak over to tenant B’s network. It would also suck if ACLs configured in tenant A’s were not enforced correctly during convergence.
So simply, the difference between server virtualization, and network virtualization is that network virtualization is all about scale (dealing with the complexity of many interconnected entities which is generally a N2 problem), and it is all about distributed state consistency. Or more concretely, it is a distributed state management problem rather than a low level exercise in dealing with the complexities of various hardware devices.
Of course, depending on the layer of networking being virtualized, the amount of state that has to be managed varies.
All network virtualization solutions have to handle basic address mapping. That is, provide a virtual address space (generally addresses of the packets within the tunnel) and the physical address space (the external tunnel header), and a mapping between the two (virtual address X is at physical address Y). Any of the many tunnel overlays solutions, whether ad hoc, proprietary, or standardized provide this basic mapping service.
To then virtualize L2, requires almost no additional state management. The L2 forwarding tables are dynamically populated from passing traffic. And the size of a single broadcast domain has fairly limited scale, supporting hundreds or low thousands of active MACs. So the only additional state that has to be managed is the association of a port (virtual or physical) to a broadcast domain which is what virtual networking standards like NVGRE and VXLAN provide.
As an aside, it’s a shame that standards like NVGRE and VXLAN choose to dictate the wire format (important for hardware compatibility) and the method for managing the context mapping between address domains (multicast), but not the control interface to manage the rest of the state. Specifying the wire format is fine. However, requiring a specific mechanism (and a shaky one at that) for managing the virtual to physical address mappings severely limits the solution space. And not specifying the control interface for managing the rest of the state effectively guarantees that implementations will be vertically integrated and proprietary.
For L3, there is a lot more state to deal with, and the number of end points to which this state applies can be very large. There are a number of datacenters today who have, or plan to have, millions of VMs. Because of this, any control plane that hopes to offer a virtualized L3 solution needs to manage potentially millions of entries at hundreds of thousands of end points (assuming the first hop network logic is within the vswitch). Clearly, scale is a primary consideration.
As another aside, in our experience, there is a lot of confusion on what exactly L3 virtualization is. While a full discussion will have to wait for a future post, it is worth pointing out that running a router as a VM is *not* network virtualization, it is x86 virtualization. Network virtualzation involves mapping between network address contexts in a manner that does not effect the total available bandwidth of the physical fabric. Running a networking stack in a virtual machine, while it does provide the benefits of x86 virtualization, limits the cross-sectional bandwidth of the emulated network to the throughput of a virtual machine. Ouch.
For L4 and above, the amount of state that has to be shared and the rate that it changes increases again by orders of magnitude. Take, for example, WAN optimization. A virtualized WAN optimization solution should be enforced throughout the network (for example, each vswitch running a piece of it) yet this would incur a tremendous amount of control overhead to create a shared content cache.
So while server virtualization lives and dies by the ability to deal with the complexity of virtualizing complex hardware interfaces of many devices at speed, network virtualization’s primary technical challenge is scale. Any solution that doesn’t deal with this up front will probably run into a wall at L2, or with some luck, basic L3.
This is all interesting … but why do I care?
Full virtualization of the network address space and service model is still a relatively new area. However, rather than tackle the problem of network virtualization directly, it appears that a fair amount of energy in industry is being poured into point solutions. This reminds us of the situation 10 years ago when many people were trying to solve server sprawl problems with application containers, and the standard claim was that virtualization didn’t offer additional benefits to justify the overhead and complexity of fully virtualizing the platform. Had that mindset prevailed, today we’d have solutions doing minimal server consolidation for a small handful of applications on only one or possibly two OSs, instead of a set of solutions that solve this and many many more problems for any application and most any OS. That mindset, for example could never have produced vMotion, which was unimaginable at the outset of server virtualization.
At the same time, those who are advocating for network virtualization tend to draw technical comparisons with server virtualization. And while clearly there is a similarity at the macro level, this comparison belies the radically different technical challenges of the two problems. And it belies the radically different approaches needed to solve the two problems. Network virtualization is not the same as server virtualization any more than server virtualization is the same as storage virtualization. Saying “the network needs a VMware” in 2012 is a little like saying “the x86 needs an EMC” in 2002.
Perhaps the confusion is harmless, but it does seem to effect how the solution space is viewed, and that may be drawing the conversation away from what really is important, scale (lots of it) and distributed state consistency. Worrying about the datapath , is worrying about a trivial component of an otherwise enormously challenging problem.