[This post was co-authored by Bruce Davie and Ken Duda]
Almost a year ago, we wrote a first post about our efforts to build virtual networks that span both virtual and physical resources. As we’ve moved beyond the first proofs of concept to customer trials for our combined solution, this post serves to provide an update on where we see the interaction between virtual and physical worlds heading.
Our overall approach to connecting physical and virtual resources can be viewed in two main categories:
- terminating the overlay on physical devices, such as top-of-rack switches, routers, appliances, etc.
- managing interactions between the overlay and the physical devices that provide the underlay.
We first started working to design a control plane to terminate network virtualization overlays on physical devices in 2012. We started by looking at the information model, defining what information needed to be exchanged between a physical device and a network virtualization controller such as NSX. To bound the problem space, we focused on a specific use case: mapping the ports and VLANs of a physical switch into virtual layer 2 networks implemented as VXLAN-based overlays. (See our posts on the issues around encapsulation choice here and here). At the same time, we knew there were a lot more use cases to be addressed, so we picked a completely extensible protocol to carry the necessary information: OVSDB. This was important: we knew that over time we’d have to support a lot more use cases than just L2 bridging between physical and virtual worlds. After all, one of the tenets of network virtualization is that a virtual network should faithfully reproduce all of the networking stack, from L2-L7, just as server virtualization faithfully reproduces a complete computing environment (as described in more detail here.)
So, the first thing we added to the solution space once we got L2 working was L3. By the time we published the VTEP schema as part of Open vSwitch in late 2013, distributed logical routing was included. (We use the term VTEP – VXLAN tunnel end-point – as a general term for the devices that terminate the overlay.) Let’s take a look at how logical routing works.
Distributed logical routing is an example of a more general capability in network virtualization, the distribution of services. Brad Hedlund wrote some time ago about the value of distributing services among hypervisors. The same basic arguments apply when there are VTEPs in the picture — you want to distribute functions, like logical routing, so that packets always follow the shortest path without hair-pinning, and so that the capacity to perform that function scales out as you add more devices, be they hypervisors or physical switches.
So, suppose a VM (VM1) is placed in logical subnet A, and a physical server (PS1) that is in subnet B is located behind a ToR switch acting as a VTEP (see picture). Say we want to create a logical network that interconnects these two subnets. The logical topology is created by API requests to the network virtualization controller, which in turn programs the vswitches and the ToR to instantiate the desired topology. Part of this process entails mapping physical ports to the logical topology via API requests. Everything the ToR needs to know to participate in the logical topology is provided to it via OVSDB.
Suppose VM1 needs to send a packet to PS1. The VM will send an ARP request towards its default gateway, which is implemented in a distributed manner. (We assume the VM learned its default gateway via some prior step; for example, DHCP may be used.) The ARP request will be intercepted by the local vswitch on the hypervisor. Acting as the logical router, the vswitch will reply to the ARP, so that the VM can now send the packet towards the router. All of this happens without any packet leaving the hypervisor.
The logical router now needs to ARP for the destination — PS1 (assuming an ARP cache miss for the first packet). It will broadcast the ARP request by sending it over a VXLAN tunnel to the VTEP (and potentially to other VTEPs as well, if there are more VTEPs that are involved in logical subnet B). When the ARP packet reaches the ToR, it is sent out on one or more physical interfaces — the set of interfaces that were previously mapped to this logical subnet via API requests. The ARP will reach PS1, which replies; the ToR forwards the reply over a VXLAN tunnel to the vswitch that issued the request, and now it’s able to forward the data traffic to the ToR which decapsulates the packet and delivers it to PS1.
For traffic flowing the other way, the role of logical router would be played by the physical VTEP rather than the vswitch. This is the nature of distributed routing — there is no single box performing all the work for a single logical router, but rather a collection of devices. And in this case, the work is distributed among both hardware VTEPs and vswitches.
We’ve glossed over a couple of details here, but one detail that’s worth noting is that, for traffic heading in the physical-to-virtual direction, the hardware device needs to perform an L3 lookup followed by VXLAN encapsulation. There has been some uncertainty regarding the capabilities of various switching chips to perform this operation (see this post, for example, which tries to determine the capabilities of Trident 2 based on switch vendor information). We’ve actually connected VMware’s NSX controller to ToR switches using at least four different classes of switching silicon (two merchant vendors, two custom ASIC-based designs). This includes both Arista’s 7150 series and 7050X switches. All of these are capable of performing the necessary L3+VXLAN operations. We’ll let the switch vendors speak for themselves regarding product specifics, but we’re essentially viewing this as a non-issue.
OK, that’s L3. What next? Overall, our approach has been to try to provide the same capabilities to virtual ports and physical ports, as much as that is possible. Of course, there is some inherent conflict here: hardware-based end-points tend to excel at throughput and density, while software-based end-points give us the greatest flexibility to deliver new features. Also, given our rich partner ecosystem with many hardware players, it’s not always going to be feasible to expose the unique features of a specific hardware product through the northbound API of NSX. But we certainly see value in being able to do more on physical ports: for example, we should be able to create access control lists on physical ports under API control. Similarly, we’d like to be able to control QoS policy at the physical ingress, e.g. remarking DSCP bits or trusting the value received and copying to the outer VXLAN header. More stateful services, such as firewalling or load-balancing, may not make sense in a ToR-class device but could be implemented in specific appliances suited for those tasks, and could still be integrated into virtualized networks using the same principles that we’ve applied to L2 and L3 functions.
In summary, we see the physical edges of virtual networks as a critical part of the overall network virtualization story. And of course we consider it important that we have a range of vendors whose devices can be integrated into the virtual overlay. It’s been great to see the ecosystem develop around this ability to tie the physical and the virtual together, and we see a lot of opportunity to build on the foundation we’ve established.
[This post was written with input from Martin Casado, Ben Pfaff, Justin Pettit and Ben Basler.]
The Open vSwitch (OVS) project is obviously dear to our hearts at Nicira (and now VMware). OVS is a fairly standard open source project, with many dozens of people from companies around the world contributing patches and reviewing them. However, there is more to openness than just open source software; open protocols (with freely accessible specs) are also important. Of course, Open vSwitch is well known as an implementation of the OpenFlow protocol, for which the specs are freely available. But there is another protocol, arguably just as important as OpenFlow, which is used to manage Open vSwitch instances. That protocol is known as the Open vSwitch Data Base management protocol or OVSDB protocol. While the specification of that protocol can be found within the Open vSwitch sources, it’s a bit of an effort to figure out exactly how it works. With that in mind, as well as a desire to be as open as possible, we decided to publish a spec for the OVSDB protocol in a more familiar and accessible format – an Internet draft.
To be clear, anyone can publish an Internet draft, and that does not make something into a standard. There’s a possibility that this Internet draft may be suitable for publication as an informational RFC. That wouldn’t make it a standard either, but it would at least provide an archival publication mechanism for a protocol that is quite widely used. Whether that happens or not depends on the “Independent Stream Editor”, part of the rather complex organization that handles RFC publication. (See http://www.rfc-editor.org/RFCeditor.html.)
So, what is this OVSDB protocol? Obviously, you could just go and read our new Internet draft, but here is the quick summary. While OpenFlow establishes flow state in a switch, there’s a lot more to Open vSwitch – indeed there is more to networking – than just setting up flow (or forwarding) table entries. In Open vSwitch, you can create many virtual switch instances, attach interfaces to those switches, set QOS policies on interfaces, and so on. None of these configuration tasks can be done with OpenFlow, so you need a management/configuration protocol to do them.
The OVSDB protocol has been part of the Open vSwitch implementation for many years. It is essentially a general purpose protocol for interacting with a database, and in Open vSwitch the database in question is a set of tables representing switch configuration data. (Some readers may be familiar with of-config – the OpenFlow config protocol – a more recent effort; we believe that protocol could actually be implemented on top of OVSDB.)
To step back for a moment, networking folks often think of any network device as having a control plane and a data plane. Sadly, the management plane has been all too often neglected. OVSDB is a protocol that was created to address that important but neglected aspect of networking. We think that making networks dramatically easier to manage is in fact one of the major benefits of network virtualization. That’s a topic we’ve discussed elsewhere; for now, I’ll just urge readers of this blog to go take a look at our current approach to managing and configuring Open vSwitch instances.