network virtualization gets physical

Network Virtualization Gets Physical

Network virtualization, as others have noted, is now well past the hype stage and in serious production deployments. One factor that has facilitated the adoption of network virtualization is the ease with which it can be incrementally deployed. In a typical data center, the necessary infrastructure is already in place. Servers are interconnected by a physical network that already meets the basic requirements for network virtualization: providing IP connectivity between the physical servers. And the servers are themselves virtualized, providing the ideal insertion point for network virtualization: the vswitch (virtual switch). Because the vswitch is the first hop in the data path for every packet that enters or leaves a VM, it’s the natural place to implement the data plane for network virtualization. This is the approach taken by VMware (and by Nicira before we were part of VMware) to enable network virtualization, and it forms the basis for our current deployments.

In typical data centers, however, not every machine is virtualized. “Bare metal” servers — that is, unvirtualized, or physical machines — are a fact of life in most real data centers. Sometimes they are present because they run software that is not easily virtualized, or because of performance concerns (e.g. highly latency-sensitive applications), or just because there are users of the data center who haven’t felt the need to virtualize. How do we accommodate these bare-metal workloads in virtualized networks if there is no vswitch sitting inside the machine?

Our solution to this issue was to develop gateway capabilities that allow physical devices to be connected to virtual networks. One class of gateway that we’ve been using for a while is a software appliance. It runs on standard x86 hardware and contains an instance of Open vSwitch. Under the control of the NSX controller, it maps physical ports, and VLANs on those ports, to logical networks, so that any physical device can participate in a given logical network, communicating with the VMs that are also connected to that logical network. This is illustrated below.

NSX Gateway

An NSX gateway connects physical devices to virtual networks

As an aside, these gateways also address another common use case: traffic that enters and leaves the data center, or “north-south” traffic. The basic functionality is similar enough: the gateway maps traffic from a physical port (in this case, a port connected to a WAN router rather than a server) to logical networks, and vice versa.

Software gateways are a great solution for moderate amounts of physical-to-virtual traffic, but there are inevitably some scenarios where the volume of traffic is too high for a single x86-based appliance, or even a handful of them. Say you had a rack (or more) full of bare-metal database servers and you wanted to connect them to logical networks containing VMs running application and web tiers for multi-tier applications. Ideally you’d like a high-density and high-throughput device that could bridge the traffic to and from the physical servers into the logical networks. This is where hardware gateways enter the picture.

Leveraging VXLAN-capable Switches

Fortunately, there is an emerging class of hardware switch that is readily adaptable to this gateway use case. Switches from several vendors are now becoming available with the ability to terminate VXLAN tunnels. (We’ll call these switches VTEPs — VXLAN Tunnel End Points or, more precisely, hardware VTEPs.) VXLAN tunnel termination addresses the data plane aspects of mapping traffic from the physical world to the virtual. However, there is also a need for a control plane mechanism by which the NSX controller can tell the VTEP everything it needs to know to connect its physical ports to virtual networks. Broadly speaking, this means:

  • providing the VTEP with information about the VXLAN tunnels that instantiate a particular logical network (such as the Virtual Network Identifier and destination IP addresses of the tunnels);
  • providing mappings between the MAC addresses of VMs and specific VXLAN tunnels (so the VTEP knows how to forward packets to a given VM);
  • instructing the VTEP as to which physical ports should be connected to which logical networks.

In return, the VTEP needs to tell the NSX controller what it knows about the physical world — specifically, the physical MAC addresses of devices that it can reach on its physical ports.

There may be other information to be exchanged between the controller and the VTEP to offer more capabilities, but this covers the basics. This information exchange can be viewed as the synchronization of two copies of a database, one of which resides in the controller and one of which is in the VTEP. The NSX controller already implements a database access protocol, OVSDB, for the purposes of configuring and monitoring Open vSwitch instances. We decided to leverage this existing protocol for control of third party VTEPs as well. We designed a new database schema to convey the information outlined above; the OVSDB protocol and the database code are unchanged. That choice has proven very helpful to our hardware partners, as they have been able to leverage the open source implementation of the OVSDB server and client libraries.

The upshot of this work is that we can now build virtual networks that connect relatively large numbers of physical ports to virtual ports, using essentially the same operational model for any type of port, virtual or physical. The NSX controller exposes a northbound API by which physical ports can be attached to logical switches. Virtual ports of VMs are attached in much the same way to build logical networks that span the physical and virtual worlds. The figure below illustrates the approach.

Hardware VTEP

A hardware VTEP controlled by an NSX controller

It’s worth noting that this approach has no need for IP multicast in the physical network, and limits the use of flooding within the overlay network. This contrasts with some early VXLAN implementations (and the original VXLAN Internet Draft, which didn’t entirely decouple the data plane from the control plane). The reason we are able to avoid flooding in many cases is that the NSX controller knows the location of all the VMs that it has attached to logical networks — this information is provided by the vswitches to the controller. And the controller shares its knowledge with the hardware VTEPs via OVSDB. Hence, any traffic destined for a VM can be placed on the right VXLAN tunnel from the outset.

In the virtual-to-physical direction, it’s only necessary to flood a packet if there is more than one hardware VTEP. (If there is only one hardware VTEP, we can assume that any unknown destination must be a physical device attached to the VTEP, since we know where all the VMs are). In this case, we use the NSX Service Node to replicate the packet to all the hardware VTEPs (but not to any of the vswitches). Furthermore, once a given hardware VTEP learns about a physical MAC address on one of its ports, it writes that information to the database, so there will be no need to flood subsequent packets. The net result is that the amount of flooded traffic is quite limited.

For a more detailed discussion of the role of VXLAN in the broader landscape of network virtualization, take a look at our earlier post on the topic.

Hardware or Software?

Of course, there are trade-offs between hardware and software gateways. In the NSX software gateway, there is quite a rich set of features that can be enabled (port security, port mirroring, QoS features, etc.) and the set will continue to grow over time. Similarly, the feature set that the hardware VTEPs support, and which NSX can control, will evolve over time. One of the challenges here is that hardware cycles are relatively long. On top of that, we’d like to provide a consistent model across many different hardware platforms, but there are inevitably differences across the various VTEP implementations. For example, we’re currently working with a number of different top-of-rack switch vendors. Some partners use their own custom ASICs for forwarding, while others use merchant silicon from the leading switch silicon vendors. Hence, there is quite a bit of diversity in the features that the hardware can support.

Our expectation is that software gateways will provide the greatest flexibility, feature-richness, and speed of evolution. Hardware VTEPs will be the preferred choice for deployments requiring greater total throughput and density.

A final note: one of the important benefits of network virtualization is that it decouples networking functions from the physical networking hardware. At first glance, it might seem that the use of hardware VTEPs is a step backwards, since we’re depending on specific hardware capabilities to interconnect the physical and virtual worlds. But by developing an approach that works across a wide variety of hardware devices and vendors, we’ve actually managed to achieve a good level of decoupling between the functionality that NSX provides and the hardware that underpins it. And as long as there are significant numbers of physical workloads that need to be managed in the data center, it will be attractive to have a range of options for connecting those workloads to virtual networks.

14 Comments on “Network Virtualization Gets Physical”

  1. Cristiano Monteiro says:

    Hi Bruce,

    Sometimes you also have in place inside datacenter traditinal middleboxes, I suppose that also in this case gws would be use to accomodate them.What happened when I have more than one middlebox providing HA ? I know that such integration is not the ideal in your approach but I’m curious if this kind of situation is common.


    • drbruced says:

      Gateways can certainly be used to connect to middleboxes. Exactly how you achieve HA depends on the middlebox function, but you can essentially build any topology you like (e.g. put both instances of the middlebox on the same logical L2 segment if necessary). It’s also worth noting that, although we haven’t done so as yet, a middlebox could itself implement the VTEP function for direct connection to logical networks under NSX control.

  2. Great post. Glad to see you guys have a way to address the virtual port to physical node communication challenge. I guessed that this could be done in software and similar to the early days of x86 virtualization performance, there is the need for further collaboration with hardware providers to add extensions to improve throughput to the virtualization layer.

    I believe the challenge will be getting the major hardware vendors to commit to helping expand the interaction and performance of virtual networks across physical devices. I don’t see the market incentive for them to do so. I’m looking forward to the tipping point where they will be forced to innovate similar to server vendors.

    • The virtual network is a major market disruption and is providing opportunity for the next-generation of hardware vendors that do see the value in an open ecosystem that enables the virtual physical interface. While many of the incumbent vendors will preach a closed stack is the way to the virtual network, I believe the path if via strong partnerships and open protocols that allow the interaction of the various components of the virtualized data center…

    • drbruced says:

      We’re actually seeing a lot of interest from hardware vendors to play in this space. Customers are asking for it, which is all the market incentive they need. Stay tuned for announcements of our hardware partners at VMworld.

      • Jason Nolet says:

        To Bruce’s point, customers are asking for a way to interconnect their virtual and physical data center assets in as seamless a way possible, and a VTEP as a feature of a ToR or aggregation switch is a natural way to tackle this need with high levels of performance and scalability. The other point to make here is, even in the presence of a virtual network, the physical network remains critical to the success of the application. Throughput, latency, efficiency, automation and reliability are just as important in this new world as they were before. In fact, one could argue that automation and simplicity will be the new battleground for physical network infrastructure given the fact that overlays add yet another layer of networking to the infrastructure.

  3. Nice post Bruce! Interesting looking switch you used in that diagram?

  4. qthrul says:

    Disclosure: I’m with VCE

    Cool post. I see this physicality requirement come up again and again.

    So, I was wondering if you could contrast or compare the operational impact (reasonable ability to achieve) in order to leverage a physically separated IDS/IPS hardware appliance when using:

    a) gateway capabilities you outline

    b) Cisco Nexus 1000v (ERSPAN)

    c) Cisco Nexus 1100

    d) a combination of all of the above

    (Please feel free to make assumptions like vSphere 5.1 or greater and a pre Cisco DFA data center scenario if that simplifies the discussion)

  5. […] switches and other hardware, as described last week by VMware Principal Engineer Bruce Davie on Network Heresy, the blog he runs with VMware’s Martin Casado. In the comments, Davie added that some […]

  6. […] Top of Rack switches with VXLAN […]

  7. […] Central to the announcement, likely the centerpiece of Monday’s VMworld opening keynotes, is the ability to configure physical and virtual switch ports at the same time. As VMware had hinted earlier in August, that trick is enabled through the NSX’s use of VXLAN for creating overlay tunnels, and by switches that likewise support VXLAN. […]

  8. […] does this work? Once-Cisco Fellow Bruce Davie wrote a great article on how “Network Virtualization Gets Physical.” Using the large partner ecosystem […]

  9. […] solidify and expand in 2014. We have a collection of hardware partners whose devices enable NSX to manage physical ports in much the same manner as virtual ports—their products will become generally available in 2014, […]

  10. […] solidify and expand in 2014. We have a collection of hardware partners whose devices enable NSX to manage physical ports in much the same manner as virtual ports—their products will become generally available in 2014, […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s