Posted: January 13, 2015
By Justin Pettit, Ben Pfaff, Chris Wright, and Madhu Venugopal
Today we are excited to announce Open Virtual Network (OVN), a new project that brings virtual networking to the OVS user community. OVN complements the existing capabilities of OVS to add native support for virtual network abstractions, such as virtual L2 and L3 overlays and security groups. Just like OVS, our design goal is to have a production quality implementation that can operate at significant scale.
Why are we doing this? The primary goal in developing Open vSwitch has always been to provide a production-ready low-level networking component for hypervisors that could support a diverse range of network environments. As one example of the success of this approach, Open vSwitch is the most popular choice of virtual switch in OpenStack deployments. To make OVS more effective in these environments, we believe the logical next step is to augment the low-level switching capabilities with a lightweight control plane that provides native support for common virtual networking abstractions.
To achieve these goals, OVN’s design is narrowly focused on providing L2/L3 virtual networking. This distinguishes OVN from general-purpose SDN controllers or platforms.
OVN is a new project from the Open vSwitch team to support virtual network abstraction. OVN will put users in control over cloud network resources, by allowing users to connect groups of VMs or containers into private L2 and L3 networks, quickly, programmatically, and without the need to provision VLANs or other physical network resources. OVN will include logical switches and routers, security groups, and L2/L3/L4 ACLs, implemented on top of a tunnel-based (VXLAN, NVGRE, Geneve, STT, IPsec) overlay network.
OVN aims to be sufficiently robust and scalable to support large production deployments. OVN will support the same virtual machine environments as Open vSwitch, including KVM, Xen, and the emerging port to Hyper-V. Container systems such as Docker are growing in importance but pose new challenges in scale and responsiveness, so we will work with the container community to ensure quality native support. For physical-logical network integration, OVN will implement software gateways, as well as support hardware gateways from vendors that implement the “vtep” schema that ships with OVS.
Northbound, we will work with the OpenStack community to integrate OVN via a new plugin. The OVN architecture will simplify the current Open vSwitch integration within Neutron by providing a virtual networking abstraction. OVN will provide Neutron with improved dataplane performance through shortcut, distributed logical L3 processing and in-kernel based security groups, without running special OpenStack agents on hypervisors. Lastly, it will provide a scale-out and highly available gateway solution responsible for bridging from logical into physical space.
The Open vSwitch team will build and maintain OVN under the same open source license terms as Open vSwitch, and is open to contributions from all. The outline of OVN’s design is guided by our experience developing Open vSwitch, OpenStack, and Nicira/VMware’s networking solutions. We will evolve the design and implementation in the Open vSwitch mailing lists, using the same open process used for Open vSwitch.
OVN will not require a special build of OVS or OVN-specific changes to ovs-vswitchd or ovsdb-server. OVN components will be part of the Open vSwitch source and binary distributions. OVN will integrate with existing Open vSwitch components, using established protocols such as OpenFlow and OVSDB, using an OVN agent that connects to ovs-vswitchd and ovsdb-server. (The VTEP emulator already in OVS’s “vtep” directory uses a similar architecture.)
OVN is not a generic platform or SDN controller on which applications are built. Rather, OVN will be purpose built to provide virtual networking. Because OVN will use the same interfaces as any other controller, OVS will remain as flexible and unspecialized as it is today. It will still provide the same primitives that it always has and continue to be the best software switch for any controller.
The design and implementation will occur on the ovs-dev mailing list. In fact, a high-level description of the architecture was posted this morning. If you’d like to join the effort, please check out the mailing list.
Posted: November 13, 2014
[This post was written by OVS core contributors Justin Pettit, Ben Pfaff, and Ethan Jackson.]
The overhead associated with vSwitches has been a hotly debated topic in the networking community. In this blog post, we show how recent changes to OVS have elevated its performance to be on par with the native Linux bridge. Furthermore, CPU utilization of OVS in realistic scenarios can be up to 8x below that of the Linux bridge. This is the first of a two-part series. In the next post, we take a peek at the design and performance of the forthcoming port to DPDK, which bypasses the kernel entirely to gain impressive performance.
Open vSwitch is the most popular network back-end for OpenStack deployments and widely accepted as the de facto standard OpenFlow implementation. Open vSwitch development initially had a narrow focus — supporting novel features necessary for advanced applications such as network virtualization. However, as we gained experience with production deployments, it became clear these initial goals were not sufficient. For Open vSwitch to be successful, it not only must be highly programmable and general, it must also be blazingly fast. For the past several years, our development efforts have focused on precisely this tension — building a software switch that does not compromise on either generality or speed.
To understand how we achieved this feat, one must first understand the OVS architecture. The forwarding plane consists of two parts: a “slow-path” userspace daemon called ovs-vswitchd and a “fast-path” kernel module. Most of the complexity of OVS is in ovs-vswitchd; all of the forwarding decisions and network protocol processing are handled there. The kernel module’s only responsibilities are tunnel termination and caching traffic handling.
When a packet is received by the kernel module, its cache of flows is consulted. If a relevant entry is found, then the associated actions (e.g., modify headers or forward the packet) are executed on the packet. If there is no entry, the packet is passed to ovs-vswitchd to decide the packet’s fate. ovs-vswitchd then executes the OpenFlow pipeline on the packet to compute actions, passes it back to the fast path for forwarding, and installs a flow cache entry so similar packets will not need to take these expensive steps.
Until OVS 1.11, this fast-path cache contained exact-match “microflows”. Each cache entry specified every field of the packet header, and was therefore limited to matching packets with this exact header. While this approach works well for most common traffic patterns, unusual applications, such as port scans or some peer-to-peer rendezvous servers, would have very low cache hit rates. In this case, many packets would need to traverse the slow path, severely limiting performance.
OVS 1.11 introduced megaflows, enabling the single biggest performance improvement to date. Instead of a simple exact-match cache, the kernel cache now supports arbitrary bitwise wildcarding. Therefore, it is now possible to specify only those fields that actually affect forwarding.. For example, if OVS is configured simply to be a learning switch, then only the ingress port and L2 fields are relevant and all other fields can be wildcarded. In previous releases, a port scan would have required a separate cache entry for, e.g., each half of a TCP connection, even though the L3 and L4 fields were not important.
The introduction of megaflows allowed OVS to drastically reduce the number of packets that traversed the slow path. This represents a major improvement, but ovs-vswitchd still had a number of responsibilities, which became the new bottleneck. These include activities like managing the datapath flow table, running switch protocols (LACP, BFD, STP, etc), and other general accounting and management tasks.
While the kernel datapath has always been multi-threaded, ovs-vswitchd was a single-threaded process until OVS 2.0. This architecture was pleasantly simple, but it suffered from several drawbacks. Most obviously, it could use at most one CPU core. This was sufficient for hypervisors, but we began to see Open vSwitch used more frequently as a network appliance, in which it is important to fully use machine resources.
Less obviously, it becomes quite difficult to support ovs-vswitchd’s real-time requirements in a single-threaded architecture. Fast-path misses from the kernel must be processed by ovs-vswitchd as promptly as possible. In the old single-threaded architecture, miss handling often blocked behind the single thread’s other tasks. Large OVSDB changes, OpenFlow table changes, disk writes for logs, and other routine tasks could delay handling misses and degrade forwarding performance. In a multi-threaded architecture, soft real-time tasks can be separated into their own threads, protected from delay by unrelated maintenance tasks.
Since the introduction of multithreading, we’ve continued to fine-tune the number of threads and their responsibilities. In addition to order of magnitudes improvements in miss handling performance, this architecture shift has allowed us to increase the size of the kernel cache from 1000 flows in early versions of OVS to roughly 200,000 in the most recent version.
The introduction of megaflows and support for a larger cache enabled by multithreading reduced the need for packets to be processed in userspace. This is important because a packet lookup in userspace can be quite expensive. For example, a network virtualization application might define an OpenFlow pipeline with dozens of tables that hold hundreds of thousands of rules. The final optimization we will discuss is improvements in the classifier in ovs-vswitchd, which is responsible for determining which OpenFlow rules apply when processing a packet.
Data structures to allow a flow table to be quickly searched are an active area of research. OVS uses a tuple space search classifier, which consists of one hash table (tuple) for each kind of match actually in use. For example, if some flows match on the source IP, that’s represented as one tuple, and if others match on source IP and destination TCP port, that’s a second tuple. Searching a tuple space search classifier requires searching each tuple, then taking the highest priority match. In the last few releases, we have introduced a number of novel improvements to the basic tuple search algorithm:
- Priority Sorting – Our simplest optimization is to sort the tuples in order by the maximum priority of any flow in the tuple. Then a search that finds a matching flow with priority P can terminate as soon as it arrives at a tuple whose maximum priority is P or less.
- Staged Lookup – In many network applications, a policy may only apply to a subset of the headers. For example, a firewall policy that requires looking at TCP ports may only apply to a couple of VMs’ interfaces. With the basic tuple search algorithm, if any rule looks at the TCP ports, then any generated megaflow would look all the way up through the L4 headers. With staged lookup, we scope lookups from metadata (e.g., ingress port) up to layers further up the stack on an as-needed basis.
- Prefix Tracking – When processing L3 traffic, a longest prefix match is required for routing. The tuple-space algorithm works poorly in this case, since it degenerates into a tuple that matches as many bits as the longest match. This means that if one rule matches on 10.x.x.x and another on 192.168.0.x, the 10.x.x.x rule will also require matching 24 bits instead of 8, which requires keeping more megaflows in the kernel. With prefix tracking, we consult a trie that allows us to only look at tuples with the high order bits sufficient to differentiate between rules.
These classifier improvements have been shown with practical rule sets to reduce the number of megaflows needed in the kernel from over a million to only dozens.
The preceding changes improve many aspects of performance. For this post, we’ll just evaluate the performance gains in flow setup, which was the area of greatest concern. To measure setup performance, we used netperf’s TCP_CRR test, which measures the number of short-lived transactions per second (tps) that can be established between two hosts. We compared OVS to the Linux bridge, a fixed-function Ethernet switch implemented entirely inside the Linux kernel.
In the simplest configuration, the two switches achieved identical throughput (18.8 Gbps) and similar TCP_CRR connection rates (696,000 tps for OVS, 688,000 for the Linux bridge), although OVS used more CPU (161% vs. 48%). However, when we added one flow to OVS to drop STP BPDU packets and a similar ebtable rule to the Linux bridge, OVS performance and CPU usage remained constant whereas the Linux bridge connection rate dropped to 512,000 tps and its CPU usage increased over 26-fold to 1,279%. This is because the built-in kernel functions have per-packet overhead, whereas OVS’s overhead is generally fixed per-megaflow. We expect that enabling other features, such as routing and a firewall, would similarly add CPU load.
|Linux Bridge||Pass BPDUs||688,000||48%|
|Open vSwitch 1.10||12,000||100%|
|Open vSwitch 2.1||Megaflows Off||23,000||754%|
While these performance numbers are useful for benchmarking, they are synthetic. At the OpenStack Summit last week in Paris, Rackspace engineers described the performance gains they have seen over the past few releases in their production deployment. They begin in the “Dark Ages” (versions prior to 1.11) and proceed to “Ludicrous Speed” (versions since 2.1).
Now that OVS’s performance is similar to those of a fixed-function switch while maintaining the flexibility demanded by new networking applications, we’re looking forward to broadening our focus. While we continue to make performance improvements, the next few releases will begin adding new features such as stateful services and support for new platforms such as DPDK and Hyper-V.
If you want to hear more about this live or talk to us in person, we will all be at the Open vSwitch Fall 2014 Conference next week.