[A lot of the content of this post was drawn from conversations with Juan Lage, Rajiv Ramanathan, and Mohammad Attar]
Some of the more aggressive buzz around OpenFlow has lauded it as a mechanism for re-implementing networking in total. While in some (fairly fanciful) reality, that could be the case, categorical statements of this nature tend hide practical design trade-offs that exist between any set of technologies within a design continuum.
What do I mean by that? Just that there are things that OpenFlow and more broadly SDN are better suited for than others.
In fact, the original work that lead to OpenFlow was not meant to re-implement all of networking. That’s not to say that this isn’t a worthwhile (if not quixotic) goal. Yet our focus was to explore new methods for datapath state whose management was difficult to do with using the traditional approach of full distribution with eventual consistency.
And what sort of state might that be? Clearly destination-based, shortest path forwarding state can be calculated in a distributed fashion. But there is a lot more state in the datapath beyond that used for standard forwarding (filters, tagging, policy routing, QoS policy, etc.). And there are a lot more desired uses for networks than vanilla destination-based forwarding.
Of course, much of this “other” state is not computed algorithmically today. Rather, it is updated manually, or through scripts whose function more closely resembles macro replacement than computation.
Still, our goal was (and still is) to compute this state programmatically.
So the question really boils down to whether the algorithm needed to compute the datapath state is easily distributed. For example, if the state management algorithm has any of the following properties it probably isn’t, and is therefore a potential candidate for SDN.
- It is not amenable to being split up into many smaller pieces (as opposed to fewer, larger instances). The limiting property in these cases is often excessive communication overhead between the control nodes.
- It is not amenable to running on heterogeneous compute environments. For example those with varying processor speeds and available memory.
- It is not amenable to relatively long RTT’s for communication between distributed instances. In the purely distributed case, the upper bound for communicating between any two nodes scales linearly with the longest loop free path.
- The algorithm requires sophisticated distributed coordination between instances (for example distributed locking or leader election)
There are many examples of algorithms that have these properties which can (and are) used in networking. The one I generally use as an example is a runtime policy compiler. One of our earliest SDN implementations was effectively a Datalog compiler that would take a topologically independent network policy and compile it into flows (in the form of ACL and policy routing rules).
Other examples include implementing global solvers to optimize routes for power, cost, security policy, etc., and managing distributed virtual network contexts.
The distribution properties of most of these algorithms are well understood (and have been for decades). And so it is fairly straightforward to put together an argument which demonstrates that SDN offers advantages over traditional approaches in these environments. In fact, outside of OpenFlow, it isn’t uncommon to see elements of the control plane decoupled from the dataplane in security, management, and virtualization products.
However, networking’s raison d’être, it’s killer app, is forwarding. Plain ol’ vanilla forwarding. And as we all know, the networking community long ago developed algorithms to do that which distribute wonderfully across many heterogeneous compute nodes.
So, that begs the question. Does SDN provide any value to the simple problem of forwarding? That is, if the sole purpose of my network is to move traffic between two end-points, should I use a trusty distributed algorithm (like an L3 stack) that is well understood and has matured for the last couple of decades? Or is there some compelling reason to use an SDN approach?
This is the question we’d like to explore in this post. The punchline (as always, for the impatient) is that I find it very difficult to argue that SDN has value when it comes to providing simple connectivity. That’s simply not the point in the design space that OpenFlow was created for. And simple distributed approaches, like L3 + ECMP, tend to work very well. On the other hand, in environments where transport is expensive along some dimension, and global optimization provides value, SDN starts to become attractive.
First, lets take a look at the problem of forwarding:
Forwarding to me simply means find a path between a source and a destination in a network topology. Of course, you don’t want the path to suck, meaning that the algorithm should efficiently use available bandwidth and not choose horribly suboptimal hop counts.
For the purposes of this discussion, I’m going to assume two scenarios in which we want to do forwarding: (a) let’s assume that bandwidth and connectivity are cheap and plentiful and that any path to get between two points are roughly the same (b) lets assume none of these properties.
We’ll start with the latter case.
In many networks, not all paths are equal. Some may be more expensive than others due to costs of third party transit, some may be more congested than others leading to queuing delays and loss, different paths may support different latencies or maximum bandwidth limits, and so on.
In some deployments, the forwarding problem can be further complicated by security constraints (all flows of type X must go through middleboxes) and other policy requirements.
Further, some of these properties change in real time. Take for example the cost of transit. The price of a link could increases dramatically after some threshold of use has been hit. Or consider how a varying traffic matrix may affect the queuing delay and available bandwidth of network paths.
Under such conditions, one can start to make an argument for SDN over traditional distributed routing protocols.
Why? For two reasons. First, the complexity of the computation increases with the number of properties being optimized over. It also increases with the complexity in the policy model, for example a policy that operates over source, protocol and destination is going to be more difficult than one that only considers destination.
Second, as the frequency in which these properties change increases, the amount of information the needs to be disseminated increases. An SDN approach can greatly limit the total amount of information that needs to hit the wire by reducing the distribution of the control nodes. Fully distributed protocols generally flood this information as it isn’t clear which node needs to know about the updates. There have been many proposals in the literature to address these problems, but to my knowledge they’ve seen little or no adoption.
I presume these costs are why a lot of TE engines operate offline where you can throw a lot of compute and memory at the problem.
So, if optimaility is really important (due to the cost of getting it wrong), and the inputs change a lot, and there is a lot of stuff being optimized over, SDN can help.
Now, what about the case in which bandwidth is abundant, relatively cheap and any sensible path between two points will do?
The canonical example for this is the datacenter fabric. In order to accommodate workload placement anywhere, datacenter physical networks are often built using non-oversubscribed topologies. And the cost of equipment to build these is plummeting. If you are comfortable with an extremely raw supply channel, it’s possible to get 48 ports of 10G today for under $5k. That’s pretty damn cheap.
So, say you’re building a datacenter fabric, you’ve purchased piles of cheap 10G gear and you’ve wired up a fat tree of some sort. Do you then configure OSPF with ECMP (or some other multipathing approach) and be done with it? Or does it make sense to attempt to use an SDN approach to compute the forwarding paths?
It turns out that efficiently calculating forwarding paths in a highly connected graph, and converging those paths on failure is something distributed protocols do really, really well. So well, in fact, that it’s hard to find areas to improve.
For example, the common approach of using multipathing approximates Valiant load balancing which effectively means the following: if you send a packet to an arbitrary point in the network, and that point forwards it to the destination, then for a regular topology, and any traffic matrix, you’ll be pretty close to fully using the fabric bandwidth (within a factor of two given some assumptions on flow arrival rates).
That’s a pretty stunning statement. What’s more stunning is that it can be accomplished with local decisions, and without requiring per-flow state, or any additional control overhead in the network. It also obviates the need for a control loop to monitor and respond to changes in the network. This latter property is particularly nice as the care and feeding of control loops to prevent oscillations or divergence can add a ton of complexity to the system.
On the other hand, trying to scale a solution using classic OpenFlow almost certainly won’t work. To begin with the use of n-tuples (say per-flow, or even per host/destination pair) will most likely result in table space exhaustion. Even with very large tables (hundreds of thousands) the solution is unlikely to be suitable for even moderately sized datacenters.
Further, to efficiently use the fabric, multipathing should be done per flow. It’s highly unlikely (in my experience) that having the controller participate in flow setup will have the desired performance and scale characteristics. Therefore the multipathing should be done in the fabric (which is possible in a later version of OpenFlow like 1.1 or the upcoming 1.2).
Given these constraints, an SDN approach would probably look a lot like a traditional routing protocol. That is, the resulting state would most likely be destination IP prefixes (so we can take advantage of aggregation and reduce the table requirements by a factor of N over source-destination pairs). Further, multipathing, and link failure detection would have to be done on the switch.
Another complication of using SDN is establishing connectivity to the controller. This clearly requires each switch to run something to bootstrap communication, like a traditional protocol. In most SDN implementations I know of, L3 is used for this purpose. So now, not only are we effectively mimicking L3 in the controller to manage datapath state, we haven’t been able to get rid of the distributed approach potentially doubling the control complexity network wide.
So, does SDN provide value for forwarding in these environments? Given the previous discussion it is difficult to argue in favor of SDN.
Does SDN provide more functionality? Unlikely, being limited to manipulating destination prefixes with multipathing being carried out by switches robs SDN of much of its value. The controller cannot make decisions on anything but the destination. And since the controller doesn’t know a priori which path a flow will take, it will be difficult to add additional table rules without replicating state all over the network (clearly a scalability issue).
How about operational simplicity? One might argue that in order to scale an IGP one would have to manually configure areas which presumably would be done automatically with SDN. However, it is difficult to argue that building a separate control network, or in-band-configuring is any less complex than a simple ID to switch mapping.
What about scale? I’ll leave the details of that discussion to another post. Juan Lage, Rajiv Ramanathan, and I have had a on-again, off-again e-mail discussion comparing that scaling properties of SDN to that of L3 for building a fabric. The upshot is that there are nice proof-points on both sides, but given today’s switching chipsets, even with SDN, you basically end up populating the same tables that L3 does. So any scale argument revolves around reducing the RTT to the controller through a single-function control network, and reducing the need to flood. Neither of these tricks are likely to produce a significant scale advantage for SDN, but they seem to produce some.
So, what’s the take-away?
It’s worth remembering that SDN, like any technology, is actually a point in the design space, and is not necessarily the best option for all deployment environments.
My mental model for SDN starts with looking at the forwarding state, and asking the question “what algorithm needs to run to compute that state”. The second question I ask is, “can that algorithm be easily distributed amongst many nodes with different compute and memory resources”. In many use cases, the answer to this latter question is “not-really”. The examples I provide general reduce to global solvers used for finding optimal solutions with many (often changing) variables. And in such cases, SDN is a shoo in.
However, in the case of building out networking fabrics in which bandwidth is plentiful and shortest path (or something similar) is sufficient, there are a number of great, well tested distributed algorithms for the job. In such environments, it’s difficult to argue the merits of building out a separate control network, adding additional control nodes, and running vastly less mature software.
But perhaps, that’s just me. Thoughts?
For those of you who are interested, my keynote at last week’s Open Networking Summit provides some background on OpenFlow and SDN in the form of a historical narrative. However, the main point of the talk (which doesn’t come across as well as I would have liked) is that it really is the community, and not necessarily the technology, that makes the SDN movement so special. I believe the technology will work itself out. However, building a diverse community with strong representation across the networking ecosystem (from ODMs, to customers, and everyone in between) is a very difficult undertaking. And now that we have such a community let’s be sure to acknowledge its importance and focus on continuing to cultivate and grow it.
And for some only tangentially related trivia, I’ve been tracking down the origins of the term ‘SDN’. From what I’ve been able to dig up, it was coined by Kate Greene (who had covered software defined radio) while putting together this article. So, thanks Kate.
[This post is written by Teemu Koponen, and Martin Casado. Teemu has architected and been involved in the implementation of multiple widely used OpenFlow controllers.]
Over the last year, there has been some discussion within the OpenFlow and broader SDN community about designing and standardizing a controller API in addition to the switch interface.
To be honest, standardizing a controller API sounds like a poor idea. The controller is software, not a protocol. To our knowledge there are 9 controllers in the wild today, and more than half of them are open source. If the community wants an open software platform, it should divert resources from arguing over standards to building open source software.
(yeah ok, this comic is only tangentially relevant … )
In addition to being a waste of time, premature standardization could be detrimental to the SDN effort. Minimally, it will unnecessarily constrain a fledgling software ecosystem. Software innovation should come through development and experience with deployment, not through consensus built on prophecies.
Further, it’s not clear how to design and standardize an interface to a (non-existent) large software system out of whole cloth. Even Unix which started in the early 70s did not get standardized until the mid/late 80s.
For SDN in particular, there is really very little experience building controllers systems in the wild today. We (the authors) have built four controllers, three which have seen production use, and two of which have multiple products built on them from different organizations. And still, we would certainly not argue for standardizing a particular interface. Hell, we’re not even clear if there is a one-size-fits-all interface for controllers targeted at drastically different deployment environments. In fact, our experience suggests the contrary.
In our opinion, if an interface is going to be standardized, it should follow the software model of standardization. Either (a) take a system that has demonstrated broad success and generality over many years and call it a standard, or (b) during the design phase of a particular software project, have the *developers* work to create a standard API with the appropriate community feedback (which is probably product and system developers).
But most importantly, and it really bears repeating, we all win if we let an unbridled software ecosystem flourish within the controller space.
OK, so while we think any discussion about standardizing a controller interface is extremely premature, the question of “what makes a good controller API” is absolutely crucial to SDN and very much worth discussing. However, there has been relatively little discourse on controller APIs. And that’s exactly what we’ll be focusing on in rest of this post. We’ll start by providing a brief look at common controller designs, and then take a closer look at Onix, the controller platform we work on.
Common Controller Types
If you survey the existing controller landscape there appears to be three broad classes of API.
Single Purpose Controllers: These controllers are built for a specific function. So while they may use some common code (like an OpenFlow library) to communicate with the switch, they are otherwise not built with different uses in mind. Or put another way, they lack a control-logic agnostic interface for extension, so there really is little more to talk about.
OpenFlow Controllers: Yeah, that is probably a confusing name, but bear with us. A number of controllers provide a high-level interface to OpenFlow (specifically) and some infrastructure for hosting one or more control “applications”. These are general purpose platforms meant for extension. However, the interface is built around the OpenFlow protocol itself, meaning the interface is a thin wrapper around each message, without offering higher-level abstractions. Therefore, as the protocol changes (e.g., a new message is added), the API necessarily reflects this change. Most controllers (including one we’ve written) fall in this camp.
In our experience, the primary problem with this tight coupling is that it is challenging to evolve any aspect of the control application, switch protocol, or controller state distribution (assuming the controllers form a cluster) without refactoring all the layers of the software stack in tandem.
General SDN Controllers: We’re sort of making these names up as we go along, but a general SDN controller is one in which the control application is fully decoupled from both the underlying protocol(s) communicating with the switches, and the protocol(s) exchanging state between controller instances (assuming the controllers can run in a cluster).
The decoupling is achieved by turning the network control problem into a state management problem, and only exposing the state to be managed to the application, and not the mechanisms by which it arrived.
Applications operate over control traffic, flow table state, configuration state, port state, counters, etc. However, how this data is pulled into the controller, and how any changes are pushed back down to the switch is not the application’s business. It can just happily modify network state as if it were local and the platform will take care of the state dissemination.
The platform we’ve been working on over the last couple of years (Onix) is of this latter category. It supports controller clustering (distribution), multiple controller/switch protocols (including OpenFlow) and provides a number of API design concessions to allow it to scale to very large deployments (tens or hundreds of thousands of ports under control). Since Onix is the controller we’re most familiar with, we’ll focus on it.
So, what does the Onix API look like? It’s extremely simple. In brief, it presents the network to applications as an eventually consistent graph that is shared among the nodes in the controller cluster. In addition, it provides applications with distributed coordination primitives for coordinating nodes’ access to that graph.
Conceptually the interface is as straightforward as it sounds, however understanding its use in practice, especially in a distributed setting, bears more discussion.
So, digging a little further …
The Network as an Eventually Consistent Graph
In Onix, the physical network is represented as a graph for the control application. Elements in the graph may represent physical objects such as switches, or their subparts, such as physical ports, or lookup tables. And it may contain logical entities such as logical ports, tunnels, BFD configuration, etc.
Control applications operate on the graph to find elements of interest and read and write to them. They can register for notifications about state changes in the elements, and they can even extend the existing elements to hold network state specific to a particular control logic.
We happen to call this graph “the NIB” (for Network Information Base).
The NIB abstracts away both the protocols for state synchronization between the controller and the switches and the protocols used for state synchronization between controllers (discussed further below).
Regarding switch state, the control applications operate on the state directly, independent of how it is synchronized with the switch. For example, control logic may traverse the NIB looking for a particular switch, modify some forwarding table entries, and then place a trigger to receive notification on any future changes (such as a port status change event).
Doing this within a single controller node is pretty straightforward. However, for resilience and scale, most production deployments will want to run with multiple controllers. Hence, the platform needs to support clustering.
Within Onix, the NIB is the central mechanism for distributed state sharing. That is, the NIB is actually a shared datastructure among the nodes in the cluster. If a node updates the NIB, eventually that update will propagate to the other nodes interested in that portion of the graph (assuming no network failures).
Under this model, the control logic remains unaware of the specifics of the distribution mechanisms for controller-to-controller state. However, it is the responsibility of the application logic to coordinate which controller instance is responsible for which portion of the graph at any given time. To aid with this, Onix includes a number of standard distributed coordination primitives such as distributed locking, leader election, etc. Using them, control applications can safely partition the work amongst the cluster.
Because the control logic is decoupled from the distribution mechanisms, the Onix may be configured to use different distributed datastores to share different parts of the graph. For more dynamic state, it may use high-performance, memory-only, key/value store, whereas for more static information (such as admin configured parts of the graph) it may prefer storage that is strongly consistent and provides durability.
Of course, a design like this pushes a lot of complexity to the application. For example, the application is responsible for all distributed coordination (though tools for doing so are provided), and if the application chooses to use an eventually consistent storage back-end, it must tolerate eventual consistency in the NIB (including conflicts, data disappearing, etc.)
An alternative approach would have been to constrain the distribution model to something simple (like strongly consistent updates to all data) which would have greatly simplified the API. However, this would come at a huge cost to scalability.
Which begs the question …
Is This the Right Interface?
Perhaps for some environments. Almost certainly not for others. There have been multiple control applications built on Onix, and it is used in large production deployments in the data centers, as well as in the access and core networks. However, it is probably too heavyweight for smaller networks (the home or small enterprise), and it is certainly too complex to use as a basic research tool.
The NIB is also a very low-level interface. Clearly, there is a lot of scaffolding one can build on top to aid in application development. For example, routing libraries, packet classification engines, network discovery components, configuration management frameworks, and what not.
And that is the beauty of software. If you don’t like an API, you fix it, build on it, or you build your own platform. Sure, as a platform matures, binary compatibility and stability of APIs becomes crucial, but that is more a matter of versioning than a priori standardization and design.
So our vote is to keep standards away from the controller design, to support and promote multiple efforts, and to let the ecosystem play out naturally.
Scott’s keynote at Ericsson research on SDN. I really encourage anyone who is interested in OpenFlow and/or SDN to view it. It is, in my opinion, the cleanest justification for SDN, and appropriately articulates where OpenFlow fits in the broader context (… as a minor mechanism that is capable, but generally unimportant).
Wow, over the last couple of weeks there has been an escalation in the confusion around Openflow. The problem is on both sides of the metaphorical isle. Those behind the hype engine are spinning out of control with fanciful tales about bending the laws of physics (“Openflow will keep the icecream in your refrigerator cold during a power outage!”). And those opposed to OpenFlow (for whatever reason) are using the hype to build non-existant strawmen to attack (“Openflow cannot bend the laws of physics, therefore it isn’t useful”).
So for this post, I figured I’d go through some of the dubious claims and bogus strawmen and try and beat some sense into the discussion. Each of the claims I talk to below were pulled from some article/blog/interview I ran into over the last few weeks.
Claim: “Openflow provides a more powerful interface for managing the network than exists today”
Patently false. APIs expose functionality, not create it. Openflow is a subset of what switching chipsets offer today. If you want more power, sign an NDA with Broadcom and write to their SDK directly.
What Openflow does attempt to do is provide a standard interface above the box. If you believe that SDN is a powerful concept (and I do), then you do need some interface that is sufficiently expressive and hopefully widely supported.
This is almost not worth saying, but clearly Openflow is more expressive than a CLI. CLIs are fine for manual configuration, but they totally blow for building automated systems. The shortcomings are blindingly obvious, for example traditional CLIs have no clear data schema or state semantics and they change all the time because the designers are trying to solve an HCI not an API versioning problem. However, I don’t think there is any real disagreement on this point.
Claim: “Openflow will make hardware simpler”
I highly doubt this. Even with Openflow, a practical network deployment needs a bunch of stuff. It needs to do lookups over common headers (minimally L2/L3/L4), it may need hash-based multi-pathing, it may need port groups, it may need tunneling, it may need fine-grained link timers. If you grab a chip from Broadcom, I’m not exactly sure what you’d throw away if using Openflow.
What Openflow may discourage is stupid shit like creating new tags as an excuse for hardware stickyness (“you want new feature X? It’s implemented using the Suxen tag(tm) and only our new chip supports it.”). This is because an Openflow-like interface can effectively flatten layers. For example, I don’t have to use the MPLs tag for MPLS per se. I can use it for network virtualization (for instance) or for identifying aggregates that correspond to the same security group. However, that doesn’t mean hardware is simpler. Just that the design isn’t redundant to fulfill a business need. (Post facto note: There are some great points regarding this issue in the comments. We’ll work to spin this out into a separate post.)
Or more to the point. There are an awful lot of fields and bits in a header. And an awful lot of lookup capacity in a switch. If you shuffle around what fields mean what, you can almost certainly do what you want without having to change how the hardware parses the packet.
Claim: “Openflow will commoditize the hardware layer”
This is total Kaiser Soze-esque nonsense (yup, that’s the picture at the top of the post). To begin with, networking hardware is already on its way to horizontal integration. The reason that Arista is successful has nothing to do with Openflow (they strictly don’t support it and never have), it has nothing to do with SDN, it’s because Ken Duda, and Andy Bechtolsheim are total bad-asses and have built a great product. Oh, and because merchant silicon has come of age, meaning you don’t need need an ASIC team to be successful in the market.
The difference between OpenFlow and what Arista supports is that with Openflow you choose to build to an industry standard interface, and not a proprietary one. Whether or not you care is, of course, totally up to you.
So Openflow does provide another layer of horizontal integration and once the ecosystem develops that it is, in my opinion, a very good thing. But the ecosystem is still embryonic, so it will take some time before the benefits can be realized.
I think the power of merchant silicon and the rise of the “commodity” fabric are a far greater threat to the crusty scale-up network model. Oddly, Openflow has become a relatively significant distraction. That said, as this area matures, creating a horizontal layer “above the box” will grow in significance.
Claim: “Openflow does not map efficiently to existing hardware”
True! Older versions of Openflow did not map well to existing silicon chips. This is a *major* focus of the existing design effort; to make Openflow more flexible. As with any design effort, the trade-off is between having a future proof roadmap, and having a practical design that can have tangible benefits now. So we’re trying to thread the needle practically but with some foresight. It will take a little patience to see how this evolves.
Claim: “Openflow reduces complexity”
This is a meaningless statement. Openflow is like USB, it doesn’t “do” anything new. A case can be made for SDN to reduce complexity, but it does this by constraining the distribution model and providing a development platform that doesn’t suck. No magic there.
Claim: “Openflow will obviate the need for traditional distributed routing protocols”
I hear this a lot, but I just don’t buy it. There are certainly those within the Openflow community who disagree with me, so please take this for what it’s worth (very little …)
In my opinion, traditional distributed protocols are very good at what they do, scaling, routing packets in complex graphs, converging, etc. If I were to build a fabric, I’d sure as hell do it with an distributed routing protocol (again, not everyone agrees on this point). What traditional protocols suck at is distributing all the other state in the network. What state is that? If you look at the state in a switching chip, a lot of it isn’t traditionally populated through distributed protocols. For example, the ACL tables, the tunnels, many types of tags (e.g. VLAN), etc.
Going forward, it may be the case the distributed routing protocols get pulled into the controller. Why? Because controllers have much higher cpu density and therefore can run the computation for multiple switches. A multicore server will kick the crap out of the standard switch management CPU. Like I described in a previous post, this is no different than the evolution from distance vector to link state, it’s just taking it a step further. However, the protocol certainly is still distributed.
Claim: “Openflow cannot scale because of X”
I’ve addressed this at length in a previous post. Still, I’m going to have to call Doug Gourlay out. And I quote …
“I honestly don’t think it’s going to work,” Arista’s Gourlay said. “The flow setup rates on Stanford’s largest production OpenFlow network are 500 flows per second. We’ve had to deal with networks with a million flows being set up in seconds. I’m not sure it’s going to scale up to that.”
Other than being a basic logical fallacy (Stanford’s largest production Openflow network has absolutely nothing to do with the scaling properties of the Openflow or SDN architecture), there appears to be an implicit assumption that flow-setups are in some way a limiting resource. This clearly isn’t the case if they don’t leave the datapath which is a valid (and popular) deployment model for Openflow. Doug’s a great guy, at a great company, but this is a careless statement, and it doesn’t help further the dialog.
Claim: “Openflow helps virtualize the network”
Again, this is almost a meaningless statement. A compiler helps virtualize the network if it is being used to write code to that effect. The fact is, the pioneers of network virtualization, such as VMWare, Amazon, and Cisco, don’t use Openflow.
So yes, you can use Openflow for virtualization (that’s a help, I guess … ). And open standards are a good thing. But no, you certainly don’t need to.
OK, that’s enough for now.
To be very clear, I’m a huge fan of Openflow. And I’m a huge fan of SDN. Yet, neither is a panacea, and neither is a system or even a solution. One is an effort to provide a standardized interface for building awesome systems (Openflow). The other is a philosophical model for how to build those systems (SDN). It will be up to the systems themselves to validate the approach.
In my experience, it’s rare to have a discussion about SDN without someone getting their panties in a bunch over scaling. Nevermind that SDN networks have already been implemented that scale to tens of thousands of ports. And nevermind that distributed systems are built today that manage hundreds of thousands of entities and petabytes of data. It remains a perennial point of contention.
So here is a hand-wavy – probably not totally convincing – but hopefully offers some intuitive understanding – attempt at an explanation of the scaling limits of SDN.
But first! Some history: Unfortunately, I think I am partially to blame for the sorry state of affairs. In some of the earliest writeups we would describe an SDN controller as “logically centralized”. The intended meaning of this was something along the lines of “it has a centralized programmatic model, but was really distributed”.
Now, what does “logically centralized” actually mean? Nothing. It’s a nonsense term and the result of sloppy thinking. Either you’re centralized, or you’re distributed (thanks to Teemu Koponen for pointing that out). So by logically centralized, I guess we really meant “distributed”, and that is what we should have said. Whoops!
Clearly, a centralized controller cannot scale. Perhaps it can handle a really large network. Hell, I’ve heard claims of single node SDN networks that scale to thousands of switches and tens of thousands of ports. Whether or not that’s true, at some point either CPU or memory will run out.
However, controllers can (and should!) be distributed. And in that case, I would argue that there are no inherent bottlenecks. Trivially, you can distribute controllers such that the total amount of available CPU and memory for the control path is equivalent to the sum total of management cpu’s on the switches themselves. Controllers can distribute state amongst themselves like network nodes do today.
Still not convinced? Let’s deconstruct this …
Latency: How might SDN affect latency? That depends on how it is used. In some implementations, datapath traffic never leaves the hardware. In which case, datapath latency should be identical to traditional networking. Other implementations will forward the first packet of a flow up to the controller and then cache the forward decisions on the datapath. In this case, the flow setup will have to pay an RTT to the controller. The following paragraph will discuss how much that might be.
Whether or not data traffic is sent to the controller, control traffic often is (for example BGP from a neighboring network, or the signalling of a port status change). In this case the additional overhead is the RTT to the controller + the cost of the computation to determine what to do next. How much that is will vary wildlyby deployment. It’s possible to keep this sub millisecond (200-300us is not unusual). I’ve seen as little as 70us claimed, but I doubt that could be maintained under any real load.
So you may not want to pay this latency per flow (though in many cases it probably doesn’t matter). However, there isn’t any appreciable overhead for responding to network events. And as I’ll describe later, it probably will reduce the total time needed to disseminate control traffic.
Datapath Scaling: The datapath is where the original OpenFlow design was broken. This is because OpenFlow used the abstraction of a switch datapath as a single TCAM. Clearly, this can be problematic for some forwarding rulesets. Assume, for example, that you are building an SDN application which would integrate with BGP, and map prefixes to tags with QoS policies. Implemented this within a single flow table would require (RIB * QoS rules) entries. Ouch.
However, later versions of OpenFlow support multiple tables which take care of this Cartesian explosion nightmare. Rather than having to multiply all the rules together, they can be placed in separate tables limiting the maximum size requirements of a single table (by a lot!).
Still, many modern implementations of SDN dispense with the abstraction of a flow altogether and program the switch hardware tables directly. I think we’ll see a lot more of this going forward.
Convergence on failure: Alright, what happens on link (or node) failure? Traditionally, when a link fails, the information of the failed link is propagated through flooding. Flooding, of course, scales linearly with the distance of the longest loop free path. If you compare this with SDN, instead of being flooded the information goes to one of the controllers, which sends the updated routing tables to the affected switches.
A few things to note:
- With SDN, the controller should only be updating the switches whose tables have actually changed.
- With SDN, the total cost is link propagation to the controller + cost of computation + time to update to affected switches.
- In a distributed SDN case, an implementation may distributed the link update among the controller clusters which then do the computation for their piece of the network.
Unfortunately for SDN, when in-band control is in play, things are a lot uglier. Inband control basically means that the datapath is used both for SDN control traffic and data. So a failure on the network may affect connectivity to the controller which must be patched up somehow before the controller can fix the problem for the rest of the network.
Patching up the control channel, in this case, is generally done with an IGP, so we’re strictly worse off than if we just used an IGP to begin with. Suck.
So bottom line, if convergence time is important, out of band control should be used. It’s simple to build a cheap, reliable out of band control network using traditional gear (the amount of traffic is minimal).
Computation and Memory: It’s easy to see that the latest multi-core server available from your friendly server vendor can beat the pants off of whatever crap embedded cpu you’ll find in most networking gear (800mhz-1Ghz is common).
I think Nick McKeown said it best. If we look historically, early routing protocols used distance vector routing algorithms (like RIP) which have very low computational requirements, but suck in a bunch of other ways (convergence times, split infinity etc.). As cpu’s became more powerful, it was possible for each node to compute single source shortest path to all destinations on each failure, which is the standard approach used today.
Calculating more routes (e.g. all pairs shortest path, or multiple source to all destinations) on stronger CPUs really follows as a natural evolution. The clear “next step”. And a beefy server with lots of memory can compute *a lot* of routes, and quickly. Further, we already know how to distribute the algorithms.
Some parting notes: The goal of SDN is not to scale a simple network fabric. This is something that distributed routing algorithms do just fine. The goal of SDN is to provide a design paradigm which allows the creation of more sophisticated control paradigms: TE, security, virtualization, service interposition, etc. Perhaps that means just manipulating state at the edge of the network while traditional protocols are used in the core. Or perhaps that means using SDN throughout. In eithercase, SDN architecturally should be able to handle the scale. Remember, it’s the same amount of state that’s being passed around.
I think the right question to ask is not “does it scale” but “is it worth the hassle of building networks this way”? Building an SDN network requires some thought. In addition to the physical network nodes, a control network (probably), and controller servers need to be thrown into the mix.
So, is it worth it?
That, is up to you to decide. Ultimately the answer will rely on what gets built using SDN and who wants to run it. I think it’s still too early in the development cycle to derive a meaningful prediction.
If you haven’t heard the talk, or seen the slides, these are a must. Professor Shenker, one of the creators of SDN, put together this talk about why programming to high-level abstractions is better than protocol design, and why SDN helps achieve it.