The ONOS (Open Network Operating System) software defined networks project has shipped the second iteration of its code, and with it hopes to encourage the SDN controller sector to start publishing detailed performance metrics.
Prajakta Joshi, director of products at ON.Lab, told The Register that while SDN is seeing strong takeup in the data centre space, service providers are more wary. As the home of six-nines network requirements, telcos need to be able to validate a lot more than the top-line I/O performance provided by tools like Cbench.
In Blackbird, the project has sought to broaden the metrics carriers can measure to describe how quickly the SDN controller is reacting to changes in its environment. These include topology (new links or new switches); the throughput of flow operations (how many flows the controller can handle, how quickly it reacts to changes in flows); “intent” throughput and latency (which I'll explain in detail below); and the control plane scalability.
Joshi said as well as publishing both the measurements the ONOS Project has made on Blackbird to its various metrics, the group wants other SDN controllers – both open and proprietary – to publish similar numbers.
“It's about starting the conversation”, she told Vulture South.
Joshi said the kinds of questions service providers are trying to answer before they get into SDN are things like “As your network grows, if you have 100 switches – then you get another 200 switches – can you scale-up the controller by adding instances?
“If something happens southbound, can you detect it fast enough? On the northbound, applications are trying to program the network – how many applications can you service, and how quickly can you service them?
“If something happens to the network – can you tell the applications? If your application reacts to reprogram the network, how quickly?”
In the SDN context, the latency targets we're talking about aren't about how fast packets are moving, but how fast the infrastructure can react to changes. In the service provider context, that's vital: being able to flick traffic to a different path fast enough to leave conversations or sessions uninterrupted, for example, is how a carrier makes sure a dead switch doesn't become a crisis.
In all cases, Joshi told The Register, ONOS set latency targets of 100 ms worst-case and an ideal of less than 10 ms.
In a white paper to be published shortly, ONOS says the Blackbird is at or close to its latency targets in most measurements:
- Switch latency – between 66 ms and 77 ms on switch up events, and between 8 ms and 13 ms on switch down events;
- Link latency – between 8 ms and 22 ms for link up events, and between 3 ms and 4 ms for link down events (it takes longer to discover a new link than to identify a broken link);
- Flow setups – the group claims 500,000 flow installations per second for a single instance, and for a seven-instance cluster, 3 million local flows and 2 million non-local flows per second.
SDN with intent
“Intent” refers to how quickly the SDN controller can respond to a policy input. As Joshi explained: “the applications tell the framework what they need, then the framework carries it out. 'I want a 10 Gbps path between two hosts, optimised for cost' – that is what we would call intent.”
In measuring intent latency, then, the test framework is looking at how long it takes the SDN controller to do something like computing the requested path and pushing it down to the switches.
The ONOS lab work, Joshi said, measures how long it takes to initiate or withdraw intent, or how long the controller needs to reroute traffic.
Currently the group claims Blackbird is well under the 100 ms target for intent installation, withdrawal and reroute – a single instance responds within 15 ms to all three triggers, while a multi-node instance responded within 40 ms for install/withdrawal and 20 ms for reroute.
The intent performance, Joshi said, will be subject to further tuning to get closer to target. “Some things remain to be optimised”, she said.
In scale-out, she said, Blackbird is achieving close to linear scalability – if you have 100 switches managed with four servers, then eight servers should serve 200 switches.
There is a small scalability penalty because of the extra east-west traffic for communications between instances, but “that's been minimised to get reasonably linear scale-out”.
So service providers can run the same tests, either to validate Blackbird's performance for themselves or to test other controllers against the same metrics, the test setups and methodology are available here. ®