Inside OpenStack: Gifted, troubled project that wants to clobber Amazon

Networks may be its achilles heel, but its devs are working on it

OpenStack Summit OpenStack is just like a precocious child – the open-source data center management and service layer keeps on doing terrifically clever things, while infuriating everyone that deals with it.

As the open-source cloud control freak's summit unfolds this week in Hong Kong, we decided to take a look at the progress of the ambitious software.

OpenStack was born in mid-2010 with compute and storage code donated from NASA and Rackspace, respectively.

The software is designed to manage public and private clouds and is seen by many companies as their best hope for gaining features found in the monolithic proprietary clouds operated by Google, Amazon, and Microsoft.

Along the way it has gathered interest from hundreds of companies such as Intel, Red Hat, AT&T, Brocade and F5, and become the foundation of various public clouds, such as HP's, and IBM's upcoming SoftLayer-revamp.

Though many bill OpenStack as "the Linux of the cloud", the technology so far fails to meet the expectations of usability and compatibility that defines the Linux community, though it is improving rapidly.

As of this month, many of its core components are usable and survive at scale. Many people El Reg has spoken to had good words to say about the Cinder block storage component, the Horizon web front-end, and the Keystone identity service. Some of the younger features also do well, with Oracle plugging the nascent Swift object-cum-blob store into its cloud.

But there is one crucial area that the technology has run into problems: the Neutron networking component. This is causing concern among the community, and handwringing behind closed doors and in IRC channels.

The challenges are to be expected – after all, while open source has made vigorous inroads into storage and compute systems in the past two decades, networking systems have remained resolutely in the domain of proprietary suppliers until very recently. There hasn't been much change, or much open code, to build expertise around. And it shows.

The Neutron Networking Nightmare

The Neutron "network-as-a-service" component was formerly known as Quantum and came along in the Folsom release of the system in September 2012. After a dispute with Quantum Corp, the name was changed to Neutron with the most recent Havana release.

IRC discussion logs from the OpenStack networking channel seen by The Register indicate that the community has found bugs in Neutron that they have been forced to hurriedly patch for the Havana release. Though these are all being fixed, their presence highlights the immense difficulty of developing a stable networking component that can scale to thousands of servers.

"If you stay within the basic realm of Neutron, it's not bad and if you try to go much deeper than that it's pretty dicey," said Hernan Alvarez, the vice-president of operations and product at OpenStack-hosting company Bluebox.

One posting to the OpenStack public mailing list in September 27 asked whether anyone else was using Quantum or Neutron in production. It hasn't received a single public reply.

The poster highlighted several areas that did not "inspire confidence" in the tech, including: "no support for clean network teardown", difficulties in updating subnets, no support for multi-host deployments, "little to no support in IRC from Neutron devs", and "confusing and unrealistic assumptions about deployments" in the network administration documentation.

"The only thing driving us into Neutron is the threat of nova-network's deprecation," the user wrote.

Clearly, one person's experience cannot be representative of the whole, and it's likely that many people are playing around with Neutron on private and/or secret projects, but the posting points to problems that El Reg has heard in various chats over the past few months.

Another issue that is endemic to both within-compute networking (Nova) and the standalone in-development networking module (Neutron), is that for single-host and flat networks, the IP allocation, IP routing, NAT, DHCP, and OpenStack metadata services are in a single chunk of code making them difficult to interface with, while in a multi-host format the services are distributed across hypervisors presenting a much larger attack surface.

"The worst architectural decision you can make is stay with default networking for a production system because the default networking model in OpenStack is broken for use at scale," said Randy Bias, chief executive of OpenStack specialist CloudScaling.

Work is being done on Neutron, and we understand that many companies are working hard to increase the stability of this critical component, but some of these solutions are accidentally fragmenting the community. Just as we've seen fragmentation in Linux and Android, it's happening in OpenStack – the community need to get a handle on this otherwise it could spawn a flock of separate distributions that are hard to migrate between.

"People are solving high availability in Nova and in Neutron in lots of different ways and there's not really a best practise," said Alvarez of Bluebox. "Every time you get into one of these they're kind of hand-carved – that's a shortcoming of what's going on."

Alvarez's reference to "hand-carved" features highlights another area where OpenStack has had problems – namely the fact many of its stock components seem to require heavy tweaking according to the use case, which is a bad thing for the majority of users because companies may be tempted to tweak their own installation to the point where it's hard to merge in the improved features from subsequent community releases.

"Downloading an OpenStack deployment is a complex undertaking with a lot of rough edges – there's a lot of ways to hurt yourself," said Randy Bias of CloudScaling. "OpenStack is a very flexible system [but it] presents problems in that there can be certain design decisions that make it hard to upgrade."

This echoes the thoughts of OpenStack software provider Metacloud, whose director of systems architecture Chet Burgess told El Reg recently that the set up of the software can be difficult for novice administrators, and therefore attracts organizations that are likely to make tweaks to deal with its problems.

"It isn't easy to deploy OpenStack, and it's not easy to troubleshoot and run it," Burgess said. "If you're trying, like OpenStack, to have a lot of features and a lot of options it increases the complexity of keeping everything working."

One piece of advice that both CloudScaling and Metacloud and Bluebox have is that companies shouldn't meddle too heavily with their distro. (Yes, all of these companies benefit from outsourced OpenStack development, but we believe their points are driven by a care for the community and respect for the product, with a dash of self-interest rather than vice versa.)

"I think the problem is... the stance is 'go download OpenStack and do anything you want to it', it doesn't tell you the other side of the coin which is that if you do that you are committing to maintaing a customized system," Bias said. "It's like saying, 'why don't you go out and build a custom linux distribution,' you're going to make all kinds of weird decisions."

What OpenStack needs, more than anything else, is a smaller range of distributions and a concerted effort by the community to harden all of the core features, we reckon.

This is a similar approach that has been taken by the Hadoop lot, which has seen a wealth of companies converge around the stock Apache Hadoop project, including Intel and Hortonworks and Microsoft, and a few companies produce heavily proprietary "open core" distros such as MapR or Cloudera.

"People should be using products based on OpenStack and not doing DIY. You either get the main factor - the stock car... or you get the hot rod car - you don't get to have both," Bias said. "Define what your strategy is."

With the Icehouse release due for release in six months or so, many in the community are hopefully that the networking issues will be fixed, and other core features will be worked on.

"It's certainly not a process without bumps in the road," Burgess of Metacloud said. "The past two years has proved it is working - we have come along way, it has gotten remarkably better." This thought is echoed by Alvarez of Bluebox, who added: "What's been developed in the last 18 months is absolutely astounding."

As long as the community keeps dazzling with its technical innovations, while working on fundamentals, OpenStack looks set to grow further, but the development of Neutron provides a cautionary example for the pitfalls of trying to build a management system for every data center of every size. ®

Other stories you might like

Biting the hand that feeds IT © 1998–2022