CERN IT boss: What we do is not really that special

You'll all be doing the same - in about 10 years' time

When the head of infrastructure services at CERN tells you that he has come to the conclusion that there’s nothing intrinsically “special” about the systems at the multi-billion atom-smasher, you naturally want to check you’ve heard correctly.

After all, when we sat down with Tim Bell at the OpenStack Summit in Paris recently, it was rather noisy as around 5,000 visibly excited engineers techies swapped war stories about the open cloud computing platform while vendors hurled hospitality and job offers at them.

Few of them, however, would be running the sort of systems for which Bell and his team are responsible: a 100PB archive growing at 27PB a year, with 11,000 servers supporting 75,000 disk drives and 45,000 tapes. And that data is being thrown off by the machine that recently found the Higgs Boson, the so-call God particle. Most tech managers would say that’s up the upper end of data infrastructure challenges.

Inside the CMS experiment

But as Bell continued, some of his issues certainly sounded familiar. The Large Hadron Collider itself is in the midst of an upgrade which will double the amount of energy produced, and the supporting systems at CERN have to keep up.

When your IT system has to keep up with the Large Hadron Collider

The LHC is due to fire up again in April, and Bell and his team have been retooling the IT infrastructure since 2013. As of the beginning of this month, CERN is currently running four Openstack-based clouds, the largest consisting of 70,000 cores running across 3,000 servers, while three other instances clock up a further 45,000 cores. The total number of cores should hit 150,000 in the first quarter of next year. Just in time for the reboot of the LHC.

But two years ago, when Bell and his team started planning for the upgrade, it was time to do some hard thinking - a not uncommon practice at CERN you’d assume. Even the world of top-end physics has to operate within human laws such as economics – to some degree anyway. And, according to Bell, this means no more staff, a decreasing materials budget, and legacy tools that are “high maintenance and brittle”. And just in case you were wondering, the “users” expect fast self-service.

“The big thing in this case was to apply that to the IT department ... we were basically challenging some fundamental assumptions that CERN has to create its own solutions. That they’re special.”

He continued, “When you go further you need to start challenging those assumptions that led software to be developed locally at CERN rather than taking open source and contributing to it.”

So, that double-take again: CERN IT is not special? Really?

Thinking differently

“There are clearly some special parts,” says Bell. “But there are also often things that are of interest to other people. The key thing to avoid is where we end up doing something that is similar to what is being done outside.”

And what crystallised that realisation? “We had a moment where we worked out just how much computing resources we were going to have to give to the physicists next year.”

“It was a point where we were confronted by a problem that was difficult we had to step back. It wasn’t going to be solved by doing a little bit extra - we had to basically rethink things from the beginning...and I think that helped to set a few ideas in place.” Sound familiar?

It’s not as if Bell’s team were completely starved of resources. The IT on-site at CERN has been supplemented by a new data centre in Hungary. Even so, Bell continues, “What we needed to appreciate was the extent to which the organisation needed to change as well as it just being a matter of installing some more servers.”

Hence the decision to get up close and personal with OpenStack in general and Rackspace in particular. It might be worth noting that, back when we wrote this, the firehose Bell’s team was drinking was pumping out a mere 25PB a year.

“After a few months of prototyping then we had the basis to set in place something where we could map out the roadmap to retire the legacy and the legacy environment. The decommissioning of it started on the 1st of November,” Bell says. “So in 18 months we basically produced a tool chain [which is] replacing the legacy environment that we’d run for the previous 10 years.”

OK, so that still sounds a bit special. But like every organisation, Bell’s experienced a few bumps as experienced hands went cold turkey on the “not invented here” approach.

“That’s involved a lot of work with the people looking after the services and helping them with some training - either formal or informal - in order to use the new tools. But so far we’ve had a lot of positive feedback of the new tools, so that’s all helped to get people on board.”

And the formal training has often come from youngsters straight out of college, who are familiar with the new tools and new ways of doing things. While this is perhaps an inversion of the traditional way IT is run, it is also increasingly common. Or at least commonly talked about. And it’s also faster, at least in theory, than having a constant tide of recruits having to get familiar with tools they will only find in one organisation.

“Many times people are joining CERN with the knowledge of the tools from university,” says Bell. “So it means that the training time is considerably less - you can buy a book that will tell you about Puppet whereas in the past you would have had to sit down with the guru to understand how the old system worked.”

No one’s suggesting that disgruntled older hands are getting the hump and leaving. But Bell says CERN has always been structured “to assume a regular turnover” of staff from summer interns, to fellows programmes.

“As part of CERN’s mission, it’s not only the physics. There is a clear goal for CERN to act also as a goal for people to arrive spend a short period of time at CERN - up to five years on short term contract - and then return to their home countries with those additional skills. That could be engineering, [equally] it could be physics and computing.”

“Now in this case what’s great is that we take a Linux expert out of university and we produce someone that’s trained in Openstack and Puppet and they find themselves in a lot of demand at such time as they have a contract at CERN finish.”

Other stories you might like

  • AMD claims its GPUs beat Nvidia on performance per dollar
    * Terms, conditions, hardware specs and software may vary – a lot

    As a slowdown in PC sales brings down prices for graphics cards, AMD is hoping to win over the market's remaining buyers with a bold, new claim that its latest Radeon cards provide better performance for the dollar than Nvidia's most recent GeForce cards.

    In an image tweeted Monday by AMD's top gaming executive, the chip designer claims its lineup of Radeon RX 6000 cards provide better performance per dollar than competing ones from Nvidia, with all but two of the ten cards listed offering advantages in the double-digit percentages. AMD also claims to provide better performance for the power required by each card in all but two of the cards.

    Continue reading
  • Google opens the pod doors on Bay View campus
    A futuristic design won't make people want to come back – just ask Apple

    After nearly a decade of planning and five years of construction, Google is cutting the ribbon on its Bay View campus, the first that Google itself designed.

    The Bay View campus in Mountain View – slated to open this week – consists of two office buildings (one of which, Charleston East, is still under construction), 20 acres of open space, a 1,000-person event center and 240 short-term accommodations for Google employees. The search giant said the buildings at Bay View total 1.1 million square feet. For reference, that's less than half the size of Apple's spaceship. 

    The roofs on the two main buildings, which look like pavilions roofed in sails, were designed that way for a purpose: They're a network of 90,000 scale-like solar panels nicknamed "dragonscales" for their layout and shimmer. By scaling the tiles, Google said the design minimises damage from wind, rain and snow, and the sloped pavilion-like roof improves solar capture by adding additional curves in the roof. 

    Continue reading
  • Pentester pops open Tesla Model 3 using low-cost Bluetooth module
    Anything that uses proximity-based BLE is vulnerable, claim researchers

    Tesla Model 3 and Y owners, beware: the passive entry feature on your vehicle could potentially be hoodwinked by a relay attack, leading to the theft of the flash motor.

    Discovered and demonstrated by researchers at NCC Group, the technique involves relaying the Bluetooth Low Energy (BLE) signals from a smartphone that has been paired with a Tesla back to the vehicle. Far from simply unlocking the door, this hack lets a miscreant start the car and drive away, too.

    Essentially, what happens is this: the paired smartphone should be physically close by the Tesla to unlock it. NCC's technique involves one gadget near the paired phone, and another gadget near the car. The phone-side gadget relays signals from the phone to the car-side gadget, which forwards them to the vehicle to unlock and start it. This shouldn't normally happen because the phone and car are so far apart. The car has a defense mechanism – based on measuring transmission latency to detect that a paired device is too far away – that ideally prevents relayed signals from working, though this can be defeated by simply cutting the latency of the relay process.

    Continue reading

Biting the hand that feeds IT © 1998–2022