Let's play immutable infrastructure! A game where 'crash and burn' works both ways

Leavin' it to the Netflixes... for now


If you’ve ever had the misfortune to work as a systems administrator (and it doesn’t matter if it’s a Windows or Linux shop) you’ll know the feeling of logging on on Monday morning, checking a few log files and noticing something’s not quite right.

It might be file systems filling up, a spam attack has filled the log directories or there is an important OS system update to be performed. You’re a good sysadmin and fix the problems there and then, everybody’s happy, but unwittingly you’ve created a major problem.

You’re managing dozens of machines: the majority are mostly the same, running the same OS, possibly the same applications (load balanced web servers, for instance) and you’ve just updated one, but not the others. Your infrastructure drifts, servers are slightly different versions, applications not quite the same and one day during an update it all comes crashing down.

OK, it might not be that bad, but you might have the nightmare of having slightly different procedures to update on each machine or worse update scripts that fail in some cases. There might be one way to stop the problem happening, don’t let anyone log on to a server once it’s been set up. This is immutable infrastructure - it never changes.

If a sysadmin can log on to a machine, the temptation to change setting (and perhaps fill in the change log forms later, if you have time) can be just too much to bear.

Immutable infrastructure stops this – machines you don’t alter just dump and replace with new machines. The servers you need to create are built from scripts - something most likely to happen with cloud-based servers but - with careful installation procedures - something you can do locally. Immutable infrastructure is a concept that has been hotly debated for some time.

The last thing your script does is turn off the SSH port (or whatever method your OS uses to let you logon) so you can’t log-on to the machine anymore. Your machines will run like this until they are therefore destroyed and replaced by new a script-based set of machines.

Of course all these scripts will be under source control, if needs be they can be rolled back to a previous state, the point is all of them will always be the same configuration, the data will be safe on another disk (backed up of course!) and in house application software deployed by continuous deployment.

So now our infrastructure really is cattle and not pets, the sysadmin probably won’t know the names of most of the machines that make up the application, it will all be code numbers and sub version numbers. Old timers like me will lament the passing of servers named after planets in Star Trek, but such is the price of progress.

Sharp readers will have noticed that there is a price to pay for this, if your infrastructure is code based then it will need testing before it is deployed. That's going to mean it will need test infrastructure, test scripts to make sure it works when it’s brought up and a testing schedule to ensure all tests are covered and correct.

However, this isn’t like pure software that can generally be tested quickly; deploying a new server takes time, even for the fastest cloud based operators and that delay will be frustrating! Here’s an example: your services are under attack from some nasty hackers, there’s a OS fix to counter the attack, in the old days your admin team would deploy the fix across the server range, embarrassment avoided. Now you need to go to the test environment, test the fix, and then redeploy. Time to fix has stretched.

The delay in fixing problems might not be the only problem, it’s fairly well accepted that the biggest problems with code are introduced during programming (even if they may be following requirements!).

Now that our infrastructure is code, deploying new servers can potentially be susceptible to the same problem, without that through testing phase the infrastructure scripts might be buggy. Worse still, what if there is a scripter with a grudge, bugs in the infrastructure could be nasty, destroying data disks is just one example to bear in mind.

It’s worth bearing in mind that the immutable dream isn’t all there yet, companies have been working with Docker, AWS, Azure to make it happen, but there is no simple and cheap off-the-shelf solution just yet - look at how much code Netflix has built (and open sourced) to make their platform work.

Immutability certainty works for the big boys, those with global operations, but for small and medium enterprises, it’s almost certainly a step too far - for now. ®

Want to learn more about DevOps, Continuous Delivery, and Agile? Head to our Continuous Lifecycle Conference from May 3-5. Full details here.


Other stories you might like

  • IT downtime not itself going down, power failures most common cause
    2022 in a nutshell: Missing SLAs, failing to meet customer expectations

    Infrastructure operators are struggling to reduce the rate of IT outages despite improving technology and strong investment in this area.

    The Uptime Institute's 2022 Outage Analysis Report says that progress toward reducing downtime has been mixed. Investment in cloud technologies and distributed resiliency has helped to reduce the impact of site-level failures, for example, but has also added complexity. A growing number of incidents are being attributed to network, software or systems issues because of this intricacy.

    The authors make it clear that critical IT systems are far more reliable than they once were, thanks to many decades of improvement. However, data covering 2021 and 2022 indicates that unscheduled downtime is continuing at a rate that is not significantly reduced from previous years.

    Continue reading
  • Digital sovereignty gives European cloud a 'window of opportunity'
    And US hyperscalers want to shut it ASAP, we're told

    OpenInfra Summit The OpenInfra Foundation kicked off its first in-person conference in over two years with acknowledgement that European cloud providers must use the current window of opportunity for digital sovereignty.

    This is before the US-headquartered hyperscalers shut down that opening salvo with their own initiatives aimed at satisfying regulator European Union, as Microsoft recently did – with President Brad Smith leading a charm offensive.

    Around one thousand delegates turned out for the Berlin shindig, markedly fewer than at CNCF's Kubecon in Valencia a few weeks earlier. Chief operating officer Mark Collier took to the stage to remind attendees that AWS' CEO noted as recently as this April that 95 per cent of the world's IT was not spent in the cloud, but on on-premises IT.

    Continue reading
  • IBM buys Randori to address multicloud security messes
    Big Blue joins the hot market for infosec investment

    RSA Conference IBM has expanded its extensive cybersecurity portfolio by acquiring Randori – a four-year-old startup that specializes in helping enterprises manage their attack surface by identifying and prioritizing their external-facing on-premises and cloud assets.

    Big Blue announced the Randori buy on the first day of the 2022 RSA Conference on Monday. Its plan is to give the computing behemoth's customers a tool to manage their security posture by looking at their infrastructure from a threat actor's point-of-view – a position IBM hopes will allow users to identify unseen weaknesses.

    IBM intends to integrate Randori's software with its QRadar extended detection and response (XDR) capabilities to provide real-time attack surface insights for tasks including threat hunting and incident response. That approach will reduce the quantity of manual work needed for monitoring new applications and to quickly address emerging threats, according to IBM.

    Continue reading
  • The Register talks to Microsoft's European cloud rivals about getting a fair deal
    What do we want? Open standards, just licensing and a level playing field, say OVHcloud, Scaleway and others

    If you're a cloud specialist in the EU, things like licensing, Euro digital sovereignty project Gaia-X, and a creating a level playing field are all front of mind.

    As far as licensing goes, Microsoft recently had an apparent awakening concerning its practices in Europe and vowed to make concessions – accusations of anti-competitive behavior can focus the mind.

    Yet a cloud rival The Register spoke to, who asked to remain anonymous, complained that Redmond's proposed changes fail to "move the needle" and ignore the company's "other problematic practices."

    Continue reading

Biting the hand that feeds IT © 1998–2022