This article is more than 1 year old
IT outsourcing: SLAs, patches – and how uptime funk's going to get to you
Cheaper, better than running it yourself? Maybe
Feature Outsourcing generally has a bad reputation, scarred by countless failed projects in the public and private sectors and with cost cutting, rather than improved sevice delivery, seeming to drive business decisions.
It's big business: the global IT outsourcing market was worth $318.5bn (£232.5bn) in 2019, according to one report. So not every CIO can be wrong when they decide that bringing in an external technology provider is preferable to doing it themselves. Can they?
The failures can be high profile in nature, be it the Home Office's digital border sevices contract with Raytheon; IBM and WPP; Capita and the Primary Care contract; Capita and the Army recruitment services; Capita and the school bug; Capita and... we'll stop there.
Yet to some outsourcing makes sense – particularly in the SME sector, where it is far more economical to get in a third party with a bunch of bright, highly qualified engineers than it would be to employ your own.
Security – we've heard of it
There are various problems with outsourcing, though, the first of which is the security of the outsourced suppliers themselves. Let's look back to the day when X.25 networks were still a thing, and a leading travel software provider used it as a means of providing remote support to customers (travel companies) that ran its software.
Guess what? A number of its customers had their networks penetrated and lost money because the supplier's security, and that of the X.25 network and related kit, wasn't up to scratch – so bad actors were able to access the booking systems and associated payment engines of some of the customers. This writer was consulting for one of the affected companies at the time and the loss stung.
Even the big names, whom you'd think would know better, have the occasional hole in their security armour. Like the big US telco's call centre operator that forgot to walk me through the authentication process because they were so busy telling me how cool my English accent was.
And coming right up to date, the Kaseya ransomware and SolarWinds attacks exploited vulnerabilities in a supplier in order to infect its clients. Not strictly outsourcing in the latter case, but a variation on a theme.
One might think that the supplier's own security is the most important problem, but that's not the case: far more concerning is the concept of where the buck stops, because it's never with the supplier and always with the client.
Take the example of a law firm that was setting up an office in a new location and decided, perfectly sensibly, to start up in a managed office suite. A security researcher phoned them one day and asked them: do you know your printer is on the internet, and do you know the admin password is set to the factory default? The researcher had been gazing around for interesting things using the Shodan.io device security search engine, and among the boring stuff had noticed something that looked like a printer. So, he pointed his web browser at it, Googled the default password for that make and model, checked out the printer's address book and realised whose printer it was.
In this case the outcome was a happy one, but it should be obvious that a printer that's visible from the internet is a simple path into a company's internal network. And had the result been, say, the theft of masses of personal data from the law firm's files, the company shamed by the press wouldn't have been the service provider. Customers can write penalty clauses into outsource agreements, but we'd be hard pressed to find one contract that transfers the reputational impact - it's simply not possible.
So, then, accountability is a worse problem than the supplier's own security. But the next problem is even worse than both of the above: ownership.
Contract should be water-tight – but remember that can sometimes work against you
The service provider is contractually obliged to provide the services stated in the contract, to the level agreed therein. And that's it. If you're smugly thinking: "Oh, my provider does more than that", this may be true some of the time: A half-decent service provider that comes across a new vulnerability – perhaps even one that affects its own systems – will, as a courtesy, mention it to the customer. But that doesn't mean it is delivering a robust Threat Intelligence (TI) service, just that the provider's mentioning a potential vulnerability once in a while.
A typical contract with the service provider will probably include the requirement to ensure that security patches are installed within a strict time limit after release. But here's to betting it doesn't include checking to see whether any of the vulnerabilities that those patches fix have been exploited in the meantime. Although some security patches fix vulnerabilities completely and permanently, others will slam the door shut but not notice the bad guys are already inside; unless your service provider takes the time to look, the installation of the patch may have been largely pointless.
And when the agreement includes "patching of Windows servers", what does that mean? Unless customers are careful, it means installing Windows patches. But what about the motherboard firmware? The firmware of the RAID card? The software of the "lights-out" management adaptor?
All of the above can be catered for in the service contract and the Service Level Agreement (SLA), of course, but in some cases it's conspicuous by its absence. Yet vendors regularly put out firmware fixes for their kit because of both functionality bugs (which can cause your production servers to turn up their toes without warning) and security vulnerabilities.
Firmware can bite you on the backside when you least expect it, such as the morning my primary production database shut itself down because a firmware bug had wrongly caused the hardware to turn itself off. And if you still don't believe firmware patching is a necessity, check out a March 2021 report from Microsoft, which says that 80 per cent of organisations have experienced a firmware-related attack over a two-year period.
- UK Ministry of Defence tries again to procure £1.7bn tri-service recruitment system
- Council culture: Software test leads to absurd local planning SNAFU
- Backblaze, long a champion of home-grown hardware, succumbs to the lure of commodity servers
- Capita scores half a billion pound outsourcing contract, but refuses to name (or shame?) lucky 'European telco' customer
Interestingly the SLA associated with the outsourced service can be an enemy rather than a friend. Take, for example, a global company that outsourced its IT support to a well-known name, who dealt with a service request from a tester who needed to be enabled to log into a dozen or so servers to run CPU and memory loading tests.
The tester arrived the next morning to find that he could indeed log into those servers … and after a little digging realised that this had been achieved by the service provider making him a Domain Administrator. Lightning-fast response, and the target time of the service level agreement was well and truly met, but overall it was a somewhat sub-optimal outcome.
The front-line team were focused on quick response, and they achieved it, but as junior people in a vast organisation it never occurred to them to think of the deeper implications.
So, we've looked at a variety of instances where your outsourced service provider hasn't provided – or isn't contracted to provide – the service you need or expect. There's another factor at play, though: the uptime you expect your service provider to give you.
The concept of "five nines" uptime is a common one – that is, the expectation that your systems will be up and working for 99.999 per cent of the time, which permits a tad over five minutes' downtime per year. And that's fine… but does that mean five minutes' total downtime, or five minutes' unplanned downtime?
It's an important distinction, because particularly with firmware upgrades it'll take more than five minutes to run in the new code and reboot the device. This means that unless you have fully resilient systems – clustered pairs of firewalls, physical servers connected to multiple independent LAN switches, dual internet routers, and so on – your provider simply won't be able to keep everything up to five-nines quality if total uptime is what you're measuring them on.
And yes, it's perfectly possible to implement resilient systems, but of course there's a cost and so the majority of companies can't afford to make absolutely everything resilient.
And even without formal uptime requirements, it's easy to get into a death spiral of failing to upgrade. Like the Infrastructure-as-a-Service (IaaS) provider whose core LAN infrastructure collapsed and died when a minor change was made to a VLAN setting; it was bitten by a bug that had long since been fixed but the firmware hadn't been upgraded in order that their customers' service shouldn't be interrupted.
The moral of the story is this, then. As a customer, you are entirely within your rights to expect a certain type and level of service from the companies to which you outsource your IT support – and these elements will be in black and white in the contract. But take a step back and consider what the words in the document mean.
Installing Windows patches means just that; it doesn't mean any additional thought is given to what the provider is fixing and whether there's something related that you ought to be checking. Informal notifications of new, critical vulnerabilities are just that – it's nice of the provider to let you know, but it is not obliged to do so and neither can you point the blame at them if you're hit by a vulnerability that the firm didn't tell you about.
And if you won't allow the service provider the downtime that it needs in order to keep your systems up to date at every level – firmware, operating system, drivers, software – then you really have no right to point the finger when something goes pear-shaped. ®