The Spectre processor design vulnerability is here to stay. Even if you choose to ignore it, the problem still exists. This is potentially a very bad thing for public cloud vendors. It may end up being great for chip manufacturers. It's fantastic for VMware.
Existing patches can fix Meltdown, but only seem to be able to mitigate Spectre, not fix it. By many accounts, we'll be playing whack-the-vulnerability with Spectre until at least the next generation of silicon.
The definitive paper on Spectre says: "While makeshift processor-specific countermeasures are possible in some cases, sound solutions will require fixes to processor designs as well as updates to instruction set architectures (ISAs) to give hardware architects and software developers a common understanding as to what computation state CPU implementations are (and are not) permitted to leak."
A number of security experts I have spoken to confirm that the Spectre problem has not gone away, nor is it going to any time soon. There is some concern, however, about the messaging that is emerging around this vulnerability.
A great many individuals – not only those who work for Intel – have been putting a lot of time recently into telling everyone that we should calm down, not worry about Spectre, and simply continue with business as usual. There are patches, they say, and even if those patches cause problems now, that will be addressed soon.
It's not quite that simple, and Spectre may ultimately change computing forever. In the short term, it means a lot of pain for some pretty big companies.
One can apply a patch for Meltdown, take the performance hit (which can be 30 per cent or more for some workloads), and then never think about Meltdown again. This isn't ideal, but from a risk management standpoint it's fire and forget.
Spectre is a different story. Even with microcode updates, Spectre can't be completely fixed without significant changes to the architecture of modern CPUs, and that means hardware replacement. Unfortunately, the CPUs we all need to buy in order to guarantee that we're not affected by Spectre don't actually exist yet.
This isn't exactly good news if you're a public cloud provider that is trying to build enough trust to absorb a significant percentage of the world's regulated workloads. It's one thing for software vulnerabilities to exist, it's another to have known hardware vulnerabilities. That's not good when you're selling the concept of shared infrastructure.
Even if public cloud providers wanted to replace some or all of their systems with CPUs that don't have the Spectre hardware bug, they can't. Yes, older Atom processors and other in-order CPUs aren't affected by Spectre.
Unfortunately, none exist which are as fast as the out-of-order Xeons that power our servers. In fact, there probably aren't enough in-order x86 CPUs on the planet to replace the requirements of even a tier-two public cloud provider. As soon as realistic replacement chips are produced, complete replacement cycles will most likely occur. The legal uncertainty places pressure on public cloud providers to get this Spectre issue put to bed once and for all, if for no other reason than risk management.
That's great news for Intel – after all who else are you going to buy from – but it's non-optimal for the public cloud providers. Unexpected hardware refresh cycles take time, money, and affect margins. Wall Street doesn't like things that affect margins.
Forget what they taught you in kindergarten; sharing is bad
Spectre can theoretically allow code operating in a VM to read code in the cache of the physical CPU. If anyone figures out how to exploit it then it can allow someone executing code in one VM to peek into what's running in memory of another VM.
Because cloud providers are designed to offer shared resources, nothing stops a malicious actor from executing code on VMs they hire from the cloud provider. This could let them get access to data being crunched by other VMs which the malicious actor didn't hire. This means that – hypothetically, at least – every workload running in the public cloud that isn't on a dedicated host is vulnerable to random malicious actors.
State-sponsored actors absolutely have the resources to produce malware to exploit Spectre. Let none of us pretend that they don't. From a legal standpoint, this may not be a huge problem. It's unlikely any judge will expect the average company to defend themselves against nation states.
Unfortunately, nation states likely aren't the only ones with the resources to exploit Spectre. Consider the theft of cryptocurrency over the years. At today's prices, since 2010, multiple billions of dollars worth has been stolen. In 2017 alone, $225m worth of the Ethereum coin was stolen. That's not counting all the various Bitcoin thefts, or of other minor cryptocoins. A quick Google for Bitcoin thefts in 2017 shows that we could easily be looking at hundreds of millions of dollars there as well.
Add in the steady payday of ransomware over the past few years and the net result is malicious actors with significant digital crime experience and potentially a lot of money. Enough money to make very serious plays for Spectre zero days.
Yes, Spectre patches exist that mitigate the problem. And as soon as a new attack is discovered, patches will emerge to mitigate those attacks too. The problem is that everyone now knows where to look for guaranteed exploits, and there are likely to be more people trying to come up with new attacks than there are people trying to create new mitigation patches.
The above needs to be considered in the context of increasing regulatory pressure. The GPDR looms large. Canada is moving towards mandatory breach notification. Australia is already there, with some US states joining in as well.
Some of the newer regulatory regimes aren't satisfied with just patching and pretending everything is OK. They basically say that organisations have to do everything within their power to protect against any flaws that they reasonably should have known existed. The more we collectively talk about Spectre – and the tech press isn't giving up on this any time soon – the harder it becomes to stand up in front of a judge and say: "Your honour, burying our heads and engaging in business as usual was the best practice at the time."
It is here that the real dichotomy emerges in discussions about Spectre.
Public cloud providers, Intel, and software vendors that exist primarily in the cloud ecosystem are largely hoping nobody pulls a Max Schrems and challenges this in court until they can replace CPUs. They want to promote calm in order to ensure that the adoption of public cloud services continues uninterrupted.
And the public clouds matter. Shared infrastructure is increasingly dominant, and projected by many of the top analyst houses to host the lion's share of enterprise IT by 2020.
Note that this doesn't mean the majority of workloads. What it means is that enterprises are relying on the public cloud to handle the really large workloads. Big Data analytics, machine learning, artificial intelligence: the sort of workloads that I lump together under the term Bulk Data Computational Analysis (BDCA).
The key there is "computational": these are CPU and GPU-heavy workloads. And they often operate on highly sensitive datasets, such as medical data. Public cloud companies don't want to lose this work, and if we're being perfectly honest about it, there aren't enough trained BDCA IT operations people to allow enterprises to bring these workloads in-house anyways.
As a result of the above, frank discussions about Spectre are politically fraught territory. This was made clear early on when CERT deleted their note for one of the two Spectre CVEs that said the solution is to replace CPU hardware. The original wording was: "Underlying vulnerability is caused by CPU architecture design choices. Fully removing the vulnerability requires replacing vulnerable CPU hardware."
Reality didn't change between the initial posting of that recommendation and its retraction. Fully fixing Spectre still requires replacing the CPUs. The thing is, there's nothing any of us can do about that right now, and CERT's original recommendation could lead to anxiety.
Anxiety about Spectre is considered by many to be a very bad thing. There is an argument to be made that anybody rocking the boat with anything other than "remain calm" messaging is a threat to US national security. The public clouds are not only a major economic consideration for them, but they are increasingly a strategic asset.
Having companies and governments around the world suddenly decide that they are going to pull their data off of US servers run by US companies that must obey US subpoenas is not something the US encourages.
Your own toys
There is no reason for doom and gloom, however, as public cloud providers already have the solution to this problem to hand. The rental of bare-metal systems under the control of a single organisation is the ultimate mitigation against Spectre.
Yes, the Spectre vulnerability is still there, buried underneath it all. But malicious actors can no longer simply rent time on the same physical box that your workload is executing on, and try to get at your encryption keys, passwords, and other goodies. The bad guys would have to break in the old-fashioned way: through your layers of firewalls, application security and other defence-in-depth measures.
In many ways, VMware on AWS may be just be the ultimate solution here. After all, it is dedicated hardware to just you. VMware on AWS isn't alone – Microsoft, for example, rolled their own version – and renting dedicated servers from service providers has been a thing for some time.
I find it deeply ironic that perhaps the greatest reason for organisations to seriously consider VMware on AWS isn't exactly something VMware can get out there and start loudly advertising. Don't expect to see any "use VMware on AWS because Spectre means that shared infrastructure is a bad plan for sensitive workloads and Intel is going to take a couple of years to get the world replacement chips" whitepapers. VMware can't really afford to pee in either Amazon's or Intel's Cheerios.
Other than Google, there may not be anyone who can.
Defence in depth
For most, the above is an abstract discussion. None of us can really do anything about Spectre other than patch. Public cloud adoption programs are large undertakings that move slowly, and they aren't easily slowed or halted. In the time it would take most organisations to bring their workloads back in-house, the hardware replacement will have taken place.
But Spectre may cause security-conscious organisations to delay implementation of new public cloud migrations. It may also cause discussions to move away from shared infrastructure and towards dedicated servers. If done on a large enough scale that changes cloud economics, and could even have a noticeable impact on global electricity consumption.
Defence in depth is ultimately the only real choice any of us have. Some vendors may do well here. HyTrust (formerly DataGravity) is one vendor I expect is going to get a second look from a lot of organisations. Their elevator pitch has always been about enforcing security-based policies across private and public clouds, and automating security just got a whole lot more important for everyone.
But all the firewalls, network microsegmentation, policy automation, Role Based Access Controls (RBAC), and so forth that we layer on top of our networks guarantees nothing. Our best bet is proper holistic IT, and some serious investment in automated incident response.
Part of defence in depth now requires that we pay careful attention to which workloads we place on shared infrastructure and which workloads we insist must operate on nodes only our organisation uses. We must now assume that everything is compromised. Even the CPUs upon which our workloads run. ®