AWS is fed up with tech that wasn’t built for clouds because it has a big 'blast radius' when things go awry
Which is why it's built its own UPS, from the firmware up. And also why Graviton ignores Intel's and AMD's best tricks
Amazon Web Services is tired of tech that wasn’t purpose built for clouds and hopes that the stuff it’s now building from scratch will be more appealing to you, too.
That’s The Register’s takeaway from today’s “Infrastructure Keynote” at the cloud giant’s elongated re:invent conference, which featured veep for global infrastructure leadership Peter DeSantis revealing a little about how AWS keeps itself running.
Among the nuggets he revealed was that AWS has designed its own uninterruptible power supplies (UPS) and that there’s now one in each of its racks. AWS decided on that approach because the UPS systems it needed were so big they required a dedicated room to handle the sheer quantity of lead-acid batteries required to keep its kit alive. The need to maintain that facility created more risk and made for a larger “blast radius” - the extent of an incident's impact - in the event of failure or disaster.
Amazon’s cloudy Macs cost $25.99 a day. 77 days of usage would buy you your own MacREAD MORE
AWS is all about small blast radii, DeSantis explained, and in the past the company therefore wrote its own UPS firmware for third-party products.
“Software you don’t own in your infrastructure is a risk,” DeSantis said, outlining a scenario in which notifying a vendor of a firmware problem in a device commences a process of attempting to replicate the issue, followed by developing a fix and then deployment.
“It can take a year to fix an issue,” he said. And that’s many months too slow for AWS given a bug can mean downtime for customers.
This approach has also seen AWS design its own software to manage switchgear, the devices that cut over from mains power to UPS in the event of an outage.
Amazon’s home-grown Arm processor, the Graviton2, was developed for similar reasons.
Software you don’t own in your infrastructure is a risk
DeSantis said the reason that commercial UPSes and switchgear don’t meet its needs is that they’re designed for the many scenarios in which they’ll be put to work, rather than Amazon’s requirements. The same logic goes into developing CPUs, he said, arguing that the likes of Intel and AMD design products that will sell well by making them general-purpose devices.
The result is processors that pack in features to make them suitable for more tasks. When raw power was needed, multi-core CPUs were the answer. When utilisation rates of CPUS became an issue, simultaneous multithreading came along. None of that tech ever left mainstream CPUs, DeSantis argued, and the result is architectures ripe for side-channel attacks and which deliver variable performance. DeSantis said he thinks the HPC crowd turn off SMT to avoid the latter issue.
AWS would rather processors designed for the cloud. Hence its investment in Graviton, the many-core architecture and extra-large caches as they allow better per-core performance without the need for other trickery. The architecture is designed from the ground up for microservices, which AWS sees as the dominant wave of software development.
“Graviton 2 delivers 2.5-3 times better performance/watt than any other CPU in our cloud,” DeSantis said.
In conversation with The Register he added that such performance is only possible thanks to AWS’ Nitro silicon, to which the cloud colossus offloads virtualisation and networking chores.
DeSantis declined to tell The Register what’s inside a Nitro device but did say the company is now using a fourth-generation device and that it is not correct to characterise them as SmartNICs.
“SmartNIC is a subset of its functionality,” he said. “It is very specialised hardware for us, really deeply formed for AWS. DeSantis allowed that there are “some similarities, logically, but it is more specialised.”
And then in his keynote he showed one of the devices and said it connected to AWS’ new Mac instances over Thunderbolt.
Much of DeSantis’ talk was dedicated to AWS’ green credentials – it has just ordered a stack more renewable energy – and not-so-subtle digs at the language cloud rivals use when describing the physical separation of availability zones. AWS, he said, is perfectly clear that its data centres are a disaster-proof distance from one another, but less than a millisecond of latency apart.
That’s a configuration that he said delivers what cloud apps need: enough distance to be safe, but not so network-challenged that stateless apps will struggle.
DeSantis also talked up AWS’ newly-revealed plans for a second Australian region. ®