This article is more than 1 year old
AWS intros homebrew Graviton CPU tuned for HPC, network stack tuned to updated Nitro system
'Scalable Reliable Datagram' uses multi-path topography to smoke TCP
Amazon Web Services has introduced a CPU customized for high-performance computing, an updated Nitro system capable of handling more traffic, and a network protocol that can make both sing.
The CPU is called Graviton3E and has been optimized for floating point and vector math – common operations in the world of high-performance computing. AWS's senior veep for utility computing, Peter DeSantis, declined to detail the CPU's capabilities, but said it performs better than the vanilla Graviton 3 on benchmarks used to measure performance for life sciences and financial modelling workloads.
HPC workloads typically involve and move a lot of data, so AWS created the elastic fabric adapter (EFA) to ensure the data will flow. Accompanying EFA is Scalable Reliable Datagram (SRD) – a TCP alternative that DeSantis said will play a big part in the AWS cloud.
To explain why, he revealed that AWS internal networking relies on many custom switches and multipath routing. But the TCP used by most applications and networks prefers to use a single path – even if that path has a slow node that compromises performance.
Amazon's own SRD makes use of multipath routing and doesn't transmit packets in order, but can tidy things up when packets arrive out of order. DeSantis claimed it will retransmit dropped packets "in microseconds, not milliseconds" and speed up networks hosted on the AWS cloud.
It's also fast: DeSantis said it outperforms TCP within the Amazonian cloud because it is tuned to the Nitro hardware that AWS uses to isolate networking and storage chores from hosts. The veep said it improves tail latency in circumstances such as database writes so effectively that Amazon Elastic Block Storage io2 volumes will run SRD as standard to ensure users gain performance improvements.
AWS has also built a new version of its elastic network adapter (ENA) – the network driver offered with EC2 instances – called ENA Express that offers native SRD support.
SRD and ENA Express both offload work to Nitro cards. DeSantis said fifth generation Nitro cards have almost doubled compute capacity, boast 50 percent more DRAM bandwidth, twice the PCIe bandwidth, and support 60 percent higher packets per second with 30 percent latency reduction.
A forthcoming C7GN instance type teams the Graviton3 and fifth-gen Nitro to provide a network-optimized compute option. The Graviton3E and fifth-gen Nitro have been paired in an instance called HPC7G designed to offer HPC users another cloudy option.
- AWS joins the water positive gang, claims it will be there by 2030
- AWS fixes 'confused deputy' vulnerability in AppSync
- AWS gives older EC2 instances a legacy lifeline
- Euro clouds lodge another complaint against Microsoft over anti-competitive licenses
DeSantis also detailed work on EC2 instance types tuned to the needs of machine learning, then detailed improvements to AWS's Lambda serverless environment.
Lambda, he explained, is built on virtual machines because AWS thinks they offer the most appropriate and powerful isolation. Lambda functions therefore execute in a runtime that exists in a very small VM that occupies a "slot" on a host.
AWS tries to keep those VMs running for as long as possible because restarting a VM – which AWS calls a "cold start" – takes enough time that it can create sub-optimal user experiences.
The cloud giant has therefore developed SnapStart, tech which it explains takes a snapshot of a function and its runtime, then caches it.
"When the function is invoked and subsequently scales up, Lambda SnapStart resumes new execution environments from the cached snapshot instead of initializing them from scratch, significantly improving startup latency."
AWS boffins have posted a paper explaining the tech, which is already available at no cost in the Amazonian cloud.
The Register has asked AWS to provide details of the Graviton3E and Nitro 5, as we would quite like to know things like core count, clock speed, manufacturing process, and other details of the devices. We will update this story if we receive useful information. ®