Descartes Labs, an outfit that analyses big data, has managed to nab the 136th spot on the top 500 list of the world's fastest publicly known supercomputers – with $5,000 and an Amazon Web Services (AWS) account.
The AWS "supercomputer" has 41,472 cores, 157,824 GB RAM, and achieved 1,926.4 TFlop/s using the LINPACK benchmark, which solves a dense system of linear equations. That is a long way short of the 148,600 TFlop/s posted by the number one supercomputer, Summit at the Oak Ridge National Laboratory in the US, but remarkable for pay-as-go computing on the public cloud.
"Our team merely followed the standard steps to request a 'placement group', or high-network throughput instance block, which is sort of like reserving a mini-Oakridge inside the AWS infrastructure," Descartes Labs commented. "We were granted access to a group of nodes in the AWS US-East 1 region for approximately $5,000 charged to the company credit card."
The system ran Amazon Linux 2 on normal EC2 instances and was managed by Descartes Labs CTO Mike Warren, using HashiCorp Packer to build automated machine images and MPI (Message Passing Interface) for parallel computing.
AWS has featured in the Top500 before, its best performance being in 2013 when a submission from AWS itself achieved a rank of 64 with 484.2 TFlop/s. It could no doubt do better today especially as "we provisioned them before the instances were launched publicly, so no one else was on the fleet at the time," said Amazon's Deepak Singh on Twitter. However, Descartes Labs believes it has achieved the highest Top500 performance to date for a system on public cloud.
"We believe that true HPC applications will eventually migrate over to the cloud en masse," the biz said, though since the company specialises in a "data-refinery on a cloud-based supercomputer" such a declaration is hardly surprising. Cloud HPC does have many attractions, though, including relatively low cost of entry and the ability to burst capacity as needed.
Hey, US taxpayers. Filed your taxes? Good, good. $500m of it is going on an Intel-Cray exascale boffinry supercomputerREAD MORE
There is a cloud premium to pay, which is the margin AWS (in this instance) makes on the compute resources. You also have to factor the cost of getting your data into the cloud, which can be an issue.
Dr Paul Calleja, director of the University of Cambridge Research Computing Services, which operates Cumulus, 107 on the Top500, spoke to The Reg from the International Supercomputing Conference under way in Frankfurt. "The cost of running off-prem is significantly higher than the cost of running on-prem," he said. "With our cost models it's roughly 3x which is a big number when you are talking petascale."
Another issue is performance. "AWS may run LINPACK well, which is notoriously easy. I/O performance sucks on AWS. A more HPC-like infrastructure has RDMA (Remote Direct Memory Access) networks. So there's many issues with this. A big LINPACK number doesn't mean anything."
The Cambridge team has some bragging rights when it comes to I/O, occupying the world number 1 spot on the IO-500 list in the latest ranking.
Calleja is an advocate of hybrid cloud using OpenStack, an open source cloud computing API which his team is customising for this purpose. "It allows us to instantiate a on-prem cloud configuration within an off-prem commercial cloud. So you would use off-prem if you really needed to boost must larger than your in-house infrastructure, with time to solution as a commercial driver. If time to solution does not have a commercial driver you might as well just wait and run it on-prem three times cheaper." ®
Another approach to democratising HPC is this up-to-3,000 core Raspberry Pi system which you can set up without a data centre. Performance is not that great, but it costs a lot less to test your code on this before renting space on public cloud.
Juicy cores of Raspberry Pi in a box