Amazon cloud accused of network slowdown

Users cry latency spike


Amazon's sky-high EC2 service has experienced a significant increase in network latency in recent days, according to data from two separate companies running widely-used management tools in tandem with the service.

Cloudkick - one of the many outfits that offer a service for overseeing the use of Amazon EC2 and other so-called compute clouds - first noticed an Amazon latency spike around Christmas time, and the problem has grown steadily over the past few weeks.

"We have very concrete evidence that Amazon is having latency issues on its network," Cloudkick's Alex Polvi tells The Reg. "If you look at data across all of our customers, latency was completely smooth until about Christmas time and then things start going nuts."

Amazon EC2 - short for Elastic Compute Cloud - provides on-demand access to scalable compute resources via the net. Similar services are available from the likes of Rackspace, Slicehost, and GoGrid.

Cloudkick's data - posted to the company's blog here - covers "several hundred" EC2 server instances across a wide range of customers. Normally, Polvi says, the average ping latency is around 50 milliseconds. But in recent weeks, it has climbed as high as 1000 milliseconds.

Cloudkick on Amazon EC2

Cloudkick maps Amazon EC2 latency

"Amazon has a great track record in performance and reliability, so this is why we are so surprised by this data," reads Cloudkick's blog post on the matter.

Cloudkick's numbers are limited to Amazon's "US-East" availability zone. EC2 serves up processing power from two separate geographic locations - the US and Europe - and each geographic region is split into multiple zones designed never to vanish at the same time.

enStratus, an outfit similar to Cloudkick, confirms the latency increase, but it says the spike is significantly smaller. Response time from the company's network into "all regions" of the Amazon cloud increased by 10 per cent on January 9, enStratus CTO George Reese tells The Reg, and it has remained roughly that high ever since. Reese's sample size is around 300 server instances.

Cloudkick and enStratus released their data in the wake of a blog post from Alan Williamson, co-head of the UK-based cloud consultancy AW2.0, who asked whether Amazon was experiencing capacity issues after one of his customers experienced a serious slowdown beginning at the end of last year. "We began noticing [the problem] around the end of November," Williamson tells The Reg. "We had been running with Amazon for approximately 20 months with absolutely no problems whatsoever. We could throw almost anything at them and it wouldn't even hiccup."

Echoing what Cloudkick and enStratrus have seen, Williamson says he eventually traced the problem back to network latency. On the application in question, the average time needed to turn around a web request jumped from about 2 to 3 milliseconds to about between 50 and 100 milliseconds.

Responding to an inquiry about the post from Data Center Knowledge, Amazon said that their infrastructure does not have capacity issues. And this afternoon, the company sent a similar statement to The Reg.

"We do not have over-capacity issues. When customers report a problem they are having, we take it very seriously," a company spokeswoman said. "Sometimes this means working with customers to tweak their configurations or it could mean making modifications in our services to assure maximum performance."

When we specifically asked about latency problems, the company did not respond.

Since posting to his blog, Williamson has discussed the issue with Amazon. But it is still unresolved. Williamson says that problem abates if he upgrades to more expensive instances. The least expensive EC2 instance offers 1.7GB memory, one virtual core, 160GB of storage, a 32-bit platform, and "moderate" I/O performance. More expensive instances offer greater resources.

According to Williamson, Amazon also says that more expensive instances reside in a different part of its infrastructure. Cloudkick's data represents all instance sizes, but Alex Polvi says that the it likely represents more small instances than large, because that's what people use more. Smaller instances are not only cheaper. They're the default.

This could mean that more expensive instances don't have the latency problem, says enStratus's George Reese, but it may simply mean that the problem can be masked with greater resources. He also points out that if you're using many small-server instances as opposed to a few big ones, latency will be more of an issue.

Cloudkick's Alex Polvi speculates that the issue could be down to the use of different hardware. Smaller instances, for instance, may use different networking gear.

Thorsten von Eicken, CTO of a third cloud-management firm, RightScale, has not noticed a latency spike and points out that ICMP pings may be a poor judge of latency because they receive low priority. "I can't confirm the issues reported as we have not seen these problems ourselves," he tells The Reg. "We tend to use larger instances, as do most of our customers, so we may not see these issues as much."

George Reese stresses that he's not seeing problems with processing or moving data. He's only seeing problems with latency. This means that customers will only notice the problem on certain applications. His customers are not seeing major problems, he says, but that could be because most of their applications are transactional enterprise tools, not the sort of thing that requires relatively low latency. ®

Broader topics


Other stories you might like

  • Alibaba Cloud challenges AWS with its own custom smartNIC
    Who'll board the custom silicon bandwagon next?

    Alibaba Cloud offered a peek at its latest homegrown silicon at its annual summit this week, which it calls Cloud Infrastructure Processing Units (CIPU).

    The data processing units (DPUs), which we're told have already been deployed in a “handful” of the Chinese giant’s datacenters, offload virtualization functions associated with storage, networking, and security from the host CPU cores onto dedicated hardware.

    “The rapid increase in data volume and scale, together with higher demand for lower latency, call for the creation of new tech infrastructure,” Alibaba Cloud Intelligence President Jeff Zhang said in a release.

    Continue reading
  • ZTE intros 'cloud laptop' that draws just five watts of power
    The catch: It hooks up to desktop-as-a-service and runs Android – so while it looks like a laptop ...

    Chinese telecom equipment maker ZTE has announced what it claims is the first "cloud laptop" – an Android-powered device that the consumes just five watts and links to its cloud desktop-as-a-service.

    Announced this week at the partially state-owned company's 2022 Cloud Network Ecosystem Summit, the machine – model W600D – measures 325mm × 215mm × 14 mm, weighs 1.1kg and includes a 14-inch HD display, full-size keyboard, HD camera, and Bluetooth and Wi-Fi connectivity. An unspecified eight-core processors drives it, and a 40.42 watt-hour battery is claimed to last for eight hours.

    It seems the primary purpose of this thing is to access a cloud-hosted remote desktop in which you do all or most of your work. ZTE claimed its home-grown RAP protocol ensures these remote desktops will be usable even on connections of a mere 128Kbit/sec, or with latency of 300ms and packet loss of six percent. That's quite a brag.

    Continue reading
  • Oracle shrinks on-prem cloud offering in both size and cost
    Now we can squeeze required boxes into a smaller datacenter footprint, says Big Red

    Oracle has slimmed down its on-prem fully managed cloud offer to a smaller datacenter footprint for a sixth of the budget.

    Snappily dubbed OCI Dedicated Region Cloud@Customer, the service was launched in 2020 and promised to run a private cloud inside a customer's datacenter, or one run by a third party. Paid for "as-a-service," the concept promised customers the flexibility of moving workloads seamlessly between the on-prem system and Oracle's public cloud for a $6 million annual fee and a minimum commitment of three years.

    Big Red has now slashed the fee for a scaled-down version of its on-prem cloud to $1 million a year for a minimum period of four years.

    Continue reading
  • Mega's unbreakable encryption proves to be anything but
    Boffins devise five attacks to expose private files

    Mega, the New Zealand-based file-sharing biz co-founded a decade ago by Kim Dotcom, promotes its "privacy by design" and user-controlled encryption keys to claim that data stored on Mega's servers can only be accessed by customers, even if its main system is taken over by law enforcement or others.

    The design of the service, however, falls short of that promise thanks to poorly implemented encryption. Cryptography experts at ETH Zurich in Switzerland on Tuesday published a paper describing five possible attacks that can compromise the confidentiality of users' files.

    The paper [PDF], titled "Mega: Malleable Encryption Goes Awry," by ETH cryptography researchers Matilda Backendal and Miro Haller, and computer science professor Kenneth Paterson, identifies "significant shortcomings in Mega’s cryptographic architecture" that allow Mega, or those able to mount a TLS MITM attack on Mega's client software, to access user files.

    Continue reading

Biting the hand that feeds IT © 1998–2022