FYI: Processor bugs are everywhere – just ask Intel and AMD

More chip flaws await


In 2015, Microsoft senior engineer Dan Luu forecast a bountiful harvest of chip bugs in the years ahead.

"We’ve seen at least two serious bugs in Intel CPUs in the last quarter, and it’s almost certain there are more bugs lurking," he wrote. "There was a time when a CPU family might only have one bug per year, with serious bugs happening once every few years, or even once a decade, but we’ve moved past that."

Thanks to growing chip complexity, compounded by hardware virtualization, and reduced design validation efforts, Luu argued, the incidence of hardware problems could be expected to increase.

This month's Meltdown and Spectre security flaws that affect chip designs from AMD, Arm, and Intel to varying degrees support that claim. But there are many other examples.

Last March, there was a bug affecting AMD's Ryzen chips that got patched with a workaround. And in June, AMD replaced some Ryzen 7 chips that weren't tuned to perform well under load.

That same summer, problems with hyperthreading surfaced in Intel's Skylake and Kaby Lake processors.

In February last year, clock problems with Intel's Atom C2000 chips surfaced, requiring widespread replacement.

Webpage slinger Cloudflare this month recounted a problem with Intel's Broadwell chips that it encountered last year.

In February 2017, while fixing a security issue the company dubbed Cloudbleed, Cloudflare engineers spotted a number of unexplained NGINX process crashes.

These segmentation faults (SIGSEGV) killed server processes intermittently but often enough to attract attention because the company runs so many servers.

The crashes produced core dumps and sifting through them requires some effort because they can be several gigabytes in size.

After ruling out memory errors, explains Cloudflare systems engineer David Wragg a blog post, those working on the issue noticed a common factor: the crashes were all occurring on Intel Xeon E5-2650 v4 servers.

Suspicions of a hardware problem were validated when engineers noticed an entry in Intel's errata for that processor model.

"The Specification Update described 85 issues, most of which are obscure issues of interest mainly to the developers of the BIOS and operating systems," said Wragg. "But one caught our eye: 'BDF76 An Intel Hyper-Threading Technology Enabled Processor May Exhibit Internal Parity Errors or Unpredictable System Behavior.'"

Intel fixed issue BDF76 through a microcode patch that Cloudflare delivered through a BIOS update from its server vendor. After the patch was applied, the number of unexplained core dumps dropped significantly.

Expect more hardware flaws to come. ®

Similar topics

Broader topics


Other stories you might like

  • Intel demands $625m in interest from Europe on overturned antitrust fine
    Chip giant still salty

    Having successfully appealed Europe's €1.06bn ($1.2bn) antitrust fine, Intel now wants €593m ($623.5m) in interest charges.

    In January, after years of contesting the fine, the x86 chip giant finally overturned the penalty, and was told it didn't have to pay up after all. The US tech titan isn't stopping there, however, and now says it is effectively seeking damages for being screwed around by Brussels.

    According to official documents [PDF] published on Monday, Intel has gone to the EU General Court for “payment of compensation and consequential interest for the damage sustained because of the European Commissions refusal to pay Intel default interest."

    Continue reading
  • Cloudflare explains how it managed to break the internet
    'Network engineers walked over each other's changes'

    A large chunk of the web (including your own Vulture Central) fell off the internet this morning as content delivery network Cloudflare suffered a self-inflicted outage.

    The incident began at 0627 UTC (2327 Pacific Time) and it took until 0742 UTC (0042 Pacific) before the company managed to bring all its datacenters back online and verify they were working correctly. During this time a variety of sites and services relying on Cloudflare went dark while engineers frantically worked to undo the damage they had wrought short hours previously.

    "The outage," explained Cloudflare, "was caused by a change that was part of a long-running project to increase resilience in our busiest locations."

    Continue reading
  • Intel delivers first discrete Arc desktop GPUs ... in China
    Why not just ship it in Narnia and call it a win?

    Updated Intel has said its first discrete Arc desktop GPUs will, as planned, go on sale this month. But only in China.

    The x86 giant's foray into discrete graphics processors has been difficult. Intel has baked 2D and 3D acceleration into its chipsets for years but watched as AMD and Nvidia swept the market with more powerful discrete GPU cards.

    Intel announced it would offer discrete GPUs of its own in 2018 and promised shipments would start in 2020. But it was not until 2021 that Intel launched the Arc brand for its GPU efforts and promised discrete graphics silicon for desktops and laptops would appear in Q1 2022.

    Continue reading
  • Linux Foundation thinks it can get you interested in smartNICs
    Step one: Make them easier to program

    The Linux Foundation wants to make data processing units (DPUs) easier to deploy, with the launch of the Open Programmable Infrastructure (OPI) project this week.

    The program has already garnered support from several leading chipmakers, systems builders, and software vendors – Nvidia, Intel, Marvell, F5, Keysight, Dell Tech, and Red Hat to name a few – and promises to build an open ecosystem of common software frameworks that can run on any DPU or smartNIC.

    SmartNICs, DPUs, IPUs – whatever you prefer to call them – have been used in cloud and hyperscale datacenters for years now. The devices typically feature onboard networking in a PCIe card form factor and are designed to offload and accelerate I/O-intensive processes and virtualization functions that would otherwise consume valuable host CPU resources.

    Continue reading
  • AMD to end Threadripper Pro 5000 drought for non-Lenovo PCs
    As the House of Zen kills off consumer-friendly non-Pro TR chips

    A drought of AMD's latest Threadripper workstation processors is finally coming to an end for PC makers who faced shortages earlier this year all while Hong Kong giant Lenovo enjoyed an exclusive supply of the chips.

    AMD announced on Monday it will expand availability of its Ryzen Threadripper Pro 5000 CPUs to "leading" system integrators in July and to DIY builders through retailers later this year. This announcement came nearly two weeks after Dell announced it would release a workstation with Threadripper Pro 5000 in the summer.

    The coming wave of Threadripper Pro 5000 workstations will mark an end to the exclusivity window Lenovo had with the high-performance chips since they launched in April.

    Continue reading
  • AMD bests Intel in cloud CPU performance study
    Overall price-performance in Big 3 hyperscalers a dead heat, says CockroachDB

    AMD's processors have come out on top in terms of cloud CPU performance across AWS, Microsoft Azure, and Google Cloud Platform, according to a recently published study.

    The multi-core x86-64 microprocessors Milan and Rome and beat Intel Cascade Lake and Ice Lake instances in tests of performance in the three most popular cloud providers, research from database company CockroachDB found.

    Using the CoreMark version 1.0 benchmark – which can be limited to run on a single vCPU or execute workloads on multiple vCPUs – the researchers showed AMD's Milan processors outperformed those of Intel in many cases, and at worst statistically tied with Intel's latest-gen Ice Lake processors across both the OLTP and CPU benchmarks.

    Continue reading
  • Microsoft fixes under-attack Windows zero-day Follina
    Plus: Intel, AMD react to Hertzbleed data-leaking holes in CPUs

    Patch Tuesday Microsoft claims to have finally fixed the Follina zero-day flaw in Windows as part of its June Patch Tuesday batch, which included security updates to address 55 vulnerabilities.

    Follina, eventually acknowledged by Redmond in a security advisory last month, is the most significant of the bunch as it has already been exploited in the wild.

    Criminals and snoops can abuse the remote code execution (RCE) bug, tracked as CVE-2022-30190, by crafting a file, such as a Word document, so that when opened it calls out to the Microsoft Windows Support Diagnostic Tool, which is then exploited to run malicious code, such spyware and ransomware. Disabling macros in, say, Word won't stop this from happening.

    Continue reading

Biting the hand that feeds IT © 1998–2022