IBM slips automatic tranny into Power7

Self-aware chippery


Hot Chips "Civilization advances by extending the number of important operations which we can perform without thinking about them," said mathematician Alfred Lord Whitehead, in his 1911 tome, An Introduction to Mathematics. And with its Autonomic Computing effort, IBM believes it's advancing civilization.

On Monday in Palo Alto, California, IBM gave attendees at this year's Hot Chips conference a deep dive into three of the latest developments in its nearly decade-long effort to create computers that dynamically self-optimize.

"As good engineers we always put guardband in just to cover our tails," Power7 EnergyScale architect Michael Floyd told his Hot Chips audience — somewhat less elegantly than Lord Whitehead, whose quote introduces Autonomic Computing on Big Blue's website. Floyd was introducing one of the three autonomic features in the Power7 processor's EnergyScale power-management system: the reduction of wasteful guardband.

By "guardband", Floyd was referring to the nececessary but inefficient practice of adding a smidgen of time into each clock cycle so that variabilities in, say, clock and data signals won't mess up signal-recognition timing.

Those variabilities can result from a broad range of unwelcome timing tweakers: voltage droop and power supply variabilities, thermal variabilities, processor aging, and more.

The problem with most guardband schemes is that they're static — they're built into the chip, and as such need to guard against worst-case scenarios. What IBM has added to EnergyScale for the Power7 — and which wasn't in EnergyScale for the Power6 — is a Critical Path Monitor (CPM) that performs what the company calls "circuit margin feedback" to monitor and adjust guardband in real time.

According to Floyd, the benefits of CPM-based dynamic guardbanding can be used to either boost performance or save energy. He claimed that IBM testing has shown that such a technique — all else being equal — can either allow a CPU to be overclocked by 7.3 per cent or have its power needs reduced by up to 15.8 per cent.

For you product-testing geeks out there, Floyd got these numbers from a 32-core IBM Power 750 Express Server with 64GB of memory, running SPECPower_ssj at 100 per cent load, and with the EnergyScale policy set at DPS-FP. On, and the ambient temp was 22°C.

A second autonomic feature in the Power7 EnergyScale scheme is low-activity detection (LAD), which drops processor frequency, thus saving power, when the processor has nothing better to do — for example, when running memory-bound workloads and waiting for data to arrive.

"As you guys know," Floyd said to the geek-filled crowd, "a lot of workloads are memory-bound, or at least certain points in time are memory-bound, and you don't always need the full processor at peak frequency during those times. The interesting thing that we found ... is that systems that appear to be 100 per cent utilized when using traditional metrics to measure what the system's doing, actually are 100 per cent idle."

"This may sound counterintuitive at first, but if you're running an idle loop, or if you're polling a place in memory, you may be really busy but technically you're idle — and you're not getting a whole lot of work done," he said. Not good — a waste of power.

So when the LAD detects such a condition, it instructs the digital PLL (DPLL) to drop the frequency — which it can do at 25MHz resolution at an up-or-down speed of 50MHz per microsecond, dropping frequency by up to 50 per cent (or, in other scenarios, raising it by up to 10 per cent). This same effect can be accessed by software using a technique that Floyd called Green Polling.

The third autonomic power-saving upgrade in Power7 EnergyScale is what Floyd called the processor-core Power Proxy — a way of finding out what a processor core's power consumption is without directly measuring it.

"We have eight processor cores running on this chip, and due to system constraints we can't put an external voltage regulator on each one of these processor cores — it's too prohibitive," said Floyd, explaining why all the cores share the same voltage point.

To be more exact, to use IBM lingo, it's not just the Power7's cores that share the same voltage point, but all its "chiplets" — that's Big Blue's term for a core, its associated L2 and L3 caches, and some connective tissue.

This lack of per-chiplet voltage regulators creates a problem for power management: "You can't make intelligent decisions because you don't know how much each of their eight processor-core chiplets is burning." Floyd pointed out. "You can see how much they're burning as a whole, but that doesn't help if you're trying to do power shifting or power trade-offs between the multiple cores."

To the rescue comes the Power Proxy scheme, a hardware-based system that samples activities in different areas of each chiplet — e.g., a cache read or write, an execution pipeline issue, or some such — then weights each activity to represent how much power it consumes, combines the weighted results from the chiplet's subsections, then sends the final stats off to the EnergyScale firmware.

That firmware, in turn, treats the Power Proxy inputs as if they were direct measurements of power rather than estimates based on chiplet subsection activity, and allocates power among cores — or even to other components such as memory — as needed.

These three new autonomic features are just the latest additions to IBM's EnergyScale architecture. A more-complete discussion of the power-saving features in EnergyScale's Power7 implementation, including its policy-based, customer-managed tunability, can be found in a 51-page white paper, here (PDF). ®

Similar topics

Narrower topics


Other stories you might like

  • US-APAC trade deal leaves out Taiwan, military defense not ruled out
    All fun and games until the chip factories are in the crosshairs

    US President Joe Biden has heralded an Indo-Pacific trade deal signed by several nations that do not include Taiwan. At the same time, Biden warned China that America would defend Taiwan from attack; it is home to a critical slice of the global chip industry, after all. 

    The agreement, known as the Indo-Pacific Economic Framework (IPEF), is still in its infancy, with today's announcement enabling the United States and the other 12 participating countries to begin negotiating "rules of the road that ensure [US businesses] can compete in the Indo-Pacific," the White House said. 

    Along with America, other IPEF signatories are Australia, Brunei, India, Indonesia, Japan, South Korea, Malaysia, New Zealand, the Philippines, Singapore, Thailand and Vietnam. Combined, the White House said, the 13 countries participating in the IPEF make up 40 percent of the global economy. 

    Continue reading
  • 381,000-plus Kubernetes API servers 'exposed to internet'
    Firewall isn't a made-up word from the Hackers movie, people

    A large number of servers running the Kubernetes API have been left exposed to the internet, which is not great: they're potentially vulnerable to abuse.

    Nonprofit security organization The Shadowserver Foundation recently scanned 454,729 systems hosting the popular open-source platform for managing and orchestrating containers, finding that more than 381,645 – or about 84 percent – are accessible via the internet to varying degrees thus providing a cracked door into a corporate network.

    "While this does not mean that these instances are fully open or vulnerable to an attack, it is likely that this level of access was not intended and these instances are an unnecessarily exposed attack surface," Shadowserver's team stressed in a write-up. "They also allow for information leakage on version and build."

    Continue reading
  • A peek into Gigabyte's GPU Arm for AI, HPC shops
    High-performance platform choices are going beyond the ubiquitous x86 standard

    Arm-based servers continue to gain momentum with Gigabyte Technology introducing a system based on Ampere's Altra processors paired with Nvidia A100 GPUs, aimed at demanding workloads such as AI training and high-performance compute (HPC) applications.

    The G492-PD0 runs either an Ampere Altra or Altra Max processor, the latter delivering 128 64-bit cores that are compatible with the Armv8.2 architecture.

    It supports 16 DDR4 DIMM slots, which would be enough space for up to 4TB of memory if all slots were filled with 256GB memory modules. The chassis also has space for no fewer than eight Nvidia A100 GPUs, which would make for a costly but very powerful system for those workloads that benefit from GPU acceleration.

    Continue reading

Biting the hand that feeds IT © 1998–2022