IBM's zEnterprise 196 CPU: Cache is king

'The fastest CPU in the world.' And more

Analysis IBM is a funny technology company in that its top brass doesn't like to talk about feeds and speeds and seems to be allergic to hardware in particular. Which is particularly idiotic for a hardware company that sells servers, storage, and chips.

Thursday, in launching the new System zEnterprise 196 mainframe, IBM didn't say much about the feeds and speeds of the new quad core processor at the heart of the system. About the only tech talking point the company offered was that the new machine's processors ran at 5.2 GHz, making it "the fastest microprocessor in the world."

Well, yes, if you are looking at raw clock speed alone. But there is more to this z196 processor than fast clocks and more to any system than its cores.

The quad-core z196 processor bears some resemblance to the 4.4 GHz quad-core z10 processor it replaces in the System z lineup. The z196 processor is implemented in a 45 nanometer copper/silicon-on-insulator process (a shrink from the 65 nanometer processes used in the z10 chip), which means Big Blue could cram all kinds of things onto the chip, and it did just that. Much as it did with the eight-core Power7 chips announced in February.

The z196 processor has 1.4 billion transistors and weighs in with 512.3 square millimeters in real estate, making it a bit larger than the Power7 chip in both transistor count and area. The z196 chip uses IBM's land grid array packaging, which have golden bumps called C4 instead of pins. The z196 processor has a stunning 8,093 power bumps and 1,134 signal bumps.

Each core on the z196 chip has 64 KB of L1 instruction cache and 128 KB of L1 data cache, just like the z10. The cores are very similar, except that the z196 has 100 new instructions to play with and some tweaks to the superscalar pipeline allows for instructions to be reordered in ways that makes the pipeline more efficient than the z10 but in a way that is invisible to compiled code. Each core has 1.5 MB of its own L2 cache as well. Take a look at the chip below:

zEnterprise 196 Mainframe CPU

IBM's z196 mainframe processor

The z196 engine's superscalar pipeline can decode three z/Architecture CISC instructions per clock cycle and execute up to five operations per cycle. Each core has six execution units: two integer units, one floating point unit, two load/store units and one decimal (or money math) unit. IBM says that the floating point unit has a lot more oomph than the one used in the z10 chip, but did not say how many flops it could do per clock. Some of the prior z/Architecture CISC instructions have been busted into pieces, allowing for them to be spread across the pipeline more efficiently and making the z196 a bit more RISCy.

Like the Power7 chip, the z196 implements embedded DRAM (eDRAM) as L3 cache memory on the chip. Which this eDRAM memory is slower than static RAM (SRAM) normally used to implement cache memory, you can cram a lot of it onto a given area. For many workloads, having more memory closer to the chip is more important than having fast memory. The z196 processor has 24 MB of eDRAM L3 cache memory, which is split into two banks and managed by two on-chip L3 cache controllers.

Each z196 chip as a GX I/O bus controller - the same as is used on the Power family of chips to interface with host channel adapters and other peripherals - and a memory controller that interfaces with the RAID-protected DDR3 main memory allocated to each socket. Each z196 chip also has two cryptographic and compression co-processors, the third generation of such circuits to go into IBM's mainframes.

Two cores share one of these co-processors, which have 16 KB of their own cache memory. Finally, each z196 chip has an interface to a SMP Hub/shared cache chip. Two of these chips, which are shown below, are put onto each z196 multichip module (MCM), and they provide the cross-coupling that allows all six sockets on the MCM to be linked to each other with 40 GB/sec links.

IBM zEnterprise 196 L4 Cache Hub

The zEnterprise 196 SMP hub/shared cache

In the IBM mainframe lingo, the z196 processing engine is a CP, or central processor, while the interconnect chip for the CPs is called the SC, short for shared cache. Each SC has six CP interfaces to link to each of the CPs and three fabric interfaces to link out to the three other MCMs in a fully loaded z196 system.

What's neat about this SMP hub is that it is loaded to the gills with L4 cache memory, which most servers do not have. (IBM added some L4 cache to its EXA chipsets for Xeon processors from Intel a few years back). This L4 cache is necessary for one key reason, I think: the clock speed on the mainframe engine is a lot higher than main memory speeds, and only by adding another cache layer can the z196 engines, which are terribly expensive, be kept fed. Anyway, this SMP Hub/shared cache chip is made in the same 45 nanometer processes as the CPs, and weighs in at 1.5 billion transistors and 478.8 square millimeters of real estate. It has 8,919 bumps in its package, so to speak.

Six CPs and two SCs are implemented on each MCM, which is a square that is 96 millimeters on a side, which dissipates 1,800 watts. Each processor book has one of these MCM puppies, and a fully connected system has 96 CPs, a dozen memory controllers able to access up to 3 TB of RAID memory, and up to 32 I/O hub ports with a maximum of 288 GB/sec of I/O bandwidth. Up to 80 of the CPs in the top-end zEnterprise 196 M80 machine can be used to run workloads; others are used for coupling systems together using Parallel Sysplex clustering, managing I/O, hot spares, and such. ®

Similar topics

Narrower topics

Other stories you might like

  • IBM finally shutters Russian operations, lays off staff
    Axing workers under 40 must feel like a novel concept for Big Blue

    After freezing operations in Russia earlier this year, IBM has told employees it is ending all work in the country and has begun laying off staff. 

    A letter obtained by Reuters sent by IBM CEO Arvind Krishna to staff cites sanctions as one of the prime reasons for the decision to exit Russia. 

    "As the consequences of the war continue to mount and uncertainty about its long-term ramifications grows, we have now made the decision to carry out an orderly wind-down of IBM's business in Russia," Krishna said. 

    Continue reading
  • IBM AI boat to commemorate historic US Mayflower voyage finally lands… in Canada
    Nearly two years late and in the wrong country, we welcome our robot overlords

    IBM's self-sailing Mayflower Autonomous Ship (MAS) has finally crossed the Atlantic albeit more than a year and a half later than planned. Still, congratulations to the team.

    That said, MAS missed its target. Instead of arriving in Massachusetts – the US state home to Plymouth Rock where the 17th-century Mayflower landed – the latest in a long list of technical difficulties forced MAS to limp to Halifax in Nova Scotia, Canada. The 2,700-mile (4,400km) journey from Plymouth, UK, came to an end on Sunday.

    The 50ft (15m) trimaran is powered by solar energy, with diesel backup, and said to be able to reach a speed of 10 knots (18.5km/h or 11.5mph) using electric motors. This computer-controlled ship is steered by software that takes data in real time from six cameras and 50 sensors. This application was trained using IBM's PowerAI Vision technology and Power servers, we're told.

    Continue reading
  • IBM buys Randori to address multicloud security messes
    Big Blue joins the hot market for infosec investment

    RSA Conference IBM has expanded its extensive cybersecurity portfolio by acquiring Randori – a four-year-old startup that specializes in helping enterprises manage their attack surface by identifying and prioritizing their external-facing on-premises and cloud assets.

    Big Blue announced the Randori buy on the first day of the 2022 RSA Conference on Monday. Its plan is to give the computing behemoth's customers a tool to manage their security posture by looking at their infrastructure from a threat actor's point-of-view – a position IBM hopes will allow users to identify unseen weaknesses.

    IBM intends to integrate Randori's software with its QRadar extended detection and response (XDR) capabilities to provide real-time attack surface insights for tasks including threat hunting and incident response. That approach will reduce the quantity of manual work needed for monitoring new applications and to quickly address emerging threats, according to IBM.

    Continue reading

Biting the hand that feeds IT © 1998–2022