IBM's z12 mainframe engine makes each clock count

All 5.5 billion of them – and then some

Hot Chips When you charge hundreds of thousands of dollars per core for an engine that is designed to run full-out all day doing online transaction processing and all night running single-threaded big batch jobs, you have no choice but to believe in higher clock speeds and doing anything and everything to boost that single-thread performance. And that is what IBM's new z12 mainframe engines are all about.

Big Blue announced the System zEnterprise EC12 mainframes smack dab in the middle of the Hot Chips chippery extravaganza in Cupertino, California last week, and did so several weeks in advance of the planned launch for the systems.

IBM had planned to speak about the z12 processors at Hot Chips, and Kevin Shum, senior technical staff member for System z processor development in IBM's Systems and Technology Group. Shum told El Reg that the paper he submitted for his presentation didn't even have a die shot or talk about many of the aspects of the zEC12 system because these were not ready for primetime when he submitted his materials months ago. We already told you a little about the z12 processors and a lot about the zEC12 systems, but we can now tell you a little bit more about the z12 processors and how they are different from prior generations of mainframe engines.

The first thing to consider is that unlike other processor architectures, considering the batch orientation of mainframe jobs, where you want to start a sequential piece of work and finish as soon as possible. IBM has used its ongoing chip fabrication process improvements to ramp up the clock speeds of its processors rather than plunk a lot of cores onto a single die. IBM has used its expertise in packaging, ceramics, and cooling to cram as many hot processing elements (these z12 chips reportedly run at about 300 watts) into a rack as possible. In this case, a full-on System zEnterprise EC12-HA1 system has four processor books with five sockets sporting six-core z12 engines in each socket, for a total of 120 raw compute engines. These engines are used for a variety of tasks inside the box, and up to 101 of them can be configured to run z/OS, z/VM, z/VSE, z/TPF, or Linux, act as zIIP or zAAP coprocessors to speed up DB2 or Java workloads, or support system I/O and clustering operations.

The top-end EC12 has z12 engines that are 25 per cent more powerful, at around 1,600 MIPS, than the z11 engines used in the zEnterprise 196 servers announced two years ago. The full-on system capacity is about 50 per cent greater, at around 75,000 MIPS.

Getting that 50 per cent boost in scalability is relatively easy. Getting that 25 per cent boost in single-thread performance is very, very hard, especially when you consider that the prior-generation of mainframe engines were already running at a very high 5.2GHz.

The jump in processor speed for IBM's mainframes has been quite dramatic in the past several years:

IBM has cranked the clocks on System z processors, and it has to

IBM has cranked the clocks on System z processors, and it has to

IBM was able to crank the clocks above 1GHz with the z6 engines in the System z990 servers back in 2003 by moving to a superscalar CISC pipeline. With the z10 engines in 2008, concurrent with a shrink to 64 nanometer processes IBM also added a much deeper pipeline, allowing the clocks to jump up to 4.4GHz and substantially improving single-threaded performance for the mainframe engines. Two years ago, IBM shifted to out-of-order execution on the z11 engine pipelines and did a shrink to 45 nanometers, pushing the clocks up to 5.2GHz. And according to Shum, a second-generation out-of-order execution stream plus the shrink to 32 nanometer processes is what is allowing IBM to get two more cores on the die while boosting the clock speed to 5.5GHz.

"The frequency runs all out, 24 by 7, because our customers run these machines all the time," bragged Shum. "And as a transactional engine, we are not just moving data from place to place, either."

The changes that IBM has made in the out-of-order execution scheme give the processor more out of order groups, allowing the pipeline to dispatch more grouped instructions and issue more micro-operations in a cycle than the prior z11 chip's pipeline. IBM has added an instruction queue to the decode/dispatch unit on the chip, and added a virtual branch unit for relative branch execution and a virtual branch queue for relative branch queuing. The instruction issue bandwidth was pumped up by 40 per cent to seven micro-ops per cycle. The improved branch prediction unit has two levels now, and has three times the capacity as the BRU in the z11 chip.

"When your pipeline is deep, branch prediction is extremely important. A lot of people brag about their branch prediction, but we have the Cadillac," boasted Shum.

Block diagram of the z12 core

Block diagram of the z12 core

Each z12 core has 64KB of L1 instruction cache and 96KB of L1 data cache. IBM did a few funky things with the cache inside the z12 engine to keep those 5.5GHz threads fed. First, it broke the L2 cache into instruction and data caches, mirroring what most chip etchers do with L1 caches. Most of the L2 caches out there in the world are unified, meaning they are used for both data and instructions.

IBM has 1MB of L1 data cache and 1MB of L2 instruction cache on each z12 core. Big Blue also embedded the L2 data directory inside of the L1 data cache, right next to its own directory, and logically indexed the L2 data cache just like the L1 cache. What this means is that when there is an miss in the L1 data cache, the core knows it and can look it up and see if it is in L2 data cache right then and there, reducing the L2 hit latency by 45 per cent. The core includes a global L2 cache controller and directory as well to keep things in lockstep.

Die schematic of the z12 mainframe chip

Die schematic of the z12 mainframe chip

Add it up, and the z12 core can issue seven instructions per clock, decode three instructions per clock, and complete three instructions per clock. The memory controllers on the die also support transactional memory, which we discussed elsewhere in the zEC12 system announcement last week.

The z12 processor is implemented in a 15-layer high-K metal gate process whipped up with IBM's copper/silicon-on-insulator technologies; it etches circuits at 32 nanometers in this generation. The z12 chip has 2.75 billion transistors, and includes an on-chip DDR3 memory controller (which also implements IBM's Redundant Array of Independent Memory (RAIM) parity and striping protection across memory chips. The zEC12 system supports 3TB of addressable memory, just like the z196.

The various coprocessors on the mainframe engine are now allocated to each core for them to use by their lonesome instead of being shared by the cores. (That's the "cop" in the die schematic above.) These include vector math units and encryption engines, among other things. The z12 chip has 48MB of embedded DRAM (eDRAM) L3 cache memory to feed the cores, which is twice the L3 cache the z11 processor had. There are two L3 cache controllers at the heart of the chip, as was the case with the previous z11 chips.

The off-chip SMP hub/shared L4 cache controller chip is still part of the architecture of the System z mainframe, but Shum did not talk about it during his presentation. With the zEnterprise 196 system, four z11 engines and two of these SMP hub chips, with a total of 192MB across two L4 controllers, were linked together to create a multi-chip module (MCM) that is welded onto each processor book.

We know the zEnterprise EC12 has five processor sockets in the book and that L4 cache memory has been doubled to a total of 384MB, but it is not clear if IBM just doubled up the L4 cache per SMP hub chip or doubled up the number of units on the book. The former seems likely, but the latter is possible. ®

Similar topics

Other stories you might like

  • IBM ordered to hand over ex-CEO emails plotting cuts in older workers
    Infamous 'Dinobabies' memo comes back to haunt Big Blue again

    Updated In one of the many ongoing age discrimination lawsuits against IBM, Big Blue has been ordered to produce internal emails in which former CEO Ginny Rometty and former SVP of Human Resources Diane Gherson discuss efforts to get rid of older employees.

    IBM as recently as February denied any "systemic age discrimination" ever occurred at the mainframe giant, despite the August 31, 2020 finding by the US Equal Employment Opportunity Commission (EEOC) that "top-down messaging from IBM’s highest ranks directing managers to engage in an aggressive approach to significantly reduce the headcount of older workers to make room for Early Professional Hires."

    The court's description of these emails between executives further contradicts IBM's assertions and supports claims of age discrimination raised by a 2018 report from ProPublica and Mother Jones, by other sources prior to that, and by numerous lawsuits.

    Continue reading
  • IBM ordered to pay $1.6b to BMC
    Big Blue's 'routine eschewal of rules' justifies large penalty, judge says

    IBM has been ordered to pay Houston-based IT firm BMC $1.6 billion for fraud and contract violations because it moved mutual client AT&T from BMC software to IBM software.

    On Monday, US District Judge Gray Miller issued his final judgment [PDF] in the case, which began five years ago and culminated in a bench trial in March.

    For years, IBM had serviced AT&T's mainframe computers which at least since 2007 have relied on BMC software. IBM and BMC in 2008 entered into a contract governing the business relationship between the two companies. And in 2015, the two IT outfits agreed several amendments including an Outsourcing Attachment (OA) that disallowed IBM from moving mutual clients over to its own software.

    Continue reading
  • AWS says it will cloudify your mainframe workloads
    Buyer beware, say analysts, technical debt will catch up with you eventually

    AWS is trying to help organizations migrate their mainframe-based workloads to the cloud and potentially transform them into modern cloud-native services.

    The Mainframe Modernization initiative was unveiled at the cloud giant's Re:Invent conference at the end of last year, where CEO Adam Selipsky claimed that "customers are trying to get off their mainframes as fast as they can."

    Whether this is based in reality or not, AWS concedes that such a migration will inevitably involve the customer going through a lengthy and complex process that requires multiple steps to discover, assess, test, and operate the new workload environments.

    Continue reading
  • IBM AI boat to commemorate historic US Mayflower voyage finally lands… in Canada
    Nearly two years late and in the wrong country, we welcome our robot overlords

    IBM's self-sailing Mayflower Autonomous Ship (MAS) has finally crossed the Atlantic albeit more than a year and a half later than planned. Still, congratulations to the team.

    That said, MAS missed its target. Instead of arriving in Massachusetts – the US state home to Plymouth Rock where the 17th-century Mayflower landed – the latest in a long list of technical difficulties forced MAS to limp to Halifax in Nova Scotia, Canada. The 2,700-mile (4,400km) journey from Plymouth, UK, came to an end on Sunday.

    The 50ft (15m) trimaran is powered by solar energy, with diesel backup, and said to be able to reach a speed of 10 knots (18.5km/h or 11.5mph) using electric motors. This computer-controlled ship is steered by software that takes data in real time from six cameras and 50 sensors. This application was trained using IBM's PowerAI Vision technology and Power servers, we're told.

    Continue reading
  • IBM buys Randori to address multicloud security messes
    Big Blue joins the hot market for infosec investment

    RSA Conference IBM has expanded its extensive cybersecurity portfolio by acquiring Randori – a four-year-old startup that specializes in helping enterprises manage their attack surface by identifying and prioritizing their external-facing on-premises and cloud assets.

    Big Blue announced the Randori buy on the first day of the 2022 RSA Conference on Monday. Its plan is to give the computing behemoth's customers a tool to manage their security posture by looking at their infrastructure from a threat actor's point-of-view – a position IBM hopes will allow users to identify unseen weaknesses.

    IBM intends to integrate Randori's software with its QRadar extended detection and response (XDR) capabilities to provide real-time attack surface insights for tasks including threat hunting and incident response. That approach will reduce the quantity of manual work needed for monitoring new applications and to quickly address emerging threats, according to IBM.

    Continue reading
  • Compute responsibly: Yet another IT industry sustainability drive
    From greener datacenters to data transparency and 'conscious code', IBM, Dell, others push for better IT ops

    IBM and Dell are the founding members of a new initiative to promote sustainable development in IT by providing a framework of responsible corporate policies for organizations to follow.

    Responsible Computing is described as a membership consortium for technology organizations that aims to get members to sign up to responsible values in key areas relating to infrastructure, code development, and social impact. The program is also operating under the oversight of the Object Management Group.

    According to Object Management Group CEO Bill Hoffman, also the CEO of Responsible Computing, the new initiative aims to "shift thinking and, ultimately behavior" within the IT industry and therefore "bring about real change", based around a manifesto that lays out six domains the program has identified for responsible computing.

    Continue reading
  • IBM ends funding for employee retirement clubs
    HR boss admits news may be 'disappointing' for the 'significant' population of former staff

    IBM has confirmed to former staff that it will no longer provide grants for the Retired Employee Club, meaning no more subsidized short trips to the Italian Riviera or golf days.

    The clubs are regionally split. In the UK, for example, there are 28 local organizations that have run short trips or national tournaments including corporate games or group runs.

    Joining a club was free for all Big Blue retirees with at least 10 years of service under their belt, regardless of pension age. For Local Clubs, members were asked to pay a small annual subscription.

    Continue reading

Biting the hand that feeds IT © 1998–2022