Imagination touts next series of GPU cores aimed at cloud server acceleration ...and cars, mobile, IoT

If no one else can help, and if you can find them, maybe you can hire... The B-Team

1 Reg comments Got Tips?

Imagination will today launch its latest line of graphics processor cores, including one we're told is destined for cloud hardware.

The new family, dubbed the B-series, follows December's A-series launch. Don't let the naming fool you: while some folks might think of it along the lines of an A and B team, the B-series is supposed to be superior to the A-series rather than a fallback.

The latest additions to Imaginations extended PowerVR family are: the top-end BXT, the mid-range BXM, the entry-level BXE, and the safety-aware BXS. The cores are designed to be clocked between 1 and 1.1GHz at the mobile end, and 1.5GHz at the cloud server end, and are designed with 7nm, 5nm and 3nm process nodes in mind through its partnership with TSMC.

The cores can be split over multiple chiplets within a single processor package. For example, a collection of BXT cores can be placed on a chiplet die, and two of these dies can be placed inside a single package, and work together as if all the cores were on the same contiguous piece of silicon. The chiplet approach is used by AMD and lately Intel; one advantage is that it helps keep manufacturing yield up by reducing the complexity and size of each individual die. Making massive dies that each feature all the cores on one piece of silicon is difficult, and prone to faults and other failures.

As a fablesss semiconductor business, UK-based and China-owned Imagination draws up blueprints for others to license and place into chips as they see fit. So these B-series GPU cores aren't components you can just order online; it's the tech that might power the next gadget you use, be it an Android handheld or smart TV, etc. Let's step through these new designs.


Last time we spoke to Imagination about its A-series GPUs, it was rather focused on the automotive industry, offering technology for dashboards, infotainment units, and so on. And then out of nowhere, perhaps, Innosilicon licensed the B-series's BXT designs for GPU-powered PCIe 4 cards to be fitted into servers, we're told. This kit "will power 5G cloud gaming and data centre applications," Innosilicon said in a canned statement on Tuesday. By that we assume Innosilicon will produce expansion cards featuring system-on-chips that contain BXT GPU cores, which will, it is hoped, be used to accelerate the math in AI workloads and similar software, and play games in the cloud, streaming the video over cellular internet to people's phones.

These 4K and 8K-capable cards may also be touted for desktop machines; the server-bound gear at least is "set to hit the market very soon," according to Roger Mao, Innosilicon's veep of engineering.

The BXT can scale to four clusters, with each cluster containing four GPU cores, and it is aimed at all devices from mobile to cloud servers. A quad-cluster, four-GPU-cores-per-cluster BXT 32-1024 MC4 can, according to Imagination, process as many as 6 TFLOPS using FP32 (that's six trillion 32-bit floating-point math operations a second) and 24 TOPS using INT8 (24 trillion 8-bit operations a second), and output 192 gigapixels per second.

BXT four-cluster diagram

The BXT four-cluster, four-cores-per-cluster design ... USC is a unified shader core, TPU is a texture processing unit, and FW is the firmware processor. All illustrations – Source: Imagination. Click to enlarge

Last year's top-end single-cluster, four-core AXT 64-2048 was said to reach 2 TFLOPS FP32, 8 TOPS INT8, and render 64 gigapixels per second. Imagination appears to have tripled its top-end GPU performance, from the AXT 64-2048 to the BXT 32-1024 MC4, by quadrupling the core count and increasing the clock frequency from last year's 1GHz.

Given that the 64-2048 designation for the AXT means its four-core cluster can do 64 texels and 2,048 FP32 FLOPs per clock cycle, the BXT's 32-1024 designation suggests its four-core cluster has half the performance in comparison, per clock cycle, but with four clusters and a bump in clock speed, you get your triple performance. Given there's been less than a year between the A-series and B-series launches, this level of progress isn't too surprising.

It's worth pointing out Imagination reckons its B-series cores use less die space and power than the A-series, and customers can choose the clustering approach they want to suit their application – not everyone has a need for speed, and power and die size can be just as important if not more.

The BXT clusters are said to have a lightweight interface between themselves, needing just an interrupt line to announce they've completed their share of the work: you can assign each cluster a particular region of the screen to render, with each cluster working independently to fill the screen in parallel.

Each cluster features a firmware processor (FW in Imagination-speak) that is programmed by Imagination and coordinates the rendering pipeline, power usage, timings, and so on. The BXT uses an in-house custom CPU architecture for its FW.


The BXE is aimed at providing graphical user interfaces on embedded and entry-level devices, such as the Internet-of-Things, 8K displays and smart TVs, industrial equipment, and so on. It too can be arranged in up to four clusters, with just one FW CPU total – a MIPS implementation Imagination still has access to, as it once owned the architecture before selling it off. A four-cluster, one-GPU-core-per-cluster BXE 4-32 can manage up to 128 GFLOPS and 16 gigapixels per second, it is said.

BXE four-core diagram

A four-cluster, one-core-per-cluster BXE arrangement

The BXE, like other B-series cores, supports Imagination's lossless and lossy image compression format IMGIC, which is used between the output of the GPU cores and the display driver hardware, reducing the bandwidth required to send frames to the screen.


The BXM is the mid-range offering, aimed at mobile gaming and stuff that needs a more demanding user interface. A four-cluster, one-core-per-cluster BXM 4-64 MC4 can achieve up to 256 GFLOPS and 16 gigapixels per second, we're told.

BXM four-core diagram

A four-cluster, one-core-per-cluster BXM configuration


Last but not least, the BXS is aimed at automotive and other systems in which safety is a concern. It is ISO 26262-capable, Imagination said, and in a multi-core set up, one cluster can check another's output to see if it is as expected and react to any deviation caused by hardware faults. The BXS uses an in-house-designed RISC-V CPU core as its FW, and Imagination is said to be considering using more RISC-V-compatible CPU cores in its future GPU clusters.

BXS four-core diagram

A four-cluster, one-core-per-cluster BXS arrangement

Finally, the BXS offers something called tile region protection, in which specific, safety-critical areas of a display – such as a speedometer on a dashboard – are rendered with extra integrity checks to ensure there is no corruption nor broken graphics. For instance, it can produce a CRC code for critical data moving through the GPU system, which is checked to determine whether information has been damaged in transit by a hardware fault.

Illustration of Imagination's protected tiles

An Imagination illustration showing the protected tile design ... the highlighted squares are selected important regions of the display that undergo extra integrity checks to ensure data isn't corrupt on its way to the screen

A single cluster BXS 4-64 can reach 64 GFLOPS and four gigapixels per second, it is said.

By the time you read this, more details should be online over here. ®


Biting the hand that feeds IT © 1998–2020