Opinion We are teetering on the brink of a golden age of AI. It must be true, we keep being told so. This week's preacher is Samsung, which says it has integrated processors with memory to achieve stellar AI numbers: "Approximately twice the performance in AI-based recommendation applications and a 40 per cent decrease in system-wide energy usage."
There's no reason to doubt that. One of the downsides of High Bandwidth Memory (HBM) is that it takes a lot of power to push a lot of data through a very fast external bus, and a benefit of processor-in-memory architecture (PIM) is that the data can go from RAM to CPU and back again without needing much of a push. If Los Angeles were a London borough next to Islington, all those moody youths pitching their screenplays could stop worrying about their carbon footprints.
But why does Samsung specify AI? The physics of HBM-PIM doesn't care whether you're doing massively parallel image analysis using machine learning, rendering a superhero movie, or just churning through a million customer records looking for someone to sell a phone plan upgrade to.
AI/ML is the poster child here because it is uniquely good for benefiting from in-place processing, which is to say in-place processing isn't going to be that good for other things.
It has been a truism of high-performance computing since the 1960s that the real limiting factor is bandwidth, not computing power. Computers are useless without data, and while you can do all sorts of tricks to make a CPU small and fast, you're by definition pushing the edge of what you can do at a small scale. And your data lives at a larger scale, and will be slower to arrive.
PIM AI can hide that because it does useful work on big data sets that nonetheless can fit in today's big memory, but it's a thoroughbred application that needs tending and tuning. It can't be both racehorse and carthorse.
We know how this limits things because it's happened already. Let's look at GPUs, which neatly illustrate the tensions between markets and technology. Gaming drives the GPU market, which accounts for the bulk of their architectural features – massively parallel geometric computation, tightly bound to very fast memory, and an increasing array of machine learning tricks to guesstimate what something's going to look like without doing all the maths.
- Horizon Workrooms promises a virtual future of teal despair
- Please, no Moore: 'Law' that defined how chips have been made for decades has run itself into a cul-de-sac
- Windows 11 still doesn't understand our complex lives – and it hurts
- Hubble's cosmic science is mind-blowing, but its soul celebrates something surprising about us
There are other areas of computational tasks that fit quite well into that model, certainly better than in the general purpose von Neumann model of CPUs. And so GPUs find homes in the data centres for specific machine learning tasks, scientific supercomputers and – alas – cryptomills of the currency mines.
These areas have the common feature that they make sense even with, and only with, the high level of expertise needed to make them work well. There's no app store, no Steam, for GPU verticals; you expect to pay for the development to make each task work. Your expensive Nvidia room-heater doesn't make your computer faster in general, even though it has more teraflops than the world knew in 1985.
This is profoundly frustrating. Of all the resources in short supply in computing, extreme human cleverness is right up there with successful non-x86 Intel chips. Some very promising sectors, such as FPGA supercomputing, have really struggled because of it. You can, on paper, make a superb computer where you can completely reconfigure the hardware to implement your algorithms in native silicon; it's just that there aren't enough people who know how to make that happen. A mass market needs tools that regularly smart people can use to make better than regularly good products.
Even where GPUs are just doing graphics, that hasn't really happened. From architecture to architecture, even from model to model, differences in the hardware make it hard to maintain stacks that are high performance, reliable, efficient and portable.
Work done once often needs doing again. The idea was that the advance of low-level APIs would reduce the need for drivers to do all the heavy lifting, and developers could innovate without being beholden to the GPU makers getting their darn software right. That hasn't happened. It's still easier to make hardware with lots of potential than it is to write compilers that release it.
This is what could stymie efforts like AI-in-RAM. Many of these will have a good run in rich niche markets, but we won't be able to use the AI without proper general-purpose tools. Those tools won't happen without proper standards to give them a wide set of target platforms, and the possibility of long-lasting returns on investment across multiple generations of hardware. Nobody's talking about those standards. Every vendor is selling their own AI hardware, in GPUs, in memory chips, in mobile phone video processing, in this or that or the other accelerated niche.
We need that general-purpose AI market, because that future will be fantastic. Who doesn't want to have a graphics program that will respond to the command "draw me a picture of Alan Turing receiving a CBE from the Queen, who is wielding a sword studded with Xeon die for the occasion," and have it work no matter where the AI hardware lives?
Our actual cross-platform graphics software these days? Mutant offspring of MS Paint. That's a long way from the Golden Age of AI. Until the hardware makers put more effort into software, talking to each other, and making architectures with an eye for general, not best, cases, we won't get there any time soon. ®