For the average AI shop, sparse models and cheap memory will win

Massive language models aren't for everyone, but neither is heavy-duty hardware, says AI systems maker Graphcore

As compelling as the leading large-scale language models may be, the fact remains that only the largest companies have the resources to actually deploy and train them at meaningful scale.

For enterprises eager to leverage AI to a competitive advantage, a cheaper, pared-down alternative may be a better fit, especially if it can be tuned to particular industries or domains.

That’s where an emerging set of AI startups hoping to carve out a niche: by building sparse, tailored models that, maybe not as powerful as GPT-3, are good enough for enterprise use cases and run on hardware that ditches expensive high-bandwidth memory (HBM) for commodity DDR.

German AI startup Aleph Alpha is one such example. Founded in 2019, the Heidelberg, Germany-based company’s Luminous natural-language model boasts many of the same headline-grabbing features as OpenAI’s GPT-3: copywriting, classification, summarization, and translation, to name a few.

The model startup has teamed up with Graphcore to explore and develop sparse language models on the British chipmaker's hardware.

“Graphcore’s IPUs present an opportunity to evaluate the advanced technological approaches such as conditional sparsity,” Aleph Alpha CEO Jonas Andrulius said in a statement. “These architectures will undoubtedly play a role in Aleph Alpha’s future research.”

Graphcore’s big bet on sparsity

Conditionally sparse models — sometimes called mixture of experts or routed models — only process data against the applicable parameters, something that can significantly reduce the compute resources needed to run them.

For example, if a language model was trained in all the languages on the internet, and then is asked a question in Russian, it wouldn’t make sense to run that data through the entire model, only the parameters related to the Russian language, explained Graphcore CTO Simon Knowles, in an interview with The Register.

“It’s completely obvious. This is how your brain works, and it’s also how an AI ought to work,” he said. “I’ve said this many times, but if an AI can do many things, it doesn’t need to access all of its knowledge to do one thing.”

Knowles, who’s company builds accelerators tailored for these kinds of models, unsurprisingly believes they’re the future of AI. “I’d be surprised if, by next year, anyone is building dense-language models,” he added.

HBM-2 pricey? Cache in on DDR instead

Sparse language models aren’t without their challenges. One of the most pressing, according to Knowles, has to do with the memory. The HBM used in high-end GPUs to achieve the necessary bandwidth and capacities required by these models is expensive and attached to an even more expensive accelerator.

This isn’t an issue for dense-language models where you might need all of that compute and memory, but it poses a problem for sparse models, which favor memory over compute, he explained.

Interconnect tech, like Nvidia’s NVLink, can be used to pool memory across multiple GPUs, but if the model doesn’t require all that compute, the GPUs could be left sitting idle. “It’s a really expensive way to buy memory,” Knowles said.

Graphcore’s accelerators attempt to sidestep this challenge by borrowing a technique as old as computing itself: caching. Each IPU features a relatively large SRAM cache — 1GB — to satiate the bandwidth requirements of these models, while raw capacity is achieved using large pools of inexpensive DDR4 memory.

“The more SRAM you've got, the less DRAM bandwidth you need, and this is what allows us to not use HBM,” Knowles said.

By decoupling memory from the accelerator, it’s far less expensive — the cost of a few commodity DDR modules — for enterprises to support larger AI models.

In addition to supporting cheaper memory, Knowles claims the company’s IPUs also have an architectural advantage over GPUs, at least when it comes to sparse models.

Instead of running on a small number of large matrix multipliers — like you find in a tensor processing unit — Graphcore’s chips feature a large number of smaller matrix math units that can address the memory independently.

This provides greater granularity for sparse models, where “you need the freedom to fetch relevant subsets, and the smaller the unit you’re obliged to fetch, the more freedom you have,” he explained.

The verdict is still out

Put together, Knowles argues this approach enables its IPUs to train large AI/ML models with hundreds of billions or even trillions of parameters, at substantially lower cost compared to GPUs.

However, the enterprise AI market is still in its infancy, and Graphcore faces stiff competition in this space from larger, more established rivals.

So while development on ultra-sparse, cut-rate language models for AI are unlikely to abate anytime soon, it remains to be seen whether it’ll be Graphcore’s IPUs or someone else’s accelerator that ends up powering enterprise AI workloads. ®

Other stories you might like

  • Cerebras sets record for 'largest AI model' on a single chip
    Plus: Yandex releases 100-billion-parameter language model for free, and more

    In brief US hardware startup Cerebras claims to have trained the largest AI model on a single device powered by the world's largest Wafer Scale Engine 2 chip the size of a plate.

    "Using the Cerebras Software Platform (CSoft), our customers can easily train state-of-the-art GPT language models (such as GPT-3 and GPT-J) with up to 20 billion parameters on a single CS-2 system," the company claimed this week. "Running on a single CS-2, these models take minutes to set up and users can quickly move between models with just a few keystrokes."

    The CS-2 packs a whopping 850,000 cores, and has 40GB of on-chip memory capable of reaching 20 PB/sec memory bandwidth. The specs on other types of AI accelerators and GPUs pale in comparison, meaning machine learning engineers have to train huge AI models with billions of parameters across more servers.

    Continue reading
  • Is computer vision the cure for school shootings? Likely not
    Gun-detecting AI outfits want to help while root causes need tackling

    Comment More than 250 mass shootings have occurred in the US so far this year, and AI advocates think they have the solution. Not gun control, but better tech, unsurprisingly.

    Machine-learning biz Kogniz announced on Tuesday it was adding a ready-to-deploy gun detection model to its computer-vision platform. The system, we're told, can detect guns seen by security cameras and send notifications to those at risk, notifying police, locking down buildings, and performing other security tasks. 

    In addition to spotting firearms, Kogniz uses its other computer-vision modules to notice unusual behavior, such as children sprinting down hallways or someone climbing in through a window, which could indicate an active shooter.

    Continue reading
  • Microsoft promises to tighten access to AI it now deems too risky for some devs
    Deep-fake voices, face recognition, emotion, age and gender prediction ... A toolbox of theoretical tech tyranny

    Microsoft has pledged to clamp down on access to AI tools designed to predict emotions, gender, and age from images, and will restrict the usage of its facial recognition and generative audio models in Azure.

    The Windows giant made the promise on Tuesday while also sharing its so-called Responsible AI Standard, a document [PDF] in which the US corporation vowed to minimize any harm inflicted by its machine-learning software. This pledge included assurances that the biz will assess the impact of its technologies, document models' data and capabilities, and enforce stricter use guidelines.

    This is needed because – and let's just check the notes here – there are apparently not enough laws yet regulating machine-learning technology use. Thus, in the absence of this legislation, Microsoft will just have to force itself to do the right thing.

    Continue reading

Biting the hand that feeds IT © 1998–2022