AI startup Lamini bets future on AMD's Instinct GPUs
Oh MI word: In the AI race, any accelerator beats none at all
Machine learning startup Lamini revealed its large language model (LLM) refining platform was running "exclusively" on The House of Zen's silicon.
Just out of stealth mode earlier this year, Lamini wants to help enterprises build and run generative AI products by fine-tuning existing foundation models — think OpenAI's GPT3 or Meta's Llama 2 — against their internal datasets.
If this sounds familiar, it's because you may have seen similar services from the likes of IBM with Watson-X. However, what sets Lamini apart is its choice of hardware. While most of the big AI clusters we've seen deployed by Google, Meta, Microsoft, and others are running on Nvidia A100s or H100s, Lamini has opted for AMD's Instinct GPUs exclusively.
Lamini claims its platform, which has attracted interest from Amazon, Walmart, eBay, GitLab, and Adobe, to name a few, has been running on "more than a 100 AMD GPUs in production all year" and could be scaled up to "thousands of MI GPUs."
AMD's Instinct MI250X GPUs are at the heart of some of the most powerful supercomputers in the world, including the chart-topping 1.1 exaflop Frontier supercomputer but the MIs haven't enjoyed the same fanfare as Nvidia's chips.
Moving forward, AMD hopes to bring the world around to its accelerator story. "This is our number one strategic priority, and we are engaging deeply across our customer set to bring joint solutions to the market," CEO Lisa Su said during a call with Wall Street analysts earlier this year.
During AMD's second quarter earnings call last month, Su boasted that the company had seen a seven-fold increase in AI customer engagements since its datacenter event in June. "It is clear that AI represents a multibillion-dollar growth opportunity for AMD," she opined. "In the datacenter alone, we expect the market for AI accelerators to reach over $150 billion by 2027."
This surge may simply come down to supply and demand. At least for Lamini, one of the major selling points behind AMD's hardware was that customers wouldn't be stuck waiting for GPUs to ship. "You can stop worrying about 52-week lead times for Nvidia H100s," the company quipped in a blog post.
AMD's ecosystem challenge
However, silicon, no matter how potent, won't get you very far without software to run on it. This is one of the challenges AMD president Victor Peng has been working on for the past year with the company's Unified AI Stack. The goal of this project is to develop a common software framework for running inference workloads across AMD's growing portfolio of AI hardware, which now includes CPUs, Instinct GPUs, and Xilinx FPGAs.
The chipmaker has also worked with PyTorch — a popular machine-learning framework — to upstream support for the ROCm software stack used by its Instinct GPUs. And, in June, the company solicited the help of Hugging Face to optimize open source AI models to run on its chips.
- FYI: Those fancy 'Google-designed' TPU AI chips had an awful lot of Broadcom help
- Intel CTO suggests using AI to port CUDA code to – surprise! – Intel chips
- Intel slaps forehead, says I got it: AI PCs. Sell them AI PCs
- Nvidia's 900 tons of GPU muscle bulks up server market, slims down wallets
The partnership with Lamini marks AMD's latest ecosystem play to make developing for its Instinct accelerators and ROCm runtime more accessible. The startup claims that using its software, AMD's ROCm runtime achieves software parity with Nvidia's CUDA, at least for large language models.
Developing a robust AI software ecosystem with an aim to challenge Nvidia is not just AMD's fight, of course. Last week, Intel highlighted the work it has done to drive adoption of the oneAPI and OpenVINO software frameworks used by its chips, and the company's CTO Greg Lavender even challenged developers to use AI to convert legacy CUDA code to run on its crossplatform SYCL runtime.
Faster hardware on the way
The Instinct MI200 accelerators used in Lamini's systems, which it calls LLM Superstations, were introduced in late 2021 and are good for between 181 and 383 TFLOPs of FP16 depending on the form factor. However, AMD customers won't have to wait long to get their hands on a far more powerful chip.
AMD's next-gen Instinct MI300-series accelerators are due out later this year, and promise 8x faster AI performance while also achieving 5x better performance per watt. Based on these claims, our sibling site The Next Platform estimates the chip will deliver somewhere in the neighborhood of 3 petaFLOPS of FP8 or 1.5 petaFLOPS of FP16 performance.
The first of these, dubbed the MI300A — "A" being for APU — pairs 24 Zen 4 cores with six CDNA 3 GPU dies and up to 128GBs of third-gen high-bandwidth memory (HBM3). The chip, which is already sampling to customers, is slated to power Lawrence Livermore National Laboratory's upcoming El Capitan supercomputer.
The GPU-only version of the chip, dubbed the MI300X, ditches the CPU cores in favor of two more GPU dies and boosts the supply of HBM3 to 192GBs — more than twice that of Nvidia's flagship H100. Much like previous Instinct accelerators, up to eight of these GPUs can be meshed together using AMD's "Infinity Architecture."
According to AMD, we can expect to see these chips start making their way to vendors later this quarter. ®