Nvidia's subscription software empire is taking shape
$4,500 per GPU per year adds up pretty quick – even faster when you pay by the hour
Comment In the wake of the AI boom, Nvidia has seen its revenues skyrocket to the point at which it briefly became the most valuable corporation in the world.
That growth was overwhelmingly driven by demand for its datacenter GPUs to train and run the ever-growing catalog of better, smarter, and bigger AI models. But as much as investors would like to believe CEO Jensen Huang's graphics processor empire will continue to grow, doubling quarter after quarter, nothing lasts forever.
As The Next Platform's Timothy Prickett Morgan predicted on last week's episode of The Register’s Kettle podcast, Nvidia's revenues will one day plateau.
If Nvidia's future revolved solely around selling GPUs and nothing else, that might be a big deal. But as Huang frequently reminds folks, Nvidia is every bit as much a software business as a hardware one.
Enabling new markets
From early on, Nvidia recognized the value of software to drive the adoption of GPUs. During a fireside chat with journalist Lauren Goode at SIGGRAPH last week, Huang drove home this point.
"Every time we introduce a domain specific library, it exposes accelerated computing to a new market," he explained. "It's not just about building the accelerator, you have to build the whole stack."
The first release of Nvidia's Compute Unified Device Architecture – better known now as CUDA – came in 2007 and provided an API interface for parallelizing non-graphics workloads across GPUs. While this still required developers and researchers to refactor code, the improvements over general-purpose processors were hard to ignore.
This was especially true for those in the HPC community – one of the first markets Nvidia pursued outside its old territories of gaming and professional graphics. In late 2012, Nvidia software investments helped to put the Oak Ridge National Laboratory's Titan supercomputer in the number one spot on the Top500.
Seventeen years after its initial release, CUDA is just one of an ever-growing list of compute frameworks tailored to specific markets – ranging from deep learning to computational lithography and quantum computing emulation.
Those frameworks helped Nvidia to create markets for its accelerators where little to none previously existed.
Going beyond enablement
Software is Nvidia's not-so-secret weapon, but until recently that weapon has taken the form of enablement. Over the past two years we've seen the accelerator champ's software strategy embrace a subscription pricing model in a meaningful way.
In early 2022, months before OpenAI's ChatGPT set off the AI gold rush, Nvidia CFO Collete Kress detailed the GPU giant's subscription fuelled roadmap – which, she opined, would eventually drive a trillion dollars in revenues.
At the time, Kress predicted $150 billion of that opportunity would be driven by Nvidia's AI Enterprise software suite. Even now that it's posting $26 billion quarters, the business is still well short of that trillion-dollar goal – but we are starting to get a better picture of how it may grow.
From a software standpoint, much of the work on AI enablement has already been done. Nvidia has poured enormous resources into developing tools like cuDNN, TensorRT LLM, and Triton Inference Service to get the most out of its hardware when running AI models.
However these are just pieces of a puzzle that must be carefully assembled and tuned to extract that performance, and tuning is going to be different for each model. It takes a level of familiarity with the model, software, and underlying hardware that enterprises are unlikely to have.
Building an AI easy button
At its GTC event last northern spring Nvidia revealed a new offering designed to lower the barrier to adopting and deploying generative AI at scale. That technology – called Nvidia Inference Microservices, or NIMs for short – essentially consists of containerized models and tools which ship with everything you need to run them preconfigured.
NIM containers can be deployed across just about any runtime that supports Nvidia's GPUs. That might not sound that exciting – but it's kind of the point. Container orchestration isn't exactly an easy problem to solve – just ask the Kubernetes devs. So why reinvent the wheel, when you can make use of existing tools and services in which customers are already invested?
The real value of NIMs seems to come from Nvidia engineers tuning things like TensorRT LLM or Triton Inference Server for specific models or use cases, like retrieval augmented generation (RAG). If you're not familiar, you can find our hands-on guide on RAG here, but the takeaway is that Nvidia is playing system integrator not only with its hardware, but with its software as well.
NIMs are not just clever packaging. By working toward a common API for how models and tools should communicate with one another, Nvidia can provide customers with templates designed to address specific use cases.
Nvidia's pricing ladder
A lower barrier to adoption and deployment of AI inferencing has upsides for both software licensing and hardware sales. On the software side of things, the AI Enterprise license necessary to deploy NIMs in production will set you back $4,500 per GPU per year, or $1 per GPU per hour.
So to deploy Meta's Llama 3.1 405B model with NIMs you'd not only need to rent or buy a system with 8x H100s or H200s – the minimum necessary to run the model without resorting to more aggressive levels of quantization – but you'd also be looking at $36,000/yr or $8/hr in licensing fees.
Assuming a useful lifespan of six years, that works out to between $180,000 to $420,480 in license revenues – per system – depending on whether you pay up front or by the hour. And realistically, enterprises looking to deploy AI are going to need more than one system for both redundancy and scale.
That price delta might make committing to an annual license seem like an obvious choice. But remember that we're talking about microservices that, if implemented properly, should be able to scale up or down depending on demand.
But, let's say Llama 3.1 405B is a little overkill for your needs and running a smaller model – a far less costly L40S or even L4S – might suffice. Nvidia's pricing structure is set up in a way that it drives customers toward more powerful and capable accelerators.
The AI Enterprise license costs the same regardless of whether you're running eight L40Ss or eight H200s. This creates a scenario where it may well be more economical to buy or rent fewer high-end GPUs and run the model at higher batch sizes or queues – since your license fees will be lower over the lifetime of the deployment.
And with single A100 and H100 instances becoming more common – Oracle Cloud Infrastructure, for example, announced availability last week – this is something that enterprises may want to take into consideration when evaluating the total cost of such a deployment.
- Nvidia reportedly delays Blackwell GPUs until 2025 over packaging issues
- DoJ launches probes as AI antitrust storm clouds gather round Nvidia
- Intel to shed at least 15% of staff, will outsource more to TSMC, slash $10B in costs
- AI boom is reshaping the face of cloud infrastructure
A blueprint for competition
Assuming NIMs see widespread adoption, they could quickly become a major growth driver for Nvidia.
A little back of the napkin math tells us that if NIMs helped Nvidia attach an AI Enterprise to each of the two million some Hopper GPUs it's expected to ship in 2024, it'd be looking at another $9 to $17.5 billion in annual subscription revenues. Realistically, that's not going to happen – but even if it can realize a fraction of that we're still talking about billions of dollars in annual revenue.
That's not to say NIMs are without challenges. Compared to AI training, inferencing isn't particularly picky. There are several model runners that support inferencing across Nvidia, AMD, and even general-purpose CPUs. NIMs, by comparison, only run on Nvidia hardware – which could prove limiting for customers looking to leverage container orchestration systems like Kubernetes to deploy and serve their models at scale.
This probably won't be a big issue while Nvidia still controls the lion's share of the AI infrastructure market, but will no doubt be a big red flag for customers wary of vendor lock-in.
It might also grab the attention not only of shareholders, but also the Department of Justice. The DoJ is already said to be building an antitrust case against the GPU giant.
That said, if you just want to make models easier to deploy across various cloud and on-prem infrastructure, there's really nothing stopping anyone from creating their own NIM-equivalents, tuned to their preferred hardware or software of choice. In fact, it's surprising that more developers haven't done something like this already. We can easily imagine AMD and Intel bringing similar services to market – potentially even undercutting Nvidia by offering them at no cost.
Ultimately, the success of Nvidia's NIMs may depend on just how much more efficient or performant their tuning is, and how much easier they are to stitch together. ®