This article is more than 1 year old
Why we will not have a unified HPC and AI software environment, ever
No good reason for vendors to play ball with each other
Register Debate Welcome to the latest Register Debate in which writers discuss technology topics, and you the reader choose the winning argument. The format is simple: we propose a motion, the arguments for the motion will run this Monday and Wednesday, and the arguments against on Tuesday and Thursday. During the week you can cast your vote on which side you support using the poll embedded below, choosing whether you're in favour or against the motion. The final score will be announced on Friday, revealing whether the for or against argument was most popular.
This week's motion is: A unified, agnostic software environment can be achieved. We debate the question: can the industry ever have a truly open, unified, agnostic software environment in HPC and AI that can span multiple kinds of compute engines?
Arguing AGAINST the motion today is Dan Olds, Chief Research Officer for HPC/AI industry analyst firm Intersect360 Research.
There is no way in hell this will happen. Why? Because this is a world of human beings who are working in the interests of themselves and their organizations. APIs are sources of competitive advantage for many companies and, as such, not something that those suppliers should want to completely standardize – particularly when that standard is being driven by the largest and most influential supplier in the industry.
This doesn’t mean that standards shouldn’t exist. But they should, and will, only exist in situations where competitive advantage won’t accrue to a single (or small group) of players. Self-serving standards won’t survive in the long run – they stifle innovation and shackle new products and competition by forcing new entrants to adhere to the rules established by legacy products/vendors. But this is a self-correcting mechanism in most cases, the old standards get overrun by the sheer brilliance of new ways of doing things and they fade away into obscurity.
We have seen plays like this before. The one that I’m most intimately familiar with was IBM’s gambit in the early 2000s, along with SCO and Sequent, to build a unified version of the Unix operating system, called Project Monterey. It was supposed to be the best Unix ever, with features and performance that would quickly make it the dominant flavor.
To give you a little context, I was employed by IBM at this time, and I started my tech career several years earlier with Sequent. I attended an internal IBM RS/6000 (as their Power systems were called back then) briefing in Austin and saw slide after slide about the new tech they were injecting into their proprietary AIX variant of Unix. As I was one of the guys who was going to be explaining this to customers, I kept asking the same questions over and over: “Why are you continuing to go full speed ahead with AIX when IBM is supposed to be driving the Project Monterey bus?” and, “Will these wondrous innovations be shared with the upcoming Project Monterey?”
I didn’t receive clear answers to my queries, telling me that even though IBM was the biggest supporter of a unified Unix, they were at best hedging their bets by pushing AIX development or, at worst, completely hypocritical and looking to use Monterey to roil the Unix market.
I’m not trying to toss shade at Intel for pushing OneAPI or AMD for pushing ROCm. They will soon both be suppliers of a large number of compute engines, ranging from CPUs to GPUs to FPGAs (and in Intel’s case, custom ASICs for AI). Each of these have, for the most part, unique APIs, which is a recipe for eventual drowning under the load of making sure that everything can work with everything else.
In short, Intel needs to have oneAPI at least for its own products in order to keep churning them out and keep the complexity down internally. It will also help them tell a better story to customers in that a common API will make it easy for developers to utilize full Intel product stacks. If I’m Intel, this is the right move. Ditto for AMD and ROCm.
To think that other vendors will embrace oneAPI or ROCm is naïve
But to think that other vendors will embrace oneAPI or ROCm is naïve. Imagine that you are an up-and-comer in the tech field. You have a new accelerator that outperforms everything out there. You tell your engineers that you have decided that, for business reasons, your fancy new accelerator needs to utilize oneAPI or ROCm in order to maintain compatibility with what will be the possibly only instruction set that will matter in the future.
As your engineers start yelling at you, you catch fragments of sentences coming through the noise: ”You can’t be serious, this means we have to completely re-engineer our code!” and, “It’s our instructions that make us so much faster!” and, “Without our custom instructions, we can’t get X or Y to work without serious performance hits.” The many, many expletives in those sentences have been removed.
Shaken, you leave the meeting with their cursing and insults echoing in your ears. Perhaps you should reconsider your decision before it’s too late. Maybe there is some competitive advantage to be had in doing your own thing and not adhering to standards that throttle the performance of your new accelerator.
The lesson of history is that companies do not compromise to be one of the crowd when it is not in their best interest. ®
Cast your vote below. We'll close the poll on Thursday night and publish the final result on Friday. You can track the debate's progress here.