This article is more than 1 year old

VCs warn: Pumping millions into an AI startup? You mean, pumping millions into Azure, AWS or Google Cloud...

And forget SaaS-y upstarts: These machine-learning darlings are more like traditional service outfits

Despite all the hype around artificial intelligence, trendy startups built upon the tech are said to have lower margins than funding-magnet software-as-a-service (SaaS) companies.

“Anecdotally, we have seen a surprisingly consistent pattern in the financial data of AI companies, with gross margins often in the 50-60 per cent range – well below the 60-80 per cent [and above] benchmark for comparable SaaS businesses,” said Martin Casado and Matt Bornstein, venture-capitalists at Silicon Valley's venerable Andreessen Horowitz partnership, this week.

And don't be fooled into thinking machine-learning-based upstarts follow the Saas-y build-once-run-for-everyone model: they are more like traditional service providers in that they will likely roll out customized deployments, the pair explained. “Maintaining them can feel, at times, more like a services business – requiring significant, customer-specific work and input costs beyond typical support and success functions.”

The sky-high costs of cloud compute time for machine learning and the painstaking human effort needed to clean up the data needed to train AI systems are also major financial sinkholes.

"We’ve had AI companies tell us that cloud operations can be more complex and costly than traditional approaches, particularly because there aren’t good tools to scale AI models globally," the duo said. "As a result, some AI companies have to routinely transfer trained models across cloud regions – racking up big ingress and egress costs – to improve reliability, latency, and compliance."

For more on this, just look at research, where the best state-of-the-art models are built. Top AI labs at Google, Facebook, DeepMind, Microsoft, and OpenAI often whip up hundreds of GPUs or TPU pods to crunch through heaps of data to train giant, complex neural networks that can play video games or generate text.

If you're not a hyperscaler, or close pals with one, eye-watering cloud bills can reach tens and hundreds of thousands of dollars, or even spill over to the millions of dollars – something most startups cannot afford.

Before these models can even be taught and deployed, however, a significant amount of labor needs to go into curating the training dataset. That data, whether it's a series of images, audio clips, or pages of text, needs to be labelled and cleaned, typically, by humans.

smart city concept drawing - self-driving cars, wifi hotspots etc - but no people

Please check your data: A self-driving car dataset failed to label hundreds of pedestrians, thousands of vehicles


For example, the video feeds from cameras mapping roads for self-driving cars need to be analysed. Bounding boxes need to be drawn around every road sign, pedestrian, cyclist, and other vehicles, to teach machines how to recognize objects. Voice-controlled assistants are frequently backed by small armies of human transcribers listening to snippets of private conversations algorithms failed to understand, feeding the correct wording back into neural networks to improve them. Don't forget: systems fed low-quality training data have poor performance.

The human labor required to clean the training data can be outsourced to third parties, but these services quickly rack up in costs, even if you pay people poorly, and run into privacy headaches. AI software has to be retrained regularly to adapt to the dynamic nature of data, and algorithms written and tweaked for specific customer or application workloads.

Even after the model – whether it's a robo-driver or a facial-recognition system – has been trained, running it can be expensive, too, if it's particularly complicated. The inference stage, let alone the training, may require non-trivial number crunching, requiring significant resources and specialized silicon in the cloud or at network edges, or compromises in embedded devices.

“Taken together, these forces contribute to the 25 per cent or more of revenue that AI companies often spend on cloud resources. In extreme cases, startups tackling particularly complex tasks have actually found manual data processing cheaper than executing a trained model,” the Andreessen Horowitz VCs said.

“To summarize: most AI systems today aren’t quite software, in the traditional sense. And AI businesses, as a result, don’t look exactly like software businesses. They involve ongoing human support and material variable costs. They often don’t scale quite as easily as we’d like. And strong defensibility – critical to the 'build once / sell many times' software model – doesn’t seem to come for free." ®


Similar topics


Send us news

Other stories you might like