OpenAI's rapid growth loaded with 'corner case' challenges, says Fivetran CEO

GenAI poster child is a 100-story-tall baby with simple infrastructure but extreme demands

Interview When OpenAI launched GPT-4 in March last year, it was coy about the model's size and what went into making it. Nonetheless, the current focus of AI-obsessed media and investors is understood to have employed a diverse dataset of around 1 petabyte. Aside from the challenge of getting that data to provide meaningful output, the company was tasked with getting the data in the right place.

AI

OpenAI claims GPT-4 will beat 90% of you in an exam

READ MORE

Step forward Fivetran, an automated data integration outfit that isn't shy to talk about its partnership with OpenAI, the company that – for good or for ill – has come to symbolize the tidal wave of interest in GenAI.

Speaking to The Register, CEO George Fraser said OpenAI represented one end of the extreme of its customers, while long-established global businesses such as consumer goods giant Procter & Gamble represented another.

"You look at a company like OpenAI or other startups; they have infrastructure that looks like a small company infrastructure, except for scale. It's like a baby that's like 100 stories tall. You encounter unexpected and different problems," he said.

Fraser explained that companies like P&G will generally have a lot of data that is spread out across enterprise systems such as SAP, which, while complex, is known to the user.

"The company that's had large data volumes for a long time, like Procter & Gamble – you come in, there are challenges, but you tend to work them out in the proof-of-concept phase," he said.

However, a user that has grown with Fivetran, such as OpenAI, presents different challenges in terms of data integration, he said.

"The scale of data brings serious challenges, but not the ones that people think. People tend to think the problem of scale is spinning up lots of machines… lots of CPUs, and crunching numbers really hard, but that's not really it: that part is easy.

"The hard part is that you hit corner cases of the APIs that no one ever really thought about. You find you cannot pull an endpoint as frequently as you want. Or you have weird, like, n-squared behavior when you try to update data.

"It's more like problems with the design of all the other systems you have to work around. Whoever designed this system and designed the APIs didn't anticipate this extreme scenario or new problems appearing in these extreme scenarios. It's not like the sort of big iron number-crunching, supercomputer-type stuff that people want it to be."

In September, Fivetran announced it had surpassed $300 million in annual recurring revenue, up from $200 million in 2023, although these figures have not been audited according to the rules of public companies.

The company says its aim is to help organizations move data securely and efficiently, supporting GenAI, real-time decision-making, and optimized business operations. Recent wins included UK-based retail group Kingfisher, which owns the B&Q and Screwfix brands.

Fivetran remains VC-funded. Its most recent funding round was in 2021, when it announced a Series D round of $565 million, valuing the company at $5.6 billion. At the same time, it used some of its startup capital to buy HVR, a data pipeline company that specializes in replicating data from commonly used mission-critical databases.

Despite its popularity, Fivetran has attracted criticism for its slow support for data lakes, especially those using AWS S3 storage, which the company launched last year. It has since introduced a managed data lakes service.

It's not like the sort of big iron number-crunching, supercomputer-type stuff that people want it to be

It promised the new service would remove the repetitive work of managing data lakes by automating and streamlining the process for clients. The service currently supports Amazon S3, Azure Data Lake Storage (ADLS), and Microsoft OneLake, with support for Google Cloud on the horizon.

Fraser explained that support for data lakes required table formats – particularly Apache Iceberg – to become sufficiently mature before it could support data lakes.

"It also took time for us to develop a good implementation," he said. "The key thing we needed was Iceberg, and then there was a bunch of work that we had to do downstream of that. That took a long time. It took a couple of tries, and two years of development."

Despite the significant engineering investment, Fraser said Fivetran was not desperate to raise more capital. "We haven't raised money in years: we are a pretty mature business and our cash flows are pretty predictable. Like a lot of people, after COVID, we rediscovered how to be efficient. We operate basically on a break-even cash flow basis."

Nonetheless, he said the long-term plan was to take the company public, just like data lake and analytics company Databricks, which started talking about its long-delayed IPO about four years ago.

Fraser said: "We will go public but I'm not sure exactly, I joke it will be six months after Databricks." ®

More about

TIP US OFF

Send us news


Other stories you might like