Updated Car manufacturer Jaguar Land Rover rejected leading graph database Neo4j over scalability concerns, according to its head of data and analytics.
A customer of rival database TigerGraph for a little over two and a half years, the £23bn turnover automotive manufacturer first applied the concept of graph analytics to its supply chain challenges, an area where use cases have been able to demonstrate their economic value, Harry Powell told The Register.
"When we started looking at graphs, using the more mainstream Neo4j, we very quickly found that with our kind of graph, which is very highly linked and very heterogeneous, most graphs wouldn't scale.
When a whole bunch of tow bars don't turn up at the factory, and most of the cars you make need tow bars, you can't just stop the factory
"You have exponential complexity: with the number of nodes you have and the number of steps you have, complexity goes up exponentially, really fast. We were very clear that we needed to find a graph database that partitioned across a distributed network."
Although it is the most popular graph database on the market, according to DB-Engines, Neo4j is not a distributed system capable of running across multiple nodes. Graph databases are designed to model and analyse networks of relationships, rather than put figures in rows and columns like a relational database.
Powell said Neo4j was easy to get hold of and play around with. "For small models, it's great. For models that are relatively homogenous, with not too many links, it does fine. What we found was as soon as we move beyond [a] point model, we just hit a wall."
The Register contacted Neo4J for comment. It took the opportunity to point out that Neo4j had been "battle-tested for performance and scale," while preserving ACID compliance and data integrity. Deployments are live in 2,000 startups, 800 enterprises, and 75 per cent of the Fortune 500, it said.
Tara Jana, senior product marketing director, said Neo4j's core algorithm is Raft, a distributed algorithm for maintaining a consistent log across multiple shared-nothing servers. "Neo4j processes graphs extremely efficiently. We have dozens of customers running with many billions of nodes/relationships, with the ability to take on 500+ million updates per day: the ability to process demanding workloads trumps architectural preferences," he said.
The supply chain gang: When parts go missing
But what works for some organisations was not working for Jaguar Land Rover. Chip shortages, COVID-19 disruption, and the greening of our economic infrastructure all have one thing in common for the Tata-owned automotive firm: they are headaches for supply chains, especially in the automotive sector.
The recent semiconductor supply problems had an impact too, temporarily shutting down two of the company's UK plants in April.
Initially, the distributed TigerGraph database was applied to understand how changes in supply or the availability of parts might affect production. "We have many parts, and many suppliers of parts, and we need to understand how they link with features of the car, like the hi-fi system or a panoramic roof, which might have parts [that] are shared," said Powell. "Then we look at orders and manufacturing. The question is, really, how do you deal with change?
"If you have a very static system that you're just trying to optimise you're just trying to squeeze that lemon: that's a relational database's kind of work. But it's not about dealing with change.
"When a whole bunch of tow bars don't turn up at the factory, and most of the cars you make need tow bars, you can't just stop the factory."
The challenge was then looking for options among the cars that could be made with the components already on-stream, Powell said. "That's a really hard problem if you're solving it with an Excel spreadsheet and a team of people. You might need three weeks, by which time you may have lost your opportunity."
Such an approach has come into play with the global chip shortage, Powell said.
"One of the things it's been able to do is to identify common tier-one suppliers, where we don't know much about the tier-two. So we can we can actually identify where to put our effort into working out where our exposure is."
The approach will become all the more important as the industry strives to meet demands to demonstrate an ethical and environmentally sustainable supply chain while at the same time making the transition to electric vehicles. UK government wants to outlaw the sale of all petrol and diesel cars from 2030 and Jaguar Land Rover will move to all-electric production by 2025.
As well as talking about struggling with Neo4j in industrial graph applications, Powell also disparaged the idea of building graph capabilities on top of existing relational databases, as Oracle and PostgreSQL have done.
"That hasn't worked because the whole point of graph is your pre-encoding your joins," he said. "In graph, every step you do is essentially a join. The [relational databases] that I've seen have a kind of graph presentation layer on top but underneath it is still using the same old engine and that makes it slow in comparison."
Updated at 11.02 BST on 19 May 2021 to add:
Neo4j has asked The Register to point out that while the free community edition of its graph database does not run across multiple nodes and is not distributed, the paid-for enterprise edition was specifically architected for enterprise-scale deployment, so runs across multiple nodes and is distributed. Neo4j was approached during the writing of this article and chose not to clarify this point. While DB-Engines said at the time of writing that Neo4j was not distributed, Neo4j has since requested the website update its listing. ®