This article is more than 1 year old

Want to train a dragon? You'll need 500 million files, 730TB of data, 54,000 CPU cores...

DreamWorks picks Gremlin to weave digital marvels

DataStax Accelerate Family favourite DreamWorks Animation has built a cloud platform powered by microservices that uses a graph database and Gremlin query language to guide the production of its films.

This digital pipeline handles everything from early pre-visualisation to the final render and will be employed in its next feature. This setup has been tested at length – the software engineers created an entire fake show to make sure it can do the job. The architecture of the platform was discussed for the first time at the DataStax Accelerate conference in America's capital this week.

3D animation has come a long way in the past two decades – the increasing complexity of images has pushed Hollywood to look for faster software and hardware. It may not be the first thing that comes to mind when you think about Madagascar or Kung Fu Panda, but these are technological, as much as they are artistic, achievements.

"The workflow is massive," Doug Sherman, principal engineer at DreamWorks, told the audience. "There's just tons and tons of process, tons of files. So how do we capture all of that information?"

Here there be dragons

The team was happy to furnish an example: the first Shrek film (2001) had a single dragon, involved 4.5 million files and 7TB of data, and required 2,000 CPUs.

DreamWorks' latest How To Train Your Dragon instalment, released in February, featured 60,000 different dragons, involved 500 billion files and 730TB of data, and required 54,000 CPU cores to render it – all used at a time when the studio was releasing three films per year.

"They take years to produce," Sherman said. "Seven to ten movies are being produced in any given year, so you have to multiply all of those numbers by ten.

"A lot of television animation, a lot of studios that don't quite spend what DreamWorks spends, will shy away from more complicated stories because there's a lot of tech involved in telling those stories. We just go for it – insane or not."

DataStax Accelerate – CEO Billy Bosworth

DataStax has stars in its eyes over Constellation, its latest tweak on Apache Cassandra


DreamWorks needed a way to track and manage the entire process of creating a film, linking different stages of production together. The studio chose to do this through DataStax Enterprise Graph, which is based on two open-source projects: Apache Cassandra, a NoSQL database originally developed at Facebook, and Apache TinkerPop, a graph computing framework supported by multiple database vendors.

Graph databases are normally used to identify and analyse relationships between datapoints; at DreamWorks, the datapoints are the CGI assets like 3D models, lighting and rigs – the virtual bones that move the models. "This is the stuff that trades hands from one department to the next until the completion of the movie," Sherman said.

"The whole reason we are talking about graph is because our script team, our simple way of doing these things, our brute-force methods were never going to scale – and we wanted to do more and more complexity. So we looked to microservices.

"Because those designs play very well to cloud, we are positioned much better than we have been in the past to handle the times when India decides they want to help us on some things, and the China studio wants to collaborate – the software which wasn't compatible with this way of working is now compatible, because it is containerized and distributed."

Yum yum

One of the most important elements of DreamWorks' technology stack is Gremlin – a relatively new graph traversal language that sits at the core of TinkerPop and essentially serves the same purpose that SQL does in relational databases.

"Gremlin has a steep, steep learning curve – it's very complicated," Sean Fennell, senior software engineer at DreamWorks, said. "With that complicated learning curve comes power, but it took a long time to learn the right ways to access data, right ways to write data, and we're still working through a lot of these things.

"Gremlin can do a lot more of the processing for us than any other query languages are able to do," he added.

The DreamWorks team had no previous experience in graph databases, having historically relied on relational DBs, but it was brave enough to try – after hearing about how graph was applied at Netflix.

"When we were building a proof of concept, we really wanted to work directly with production, we wanted to build something that was real – we didn't want to make a tinker toy in a box and hand it over to them," Fennell said. "We built a mock show. And in doing that we proved that yes, you can use graph for this purpose."

"This is the heart of the machine – literally our entire process would be in this graph database. And it is – we're about to go live with this on our next feature," Sherman added. ®

More about

More about

More about


Send us news

Other stories you might like