This article is more than 1 year old

How serverless is transforming graph databases

Amazon Neptune expands serverless to deliver instant workload scaling

Sponsored Feature Amazon has spent the past few years creating databases that enable its customers to process and analyze their data in different ways. One of these is Neptune, its cloud-native graph database service.

With recent additions to its query language support and the advent of serverless functionality for Neptune this month, Amazon hopes to push further into what it has identified as a rapidly-growing market. With serverless, users now have instant scalability while lowering costs compared to traditional provisioning.

First announced at Amazon's re:Invent conference in 2017, Neptune went into general availability in May 2018. It was not the first graph database, though. That honor goes to the first network model databases that emerged in the 1960s, and there have been many since. However, Neptune does several things differently than its predecessors, explains Brad Bebee, general manager of Neptune at Amazon Web Services (AWS).

"Neptune's entry as a fully-managed graph service was looked on as a very positive thing in the graph community," he says. "Now that our customer base includes some of the fastest growing SaaS-based software security services and numerous Fortune 500 companies, we knew we had to make Neptune even more enterprise-class, which is why we added serverless."

Graph databases have grown from lending themselves to specific use cases such as charting social relationships to supporting a new generation of security and customer-360 solutions. Research firm Gartner has predicted they will drive a trend that will shape data and analytics (D&A) for years to come. AWS wants to position itself as a leader in this space and help organizations expand their initial use of graph databases from the pilot to the production stage.

"Many customers tell us that they get started with a free or community edition graph database. It works great for them in a proof of concept project," says Bebee. "When they need to put something into production with high availability, they have to license the Enterprise Edition and provision additional hardware."

"With Amazon Neptune serverless, not only is there no need to provision, the administrative overhead is basically zero allowing users to focus on designing new graph applications vs. worrying about setting up the environment to run them," he adds.

Developer choice through multiple graph support

To that end, Neptune provides customers with a choice of models spanning two categories: property graphs and knowledge graphs. Property graphs use nodes representing different data entities that have relationships between them (known as edges). These nodes also have properties relating to the data entity. A person might have properties such as name and age. They are useful in closed-world, bounded knowledge spaces focusing on a single database, where the kinds of information being stored are well understood and constrained.

Conversely, knowledge graphs address more open data structures that can encompass any and all data that a company has to store. It anticipates that you want to bring together data sets from different sources. Its structure is also different, using triples instead of direct edge links between nodes. A triple comprises a subject (say, 'Publication'), along with a predicate ('Name-is), and an object ('Register'). Nodes each have global identifiers that can be used universally to combine different graphs.

Whereas data schemas are not part of a property graph, a knowledge graph does include a rich schema defining data relationships and the rules to govern them. They are based on the Resource Description Framework (RDF) originally introduced by the W3C to support the idea of the semantic web.

The semantic web concept articulated a network of knowledge in which data elements 'knew' what they were, thanks to a fabric of relationships that gave them context. Proponents labeled this as the original Web 3.0 at the time, but it never really made the jump from academic niche to general use.

Although we never saw semantic information redefine the whole web, knowledge graphs are still applicable in specific use cases. For example, the Yahoo Knowledge Graph runs globally on Neptune RDF graphs.

Property and knowledge graphs themselves are not mutually exclusive. Some companies are using both to address broad use cases. One example is Siemens, which uses Neptune for its Building Twins project.

The company wanted to marry different kinds of data, ranging from building models and construction schematics through to electrical and plumbing diagrams. "But then they also want different APIs that people can build against it that serve the property graph," Bebee says.

Expanding Neptune's query languages

Neptune supports more potential customers by offering both of these graph types, explains Bebee. To do so, it must also support the query languages that underpin them.

For RDF graphs, developers use SPARQL, a graph data query language that uses SELECT and WHERE clauses to find data.

On the property graph side, Neptune supports two query languages. Originally, it supported Gremlin, which is a graph traversal language from the Apache Software Foundation, developed for its TinkerPop open-source graph database computing framework.

Gremlin is an imperative language, more like a programming language than a query-focused one like SQL. It provides fine-grained control when manipulating graphs. "It's good for developers that want to write manipulation code," explains Bebee. "But it's hard for folks coming from a SQL background."

To make querying easier for users from a query language background, AWS announced general availability of openCypher support in April of this  year. This is a declarative language with select and matching statements, more like the SQL that many developers will be used to.

AWS chose to make openCypher and Gremlin operate over the same data model, explains Bebee. "So if you have a property graph, you can choose to use either Gremlin or openCypher, or you can use both at the same time," he says. "We believe that in the fullness of time customers will choose to use both, because there are cases where they want fine-grained control of their graph with Gremlin, and there's cases where they find that it's faster and easier to use something like openCypher."

This offers a degree of developer choice that Bebee hopes will help Neptune to meet a wider audience. AWS is also focusing heavily on the managed graph database's deployment and operations model to make it a more attractive proposition for 'graph-curious' users.

Cheaper and simpler to manage

The managed aspect already offers several benefits, he points out, including a lower total cost of ownership to more traditional on-premises or virtual lift-and-shift models. One factor is the elimination of the license fees and hardware purchases that can stop a pilot project from taking the next step.

A managed service is elastic, scrapping license fees in favor of usage charges. Traditionally, this happens via virtual machine charges, enabling customers to scale usage on demand. It has helped customers like NBC Universal, which migrated its content catalog and customer interactions to Neptune. This application has a volatile load, because events like Presidential debates, The Voice, and America's Got Talent can send usage soaring temporarily.

Security is another use case with unpredictable loads. Security users often use graphs for investigating security issues, meaning that they must scale their compute power over time. This is a growth area for Neptune, says Bebee, who has seen more customers emerging in this space over the last six months. Graphs are just as useful for modeling application security posture as they are at detecting fraud, he explains.

Neptune has also featured autoscaling capabilities that supports rules to add up to 15 low latency read replicas for faster response times. That helps to reduce cost but it still leaves customers to define their server size.

Along with I/O, storage (in Gb per month) and backup charges beyond the one free backup per cluster, customers also pay for instance hours in five-minute intervals.

Serverless adds scalability and flexibility

AWS took another step forward in scalability and cost management by introducing serverless functionality for Neptune in October 2022. The company has been aggressively growing its serverless portfolio and Neptune is now the fourth database service to get an upgrade. Neptune serverless is an on-demand option that automatically adjusts database capacity based on an application's needs.  "With the new serverless option, graph database workloads can instantly scale to hundreds of thousands of queries", says Bebee.

Instead of a fixed hourly cost, the serverless capability charges for the use of Neptune Capacity Units (NCUs). These units, which incorporate CPU usage, RAM, and network bandwidth, are finer-grained, enabling users to pay only for what they use.

Customers continue to pay for storage, IO, and backup in a serverless model. However, these make up around a fifth of the average bill, with the remainder lying in instance usage, Bebee says. Servers will scale up the NCUs that they use concurrently to accommodate volatile demand, but customers only pay for the NCUs they use. This eliminates the need to pay for idle virtual machine compute cycles while freeing customers from having to manage database capacity.

"We believe customers using that pay-for-consumption model can save up to 90 percent over their provisioning cost surplus, so their savings can be very significant," says Bebee.

Adding more attractions for Neptune users

To further Neptune's reach, AWS has also been making more of its existing capabilities available for the system. In July, the company extended its global database capability to support the graph database. This enables users to automatically keep an up-to-date copy of a Neptune cluster in another region. Read replicas already provide low-latency read scalability in single regions, but this feature extends low-latency reads across different regions. It also provides disaster recovery from region-wide outages.

AWS has also extended its machine learning capability to support Neptune with Neptune ML. This uses graph neural networks, a technique for applying machine learning to graphs and making predictions about their data. Use cases for that range from predicting fraud through to understanding the quality of a graph and what aspects of it might need enhancing. Amazon uses it internally to make product recommendations and spot malicious accounts, which Bebee says has saved the company tens of millions of dollars.

Graph databases are growing in popularity, says Bebee, and he wants potential users to know that AWS provides a serverless option that can scale to any size of graph workload (a free trial of Neptune can be found here). As AWS increasingly integrates the graph database with its other services, we could see more companies connecting the dots.

Sponsored by AWS.

More about

More about

More about

TIP US OFF

Send us news