We'll know what we node, we'll grok what we've graphed: Neo4j nails graph machine learning to data science workbench

Can help build more efficient recommendation algos, amongst other stuff it hopes users want

Neo4j has added graph embeddings to its machine learning workbench in the hopes data scientists using its graph database will gain a productivity boost.

Version 1.4 of the Graph Data Science (GDS) workbench was rolled out today. It is similar to tools such as H2O.ai but specifically for graph databases, and now supports "embeddings" including node2Vec, FastRP and GraphSAGE to help data scientists build predictive models more rapidly.

Emil Eifrem, Neo4j founder and CEO, told The Register: "You could do that before, it was just a lot of manual work. In 1.4, we’ve just made it super simple, it has this native support for what's called 'graph embeddings'."

"Embeddings" is a concept used in natural language processing, where a word's relationship with other words in a corpus of text is expressed as a multi-dimensional vector. They are the basis of word2vec, the algorithm used in machine translation and predictive text. In graph databases, the embedding technique is applied to data nodes and the relationships between them, rather than words.

Eifrem said the feature would help data scientists create predictive models in recommendation engines as well as enable more rapid fraud detection.

He said Neo4j's objective was not to replace the most common relational databases with a graph database, but help users consider what else they could get from the data they have. "Think about the shape of the data, think about what you want to do with that, and then make a choice, and that choice, sometimes is one database right for that particular project but very frequently it is two or more databases for a particular project, but it will never just be one database for all your projects," he said.

Eifrem said he was not telling users it would be "unicorns and rainbows and ice cream" if they buy GDS in the expectation that it will change their entire toolchain. Instead, it could be used in addition to existing tools.

As well as the new "graph embeddings" algorithms, GDS ships with more than 50 commonly used graph algorithms pre-installed.

Other new features in Graph Data Science 1.4 include a tool to help data scientists create their own algorithms using the Pregel API used in Google Search' PageRank.

"It's the same API that you use locally when you develop against your own data, inside your enterprise," Eifrem said.

Carl Olofson, IDC research vice president, said the inclusion of embeddings in Neo4j's data science workbench was an "an interesting development."

"Although other graph database companies such as TigerGraph offer support for AI/ML, the process of setting up a machine learning process and generating usable graphics is still a tricky business.

"IDC sees AI/ML as a huge area of development for all manner of applications, and databases must play a significant role in that adoption. We are seeing increasing interest in using graph databases for this purpose, because of their unique ability to capture and represent dynamically any relationship structure found in the data," Olofson said.

But graph databases still lack an official, standard query language. GraphQL was the leading candidate, but Neo4j developed its own query language called Cypher, he said.

Another factor that holds back adoption was that "some people have a hard time wrapping their heads around the graph concept," Olofson added. ®

Biting the hand that feeds IT © 1998–2020