Pharma boffins sharpen hunt for target molecules using graph DB

French pharma firm Servier gets Neo4j to help find relationships in 'messy' data

French pharma firm Servier says its hit rate for finding target small molecules is up by an order of magnitude after it shifted the supporting data science to a graph database from Neo4j.

The drug company, part of the industry's middle tier with €4.9 billion in annual revenue, is working on moving its knowledge map from relational systems to the graph database as part of efforts to speed up drug discovery.

Thierry Dorval, head of data sciences and data management, explained that the goal was to create a library of small molecules – defined in chemistry as less than 1,000 atomic mass units and smaller than proteins and nucleic acids – based on interactions with the graph database. The link could be due to phenotypic similarity (related to genetic interactions) or transcriptome similarity (related to RNA transcripts), for example. Due to the structure based on nodes and edges, the graph database could take homogeneous information from a range of pre-existing data at Servier, and find these target molecules more rapidly than earlier approaches, he told us.

Jérémy Grignard, Servier data and research scientist, said that before using the graph database approach, a screening campaign might examine 1 million small molecules, randomly set up. This led to a hit rate — candidate molecules considered "active" in relation to the target — of less than 1 percent. But using the knowledge graph based on Neo4j, the project screened less than 1,000 small molecules and obtained a hit rate of more than 15 percent.

Dorval said the result aligned with KPIs used to justify the project in terms of time to market for drug development, a critical differentiator in the pharma industry.

"It brings value to the business by improving the traits during the screening campaign. But on top of that, the compound has been selected in a rational way, so when you get to it, you know why your hit has been selected. It brings information and knowledge to the project about what worked and what didn't work," he said.

The small molecules example is just one application built on Servier's knowledge graph.

Debate has raged about graph databases. While advocates claim they are a boon in helping understand the relationships between things, whether they are chemical compounds or social media accounts, critics argue the benefits graph systems seem to offer can be created in relational systems, which have a longer history – and are arguably more mature and easier to manage – than their graph counterparts.

Grignard said Servier built its knowledge graph using data pipelines from multiple relational databases already in the business.

"It's just a mess to handle in relational tables because today we are designing databases, and in one month, we can have new queries and new questions. To improve the data model design is just a mess using tables like relational databases, but by using a graph it's quite straightforward because you can just add one node or one property in a node without having any impact on the rest of the graph," he said.

Dorval said it was the flexibility and speed of the graph system which appealed to the business. "Plenty of applications do not need to move to the graph, but in our case, it was about flexibility and the sparseness of the data. It was about creating long paths along the relationship. Of course, you can do that with another approach but [the graph is] so powerful and fast that using it was a no brainer," he said.

Last month, Neo4j claimed it had increased analytical queries by up to 100 times in both transactional and analytical processing within one database with a new approach to parallel runtime and change data capture. ®

More about


Send us news

Other stories you might like