Vector database Pinecone promises to bring ML data management under control with 2.0 release
Hybrid disk and RAM system should slash costs, firm says
Pinecone has upgraded its vector database, aiming at enterprises that are looking to boost productivity in machine learning projects.
Built by the team behind Amazon Sagemaker, Pinecone is designed to allow machine learning engineers to search through catalogues of embeddings, the continuous vector representations of separate variables fundamental to common ML algorithms such as word2vec.
With its 2.0 iteration, the company promises storage of metadata – such as a topic, author, and category – with each item, allowing users to filter vector searches by these criteria in a single stage.
Edo Liberty, founder and CEO of Pinecone, said that while relational databases use SQL to organise and query data, and text documents require an index, machine learning models relate to meanings or sentiment represented in multidimensional vectors.
"You don't care about the specific words; you care about meanings and sentiment. You need to do that with AI, and the way that it's done is not returned in an inverted index, it's done with vector representation of objects which is how deep learning models represent text."
The former director of research and engineering at Yahoo and AWS added: "Companies have those representations of the data and have metadata associated with it so you know whether the sentence came from a specific document, its specific time, and spoken by a specific person and so on. All this is indexed by Pinecone, made available for search and slice and dice."
- Cloud is fundamentally more profitable than on-prem, says Oracle's Safra Catz as revenue misses mark for investors
- When ERP migrations go bad: Games Workshop says project issues are delaying refresh of 'dated' online store
- Spending watchdog blames British Home Office for delays to £1bn crime-fighting IT system
- Google plays catch-up with JSON support for distributed RDBMS Spanner
The second development extolled by the vendor is combining in-memory and disk-based data on a single system, which Liberty claimed would perform the same workloads for a tenth of the cost by avoiding pulling data from disk into more expensive RAM systems.
Liberty also said the firm had improved horizontal scaling with an architecture designed to use Kafka and Kubernetes to make the vector database as reliable as any other enterprise-grade database.
Hyoun Park, chief analyst at Amalgam Insights, said that as machine and deep learning increasingly become normal business capabilities, organisations will hit performance limitations in standard relational databases.
For those starting from scratch building an ML architecture, it would be an advantage to start on appropriately designed databases.
"Vector-based search is an important aspect to consider," he said, "as it allows for better context, more human language usage, and better alignment of graphic and audio binaries with existing semantic taxonomies.
"From a practical perspective, vector-based search helps companies to better align complex text, speech, pictures, videos, and sounds to existing business departments, categories, and goals."
Park said vector-based search should be considered as the "next step" for organisations wanting to bring the "entirety of their data ecosystem into their machine learning and AI efforts." ®