Databricks' lakehouse becomes foundation under fresh layer of AI dreams
Mega startup serves slice of GenAI with data engineering main as it tries to upstage Microsoft’s Fabric showpiece
Databricks has decided to launch a complete overhaul of its platform during the climax of Ignite, the global tech shindig run by Microsoft, the software giant with which the data analytics and ML vendor shares a significant partnership.
The company founded by some of the original creators of Apache Spark has announced it is building something else atop its "lakehouse" concept, which it launched in early 2020 as a means of combining structured BI and analytics workloads of data warehousing with the messy world of data lakes.
While retaining the lakehouse's unified governance layer across data and AI and a single unified query engine to span ETL, SQL, machine learning and BI, the company said it wants to move on to exploit the technology gained in its $1.3 billion buy of MosaicML, a generative AI startup.
In an announcement big on claims and scant on detail, Databricks says it is introducing a data intelligence layer it calls DatabricksIQ, which "fuels all parts of our platform."
The idea is to employ "AI models to deeply understand the semantics of enterprise data."
New genAI enabled features Databricks claims it will introduce include end-to-end retrieval augmented generation (RAG) designed to help create "high quality conversational agents on your custom data." The company also plans to enable training of custom models either from scratch on an organization's data, or by continued pre-training of existing models. The company is yet to announce products or release data that reflects these aspirations.
Gartner senior director analyst Aaron Rosenbaum said Databricks is one of the vendors competing in the market for data fabric, "a design framework" which the analyst firm promotes.
"Enterprises will have a rich set of choices in 2024, with vendors offering both revolutionary and evolutionary approaches to the data fabric," he said.
He said Databricks' announcement would help it provide active metadata management, profiling, genAI for data management, and data cataloguing. The idea is to make it simpler for organizations to gain insights from data with quicker time to value and less staff time and expertise.
"However, organizational and cultural challenges to this new approach to data management will be a barrier to adoption for many enterprises," he said.
- Microsoft touts mirroring over moving in data warehouse gambit
- Tabular's Iceberg vision goes from Netflix and chill to database thrill
- Databricks shakes VC money tree and $500M falls out
- Snowflake's Instacart protestations hint at challenges for poster child of the data cloud
Rosenbaum declined to comment on the timing of Databricks' release, which coincided with the general availability of Microsoft's Fabric product portfolio. Like Databricks, Fabric uses the Delta table format to underpin most of its new data engineering and analytics products.
Hyoun Park, CEO and chief analyst with Amalgam Insights, pointed out that the companies collaborate on the Azure Databricks product, hosted in Microsoft's cloud platform. "It may be the most successful product on Microsoft Azure," he said.
Databricks closed $500 million in series one VC funding during September, giving it a nominal $43 billion valuation and making it one of the highest valued pre-IPO startups. Its customers include Shell, Toyota, Air Canada, Rolls-Royce, and global bank ABN AMRO.
We asked Databricks for more details on the announcement. ®