Databases

This article is more than 1 year old

Databricks: Ugh, just look at that messy data lake environment. Squints. You know... we could sort that out with a sweet shot of SQL

Data-wrangler previews another lakehouse concept tool

Fri 13 Nov 2020 // 19:32 UTC

Data management and machine learning framework biz Databricks is launching a tool it has claimed will bring SQL-style analytics to the messy world of data lakes.

SQL Analytics, the company claimed, expands the traditional scope of the data lake from data science and machine learning to include all data workloads including business intelligence and SQL. It is available for preview this week.

The tool is a manifestation of the company’s lakehouse concept, which, you’ve guessed it, is an attempt to bring some of the governance, performance and order from the data warehouse world to the wild and messy world of data lakes, which have the advantage of being able to ingest unstructured data quickly.

Speaking to The Register, Joel Minnick, Databricks product marketing veep said: “Despite it being a little bit of a whimsical name for an architecture, lakehouse is probably the best way to articulate what the architecture is.”

SQL Analytics is built on Delta Lake, Databricks’ open format data engine supposed to help bring order and performance to existing data lakes. It also uses Delta Engine, a “polymorphic query execution engine,” which rewrites Spark into C++ to take advantage of vectorisation, Minnick said. Apache Spark is written in Scala.

The idea, said Minnick, is that it allows users to auto-scale clusters that are structured to be high-performance SQL analytics clusters, which in turn is supposed to allow organisation to handle high user concurrency (many logged-in users) “behind the scenes”.

Databricks had also “done some engineering” to govern how queries were trafficked and executed to keep back and forth communication to a minimum, thereby reducing latency, he said.

Those familiar with SQL analytics or data engineering can explore the schema of their Delta Lake tables, to be able to “run SQL queries, and visualize the results,” Minnick said.

While the Databricks SQL Engine might help bring BI work to the data lake, and help users get value from that messy repository of data, it is unlikely to replace established enterprise data warehouses any time so, opined Philip Carnelley, associate vice president of software research at IDC.

“The idea is to give you the best of both worlds, there is some merit to that. But this is a solution for companies with lots of technical resources. This will run alongside other enterprise data tools. It might be that people use data warehouse systems like Teradata a bit less, because they have these tools as well, but they are not going to switch off the data warehouse any time soon,” Carnelley said.

Databricks was one of the main vendors behind Spark, a data framework designed to help build queries for distributed file systems such as Hadoop. Matei Zaharia, DataBricks' CTO and co-founder, was the initial author for Spark. ®

Topics

Special Features

Vendor Voice

Resources

Databases

Databricks: Ugh, just look at that messy data lake environment. Squints. You know... we could sort that out with a sweet shot of SQL

Data-wrangler previews another lakehouse concept tool

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Valkey publishes release candidate and attracts new backer

North American S/4HANA migrations ramping among SAP users

Linux Foundation marshals support for open source alternative to Redis

A different view from the edge

Databricks claims its open source foundational LLM outsmarts GPT-3.5

Progress outbids private equity in offer for MariaDB plc

Uncle Sam's had it up to here with 'unforgivable' SQL injection flaws

PostgreSQL pioneer's latest brainchild promises time travel to dodge ransomware

Whistleblower raises alarm over UK Nursing and Midwifery Council's DB

Beijing issues list of approved CPUs – with no Intel or AMD

Nutanix catapults IP theft sueball at DBaaS startup Tessell

Voltron Data revs up hyper-speed analytics, leaves Snowflake in the dust

About Us

Our Websites

Your Privacy