Leaving Spark behind, Databricks enters new territory as it eyes 2021 IPO

Will its vision for unified analytics fly?

Databricks, the commercial company founded around the popular Apache Spark data lake, is making a strike for new class workloads and enterprise data management jobs in its make-or-break IPO year.

Hawking technology news from the company’s Data + AI Summit, CEO Ali Ghodsi spoke to The Register about the new technologies. Ghodsi said the firm's efforts to combine the order and SQL queries familiar to data warehousing with the schema-less architecture of data lakes would be pushed more aggressively against established vendors in the data management, analytics and data warehousing.

In 2018, when launching the unified analytics concept, Databricks was promoting it for customers looking into the machine learning lifecycle, but this approach was not aggressive enough, Ghodsi said.

"We tiptoed around it to not upset the big vendors and the data warehouses. We knew we were sitting on kryptonite, and we were hiding it because we thought that it would be too upsetting for people, that it would be too competitive to everybody," he claimed.

Databricks was co-founded in 2013 by a team of academics that met at Berkeley, including computer scientist Matei Zaharia, who developed Spark as a PhD thesis in 2009 and later co-created the Apache Mesos cluster manager. Ghodsi remains an adjunct assistant professor at the University of California institution.

In 2019, Databricks introduced Delta Lake, an open-source project, designed to address data lake reliability and addressability issues, which had caused the unflattering description of "data swamp" to gain traction.

Then in February last year, Databricks introduced the term "lakehouse" to the unsuspecting technology lexicon. This - you guessed it - was an effort to put across the idea it would combine the best of the data warehouse and data lake approach.

Fast-forward to November, and Databricks launched SQL Analytics, built on Delta Lake, Databricks’ open format data engine supposed to help bring order and performance to existing data lakes. It also uses Delta Engine, a “polymorphic query execution engine,” which rewrites Spark, written in Scala, into C++ to take advantage of vectorisation. Within Delta Lake, Databricks introduced propriety Photon, the Spark-compatible execution engine designed to accelerate Spark SQL workflows.

“We already had it from the beginning: we called it Unified Analytics. Basically, unify all your analytics: advanced analytics, all the way down to basic analytics: SQL. But it's very hidden and people didn't know it. Lakehouse is sort of the same thing but now it's in your face: data lake, plus data warehouses, combined together. Lake for AI; warehouse for BI, you get the best of the AI and BI in one platform, one copy of the data in an open platform,” Ghodsi said.

Into the mix, Databricks last week added Delta Live Tables aimed at easing ETL, a common use for Spark, by “abstracting away the low-level instructions, removing many potential sources of error,” Databricks said.

Meanwhile, Unity Catalog, industry-standard ANSI SQL, is designed to offer one interface to access both structured and unstructured data, across all cloud data lakes, in a bid to help users geta single view of their data on the Databricks Lakehouse Platform.

Databricks has launched an open-source project called Delta Sharing, which will be donated to the Linux Foundation. Databricks claims it is the world’s first open protocol for securely sharing data across organizations in real-time, completely independent of the platform on which the data resides. It is supported by AWS, Google Cloud, and BI and visualisation firm Tableau.

Sanjeev Mohan, Gartner veep and analyst, said Delta Live Tables was the “crowning jewel” for Databricks. “It makes the process of creating reliable data pipelines a declarative one – like SQL. You specify the destination and don’t worry about the underlying code which is system generated,” he said.

Unity Catalog was also a good move from Databricks, as “data catalogues have been all the rage for many years.” The vendor “has a very ambitious roadmap to enhance its functionality,” Mohan said.

Delta Sharing was interesting, he added, because most existing data sharing technologies requires the user to have an account on that platform before they can partake in its capabilities. “Databricks’ Delta Share removes that requirement,” the Gartner veep said.

While products from Microsoft and Google are also aimed at unifying the world of data lakes and warehouses, there are differences in emphasis, he said. “Some products are aimed at the data analyst persona but Databricks is aimed at helping data engineers deliver faster and more reliably,” he said.

IDC: Not so easy to peel users away from other vendors

But Philip Carnelley, associate veep, software research at IDC Europe, said that by sharing performance data on features like concurrency, an area where established data warehousing firms play well, Databricks was trying to convince the market it was a serious player. But there was still inertia in favour of incumbent vendors.

“If you’ve been using something like Teradata for 30 years and you know it works, then it is important, you’re not going to move off it lightly,” the analyst said.

While Databricks said users could add capacity in the cloud, that always comes at a cost. “It is cost-performance, not just performance that is interesting here. I think that the Teradata of this world, can give assurances there because there is a lot of experience [in optimisation] that they can draw on,” he said.

Ghodsi told The Register it was the company’s aim to be “IPO-ready” this year. In the build-up to the big day, the company took a $1bn investment round in February, with AWS, Microsoft, Google, Andreessen Horowitz (Netscape founder Marc Andreessen's VC firm), and Salesforce Ventures chipping in. The splurge took the nominal value of the fledgling firm to $28bn.

Databricks is obviously hoping the narrative which sees it spreading its wings from its data lake home to more general analytics and BI technology is a story that will fly with the market. ®

Similar topics

Other stories you might like

  • Experts: AI should be recognized as inventors in patent law
    Plus: Police release deepfake of murdered teen in cold case, and more

    In-brief Governments around the world should pass intellectual property laws that grant rights to AI systems, two academics at the University of New South Wales in Australia argued.

    Alexandra George, and Toby Walsh, professors of law and AI, respectively, believe failing to recognize machines as inventors could have long-lasting impacts on economies and societies. 

    "If courts and governments decide that AI-made inventions cannot be patented, the implications could be huge," they wrote in a comment article published in Nature. "Funders and businesses would be less incentivized to pursue useful research using AI inventors when a return on their investment could be limited. Society could miss out on the development of worthwhile and life-saving inventions."

    Continue reading
  • Declassified and released: More secret files on US govt's emergency doomsday powers
    Nuke incoming? Quick break out the plans for rationing, censorship, property seizures, and more

    More papers describing the orders and messages the US President can issue in the event of apocalyptic crises, such as a devastating nuclear attack, have been declassified and released for all to see.

    These government files are part of a larger collection of records that discuss the nature, reach, and use of secret Presidential Emergency Action Documents: these are executive orders, announcements, and statements to Congress that are all ready to sign and send out as soon as a doomsday scenario occurs. PEADs are supposed to give America's commander-in-chief immediate extraordinary powers to overcome extraordinary events.

    PEADs have never been declassified or revealed before. They remain hush-hush, and their exact details are not publicly known.

    Continue reading
  • Stolen university credentials up for sale by Russian crooks, FBI warns
    Forget dark-web souks, thousands of these are already being traded on public bazaars

    Russian crooks are selling network credentials and virtual private network access for a "multitude" of US universities and colleges on criminal marketplaces, according to the FBI.

    According to a warning issued on Thursday, these stolen credentials sell for thousands of dollars on both dark web and public internet forums, and could lead to subsequent cyberattacks against individual employees or the schools themselves.

    "The exposure of usernames and passwords can lead to brute force credential stuffing computer network attacks, whereby attackers attempt logins across various internet sites or exploit them for subsequent cyber attacks as criminal actors take advantage of users recycling the same credentials across multiple accounts, internet sites, and services," the Feds' alert [PDF] said.

    Continue reading

Biting the hand that feeds IT © 1998–2022