Oh no, you're thinking, yet another cookie pop-up. Well, sorry, it's the law. We measure how many people read us, and ensure you see relevant ads, by storing cookies on your device. If you're cool with that, hit “Accept all Cookies”. For more info and to customize your settings, hit “Customize Settings”.

Review and manage your consent

Here's an overview of our use of cookies, similar technologies and how to manage them. You can also change your choices at any time, by hitting the “Your Consent Options” link on the site's footer.

Manage Cookie Preferences
  • These cookies are strictly necessary so that you can navigate the site as normal and use all features. Without these cookies we cannot provide you with the service that you expect.

  • These cookies are used to make advertising messages more relevant to you. They perform functions like preventing the same ad from continuously reappearing, ensuring that ads are properly displayed for advertisers, and in some cases selecting advertisements that are based on your interests.

  • These cookies collect information in aggregate form to help us understand how our websites are being used. They allow us to count visits and traffic sources so that we can measure and improve the performance of our sites. If people say no to these cookies, we do not know how many people have visited and we cannot monitor performance.

See also our Cookie policy and Privacy policy.

This article is more than 1 year old

Bright Sparks: Databricks emits system to sort out ‘data mess’

Data-nom from stream, lake and warehouse, they chirp

Apache Spark-wrangling biz Databricks has added a third pillar to its Unified Analytics Platform aimed at unifying data management.

The unified data management system, Delta, aims to simplify enterprises’ complex data architecture, which sees data spread across multiple data lakes and data warehouses.

CEO and co-founder Ali Ghodsi told The Register that Delta addressed one of three major roadblocks to widespread use of data analytics.

These are the need for data scientists to collaborate with non-experts, to manage complex infrastructure, and to ensure good performance, often in real-time of data in many formats.

Ghodsi said Delta – launched today at the Spark Summit in Dublin – aims to tackle the third problem, which sees customers dealing with a “data mess”, with data in data lakes and data warehouses.

At the same time, they also have streaming systems thanks to increased need for real-time performance analytics for fraud detection that can’t operate on stale data.

The idea of Delta, Databricks said, is to let customers cut out "complex, brittle extract, transform, and load processes that run across a variety of systems".

Ghodsi said it will combine streaming and batch processing, and do it with “the performance and reliability of data warehouses, with the advantages of data lakes - essentially that it’s separating compute and storage”.

Delta will store its data in Amazon S3 - Databricks said this would offer the scale of a data lake, and that it would be stored in a non-proprietary and open file format “to ensure data portability and prevent data lock-in”.

Meanwhile, the company said, Delta tables are used as data source and sink, and will provide transactional guarantees for multiple concurrent writes for batch and streaming jobs.

Delta also claims a number of automated abilities, including automated performance management, cutting out the need for manual tuning, a self-optimising data layout and intelligent data skipping and indexing.

Ghodsi said that, as a cloud company, Databricks' “number one priority” was security, listing security accreditations and its partnership with the CIA’s investment arm In-Q-Tel.

He said that customers can be given access to full audits and logs for metadata and data, for data governance requirements, claiming that - because all data is validated when it is brought into the system - it is also reliable. ®

Similar topics

Similar topics

Similar topics

TIP US OFF

Send us news


Other stories you might like