Software

This article is more than 1 year old

Hortonworks CTO's ACID Merge: The impossible dream, realised

Hive-minded project injection

Fri 7 Apr 2017 // 12:30 UTC

Interview Hortonworks believes it's solved duplication issues in its Hadoop spin that menaced users when incrementally merging data.

And CTO Scott Gnau reckoned customers are heaving sighs of relief after Hortonworks has realised what must have seemed impossible.

Gnau was speaking to The Register at the DataWorks Summit in Munich this week.

The company has acknowledged the difficulties of many workloads in the Hadoop environment when they require "frequent or unpredictable updates".

The biggest problem? Developers are responsible not only for the update logic but also the rollback logic they must detect and resolve write conflicts and find some way to isolate downstream consumers from in-progress updates. Hadoop has limited facilities for solving these problems and people who attempted it usually ended up limiting updates to a single writer and disabling all readers while updates are in progress.

Gnau reckons they've tackled this with version 2.6 of Hortonworks Data Platform — the company's enterprise Hadoop distribution.

Another objective of 2.6 was to avoid the possibility that HDP became an environment simply for copying data and running complex analytics analytics with "interference" from the "real work" taking place in the enterprise data warehouse.

Hortonworks has therefore developed ACID Merge in Apache Hive, rather than introduce it via yet another open-source project.

Initially developed by Facebook, but now part of the Apache Software Foundation, Hive provides a SQL-like interface for querying data stored in Hadoop. "Others might have taken the approach of creating yet another project to go and solve this problem," Gnau told The Reg.

"[But] we really like to focus on making the core platform as good as it can be, and that makes it easier for our customers to consume it because they don't have to rewrite their applications or do anything other than load the new software, implement the performance enhancement capability, and the application just runs."

Gnau reckoned there were "many instances" where the same file loaded twice or the same record was duplicated.

"Being able to do ACID-compliant merge of that data as an incremental data maintenance operation is really important especially as we look at the kind of applications that are going to be driven by the technical query performance, dashboarding and other transaction analytics where you want to ensure that you don't get dirty reads, where you want to ensure that you don't double-count."

ACID compliance is historically a big issue in data processing. "Being able to do data merge and incremental data loads inside of the core platform was possible, but it required a lot of heavy lifting," Gnau said. "Now it's just completely automatic and because we do it inside of the engine we can guarantee that it's ACID compliant."

The response from customers has been verging on emotional, according to Hortonworks' CTO. During a meeting with Hortonworks customers he reckoned some had "just sighed with great relief because it's a problem which they'd been grappling with, trying to solve, and it was taking more work than they wanted it to do.

"I didn't have a lot of customers asking for the ACID Merge," Gnau reckoned, "but it's obviously a really big deal for them – it might have been they didn't ask because they thought it was impossible to do. Certainly it was not without challenge from an implementation perspective." ®

Topics

Special Features

Vendor Voice

Resources

Software

Hortonworks CTO's ACID Merge: The impossible dream, realised

Hive-minded project injection

More about

More about

More about

More about

More about

TIP US OFF

Other stories you might like

Global IT spending forecast to reach $5.06 trillion this year

Google will pump more than $100B into AI, says DeepMind boss

Japanese government rejects Yahoo! infosec improvement plan

Protecting distributed branch office environments from ransomware

Tencent Cloud to revisit design after circular dependencies slowed emergency API fix

Indian PM's 25-year roadmap laid out with help from AI

RISC-V AI chip upstart Rivos plans to undercut Nvidia, helped by a quarter-billion in VC lucre

Fire in the Cisco! Networking giant's Duo MFA message logs stolen in phish attack

FYI: This site claims to have harvested 4B+ Discord chats, today all yours for a price

MGM says FTC can't possibly probe its ransomware downfall – watchdog chief Lina Khan was a guest at the time

YouTube now sabotages ad-blocking apps that stream its vids

China scientists talk of powering hypersonic weapon with cheap Nvidia chip

About Us

Our Websites

Your Privacy