This article is more than 1 year old

Sparks fly as HPE and Hortonworks hold hands for data mould

You want batch computations? Step into our database love nest

Hadoop spinner Hortonworks is tweaking Spark for bulky workloads with Hewlett-Packard Enterprise as part of a "new strategic direction."

The pair declared a breakthrough in raising Spark's ability to shuffle data across a cluster - Spark currently comes with shuffle utilities, but the companies claimed they'd managed “faster sorting and in-memory computations.”

Alongside this, the companies laboured to up Spark's memory utilisation for "better performance and broader scalability" that HPE and Hortonworks expected will lead to "new, large-scale use cases.”

Spark daddy, Matei Zaharia, told El Reg:

“The thing that's most unique about Spark and data streaming is that the same machine can do batch computation. Other projects are streaming only, or either/or, but not both.

"You can be receiving a stream of data and at the same time cross-compare that stream with stored static data. Or as you're receiving it, you have run new queries about it, like 'What's the video doing in this region of the world?', and with Spark Streaming you bring in this query.”

Scott Gnau, Hortonworks's CTO, predicted the business will, "continue to focus on the integration of Spark into broad data architectures supported by Apache YARN as well as enhancements for performance and functionality and better access points for applications like Apache Zeppelin.”

In a prepared PR blurb, HPE chief techie and Hortonwork's board member Martin Fink, said he hoped the collabration would let the Spark community desrive meaning more quickly from large data sets "without having to change a single line of code.” ®

More about


Send us news

Other stories you might like