IBM stamps its pedal to the metal: Spark flies onto Big Blue's Bluemix

Citizen analysts and Excel junkies please take note


IBM has made its planned Apache Spark service available on its Bluemix cloud platform.

The service was announced in June, when IBM pledged 3,500 staffers to Apache's analytics project, with Spark also becoming the basis of IBM's analytics and commerce platforms and its Watson Health Cloud.

Spark-as-a-service will be available on a pay-as-you-go basis with the ability to reserve instances, but there won't be volume discounts for big firms.

According to IBM, Spark is like MapReduce – an architecture running at near infinite scale that revolutionizes data processing for the masses. Spark is, of course, beginning to replace MapReduce for some.

No more proprietary boxes and chipsets or need to engage rocket scientists building high-performance clusters; this is analytics for the generation who don't like paying enterprise prices and for whom commodity x86 clusters is king.

But a lot's changed since stately IBM anointed Spark, which was created in 2009 and accepted into the Apache hall of open-source fame in 2013.

Amazon has revealed it's working on its own analytics-as-a-service Space Needle, which was revealed earlier this month for AWS. Amazon has been moving into the enterprise for some time, displacing the established infrastructure vendors – vendors like IBM. Now it's going up a level, into applications and cash-rich BI and analytics apps at that.

S3, Redshift, and Aurora have been the building blocks. Last week Amazon told Wall Street Aurora is now its fastest growing service ever – having overtaken Redshift. Aurora is AWS's MySQL-compatible database engine.

Such are the waves AWS has created that SAP was forced into pre-announcing its analytics-as-a-service, Project Orca.

Analytics isn't a friendly world for users. The apps are pricey, there are lots of high-priced consultants, and it's a bit like ERP or CRM in that the final software never quite lives up to original expectations. The second you deploy it, it's out of date.

AWS argues that it's democratizing things – thanks to its low, low prices, ease of adoption, and flexibility of architecture. Amazon has forced IBM to compete in its own back yard, the enterprise, in a game it's long thought its own: data and analytics.

So how does IBM hope to arrest that giant sucking sound that is data going into Jeff Bezos' cloud?

A gamble by Big Blue?

IBM's rolling Spark-as-a-service not just at the suits plugging away at their BI and analytics dashboards, but also at those building tools and partnering. The open-source nature of Spark, therefore, is important because anybody can build code and tools that plug into other Spark-based tools and frameworks.

In this respect, IBM is gambling it can repeat the success of the Eclipse tools framework from the early 2000s. IBM threw its weight into Eclipse, turning it into the framework for its Rational tools and achieving a major coup in the process: It cut the price of building its own Rational IDE and pulled rivals around into supporting the Eclipse framework and, thereby, its own platform and runtimes.

Eclipse raised an army of ISVs building tools to the open-source code that was IBM-ready. But open cuts little ice in the cloud of today, and that's where IBM's theory and past experience of Eclipse breaks down.

OpenStack was intended as the open alternative to AWS, but OpenStack public clouds are folding – Hewlett-Packard was the latest to bring down the shutters.

Proprietary platforms such as AWS are winning, as long as they are clothed in support for open-source operating systems, languages, and middleware.

No one was ever fired for buying IBM open software, right?

Derek Schoettle, IBM Cloud Data Services general manager, countered that open is important: AWS is proprietary and incomplete. Spark, with IBM, means not just choice, but also the opportunity to flesh out a platform more complete, from the user interface through to the processing engine and everything in between lacking in AWS.

However, unlike with Eclipse, IBM in no way can be considered the standard bearer to which all others are forced to rally. Before IBM, Cloudera, Intel, DataBricks, and MapR – to name but a few – were already committed to Spark. Moreover, Spark is available on AWS. So what exactly is IBM's differentiator this time?

"IBM's been around for a long time and has had success working with large organizations, and moving huge organizations and their data to a mixed or a purely private [cloud] estate," Schoettle told The Reg. "We have a full complement of offerings that lets enterprises move some or all or part of that environment to the cloud."

Also, IBM – while working with the traditional enterprise customers – wants to attract new, smaller firms who'd normally not consider working with IBM. "We are focused on attracting new customers. We can now partner with small and fast-moving organizations," Schoettle said. "The interest is from IBM's existing base of customers and also communities who've not worked with IBM in the past; the attractive part is that IBM is open – the Spark community."

Armed with Spark, Schoettle reckons IBM can win data scientists and "citizen analysts" who would have used Excel plus data visualization tools.

Just how much does IBM expect to be able to make from such grassroots? It's not saying. "It's early in this game," Schoettle said. "We feel very good about the next 5-10 years and our role in analytics and the cloud opportunity."

IBM won't lose out on Spark. It's just got a lot of ground to make up convincing the "as-a-service" converts it has something they need as much as IBM does. ®

Broader topics


Other stories you might like

  • New audio server Pipewire coming to next version of Ubuntu
    What does that mean? Better latency and a replacement for PulseAudio

    The next release of Ubuntu, version 22.10 and codenamed Kinetic Kudu, will switch audio servers to the relatively new PipeWire.

    Don't panic. As J M Barrie said: "All of this has happened before, and it will all happen again." Fedora switched to PipeWire in version 34, over a year ago now. Users who aren't pro-level creators or editors of sound and music on Ubuntu may not notice the planned change.

    Currently, most editions of Ubuntu use the PulseAudio server, which it adopted in version 8.04 Hardy Heron, the company's second LTS release. (The Ubuntu Studio edition uses JACK instead.) Fedora 8 also switched to PulseAudio. Before PulseAudio became the standard, many distros used ESD, the Enlightened Sound Daemon, which came out of the Enlightenment project, best known for its desktop.

    Continue reading
  • VMware claims 'bare-metal' performance on virtualized GPUs
    Is... is that why Broadcom wants to buy it?

    The future of high-performance computing will be virtualized, VMware's Uday Kurkure has told The Register.

    Kurkure, the lead engineer for VMware's performance engineering team, has spent the past five years working on ways to virtualize machine-learning workloads running on accelerators. Earlier this month his team reported "near or better than bare-metal performance" for Bidirectional Encoder Representations from Transformers (BERT) and Mask R-CNN — two popular machine-learning workloads — running on virtualized GPUs (vGPU) connected using Nvidia's NVLink interconnect.

    NVLink enables compute and memory resources to be shared across up to four GPUs over a high-bandwidth mesh fabric operating at 6.25GB/s per lane compared to PCIe 4.0's 2.5GB/s. The interconnect enabled Kurkure's team to pool 160GB of GPU memory from the Dell PowerEdge system's four 40GB Nvidia A100 SXM GPUs.

    Continue reading
  • Nvidia promises annual updates across CPU, GPU, and DPU lines
    Arm one year, x86 the next, and always faster than a certain chip shop that still can't ship even one standalone GPU

    Computex Nvidia's push deeper into enterprise computing will see its practice of introducing a new GPU architecture every two years brought to its CPUs and data processing units (DPUs, aka SmartNICs).

    Speaking on the company's pre-recorded keynote released to coincide with the Computex exhibition in Taiwan this week, senior vice president for hardware engineering Brian Kelleher spoke of the company's "reputation for unmatched execution on silicon." That's language that needs to be considered in the context of Intel, an Nvidia rival, again delaying a planned entry to the discrete GPU market.

    "We will extend our execution excellence and give each of our chip architectures a two-year rhythm," Kelleher added.

    Continue reading
  • Amazon puts 'creepy' AI cameras in UK delivery vans
    Big Bezos is watching you

    Amazon is reportedly installing AI-powered cameras in delivery vans to keep tabs on its drivers in the UK.

    The technology was first deployed, with numerous errors that reportedly denied drivers' bonuses after malfunctions, in the US. Last year, the internet giant produced a corporate video detailing how the cameras monitor drivers' driving behavior for safety reasons. The same system is now apparently being rolled out to vehicles in the UK. 

    Multiple camera lenses are placed under the front mirror. One is directed at the person behind the wheel, one is facing the road, and two are located on either side to provide a wider view. The cameras are monitored by software built by Netradyne, a computer-vision startup focused on driver safety. This code uses machine-learning algorithms to figure out what's going on in and around the vehicle.

    Continue reading
  • AWS puts latest homebrew ‘Graviton 3’ Arm CPU in production
    Just one instance type for now, but cheaper than third-gen Xeons or EPYCs

    Amazon Web Services has made its latest homebrew CPU, the Graviton3, available to rent in its Elastic Compute Cloud (EC2) infrastructure-as-a-service offering.

    The cloud colossus launched Graviton3 at its late 2021 re:Invent conference, revealing that the 55-billion-transistor device includes 64 cores, runs at 2.6GHz clock speed, can address DDR5 RAM and 300GB/sec max memory bandwidth, and employs 256-bit Scalable Vector Extensions.

    The chips were offered as a tech preview to select customers. And on Monday, AWS made them available to all comers in a single instance type named C7g.

    Continue reading

Biting the hand that feeds IT © 1998–2022