Oh no, you're thinking, yet another cookie pop-up. Well, sorry, it's the law. We measure how many people read us, and ensure you see relevant ads, by storing cookies on your device. If you're cool with that, hit “Accept all Cookies”. For more info and to customize your settings, hit “Customize Settings”.

Review and manage your consent

Here's an overview of our use of cookies, similar technologies and how to manage them. You can also change your choices at any time, by hitting the “Your Consent Options” link on the site's footer.

Manage Cookie Preferences
  • These cookies are strictly necessary so that you can navigate the site as normal and use all features. Without these cookies we cannot provide you with the service that you expect.

  • These cookies are used to make advertising messages more relevant to you. They perform functions like preventing the same ad from continuously reappearing, ensuring that ads are properly displayed for advertisers, and in some cases selecting advertisements that are based on your interests.

  • These cookies collect information in aggregate form to help us understand how our websites are being used. They allow us to count visits and traffic sources so that we can measure and improve the performance of our sites. If people say no to these cookies, we do not know how many people have visited and we cannot monitor performance.

See also our Cookie policy and Privacy policy.

This article is more than 1 year old

IBM lobs 3,500 staffers at Apache Spark

Big Blue researchers pile into cluster parade

IBM has thrown its full weight behind Spark, Apache’s open-source cluster computing framework.

Spark will form the basis of all of Big Blue's analytics and commerce platforms and its Watson Health Cloud. The framework will also be sold as a service on its Bluemix cloud.

IBM will commit more than 3,500 of its researchers and developers to Spark-related projects and promised a Spark Technology Center in San Francisco, California where data science and developers can work with IBM designers and architects.

The giant also committed to release, under open source terms, its SystemML family machine-learning libraries.

Spark was invented by researchers at the University of California at Berkeley in 2009, under Matei Zaharia, and donated to Apache in 2013.

Written in Java, Scala and Python, Spark is an in-memory system for processing large data sets. It consists of scheduling and dispatching, SQL-style programming language, a machine-learning framework and distributed graphics processing framework.

Spark can scale to more than 8,000 production nodes and, while it works with Hadoop and MapReduce, is claimed to also be faster on certain workloads. Up until last year, Spark had just 465 contributors.

The presence of IBM can make or break open-source projects.

IBM adopted the Eclipse framework early on, making it the basis of its Rational programming tools. Serving as the foundation of IBM’s tools helped establish Eclipse as one industry’s biggest development environments, behind Microsoft’s Visual Studio, and guaranteed an entire ecosystem of ISVs building Eclipse plug-ins.

It’s been a virtuous circle: IBM is freed from having to maintain the IDE plumbing, ISVs and devs got an open, pluggable tools platform, and IBM benefits from advances and partners.

On the other extreme, you have Harmony – also an Apache project, for an independent alternative to Java from the now non-existent Sun Microsystems.

IBM threw in its lot because it vied with Sun for stewardship over Java.

When Sun ceased to exist, bought by Oracle, IBM withdrew from Harmony in October 2010 to join the OpenJDK project with Apple and Oracle.

Drained of its biggest backer, Harmony shut down 12 months later.

Oracle sought to make amends of a kind with Apache in 2011 by punting its OpenOffice productivity suite over the open-source project shop’s auspices.

Announcing its backing for Apache's Spark Monday, IBM painted Spark as a platform for data and analytics, the analogy being Linux – which IBM also contributes to – as a platform for apps.

The parallel, though, would seem closer to Eclipse. ®

 

Similar topics

TIP US OFF

Send us news


Other stories you might like