Elders tell cluster tool Apache Spark it's time to quit chillin' in the crib

Hadoop Swiss Army knife software graduates from Incubator to full-blown project


The Apache Foundation has promoted a fast data-processing tool out of the Apache Incubator in a further sign of the maturity of the Hadoop family.

Apache Spark is a fast processing layer for computing data stored within the open-source Hadoop file system or other shared file systems such as NFS. It supports Scala, Java, and Python. In some tests it has demonstrated a speedup of 100 times over Hadoop when dealing with in-memory sets, and 10 times for hard-disk-held data.

On Sunday, Spark was unanimously voted to graduate from the Incubator, and some of those voting included Hadoop luminaries such as the technology's creator Doug Cutting.

Now that Spark has been promoted, a project management committee will be established for the software, and Databricks co-founder and former AMP Lab PHD student Matei Zaharia will be appointed to the role of 'Vice President, Apache Spark".

Like Hadoop, Spark has become the foundation for other data-processing engines as well, such as Shark for SQL-on-Hadoop queries, MLib for machine learning, Spark Streaming for dealing with streaming data, and GraphX for graph processing.

Some of the technology's users include Baidu, Databricks, IBM's Almaden research group, TrendMicro, Yahoo! and Alibaba.

The graduation of Apache Spark caps off a vertiginous rise for the data-processing system, which was created at the University of California at Berkeley's AMPLab in 2009 and was published as open source in 2010.

Since then, the system has gained a vigorous developer community, and more than 120 developers from 25 companies contribute source code. There seems to be enough activity around the software for businesses to smell money – as last week Hadoop hothouse Cloudera announced commercial support for the tool. ®

Narrower topics


Other stories you might like

  • DigitalOcean sets sail for serverless seas with Functions feature
    Might be something for those who find AWS, Azure, GCP overly complex

    DigitalOcean dipped its toes in the serverless seas Tuesday with the launch of a Functions service it's positioning as a developer-friendly alternative to Amazon Web Services Lambda, Microsoft Azure Functions, and Google Cloud Functions.

    The platform enables developers to deploy blocks or snippets of code without concern for the underlying infrastructure, hence the name serverless. However, according to DigitalOcean Chief Product Officer Gabe Monroy, most serverless platforms are challenging to use and require developers to rewrite their apps for the new architecture. The ultimate goal being to structure, or restructure, an application into bits of code that only run when events occur, without having to provision servers and stand up and leave running a full stack.

    "Competing solutions are not doing a great job at meeting developers where they are with workloads that are already running today," Monroy told The Register.

    Continue reading
  • Patch now: Zoom chat messages can infect PCs, Macs, phones with malware
    Google Project Zero blows lid off bug involving that old chestnut: XML parsing

    Zoom has fixed a security flaw in its video-conferencing software that a miscreant could exploit with chat messages to potentially execute malicious code on a victim's device.

    The bug, tracked as CVE-2022-22787, received a CVSS severity score of 5.9 out of 10, making it a medium-severity vulnerability. It affects Zoom Client for Meetings running on Android, iOS, Linux, macOS and Windows systems before version 5.10.0, and users should download the latest version of the software to protect against this arbitrary remote-code-execution vulnerability.

    The upshot is that someone who can send you chat messages could cause your vulnerable Zoom client app to install malicious code, such as malware and spyware, from an arbitrary server. Exploiting this is a bit involved, so crooks may not jump on it, but you should still update your app.

    Continue reading
  • Google says it would release its photorealistic DALL-E 2 rival – but this AI is too prejudiced for you to use
    It has this weird habit of drawing stereotyped White people, team admit

    DALL·E 2 may have to cede its throne as the most impressive image-generating AI to Google, which has revealed its own text-to-image model called Imagen.

    Like OpenAI's DALL·E 2, Google's system outputs images of stuff based on written prompts from users. Ask it for a vulture flying off with a laptop in its claws and you'll perhaps get just that, all generated on the fly.

    A quick glance at Imagen's website shows off some of the pictures it's created (and Google has carefully curated), such as a blue jay perched on a pile of macarons, a robot couple enjoying wine in front of the Eiffel Tower, or Imagen's own name sprouting from a book. According to the team, "human raters exceedingly prefer Imagen over all other models in both image-text alignment and image fidelity," but they would say that, wouldn't they.

    Continue reading

Biting the hand that feeds IT © 1998–2022