Yahoo! kills its stuffed elephant

For the love of Apache


Yahoo! has killed off its Hadoop distro, choosing to put its weight behind the core Apache Hadoop project instead.

Yahoo! was instrumental in bootstrapping Apache Hadoop, an open source distributed number-crunching platform based on Google's proprietary infrastructure, and in the summer of 2009, the web giant open sourced its own Hadoop distro, based on the code used in its "production" infrastructure. The idea was to let the rest of the world benefit from the work Yahoo! had done to build a platform that actually drives an internet-scale operation.

"We’ve put a lot of investment on our testing and deployment," Yahooligan Eric Baldeschwieler said in announcing the distro. "We’re going to take that work that we put into it and put it out on the web."

But on Tuesday, with a blog post, Baldeschwieler put an end to the Yahoo! distro, saying the company plans to remove all references to the distro on its website (developer.yahoo.com/hadoop), close its github repo (yahoo.github.com/hadoop-common), and concentrate on working with the Apache Hadoop community.

The Yahoo! distro, he said, was causing a bit of a community split. "As the community grew, we experimented with using the 'Yahoo! Distribution of Hadoop' as the vehicle to share our work," the post reads. "Unfortunately, Apache is no longer the obvious place to go for Hadoop releases. The Yahoo! team wants to return to a world where anyone can download and directly use releases of Hadoop from Apache.

"We want to contribute to the stabilization and testing of those releases. We also want to share our regular program of sustaining engineering that backports minor feature enhancements into new dot releases on a regular basis, so that the world sees regular improvements coming from Apache every few months, not years."

But Yahoo! still wants its work open sourced, so now it has to find a way of getting years of code into the Apache project.

Currently, Yahoo! offers two code branches: a stable release and a "future" release. The latest stable code – which Yahoo! currently runs on 40,000 nodes – is based on Hadoop 0.20, and the company is now working to move this to Apache (in the branch: hadoop/common/branches/branch-0.20-security). The idea is to call this the 20.100 release, and if the community approves, it will become an official Apache release.

Yahoo!'s future branch includes, yes, new features, and this will be rolled into Apache as well. "We are working on a set of new features for Hadoop to improve its availability, scalability and interoperability to make Hadoop more usable in mission critical deployments. You're going to see another burst of email activity from us as we work to get hadoop-future patches socialized, reviewed and checked in. These bulk checkins are exceptional. They are the result of us striving to be more transparent."

Named for the yellow stuffed elephant cherished by son of project founder Doug Cutting, Hadoop also underpins online services operated by everyone from Facebook and Twitter and StumbleUpon to, apparently, Apple. The original open source project mimicked GFS, Google's distributed file system, and MapReduce, Mountain View's distributed number-crunching platform. And it now includes various other infrastructure pieces, including HBase, based on Google's BigTable distributed database.

In 2004, Google published a pair of research papers on GFS and MapReduce, and Cutting used these to build a platform that would back Nutch, his open source web crawler. Yahoo! hired Cutting in 2006, but he has since left Yahoo! for Cloudera. ®

Similar topics

Broader topics


Other stories you might like

  • I was fired for blowing the whistle on cult's status in Google unit, says contractor
    The internet giant, a doomsday religious sect, and a lawsuit in Silicon Valley

    A former Google video producer has sued the internet giant alleging he was unfairly fired for blowing the whistle on a religious sect that had all but taken over his business unit. 

    The lawsuit demands a jury trial and financial restitution for "religious discrimination, wrongful termination, retaliation and related causes of action." It alleges Peter Lubbers, director of the Google Developer Studio (GDS) film group in which 34-year-old plaintiff Kevin Lloyd worked, is not only a member of The Fellowship of Friends, the exec was influential in growing the studio into a team that, in essence, funneled money back to the fellowship.

    In his complaint [PDF], filed in a California Superior Court in Silicon Valley, Lloyd lays down a case that he was fired for expressing concerns over the fellowship's influence at Google, specifically in the GDS. When these concerns were reported to a manager, Lloyd was told to drop the issue or risk losing his job, it is claimed. 

    Continue reading
  • End of the road for biz living off free G Suite legacy edition
    Firms accustomed to freebies miffed that web giant's largess doesn't last

    After offering free G Suite apps for more than a decade, Google next week plans to discontinue its legacy service – which hasn't been offered to new customers since 2012 – and force business users to transition to a paid subscription for the service's successor, Google Workspace.

    "For businesses, the G Suite legacy free edition will no longer be available after June 27, 2022," Google explains in its support document. "Your account will be automatically transitioned to a paid Google Workspace subscription where we continue to deliver new capabilities to help businesses transform the way they work."

    Small business owners who have relied on the G Suite legacy free edition aren't thrilled that they will have to pay for Workspace or migrate to a rival like Microsoft, which happens to be actively encouraging defectors. As noted by The New York Times on Monday, the approaching deadline has elicited complaints from small firms that bet on Google's cloud productivity apps in the 2006-2012 period and have enjoyed the lack of billing since then.

    Continue reading
  • Google recasts Anthos with hitch to AWS Outposts
    If at first you don't succeed, change names and try again

    Google Cloud's Anthos on-prem platform is getting a new home under the search giant’s recently announced Google Distributed Cloud (GDC) portfolio, where it will live on as a software-based competitor to AWS Outposts and Microsoft Azure Stack.

    Introduced last fall, GDC enables customers to deploy managed servers and software in private datacenters and at communication service provider or on the edge.

    Its latest update sees Google reposition Anthos on-prem, introduced back in 2020, as the bring-your-own-server edition of GDC. Using the service, customers can extend Google Cloud-style management and services to applications running on-prem.

    Continue reading
  • FTC urged to probe Apple, Google for enabling ‘intense system of surveillance’
    Ad tracking poses a privacy and security risk in post-Roe America, lawmakers warn

    Democrat lawmakers want the FTC to investigate Apple and Google's online ad trackers, which they say amount to unfair and deceptive business practices and pose a privacy and security risk to people using the tech giants' mobile devices.

    US Senators Ron Wyden (D-OR), Elizabeth Warren (D-MA), and Cory Booker (D-NJ) and House Representative Sara Jacobs (D-CA) requested on Friday that the watchdog launch a probe into Apple and Google, hours before the US Supreme Court overturned Roe v. Wade, clearing the way for individual states to ban access to abortions. 

    In the days leading up to the court's action, some of these same lawmakers had also introduced data privacy bills, including a proposal that would make it illegal for data brokers to sell sensitive location and health information of individuals' medical treatment.

    Continue reading
  • Google: How we tackled this iPhone, Android spyware
    Watching people's every move and collecting their info – not on our watch, says web ads giant

    Spyware developed by Italian firm RCS Labs was used to target cellphones in Italy and Kazakhstan — in some cases with an assist from the victims' cellular network providers, according to Google's Threat Analysis Group (TAG).

    RCS Labs customers include law-enforcement agencies worldwide, according to the vendor's website. It's one of more than 30 outfits Google researchers are tracking that sell exploits or surveillance capabilities to government-backed groups. And we're told this particular spyware runs on both iOS and Android phones.

    We understand this particular campaign of espionage involving RCS's spyware was documented last week by Lookout, which dubbed the toolkit "Hermit." We're told it is potentially capable of spying on the victims' chat apps, camera and microphone, contacts book and calendars, browser, and clipboard, and beam that info back to base. It's said that Italian authorities have used this tool in tackling corruption cases, and the Kazakh government has had its hands on it, too.

    Continue reading
  • Makers of ad blockers and browser privacy extensions fear the end is near
    Overhaul of Chrome add-ons set for January, Google says it's for all our own good

    Special report Seven months from now, assuming all goes as planned, Google Chrome will drop support for its legacy extension platform, known as Manifest v2 (Mv2). This is significant if you use a browser extension to, for instance, filter out certain kinds of content and safeguard your privacy.

    Google's Chrome Web Store is supposed to stop accepting Mv2 extension submissions sometime this month. As of January 2023, Chrome will stop running extensions created using Mv2, with limited exceptions for enterprise versions of Chrome operating under corporate policy. And by June 2023, even enterprise versions of Chrome will prevent Mv2 extensions from running.

    The anticipated result will be fewer extensions and less innovation, according to several extension developers.

    Continue reading
  • Brave Search leaves beta, offers Goggles for filtering, personalizing results
    Freedom or echo chamber?

    Brave Software, maker of a privacy-oriented browser, on Wednesday said its surging search service has exited beta testing while its Goggles search personalization system has entered beta testing.

    Brave Search, which debuted a year ago, has received 2.5 billion search queries since then, apparently, and based on current monthly totals is expected to handle twice as many over the next year. The search service is available in the Brave browser and in other browsers by visiting search.brave.com.

    "Since launching one year ago, Brave Search has prioritized independence and innovation in order to give users the privacy they deserve," wrote Josep Pujol, chief of search at Brave. "The web is changing, and our incredible growth shows that there is demand for a new player that puts users first."

    Continue reading

Biting the hand that feeds IT © 1998–2022