Updated You didn't shed a tear over the death of Yahoo!'s independent search engine? That may change.
As the two companies finally ended the epic gestation period for their inevitable web search pact, Yahoo! and Microsoft announced that Bing - Redmond's fledgling
decision engine search engine - will be "the exclusive algorithmic search and paid search platform for Yahoo! sites." And though the two Google chasers made it clear that Yahoo! will continue to use its own technologies to drive other areas of its business, you have to wonder what the pact means for the future of Hadoop, the open-source grid platform that had finally restored Yahoo!'s mojo.
Yahoo! is the largest contributor to the increasingly popular Apache project, contributing more than 70 per cent of all patches, and it employs the project's founder, Nutch-crawler-creator Doug Cutting. But in signing its pact with Microsoft, it would appear that the company has agreed to bury its largest Hadoop application: the Yahoo! Search Webmap.
The Webmap - which provides the Yahoo! search engine with a database of all known web pages, complete with all the necessary metadata - has also been described (by Yahoo!) as the world's largest Hadoop application. And though Hadoop powers other portions of Yahoo!, it's unclear whether the company will put as much time and money into moving the platform forward. Yahoo! has not responded to our requests for comment. Nor has Microsoft.
Redmond told Cnet that it's "open" to merging Bing with Yahoo!'s Searchmonkey platform, a misguided effort to expose the company's search results to third party developers. But although Bing's "reference vertical" uses Hadoop - thanks to the acquisition of semantic search startup Powerset - it seems unlikely that Redmond would embrace Hadoop on Bing proper. Indeed, Powerset's general manager has told us that nearly a year after the startup's acquisition, Microsoft has made no plans to do so.
Even if it did, that's beside the point. The point here is that Yahoo! - Hadoop's godfather - is giving up the crown jewel in its Hadoop empire.
Inspired by Google-published research papers describing Mountain View’s proprietary software infrastructure, Hadoop is a means of crunching epic amounts of data across a network of distributed machines. Doug Cutting originally developed the platform for use with Nutch, naming it after his son's stuffed elephant. But in 2006, he was hired by Yahoo!, and by the beginning of last year Hadoop had made its way onto Yahoo! production systems.
Webmap is the big example. But Yahoo! does use Hadoop for various other tasks. The platform now powers the real-time automated algorithms that select news stories for the Yahoo! home page. And in some cases it's used to optimize ads - i.e. to match content with relevant advertising.
Presumably, Hadoop will continue to drive these non-search tools. But does that mean Yahoo! will continue to put its considerably weight behind the project's continued development?
Christophe Bisciglia is confident that Yahoo!'s commitment will remain. "Hadoop isn't just about search," says Bisciglia, one of the minds behind Cloudera, a Silicon Valley startup offering a commercialized version of Hadoop. "Over the coming months, we will likely see Yahoo! shift resources towards the advertising and content businesses, but Hadoop plays a critical role there as well, so even if the clients for Hadoop change a bit, I don't see the overall investment from Y! decreasing.
"The expensive part of operating a search business is the hardware itself - not the development team working on Hadoop. If anything, this will better position their Hadoop team to attack challenges that have more impact on Yahoo!'s bottom line."
Granted, Bisciglia has a certain interest in Yahoo! maintaining its Hadoop efforts. But let's hope he's right. The destruction of Yahoo!'s search engine comes just as Hadoop is taking off. It underpins Facebook's backend infrastructure. It's offered up from Amazon's Web Services cloud. And last month's Hadoop Summit - driven by, yes, Yahoo! - attracted more than 700 developers from around the globe.
What's more, Hadoop had finally made Yahoo! relevant again. Yes, the project was inspired by work done at Google. But whereas Google has kept GFS and MapReduce largely hidden behind the walls of the Mountain View Chocolate Factory, Yahoo! has embraced this new-age distributed computing paradigm as an open source project, inspiring countless other developers and web outfits along the way. And at least until Google says otherwise, the open-source incarnation of MapReduce is outperforming the original.
After years as a frivolous headline that few actually bothered to click on, Yahoo! has finally found its mojo. What a shame it would be if Microsoft took it away. ®
With a blog post Thursday morning, after this story was published, Hadoop development VP Eric Baldeschwieler has reaffirmed Yahoo!'s commitment to the project. "Don't Panic!," he wrote. "We are as committed as ever to building a world class open source Cloud Computing infrastructure and Apache Hadoop remains our solution for batch computing. Hadoop is used to solve many, many internet scale problems beyond search at Yahoo. Today's deal only improves Yahoo's ability to invest in Hadoop.
"Yahoo is buzzing with more energy and bigger plans than ever before. The Hadoop team is running to keep up with our internal customers demands for ever larger, faster and better clusters. We are all looking forward to working with you, the wider Hadoop community, to build the better Hadoop that we all want."