This article is more than 1 year old
Yahoo! search! results!, recommendations!, ad! flinging! code! is! now! open! source!
Here's the keys to ride Vespa into the sunset
Having nothing at all to do with scooters, the software provides a way to query structured and unstructured data, to organize and rank results, and to write data at scale. It's a system for running computations on large data sets in real-time.
"To achieve both speed and scale, Vespa distributes data and computation over many machines without any single master as a bottleneck," explained Jon Bratseth, a distinguished (software) architect at Yahoo, in a blog post. "Where conventional applications work by pulling data into a stateless tier for processing, Vespa instead pushes computations to the data."
Vespa does this, we're told, by managing clusters of nodes and redistributing the data when machines fail or new capacity gets added. It overlaps somewhat with services like Elastic Search or relational databases, but also handles middleware container logic and live reconfiguration.
Used at services like Yahoo.com, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Gemini, and Flickr for processing search queries, content recommendations, and serving ads, Vespa's arrival elicited tweets of enthusiasm from appreciative developers.
"Many years in the making, lots of brainpower and innovation – kudos to the Vespa team and Yahoo for making this happen," said Amotz Maimon, VP and distinguished engineer at Amazon.
You're doing Hadoop and Spark wrong and they will probably failREAD MORE
"Vespa is the single greatest piece of software Yahoo ever built," said npmjs co-founder and COO Laurie Voss. "It's like ElasticSearch but a hundred times better. I am so happy."
Until now, Yahoo!'s greatest code hit has been Hadoop, a distributed storage and data processing framework developed by the search biz and shepherded in 2006 into a successful second-life as an Apache open source project.
Hadoop seeded the market for companies focused on processing data at scale, such as Cloudera, Hortonworks, and MapR.
Vespa, said Bratseth, has been substantially amended over the past few years to fit modern computing stacks. It provides developers with the ability to present even huge data sets and models to the serving system and handle computations at request time, he said, adding that this can provide a better user experience in conjunction with lower cost and complexity than alternative approaches. ®