This article is more than 1 year old
Have you always wanted an algorithm that can search like Bing? Well, if you change your mind, one's on GitHub now
Make your app answer all the easy questions, like, 'Where can I download Chrome?'
Microsoft has open sourced a machine-learning algorithm that powers part of its web search engine Bing.
The code is designed to answer questions like, "When was Bing launched?" though we fear it can't answer the hardest questions of all: "Why is Bing still a thing?" and "Steve Ballmer?"
“Keyword search algorithms just fail when people ask a question or take a picture and ask the search engine, ‘What is this?’” said Rangan Majumder, group program manager on Microsoft’s Bing search and AI team, in announcing the availability of the code.
To answer those sorts of probing inquiries, Microsoft's Bing engineers encoded more than 150 billion pieces of data, from words and web pages to images, as vectors. Representing data in this way is common in deep learning, as it helps neural networks discern all the underlying patterns in a particular dataset, and work out how bits and pieces of information are related to one another. These relationships can be used to turn questions into answers, linking searches to results.
These vectors weren't used to build a neural network, though. Instead, another machine-learning algorithm known as Space Partition Tree And Graph (SPTAG) roots through this massive pile of vectors when someone tries to look up information. The algorithm clusters similar vectors together, and finds the closest neighboring vectors that are related to what is being searched to retrieve and gather the most relevant information.
For example, if someone searches the web using an image of the Eiffel Tower, Bing will convert that picture into a vector using a PyTorch model, and then the SPTAG algorithm finds the closest vectors relating to the image to bring up other images of the Eiffel Tower. Similarly, asking Bing how tall is the Eiffel Tower retrieves information closest to those words: the actual height of the tower.
“Bing processes billions of documents every day, and the idea now is that we can represent these entries as vectors and search through this giant index of 100 billion-plus vectors to find the most related results in 5 milliseconds,” said Jeffrey Zhu, program manager on Microsoft’s Bing team.
Microsoft’s Bing dinged: What happened, Xi Jinping?
READ MORESearching by vectors also means that users don’t have to be precise with their queries, Microsoft explained. Submitting something like “How tall is the tower in Paris?” will still return the right information about the Eiffel Tower even though the search query didn't explicitly contain the words “Eiffel Tower.”
“Vector search makes it easier to search by concept rather than keyword,” Team Redmond explained. The vector for Eiffel Tower is grouped together with the vectors for “tower” and “Paris” because they’re closely related.
Microsoft released Bing's SPTAG algorithm to help enterprises perform searches within various applications and help academics study different methods of searching vectors. “We’ve only started to explore what’s really possible around vector search at this depth,” Majumder said.
You can play with the code right here: it’s written in C++ with a Python wrapper, and is MIT licensed. ®