Bing finds meaning in Powerset

Well, at least a little


Powerset's semantic obsession is already working its way into Bing's primary search engine, helping to suss out the meaning behind end-user queries, generate captions for query results, and suggest related queries.

Microsoft acquired the San Francisco-based Powerset last summer in a deal worth a reported $100m, nearly a year before Bing's much-ballyhooed debut. At the time, the startup offered a semantic search engine that indexed nothing but Wikipedia, and this Wikicontraption was eventually bolted to the side of Bing's primary search engine and rechristened as a "Reference" vertical.

But the ultimate goal is to meld Powerset's semantic indexing with Bing proper, and according to Scott Prevost, who oversees the Powerset's interplay with Redmond, the melding is well underway.

"We're taking pieces of our technology and integrating it throughout the Bing stack," Prevost tells The Reg. "So things like helping on some of the query processing. And we're now working on some of the caption generation - the text that occurs under the blue link on search results. This is part of a longer-term, deeper integration of our technologies throughout all of Bing."

He also says the outfit is "doing some work with related searches" - i.e. helping to suggest additional queries the user may be interested in.

Still based in San Francisco - several hundred miles away from Bing's Redmond base - the 65-person-strong Powerset is "diving very deeply" into the task of caption generation. "It's one of the things that helps users understand the relevance of a particular search result to their query," Prevost says. "If you have good captions, it helps users not waste time looking through pages.

"One of the challenges in developing captions is finding the right pieces of text on a page to represent that link, so semantic processing really helps. It helps pick the right sentences, sentences that may have the right concepts but not necessarily the keywords from [the user's query]. It helps us pick the piece of the sentence that's most relevant and not chop it off in places that makes it unreadable...

"You see things in Powerset captions such as whole phrases being highlighted, phrases where the words don't match all the keywords but the meaning of the words matches. Sometimes, you get a great sentence in an article and it doesn't have all the keywords but it's really the thing that best explains what the sentence is about."

Using its own back-end infrastructure, Powerset works to build a semantic index for at least a portion of the web. "When we index a document, we do much heavier processing," Prevost explains. "We do deep linguistic processing, everything from morphological analysis - scanning the words for our speech patterns - to full-on syntactic parsing of sentences.

"Then we have a component that extracts semantic relationships from those parses." For instance, the outfit's proprietary tech works to recognize synonyms or associated generic pronouns with particular names. Then, after doing a similar analysis on an end-user query, Powerset can match semantic data between query and index.

Yes, Powerset's back-end runs on Hadoop, the open-source distributed-computing platform based on Google's proprietary infrastructure. Powerset originated Hadoop's Hbase project, a mirror of Google's distributed database, BigTable. And yes, that means open-source code is juicing at least a portion of Bing proper. "What we provide Bing with is data, and data can be produced using various open-source tools in Powerset's data center," Prevost says.

Famously, Microsoft spent years treating open source like a pariah, and even now it seems that relatively few of the company's shipping products embrace open code. But according to Prevost, Microsoft was always open to the idea of retaining Powerset's Hbase base.

"We obviously had a lot of conversations [with Microsoft] about what we were doing and why it was important," Prevost says. "Microsoft was very open to the idea of open source. Obviously, Microsoft has a lot of IP concerns with software in so many different domains, so they want to be very careful about these things...but it was really just a matter of working out the details."

After the acquisition, while these conversations played out, Powerset's two full-time Hbase committers took leave from the project. But by October, they were approved to resume contributing patches.

As you might expect, Microsoft has no plans to migrate Bing proper onto the platform. "We haven't done anything to the Bing code base that explicitly uses Hbase," Prevost says.

But whether it's underpinned by Hadoop or not, Powerset intends to build a semantic index for the entire web. It just needs some time - and some cheaper, faster processing power. "Where we are right now is that it's still very expensive. We spend a lot more time indexing a page and that takes a lot more processing power. And that creates a much larger index, which is more expensive to serve. It wouldn't make sense for us to index the entire web, because it would be highly expensive, and for certain kinds of pages, we might not see the value."

So, for the moment, Powerset is indexing Wikipedia. But there's more to come. It may add other, contained datasets to Bing's Reference vertical, before attempting to embrace the web as whole. And yes, it will take that Reference tab out of hiding. As it stands, Powerset's Wikisearch is limited to a relatively small number of queries, including the search for "Albert Einstein."

Bootnote

How does Powerset avoid Wikinonsense? According to Prevost, it re-indexes the "free encyclopedia anyone can edit" every two hours or so. "We look for changes and re-index those articles," Prevost explains. "That helps to make sure we don't have pages that are vandalized...the vandalized pages get fixed pretty quickly." Or so it seems.


Other stories you might like

  • Google has more reasons why it doesn't like antitrust law that affects Google
    It'll ruin Gmail, claims web ads giant

    Google has a fresh list of reasons why it opposes tech antitrust legislation making its way through Congress but, like others who've expressed discontent, the ad giant's complaints leave out mention of portions of the proposed law that address said gripes.

    The law bill in question is S.2992, the Senate version of the American Innovation and Choice Online Act (AICOA), which is closer than ever to getting votes in the House and Senate, which could see it advanced to President Biden's desk.

    AICOA prohibits tech companies above a certain size from favoring their own products and services over their competitors. It applies to businesses considered "critical trading partners," meaning the company controls access to a platform through which business users reach their customers. Google, Apple, Amazon, and Meta in one way or another seemingly fall under the scope of this US legislation. 

    Continue reading
  • Makers of ad blockers and browser privacy extensions fear the end is near
    Overhaul of Chrome add-ons set for January, Google says it's for all our own good

    Special report Seven months from now, assuming all goes as planned, Google Chrome will drop support for its legacy extension platform, known as Manifest v2 (Mv2). This is significant if you use a browser extension to, for instance, filter out certain kinds of content and safeguard your privacy.

    Google's Chrome Web Store is supposed to stop accepting Mv2 extension submissions sometime this month. As of January 2023, Chrome will stop running extensions created using Mv2, with limited exceptions for enterprise versions of Chrome operating under corporate policy. And by June 2023, even enterprise versions of Chrome will prevent Mv2 extensions from running.

    The anticipated result will be fewer extensions and less innovation, according to several extension developers.

    Continue reading
  • Azure issues not adequately fixed for months, complain bug hunters
    Redmond kicks off Patch Tuesday with a months-old flaw fix

    Updated Two security vendors – Orca Security and Tenable – have accused Microsoft of unnecessarily putting customers' data and cloud environments at risk by taking far too long to fix critical vulnerabilities in Azure.

    In a blog published today, Orca Security researcher Tzah Pahima claimed it took Microsoft several months to fully resolve a security flaw in Azure's Synapse Analytics that he discovered in January. 

    And in a separate blog published on Monday, Tenable CEO Amit Yoran called out Redmond for its lack of response to – and transparency around – two other vulnerabilities that could be exploited by anyone using Azure Synapse. 

    Continue reading
  • I was fired for blowing the whistle on cult's status in Google unit, says contractor
    The internet giant, a doomsday religious sect, and a lawsuit in Silicon Valley

    A former Google video producer has sued the internet giant alleging he was unfairly fired for blowing the whistle on a religious sect that had all but taken over his business unit. 

    The lawsuit demands a jury trial and financial restitution for "religious discrimination, wrongful termination, retaliation and related causes of action." It alleges Peter Lubbers, director of the Google Developer Studio (GDS) film group in which 34-year-old plaintiff Kevin Lloyd worked, is not only a member of The Fellowship of Friends, the exec was influential in growing the studio into a team that, in essence, funneled money back to the fellowship.

    In his complaint [PDF], filed in a California Superior Court in Silicon Valley, Lloyd lays down a case that he was fired for expressing concerns over the fellowship's influence at Google, specifically in the GDS. When these concerns were reported to a manager, Lloyd was told to drop the issue or risk losing his job, it is claimed. 

    Continue reading
  • End of the road for biz living off free G Suite legacy edition
    Firms accustomed to freebies miffed that web giant's largess doesn't last

    After offering free G Suite apps for more than a decade, Google next week plans to discontinue its legacy service – which hasn't been offered to new customers since 2012 – and force business users to transition to a paid subscription for the service's successor, Google Workspace.

    "For businesses, the G Suite legacy free edition will no longer be available after June 27, 2022," Google explains in its support document. "Your account will be automatically transitioned to a paid Google Workspace subscription where we continue to deliver new capabilities to help businesses transform the way they work."

    Small business owners who have relied on the G Suite legacy free edition aren't thrilled that they will have to pay for Workspace or migrate to a rival like Microsoft, which happens to be actively encouraging defectors. As noted by The New York Times on Monday, the approaching deadline has elicited complaints from small firms that bet on Google's cloud productivity apps in the 2006-2012 period and have enjoyed the lack of billing since then.

    Continue reading
  • UK competition watchdog seeks to make mobile browsers, cloud gaming and payments more competitive
    Investigation could help end WebKit monoculture on iOS devices

    The United Kingdom's Competition and Markets Authority (CMA) on Friday said it intends to launch an investigation of Apple's and Google's market power with respect to mobile browsers and cloud gaming, and to take enforcement action against Google for its app store payment practices.

    "When it comes to how people use mobile phones, Apple and Google hold all the cards," said Andrea Coscelli, Chief Executive of the CMA, in a statement. "As good as many of their services and products are, their strong grip on mobile ecosystems allows them to shut out competitors, holding back the British tech sector and limiting choice."

    The decision to open a formal investigation follows the CMA's year-long study of the mobile ecosystem. The competition watchdog's findings have been published in a report that concludes Apple and Google have a duopoly that limits competition.

    Continue reading

Biting the hand that feeds IT © 1998–2022