Google, Bing, Yahoo! data hoarding is like homeopathy. It doesn't work – new study claims

Boffins find search quality unaffected


Data, it has been argued, is the new oil – the fuel for the information economy – but its importance to search engines may be overstated.

In a paper released on Monday through the National Bureau of Economic Research, Lesley Chiou, an associate professor at Occidental College, and Catherine Tucker, a professor at the MIT Sloan School of Management, all in the US, argue that retaining search log data doesn't do much for search quality.

Data retention has implications in the debate over Europe's right to be forgotten, the authors suggest, because retained data undermines that right. It's also relevant to US policy discussions about privacy regulations.

A decade ago, Google changed its search data retention policy for server logs from as long as it wants, to... as long as it wants, with a caveat: the data is identifiable only for the first 18‑24 months, after which it gets anonymized.

It was an issue other search engine providers like Microsoft and Yahoo! had to confront, too.

By 2008, Google had settled on the removal of the last 8 bits of the IP address after nine months, and on more substantive anonymization after 18 months.

At the time, the company said one of its reasons for keeping search logs was "to improve our search algorithms for the benefit of users."

There are other reasons to retain data, such as legal compliance and anti-spam efforts.

Google tracks what you spend offline to prove its online ads work. And privacy folks are furious

READ MORE

But it can be beneficial to avoid keeping too much data around. Data retention turns a company into a magnet for legal requests and represents a liability in the event of hacking. Storage infrastructure also has a cost.

To determine whether retention policies affected the accuracy of search results, Chiou and Tucker used data from metrics biz Hitwise to assess web traffic being driven by search sites.

They looked at Microsoft Bing and Yahoo! Search during a period when Bing changed its search data retention period from 18 months to 6 months and when Yahoo! changed its retention period from 13 months to 3 months, as well as when Yahoo! had second thoughts and shifted to an 18‑month retention period.

According to Chiou and Tucker, data retention periods didn't affect the flow of traffic from search engines to downstream websites.

"Our findings suggest that long periods of data storage do not confer advantages in search quality, which is an often-cited benefit of data retention by companies," their paper states.

Asked via email whether these findings suggest that Google has overstated the value of search log data, Chiou told The Register, "Our study examined retention data policies for Yahoo! and Bing and did not study Google, as Google did not undergo any changes in its retention policy at the time. Our paper does not find evidence that Yahoo!'s and Bing's change conferred an advantage."

Chiou and Tucker observe that the supposed cost of privacy laws to consumers and to companies may be lower than perceived. They also contend that their findings weaken the claim that data retention affects search market dominance, which could make data retention less relevant in antitrust discussions of Google. ®

Broader topics


Other stories you might like

  • DuckDuckGo tries to explain why its browsers won't block some Microsoft web trackers
    Meanwhile, Tails 5.0 users told to stop what they're doing over Firefox flaw

    DuckDuckGo promises privacy to users of its Android, iOS browsers, and macOS browsers – yet it allows certain data to flow from third-party websites to Microsoft-owned services.

    Security researcher Zach Edwards recently conducted an audit of DuckDuckGo's mobile browsers and found that, contrary to expectations, they do not block Meta's Workplace domain, for example, from sending information to Microsoft's Bing and LinkedIn domains.

    Specifically, DuckDuckGo's software didn't stop Microsoft's trackers on the Workplace page from blabbing information about the user to Bing and LinkedIn for tailored advertising purposes. Other trackers, such as Google's, are blocked.

    Continue reading
  • Despite 'key' partnership with AWS, Meta taps up Microsoft Azure for AI work
    Someone got Zuck'd

    Meta’s AI business unit set up shop in Microsoft Azure this week and announced a strategic partnership it says will advance PyTorch development on the public cloud.

    The deal [PDF] will see Mark Zuckerberg’s umbrella company deploy machine-learning workloads on thousands of Nvidia GPUs running in Azure. While a win for Microsoft, the partnership calls in to question just how strong Meta’s commitment to Amazon Web Services (AWS) really is.

    Back in those long-gone days of December, Meta named AWS as its “key long-term strategic cloud provider." As part of that, Meta promised that if it bought any companies that used AWS, it would continue to support their use of Amazon's cloud, rather than force them off into its own private datacenters. The pact also included a vow to expand Meta’s consumption of Amazon’s cloud-based compute, storage, database, and security services.

    Continue reading
  • Atos pushes out HPC cloud services based on Nimbix tech
    Moore's Law got you down? Throw everything at the problem! Quantum, AI, cloud...

    IT services biz Atos has introduced a suite of cloud-based high-performance computing (HPC) services, based around technology gained from its purchase of cloud provider Nimbix last year.

    The Nimbix Supercomputing Suite is described by Atos as a set of flexible and secure HPC solutions available as a service. It includes access to HPC, AI, and quantum computing resources, according to the services company.

    In addition to the existing Nimbix HPC products, the updated portfolio includes a new federated supercomputing-as-a-service platform and a dedicated bare-metal service based on Atos BullSequana supercomputer hardware.

    Continue reading

Biting the hand that feeds IT © 1998–2022