Ever felt that a few big tech companies are following you around the internet? That's because ... they are

Experimental blocking of sites that load resources from four big companies makes the web unusable


A new extension for Google Chrome has made explicit how most popular sites on the internet load resources from one or more of Google, Facebook, Microsoft and Amazon.

The extension, Big Tech Detective, shows the extent to which websites exchange data with these four companies by reporting on them. It also optionally blocks sites that request such data. Any such request is also effectively a tracker, since the provider sees the IP number and other request data for the user's web browser.

The extension was built by investigative data reporter Dhruv Mehrotra in association with the Anti-Monopoly Fund at the Economic Security Project, a non-profit research group financed by the US-based Hopewell Fund in Washington DC.

Cara Rose Defabio, editor at the Economic Security Project, said: "Big Tech Detective is a tool that pulls the curtain back on exactly how much control these corporations have over the internet. Our browser extension lets you 'lock out' Google, Amazon, Facebook and Microsoft, alerting you when a website you're using pings any one of these companies… you can't do much online without your data being routed through one of these giants."

One of the sites blocked by Big Tech Detective is that of its own sponsor, the Economic Security Project

One of the sites blocked by Big Tech Detective is that of its own sponsor, the Economic Security Project

Let's talk infrastructure

That, perhaps, is an exaggeration. Big Tech Detective will spot sites that use Google Analytics to report on web traffic, or host Google ads, or use a service hosted on Amazon Web Services such as Chartbeat analytics - which embeds a script that pings its service every 15 seconds according to this post - but that is not the same as routing your data through the services.

In terms of actual data collection and analysis, we would guess that Google and Facebook are ahead of AWS and Microsoft, and munging together infrastructure services with analytics and tracking is perhaps unhelpful.

Another point to note is that a third-party service hosted on a public cloud server at AWS, Microsoft or Google is distinct from services run directly by those companies. Public cloud is an infrastructure choice and the infrastructure provider does not get that data other than being able to see that there is traffic.

Dependencies

Defabio made the point, though, that the companies behind public cloud have huge power, referencing Amazon's decision to "refuse hosting service to the right wing social app Parler, effectively shutting it down." While there was substantial popular approval of the action, it was Amazon's decision, rather than one based on law and regulation.

She argued that these giant corporations should be broken up, so that Amazon the retailer is separate from AWS, for example. The release of the new extension is timed to coincide with US government hearings on digital competition, drawing on research from last year.

Digital power is only one of the issues which a utility like this reveals. Site and business reliability is another: if there is an outage with services like AWS or Microsoft's Office 365, the impact is huge. Even if the problem is just advertising scripts or Google-hosted fonts going offline, it can cause errors and performance issues. These are dependencies, and the more a site has, the less reliable it is – though most of the time the reason for problems is closer to home.

Privacy is a third issue and the ubiquity of tracking techniques, not only from these big four tech companies but also from elsewhere in the ad tech industry, is impacting society in unexpected ways. It is a short path from personalised advertising, so someone browsing the web sees ads for things more likely to interest them, to powerful techniques for manipulating public opinion.

El Reg takes it for a spin

What does Big Tech Detective actually reveal? We installed it into Chrome, which requires developer mode and a manual download since it is unlikely to be approved for the official Chrome Web Store. By default the tool gathers statistics, but there is an option to block sites which request data from any of these four companies. Engaging this "traffic lock" means that almost every site will be blocked, as will the search engine. Even privacy-focused search engine duckduckgo, for example, fails because it loads resources from Microsoft.

There do seem to be some flaws with the extension. We tried it on a site which is close to the default you get from an ASP.NET Core application created using an official template in Microsoft's Visual Studio. We were concerned to find that, according to Big Tech Detective, it has a dependency on Microsoft. Looking more closely though, the CSS stylesheet it identified only had a comment in the source code referencing documentation on Microsoft's site. There was no actual data transfer.

Did we find any sites that do not use any of these companies? One was sqlite.org, the official site for the widely used open source database engine, which also serves as a demonstration of how fast and lightweight it is. The site stated that it "handles about 400K to 500K HTTP requests per day, about 15-20 per cent of which are dynamic pages touching the database. Dynamic content uses about 200 SQL statements per webpage. This setup runs on a single VM that shares a physical server with 23 others."

Big Tech Detective analyses sites you visit. This data is held locally, but data is sent to the extension’s server to map IP numbers to source companies.

Big Tech Detective analyses sites you visit. This data is held locally, but data is sent to the extension’s server to map IP numbers to source companies

The official Linux site, kernel.org, also passed, but not that of the Linux Foundation, which reportedly loads resources from Google and Amazon. Items included analytics, fonts and captcha scripts from Google, as well as services running on AWS.

The extension lists all such requests with the full URL of each, which can be instructive. A local newspaper site (a genre notorious for the extent of their intrusive advertising and tracking scripts) apparently made 166 requests to Amazon, 77 to Google and one to Microsoft, all for a single page. Big Tech Detective also blocked its sponsor's site, the anti-monopoly project at the Economic Security Project, reporting links to Google and Facebook (the embedded YouTube video likely did not help).

Big Tech Detective itself has a privacy policy informing us that: "The IP addresses of a requested webpage and the content it loads — for example, Facebook pixels and Google Analytics scripts, fonts, and images — are encrypted with HTTPS and sent it to a Cloud Application Platform called Digital Ocean, which follows a different privacy policy."

The reason for this is to connect with an application that identifies the company behind each IP address, and the policy stated that "Big Tech Detective does not store or retain any information about your browsing history on its servers."

Opinions on the goals of the sponsors of this project will vary, but as a mechanism for explaining the extent to which the internet depends on and links to resources from a few giant companies, it is effective. ®


Biting the hand that feeds IT © 1998–2021