Sourcehut to shun Google's Go Module Mirror over greed
Code hosting service fed up with excessive bandwidth consumption
Sourcehut, a code hosting service similar to GitHub, GitLab, Gitea, and the like, plans to start blocking the Go Module Mirror, a proxy that fetches and caches code from git servers, because it has been using up too much network bandwidth.
Starting February 24, developers running go get or a similar command on Go packages, in order to import modules from SourceHut repositories, will see an error message. To resolve it, they will have to use a workaround to fetch the desired code.
In a blog post on Monday, Drew DeVault, founder of Sourcehut, explained that this decision follows from the behavior of Google's Go Module Mirror, which he described last year as a distributed denial of service attack in an attempt to convince Google's Go team to alter the behavior of its systems.
The Go module proxy, according to DeVault, not only handles user requests via the go get command but also, on its own, clones git repos in their entirety from multiple servers that do not coordinate their requests.
The Go Module Mirror may make these requests as many as 2,500 times per hour, often in conjunction with up to a dozen clone operations. These, DeVault, says are highly redundant and can result in a single git repo being fetched more than 100 times per hour.
This represents roughly 70 percent of Sourcehut's outbound traffic, with a single module producing as much as 4 GiB of daily traffic from Google.
"The cost of bearing this traffic is no longer acceptable to us, and the Go team has made no attempts to correct the issue during this time," wrote DeVault. "We want to avoid causing inconvenience for Go users, but the load and cost is too high for us to continue justifying support for this feature."
DeVault opened a GitHub issue to convince Google's Go engineers to deal with the problem on February 24, 2021, but after two years of back and forth, the mitigations put in place have had only minimal impact.
Other Go module maintainers have also complained about the Go Module Mirror's heedless consumption of computing resources.
"Yesterday, Go Module Mirror downloaded 4 gigabytes of data from my server requesting a single module over 500 times (log attached)," wrote developer Ben Lubar in a May 30, 2021 post. "As far as I know, I am the only person in the world using this Go module. I would greatly appreciate some caching or at least rate limiting."
- When software depends on a project thanklessly maintained by a random guy in Nebraska, is open source sustainable?
- Google debuts OSV-Scanner – a Go tool for finding security holes in open source
- 'Trust no one' is good enough for the X Files but not for software devs: How do you use third-party libs and stay secure, experts mull on stage
- Google's Dart language soon won't take null for an answer
Russ Cox, Go programming language tech lead at Google, responded to a discussion of the issue on Hacker News by noting that the Go team has been making progress in its effort to address the issue.
Go 1.19, he said, includes a way to download modules with a -reuse flag that makes refresh operations use less bandwidth by avoiding unchanged data. Cox said that proxy.golang.org service hasn't yet been revised to support this language change, but it's on the list of planned work for this year.
"On the one hand Sourcehut claims this is a big problem for them, but on the other hand Sourcehut also has told us they don't want us to put in a special case to disable background refreshes," said Cox, citing DeVault's insistence that Google fix its "a wrong solution" and that Sourcehut should not be afforded special treatment. "The offer to disable background refreshes until a more complete fix can be deployed still stands, both to Sourcehut and to anyone else who is bothered by the current load."
This is not the first time Google's code has been accused of wasting others' bandwidth. In August 2020, APNIC, the Regional Internet Registry for the Asia-Pacific region, complained that the Chromium team's 2008 decision to combine Google's browser search keyword input box with its URL input box as a single omnibox led to a huge amount of DNS traffic.
The extra data was the result of browser code designed to distinguish between search terms and URLs by checking whether network service providers were engaged in NXDomain hijacking, capturing errors – typos or queries for non-existent domains – and monetizing them by returning responses tied to their own services.
At the time, about half of the DNS root server traffic – 60 billion queries to the root server system per day – was attributed to Chromium's effort to separate queries from URLs. Google tamed its query-URL disambiguation probes in February 2021. ®