This article is more than 1 year old

GitLab versus The Zombie Repos: An old plot needs a new twist

Git back, git back, git back to where your files belong

Opinion GitLab is chewing on life's gristle. The problem, we hear, is that deadbeat freeloaders are sucking up its hosting lifeforce. The company's repo hive is clogged with zombie projects, untouched for years but still plugged into life support. It's costing us a million bucks a year, sighed GiLab's spreadsheet wranglers, and for what? 

$1 million is certainly a lot to be wasting on a fossil collection, and is a full quarter of the company's total hosting costs. Who wouldn't want to spend it on something more fun? One answer is to cut 'em loose, which was what GitLab was expected to do from September. In an attempt to forestall the inevitable tsunami of techiness, the GitLabbers set very generous rules – a project has to be untouched for a year, there'll be plenty of warning, and the merest brush of a code fairy's gossamer wings will reset the clock.

But that was never going to quell the outrage. Some of this is entitlement bias, but a lot of it is because of the harm done to open source when stuff just vanishes from places where it was once assured a safe harbor. Last week, just hours after The Reg exclusively broke the story, the org made a quick U-turn.

The problem of freemium

On the face of it, GitLab has fallen into the Freemium Brand Trap. The logic behind free tiers on a paid-for service is sound enough. If you have a system where the incremental cost of adding a new user is low enough, you can give restricted user access away for free. People can get their feet wet and build useful things; once they see your service as beneficial for larger projects, they can hand over the cash for the full-fat experience. For a service that's aimed at open source communities, there's the added bonus of Good Egg status – your brand becomes beloved. It's win-win. 

The trap comes when the cumulative load of free tier users starts to cost more than you planned. If you're a crass commercial outfit, then you can put the screws on and trust that the paying base will keep your profile high and your brand aloft. With open source, you're seen as damaging the community. Because you are. 

This is the first problem with the assumption that untouched code is dead code. Open source is built on stable components. Stable means not being fiddled with. But when you do need to revisit long-established code, say because a big yet ancient security flaw has just surfaced, you really need to revisit it. Open source should mean that this is always possible, no matter who used to own the code and what its life story has been since it was looked at. 

Those are immediate concerns about vanishing code. Open source also means having code available for research, for education, for whatever unforeseen reasons. Nobody forced GitLab to offer a free service to open source, but it did – and that means a responsibility. It's part of the world's communal memory now. 

Noble ideals aren't any good if you can't afford them, though, and a megabuck off the bottom line won't stop bleeding away by itself. Let's look at that figure. Is it legit? You can store a petabyte for up to five years in Fujifilm's Object Archive service for around $45,000, or less than $10k a year, one of the best deals around.  GitLab has around 29 million non-active users at 5GB free repo space. That's 145 petabytes, or $1.3 million a year. The tyranny of the Freemium Brand Trap is the tyranny of numbers. 

How much does the average zombie user actually use? It won't be 5GB. Only GitLab knows for sure, but there are clues. When the Windows source was moved to a Git repo for the first time it was 3.5 million files supported by 4,000 engineers, and that came to a 300GB repo. That's under 100MB per user. Repos grow rapidly as old data is kept, up to a point, but that's amenable to cleaning. At Windows levels of repo use, that's $26,000 annual zombie tax. 

Nobody can deny that GitLab has a zombie problem. It's not clear how big it really is, nor whether GitLab's proposed solution is optimal, especially given the nature of open source. But if the tyranny of numbers can work against you, it can work for you. What if the free tier was contingent on offering 10GB of your local storage to the community, with the resultant  aggregated free tier storage managed by GitLab as the hosting system?

Would that even work? That turns out to be a set of really interesting questions which could turn a lot of freemium models on their head. 

It's not GitLab's job to develop startlingly innovative solutions to basic problems. But it is GitLab's job to be a good open source citizen, and that means not brooding in secret about problems and concocting The Answer. Go to the community. Lay out the facts. Say what solutions look plausible to you, then ask for input. Better believe you'll be having that conversation whether you like it or not, so own it from day one. 

 It goes against the grain for commercial companies to present problems rather than solutions, but the whole idea of open source is community problem solving and a rather better willingness to accept the truth than is the commercial norm. Sometimes, this means accepting that there's no such thing as a free lunch. Sometimes, this means compromise. Sometimes, this means a brilliant new idea. 

We won't know, and GitLab won't know, if we don't try and find out. ®

More about

TIP US OFF

Send us news


Other stories you might like