GitHub has tried to reassure users that it is targeting zero downtime with the help of new data centres and infrastructure software – some being open-sourced.
“The fundamentals of GitHub is it’s there when you need it. GitHub needs to be as reliable as a light switch or a dial tone,” chief executive Chris Wanstrath told The Reg.
“Our goal is no outages. It’s hard to get to 99.999 but that’s what we are shooting for. Everything we are talking about falls apart if those things aren’t guaranteed.”
Wanstrath spoke at last week’s GitHub Satellite conference in London’s Docklands.
GitHub hosts boasts 22 million developers and more than 59 million repos.
Founded in 2008, Github is regarded the world’s largest version-control repository service and its success has seen off code hosting projects from some big names in tech – for example, Google Code from Google and Microsoft’s CodePlex, the latter of which is shutting this year.
But the service, ironically, has been dogged by ongoing and niggling slowdowns and outages, much to the ire of developers who depend on the service.
“I’ll admit we’ve already had scaling problems and some years are great and some are less great, but it’s not that we are reacting to the fact there may have been an outage or not an outage. We are going to be constantly pushing towards no outages, towards reliability and GitHub feeling fast no matter where you are in the world,” Wanstrath said.
GitHub has received $350m in venture investment in three rounds from Andreessen Horowitz and Sequoia Capital. At least some of that has gone on servers – build and buying data centres around the world.
The company also re-architected the underlying Git hardware foundations with Spokes, replacing DRDM and RAID master-servant set up with floating IP for a three-phase commit running on what GitHub calls “non-enterprise” storage.
Spokes uses a three-phase commit to Git, that gives you not just three versions of your data but also widely dispersed storage to avoid any local problems.
GitHub reckons it has a number of patents on the closed-source Spoke.
Sam Lambert, GitHub’s senior director for infrastructure engineering, promised more infrastructure code is coming – only this would be open sourced.
Other projects open-sourced by GitHub include OctoDNS to synchronise different DNS, GH-OST for online schema migrations and GitHub Load Balancer (GLB).
Within the next six months, two more infrastructure projects at GitHub will be open-sourced, Lambert said, one being for database technologies “solving problems I’m sure everybody is having”.
GitHub builds much of its own infrastructure.
Lambert said GitHub is opensourcing code to showcase what it’s building, attract talent, influence best practices and crowd-source problem solving.
“The great thing about infrastructure is you can open-source stuff and you are not giving away the secret sauce. Why not open-source it and let people learn from that,” Lambert said.
Wanstrath reckoned GitHub, which hosts its own Git servers, has had to build its own fixes and infrastructure because of the unique nature of its architecture.
“We don’t have database sharding problems - our data problems are on the Git side, the Terabytes of data we store,” he told The Reg.
“There’s not a lot of prior art – we can’t grab a MySQL cluster solution and apply that to Git. A lot of the problems we’ve had we had to build our own,” said Wanstrath.