GitHub goes off the Rails as Microsoft closes in
Ruby shop turns to Go, Java, and Kubernetes for platform makeover
Analysis GitHub invited a handful of journalists to its San Francisco headquarters to explain how the social code hosting biz is evolving from a website into a platform.
The event was hosted by Sam Lambert, whose title – head of platform – removes any doubt about how things will turn out.
Founded in 2008, GitHub became a platform following the introduction of its Developer Program in 2014. The program, which is to say the associated API, allows developers to build applications that integrate with GitHub. It has turned GitHub, which now has something like 28 million users, into a lynchpin for automated code deployment.
Last year's debut of the GitHub Marketplace, which offers third-party tools for enhancing developer workflows, made the platform friendly for those selling software to augment GitHub operations. And this year, Microsoft App Center and Google Container Builder showed up as Marketplace apps, signalling that GitHub isn't just a storefront for startups but a cog in the code deployment machines of major tech firms.
"We're working a lot more with major partners to have them start to use our primitives to bring workflows to our users," said Lambert, teasing significant announcements at GitHub Universe in October that will broaden GitHub's value as a platform for developers.
Simultaneously, GitHub is a social network, one that has largely avoided the content moderation and data grabbing controversies of consumer-oriented social networks like Facebook and Twitter. The difference can be seen in the fact that Facebook and Twitter are banned in mainland China. GitHub, althrough it has had past run-ins with Chinese authorities, remains accessible because Chinese developers depend on it.
Lambert said the company does not have infrastructure in China and hasn't been required to store Chinese user data locally. "It's not an issue," he said, insisting that the company has a great relationship with users, companies and the government in China, thanks to the appeal of open source software.
"GitHub gets great treatment from pretty much everywhere due to how important we are to most companies' digital economies," he said.
That popularity and importance to software-focused businesses explains why Microsoft is in the process of acquiring GitHub for $7.5 billion, a deal expected to close before the end of the year.
Breaking up is hard to do
GitHub's platform group is about 155 people at the moment and growing, said Lambert. And much of the group's focus is on breaking GitHub apart.
GitHub is about a third of the way through an architectural change that began last year. The company is moving away from Ruby on Rails toward a more heterogeneous, composable infrastructure. Ruby still has a place at GitHub – Lambert referred to the company as a Ruby shop, but he said there's more Go, Java and even some Haskell being deployed for services. The goal, he explained, is to make GitHub's internal capabilities accessible to integrators and partners.
"Our monolith is starting to break up and we're starting to abstract things into services," said Lambert. "The platform we've chosen to put them on is Kubernetes."
Lambert said Kubernetes hasn't resulted in fewer outages. "It's not a more reliable way of running servers," he said. "It's a better interface and a more consolidated way for us run services. It's allowed us to have more multitenancy across our stack. It has allowed our internal users to have a more streamlined development environment."
Internally, GitHub uses home-grown software called Moda to abstract Kubernetes and make it easier to manage. It's not yet an open source project, but Lambert says it will be, along with other applications like a Kubernetes node healer and the Kubernetes connector for GitHub's recently released datacenter load balancer, GLB Director.
"Moda is really awesome," explained Lambert. "You say to a chatbot, a Hubot, 'I want a new application,' and Moda just bootstraps a repository, puts all the Kubernetes config in there...and you can just start developing really, really quickly."
GitHub is a cloud service but it isn't a big user of cloud services. It relies on the public cloud for burst capacity when needed, and for some non-production workloads, but mostly it handles the load through its own data centers.
"We took this strategy of building data centers outside of major cities by having infrastructure inside the carrier hotels that just handles networking," said Lambert. "Then we run our own dark fiber out to these spoke data centers that have loads and loads of machines."
This arrangement is about 70 per cent cheaper than hosting inside data centers in major metro areas, he explained, at a latency cost of about 1-2ms. Using GitHub's own data centers, he said, is about ten times cheaper because the workloads tend to be I/O and storage intensive, which isn't great for the cloud.
Lambert said it's premature to think about adopting Microsoft Azure, adding that it's unlikely because the company operates in such a bespoke way. "Everything we do is on the extreme edge," he said, pointing to teams dedicated to kernel modification and designing custom hardware racks for its data centers.
"We don't try to sign ourselves up for that sort of lock-in," he said in reference to cloud services. "We prefer to get in the guts of everything we run in our own way."
DIY servers? Maybe some day
GitHub doesn't go so far as designing its own servers. Lambert sees that as the province of firms with millions of servers like Facebook and Google.
"You have to be hyperscale for that to be pragmatic," he said. But GitHub does have its own server rack configuration that can be ordered from a supplier like Dell, custom fit with switches and cables, and deployed in a data center a matter of hours.
Even so, he acknowledged that some workloads, like AI, make sense to run in the cloud because the hardware – GPUs, TPUs, etc – is changing so fast. "I want the dust to settle a bit on GPU technology before we start making massive buys," he said.
GitHub also uses the cloud to route traffic via VPoPs, virtual points of presence.
"We actually will use a public cloud's network infrastructure to stand up a virtual PoP," he explained.
GitHub is doing that in three major European cities with large telco presences. Through data gathered about network response times, the company figured out, for example, that it can serve India better from Europe than it could by putting a PoP in India.
GitHub's data center strategy also helped it weather what Lambert claims is the largest DDoS attack on record. On February 28, the site was hit by an online attack that peaked at 1.35Tbps. The rather insignificant consequence was that about 25 per cent of users had difficulty getting to the site for about three minutes.
Lambert attributed GitHub's resilience to its history of being tested by hackers. "It's extremely rare for anything big enough to wake up our engineers," he said, adding that he has some idea who is responsible but cannot comment further because the investigation is ongoing.
AI? OMG! BFF!
With regard to AI, Lambert says he believes the hype that it will change the world, but over the course of decades. For GitHub, he sees the benefits coming sooner because code is, as he describes it, "a scheme of how human beings solve problems." GitHub's data set, he said, is really easily adapted to machine learning and AI applications.
Later this year, he said, GitHub plans to start talking about how it's exploring AI. For example, he said GitHub has natural language code search working internally. It allows developers to search codebases using natural language. "You can just type in, 'Show me code that connects to the database,' and it will return code that does that," he said.
The company also has prototypes of bots that automatically write code. If the software works as suggested, GitHub's current security vulnerability warnings could become pull requests to repair bad code instead of notifications to fix flaws yourself. The cloud code repo has also been testing sentiment analysis to identity helpful, friendly community members.
Gits club GitHub code tub with record-breaking 1.35Tbps DDoS drubREAD MORE
"We know from studies that software developers spend 50 per cent of their time on environmental tasks," he said, referring to the setup, configuration and maintenance that tends to get in the way of writing new code. "We'd like to take away some of that toil and burden."
What Lambert is advocating is less reinvention of the wheel and more opportunity to create. He pointed to companies like Glitch and Zeit as examples of where he sees software development headed.
"Glitch its really awesome," he said. "It's not so much the implementation as what it tells us about what people want. They want an interactive, easy way to just drop in and do stuff and get an immediate feedback loop. … We want to get that feeling back into the hands of developers."
Lambert argues, as have others in the Silicon Valley area, that the ability to write code is empowering and that more needs to make software development accessible to a broader set of people.
"There used to be this pride in being super technical and getting into the weeds," he said. "That's kind of not cool anymore. What's cool is getting stuff to your users." ®