GitHub's journey towards microservices and more: 'We actually have our own version of Ruby that we maintain'

The Reg talks to Software Engineering veep Sha Ma


Interview GitHub has described efforts to break down its monolithic application architecture into microservices – and revealed that it still runs some services on AWS, even after the 2018 acquisition by Microsoft.

Sha Ma, VP of Software Engineering at GitHub spoke on the subject at the November Qcon Plus virtual developer event and spent some time with us afterwards.

The online code shack is among the world's busiest sites, used by over 50 million developers and hosting over 100 million repositories.

GitHub VP of Software Engineering Sha Ma at the Qcon Plus virtual developer event

GitHub VP of Software Engineering Sha Ma speaking at the Qcon Plus virtual developer event

GitHub was first built in 2008 using the Ruby on Rails web application framework. "GitHub's architecture is deeply rooted in Ruby on Rails," said Ma, adding that "a monolithic architecture got us pretty far," including multiple code deploys every day and high scale, "serving over a billion API calls daily."

People should still be able to work through command-line interfaces. That's why we're making it a priority to extract authentication as a core service outside the monolith, for enablement, so now people can use more of our systems when the web front end is not available...

The scale of the site demonstrates that claims that Ruby on Rails or a monolithic architecture do not scale are false. Why then is GitHub now migrating?

The decision is quite recent, Ma told us. "It really started at the beginning of this year. We acquired so many companies, like we've acquired Semmle which is based in Oxford, their primary stuff is in C and Python. And we've acquired Dependabot, NPM which is package management in JavaScript, and Pull Panda. Internally, we've also merged a few of the sister teams that were within Microsoft, so a lot of folks from Azure DevOps are now part of our team, and they are used to working in anything from C# to TypeScript.

"All that diversity which joined GitHub, which used to be just a Ruby on Rails shop, prompted us to think: how do we enable developers that have brought diversity in tech stack and skill set to be productive working together? That made us realise that the monolith as a sole development option is no longer viable."

Does that mean GitHub is migrating away from Ruby?

"Our strategy is not a complete replacement," said Ma. "The founders of GitHub were very deeply rooted in the Ruby community. They were contributors." GitHub has also hired leading Ruby developers over the years. "We actually have our own version of Ruby that we maintain," she told us.

"When things work well with GitHub then we contribute back into the Ruby open source... There are things we've done with the Ruby code base that are highly custom to make GitHub as performant as possible. We know we're never going to get away from that completely, and a lot of people are still very productive in that code base. For us it's going to be a hybrid environment for the foreseeable future."

On occasion, performance-critical code is written in other languages. "When we extracted authorization, we ended up rewriting that service outside the monolith, in Go," she said.

What about MySQL?

While MySQL has performed well overall, it has also been identified as an issue in some of the company's outage reports. Has GitHub considered migrating to a different database manager?

"Similar to Ruby, we have world-renowned DBAs [database administrators] that have scaled a lot of systems. For the foreseeable future we're going to remain with MySQL just because we have a lot of expertise there," Ma said.

Further, the process of splitting out functional groups into microservices will also enable some breaking apart of the data. "Even as a monolith, we've been able to scale," Ma told us. "We feel that our MySQL solution still has quite a bit of runway for us, both from a performance perspective and in terms of how much we can store."

How has the Microsoft acquisition influenced GitHub's infrastructure? Before the acquisition, GitHub was largely hosted on its own data centres. Is that moving to Azure?

"We're exploring things potentially to move," Ma said. "We actually still have things hosted on AWS. For example, a lot of our data analytics is on AWS and we've started a project to look at migration into Azure, especially since we get internal pricing which is more favourable for us. But a large part, I would say 80-90 per cent of our stuff is hosted in data centres that we physically maintain."

Migration steps and pitfalls

At Qcon, Ma explained some of the work the company is doing to enable its migration. "Good architecture starts with modularity," she said. "The first step towards breaking up a monolith is to think about the separation of code and data based on feature functionality. This can be done within the monolith before physically separating them in a microservices environment."

She also described some of the pitfalls. "I've seen a lot of cases where people start by pulling out the code logic, but still rely on calls into a shared database inside the monolith. This often leads to a distributed monolith which ends up being the worst of both worlds, having to manage the complexities of microservices without any of the benefits."

Ma explained that "It's important to keep in mind that dependency direction should always go from inside of the monolith to outside of the monolith and not the other way around."

"Getting data separation right is a cornerstone in migrating from a monolithic architecture," Ma said. "For example, we grouped everything related to repositories together, everything related to users together, and everything related to projects together... creating functional groups of database schemas will eventually help us safely split the data onto different servers and clusters needed for a microservices architecture."

This process means fixing database queries that cross these domain boundaries. "At GitHub we implemented a query watcher in the monolith to alert us any time a query crosses functional domains. We would then rewrite these queries into multiple queries that respect the domain boundaries and perform any necessary joins at the application layer," said Ma.

How far along is GitHub in its migration towards cloud native? "Not very far," Ma told us. "We're in the very early stages, I think because there is lot of knowledge accumulated over the years including how to fine-tune Ruby and how to fine-tune MySQL to make the site as performant as it is today. Even if we do explore cloud native solutions, it will probably be newer services that are not core to GitHub itself, like Actions, or even Projects as it's getting rebuilt."

Does GitHub use Kubernetes to orchestrate containers? "We're very much using Kubernetes. In order to support multi-language variations and new services that are being created, a year and a half ago we started templatizing a lot of things that are common across multiple teams, so we have what we call microservices in a box, that has Kubernetes templates. We know that every service needs logging so we automatically log into Splunk. We know every service will need to be deployed so there is automatic deployment into our existing deployment process. So people can get up and running quickly on the operational side of things."

During the migration period, does GitHub add code to the monolith at the same time as writing new microservices? "Yes, absolutely," said Ma. "Because our strategy is enablement and not replacement, the code in the monolith needs to be maintained and improved and we're still doing that." The idea though is that when a microservice is ready, it should be used 100 per cent in place of existing code so "you don't have to maintain multiple versions inside and outside the monolith."

Extracting authentication is a priority, because of its role in letting developers continue to work if the website is down. "If GitHub is down, people can't actually perform any Git operations, and that's problematic," Ma told us. "People should still be able to work through command-line interfaces. That's why we're making it a priority to extract authentication as a core service outside the monolith, for enablement, so now people can use more of our systems when the web front end is not available."

Is there a target for when GitHub will be able to say it has a microservices architecture? "I would say years," Ma told us. "This shift is not just an architecture decision. It is also a cultural shift … I think eventually the gravitational pull will shift towards all the new services being built as microservices, and that a lot of the existing services will have been rebuilt and refactored out of the monolith, but for the foreseeable future we will still be operating at least a set of core services that will be part of the monolith."

It is a pragmatic approach. "Microservices is not your solution to technical debt and bad architecture," Ma told us. "I think there's been a trend of people who went down the microservices path and are now going back into monolith because microservices became too unwieldy for them. Microservices doesn't replace good architecture. Going through things like, what should be grouped together? How should we look for things that cross domain boundaries? How should we set up teams and on-call? pushed us towards better architectural practices that benefit us both in the monolithic and microservice world. A lot of the preparatory work we're doing, we're actually doing in the monolith before extracting it." ®

Similar topics

Broader topics


Other stories you might like

  • How ICE became a $2.8b domestic surveillance agency
    Your US tax dollars at work

    The US Immigration and Customs Enforcement (ICE) agency has spent about $2.8 billion over the past 14 years on a massive surveillance "dragnet" that uses big data and facial-recognition technology to secretly spy on most Americans, according to a report from Georgetown Law's Center on Privacy and Technology.

    The research took two years and included "hundreds" of Freedom of Information Act requests, along with reviews of ICE's contracting and procurement records. It details how ICE surveillance spending jumped from about $71 million annually in 2008 to about $388 million per year as of 2021. The network it has purchased with this $2.8 billion means that "ICE now operates as a domestic surveillance agency" and its methods cross "legal and ethical lines," the report concludes.

    ICE did not respond to The Register's request for comment.

    Continue reading
  • Fully automated AI networks less than 5 years away, reckons Juniper CEO
    You robot kids, get off my LAN

    AI will completely automate the network within five years, Juniper CEO Rami Rahim boasted during the company’s Global Summit this week.

    “I truly believe that just as there is this need today for a self-driving automobile, the future is around a self-driving network where humans literally have to do nothing,” he said. “It's probably weird for people to hear the CEO of a networking company say that… but that's exactly what we should be wishing for.”

    Rahim believes AI-driven automation is the latest phase in computer networking’s evolution, which began with the rise of TCP/IP and the internet, was accelerated by faster and more efficient silicon, and then made manageable by advances in software.

    Continue reading
  • Pictured: Sagittarius A*, the supermassive black hole at the center of the Milky Way
    We speak to scientists involved in historic first snap – and no, this isn't the M87*

    Astronomers have captured a clear image of the gigantic supermassive black hole at the center of our galaxy for the first time.

    Sagittarius A*, or Sgr A* for short, is 27,000 light-years from Earth. Scientists knew for a while there was a mysterious object in the constellation of Sagittarius emitting strong radio waves, though it wasn't really discovered until the 1970s. Although astronomers managed to characterize some of the object's properties, experts weren't quite sure what exactly they were looking at.

    Years later, in 2020, the Nobel Prize in physics was awarded to a pair of scientists, who mathematically proved the object must be a supermassive black hole. Now, their work has been experimentally verified in the form of the first-ever snap of Sgr A*, captured by more than 300 researchers working across 80 institutions in the Event Horizon Telescope Collaboration. 

    Continue reading
  • Shopping for malware: $260 gets you a password stealer. $90 for a crypto-miner...
    We take a look at low, low subscription prices – not that we want to give anyone any ideas

    A Tor-hidden website dubbed the Eternity Project is offering a toolkit of malware, including ransomware, worms, and – coming soon – distributed denial-of-service programs, at low prices.

    According to researchers at cyber-intelligence outfit Cyble, the Eternity site's operators also have a channel on Telegram, where they provide videos detailing features and functions of the Windows malware. Once bought, it's up to the buyer how victims' computers are infected; we'll leave that to your imagination.

    The Telegram channel has about 500 subscribers, Team Cyble documented this week. Once someone decides to purchase of one or more of Eternity's malware components, they have the option to customize the final binary executable for whatever crimes they want to commit.

    Continue reading
  • Ukrainian crook jailed in US for selling thousands of stolen login credentials
    Touting info on 6,700 compromised systems will get you four years behind bars

    A Ukrainian man has been sentenced to four years in a US federal prison for selling on a dark-web marketplace stolen login credentials for more than 6,700 compromised servers.

    Glib Oleksandr Ivanov-Tolpintsev, 28, was arrested by Polish authorities in Korczowa, Poland, on October 3, 2020, and extradited to America. He pleaded guilty on February 22, and was sentenced on Thursday in a Florida federal district court. The court also ordered Ivanov-Tolpintsev, of Chernivtsi, Ukraine, to forfeit his ill-gotten gains of $82,648 from the credential theft scheme.

    The prosecution's documents [PDF] detail an unnamed, dark-web marketplace on which usernames and passwords along with personal data, including more than 330,000 dates of birth and social security numbers belonging to US residents, were bought and sold illegally.

    Continue reading

Biting the hand that feeds IT © 1998–2022