How does Monzo keep 1,600 microservices spinning? Go, clean code, and a strong team

Well-known software development principles count for more than technology choices

QCon London Software engineers from digital bank Monzo told developers at the QCon event in London how and why it runs its banking systems on 1,600 microservices.

Monzo's session at QCon was in stark contrast to Monday's presentation in which Sam Newman warned that a microservices architecture is a "last resort". Senior engineers Matt Heath and Suhail Patel described how microservices work well for the bank, founded in 2015 and now with over 4 million customers.

As a new business that hoped to grow substantially, Monzo had a requirement for a technology platform that was extensible, scalable, resilient and secure. The idea was to start with a few basic banking services, and then be able to add more as time and resources allowed.

Monzo was convinced early on of the value of a distributed system. The bank did not want a single, big resilient system with a complex failover that you hope never has to run.

"If you don't exercise those failover modes, how can you know that they work reliably?" said Heath. They started with Mesos for cluster management, but by 2016 they switched to Kubernetes (K8s) as the "emerging market leader".

One of the goals was to abstract the complexity of infrastructure. "We think all the complexities about scaling of infrastructure, making sure that services are provisioned and databases are available, should be dealt with by a specific team, so that engineers can focus on the product," said Patel. The systems run on Amazon Web Services (AWS).

Matt Heath and Suhail Patel present at QCon London

Matt Heath and Suhail Patel present at QCon London

K8s has not been completely pain-free. In 2017, Monzo "had quite a large outage, because a problem with K8s and how it interacted with etcd and linkerd, due to a combination of different bugs that were quite hard to test," said Heath.

Monzo picked Cassandra as the database because it scales horizontally (meaning you can simply add more hardware to scale, rather than having to migrate to a bigger system).

On the coding side: "We use Go as our primary programming language," said Heath. "It's quite simple, it's statically typed, and it makes it easy for us to get people on board." Go has a backwards-compatibility guarantee, meaning that when a new version appears, you can simply recompile existing code, getting the benefits of updates to features such as the garbage collector. Go is also well suited to strict policies about error handling.

"We have static analysis to make sure that you are not papering over errors," said Patel.

Banking systems are well suited to a modular approach. There is a requirement to link to many different systems such as BACS, CHAPS, Visa, Mastercard, Apple Pay and Google Pay. "Adding those things as separate systems allows us to keep them simpler," said Heath. Monzo builds integrations as much as possible in-house, rather than using third-party implementations, to get more control over resilience and performance (and likely saving money in the long term as well). They even built their own chat system, used internally and for support.

Monzo has also built its own tools for interacting with AWS and K8s, such as one called Shipper, which can deploy or rollback an individual service. Shipper can deploy directly from a pull request, which represents an update to code maintained in a Git repository.

Each Monzo microservice runs in a Docker container. "One of our biggest decisions was our approach to writing microservices," said Patel. There is a shared core library, which is available in every service; this is essentially copied in every container, though the build process will strip out unused code. This means that "engineers are not rewriting core abstractions like marshalling of data". It also enables metrics for every service so that after deployment it immediately shows up in a dashboard with analysis of CPU usage, network calls and so on. Automated alerting will identify degraded services.

Monzo has an extensive shared library available within every microservice

Monzo has an extensive shared library available within every microservice

A lot of thought goes into the interface or API that each service exposes. The team favours writing many small services, each dedicated to a single purpose, rather than fewer more complex services. "Why do we have such granularity? We want to minimise the risk of change," said Patel. "For example, if we want to change the way contactless payments work, we're not affecting the chip and PIN system."

How do developers work on their code, given that running 1,600 microservices is not going to work on a laptop? "You are running a subset," said Heath. "We have an RPC filter that can detect you are trying to send a request to a downstream that isn't currently running, it can compile it, start it, and then send the request to it."

Why do microservices work for Monzo, whereas in some cases they add complexity without delivering much benefit? The Register has attended many QCon events, and while software development trends have changed from year to year, some things have remained consistent. One is that the way developers interact with each other in a team (and with management) counts for more than whatever development methodology they espouse. Another is that an incremental approach wins over occasional large changes. "An iterative process is generally what we take to heart at Monzo," said Heath, "both from an infrastructure perspective but also from a product perspective. By making small changes frequently we make sure we are going in the right direction."

Another recurrent QCon theme is the advantage of simplicity over complexity. Taken as a whole, what Monzo's system does is highly complex, but it has designed its systems in such as way as to divide that complexity into smaller, simpler pieces, and abstract it away from developers working on the code. "You don't need to know how 1,600 services work," said Heath. The hard task of managing K8s is delegated to specialists.

Monzo has also standardised "a small set of technology choices", said Heath, so that "as a group, we can collectively improve those tools". This could be frustrating for developers who have different technology preferences, but must help substantially with collaboration since everyone learns the same set of tools. "Code needs to be readable to other humans," said Patel. "We optimise code for readability. One of our engineering principles is not to optimise [performance] unless it is a bottleneck."

What Monzo presented at QCon seemed to be a strong template for software development and deployment in the case where you have a complex system with many components, and need to be able to respond quickly when requirements change or features are added.

Heath and Patel made a great case for the value of microservices. Note, though, that Monzo uses a lot of custom, in-house tools and libraries that are not easy to replicate. Further, many of the principles they presented – like writing clean, readable and disciplined code, focusing on a few carefully chosen pieces of technology, and taking an incremental approach – are winners in any software architecture. ®

Similar topics

Other stories you might like

  • 381,000-plus Kubernetes API servers 'exposed to internet'
    Firewall isn't a made-up word from the Hackers movie, people

    A large number of servers running the Kubernetes API have been left exposed to the internet, which is not great: they're potentially vulnerable to abuse.

    Nonprofit security organization The Shadowserver Foundation recently scanned 454,729 systems hosting the popular open-source platform for managing and orchestrating containers, finding that more than 381,645 – or about 84 percent – are accessible via the internet to varying degrees thus providing a cracked door into a corporate network.

    "While this does not mean that these instances are fully open or vulnerable to an attack, it is likely that this level of access was not intended and these instances are an unnecessarily exposed attack surface," Shadowserver's team stressed in a write-up. "They also allow for information leakage on version and build."

    Continue reading
  • GitLab version 15 goes big on visibility and observability
    GitOps fans can take a spin on the free tier for pull-based deployment

    One-stop DevOps shop GitLab has announced version 15 of its platform, hot on the heels of pull-based GitOps turning up on the platform's free tier.

    Version 15.0 marks the arrival of GitLab's next major iteration and attention this time around has turned to visibility and observability – hardly surprising considering the acquisition of OpsTrace as 2021 drew to a close, as well as workflow automation, security and compliance.

    GitLab puts out monthly releases –  hitting 15.1 on June 22 –  and we spoke to the company's senior director of Product, Kenny Johnston, at the recent Kubecon EU event, about what will be added to version 15 as time goes by. During a chat with the company's senior director of Product, Kenny Johnston, at the recent Kubecon EU event, The Register was told that this was more where dollars were being invested into the product.

    Continue reading
  • Red Hat Kubernetes security report finds people are the problem
    Puny human brains baffled by K8s complexity, leading to blunder fears

    Kubernetes, despite being widely regarded as an important technology by IT leaders, continues to pose problems for those deploying it. And the problem, apparently, is us.

    The open source container orchestration software, being used or evaluated by 96 per cent of organizations surveyed [PDF] last year by the Cloud Native Computing Foundation, has a reputation for complexity.

    Witness the sarcasm: "Kubernetes is so easy to use that a company devoted solely to troubleshooting issues with it has raised $67 million," quipped Corey Quinn, chief cloud economist at IT consultancy The Duckbill Group, in a Twitter post on Monday referencing investment in a startup called Komodor. And the consequences of the software's complication can be seen in the difficulties reported by those using it.

    Continue reading

Biting the hand that feeds IT © 1998–2022