How does Monzo keep 1,600 microservices spinning? Go, clean code, and a strong team
Well-known software development principles count for more than technology choices
QCon London Software engineers from digital bank Monzo told developers at the QCon event in London how and why it runs its banking systems on 1,600 microservices.
Monzo's session at QCon was in stark contrast to Monday's presentation in which Sam Newman warned that a microservices architecture is a "last resort". Senior engineers Matt Heath and Suhail Patel described how microservices work well for the bank, founded in 2015 and now with over 4 million customers.
As a new business that hoped to grow substantially, Monzo had a requirement for a technology platform that was extensible, scalable, resilient and secure. The idea was to start with a few basic banking services, and then be able to add more as time and resources allowed.
Monzo was convinced early on of the value of a distributed system. The bank did not want a single, big resilient system with a complex failover that you hope never has to run.
"If you don't exercise those failover modes, how can you know that they work reliably?" said Heath. They started with Mesos for cluster management, but by 2016 they switched to Kubernetes (K8s) as the "emerging market leader".
One of the goals was to abstract the complexity of infrastructure. "We think all the complexities about scaling of infrastructure, making sure that services are provisioned and databases are available, should be dealt with by a specific team, so that engineers can focus on the product," said Patel. The systems run on Amazon Web Services (AWS).
K8s has not been completely pain-free. In 2017, Monzo "had quite a large outage, because a problem with K8s and how it interacted with etcd and linkerd, due to a combination of different bugs that were quite hard to test," said Heath.
Monzo picked Cassandra as the database because it scales horizontally (meaning you can simply add more hardware to scale, rather than having to migrate to a bigger system).
On the coding side: "We use Go as our primary programming language," said Heath. "It's quite simple, it's statically typed, and it makes it easy for us to get people on board." Go has a backwards-compatibility guarantee, meaning that when a new version appears, you can simply recompile existing code, getting the benefits of updates to features such as the garbage collector. Go is also well suited to strict policies about error handling.
"We have static analysis to make sure that you are not papering over errors," said Patel.
Banking systems are well suited to a modular approach. There is a requirement to link to many different systems such as BACS, CHAPS, Visa, Mastercard, Apple Pay and Google Pay. "Adding those things as separate systems allows us to keep them simpler," said Heath. Monzo builds integrations as much as possible in-house, rather than using third-party implementations, to get more control over resilience and performance (and likely saving money in the long term as well). They even built their own chat system, used internally and for support.
Monzo has also built its own tools for interacting with AWS and K8s, such as one called Shipper, which can deploy or rollback an individual service. Shipper can deploy directly from a pull request, which represents an update to code maintained in a Git repository.
Each Monzo microservice runs in a Docker container. "One of our biggest decisions was our approach to writing microservices," said Patel. There is a shared core library, which is available in every service; this is essentially copied in every container, though the build process will strip out unused code. This means that "engineers are not rewriting core abstractions like marshalling of data". It also enables metrics for every service so that after deployment it immediately shows up in a dashboard with analysis of CPU usage, network calls and so on. Automated alerting will identify degraded services.
A lot of thought goes into the interface or API that each service exposes. The team favours writing many small services, each dedicated to a single purpose, rather than fewer more complex services. "Why do we have such granularity? We want to minimise the risk of change," said Patel. "For example, if we want to change the way contactless payments work, we're not affecting the chip and PIN system."
How do developers work on their code, given that running 1,600 microservices is not going to work on a laptop? "You are running a subset," said Heath. "We have an RPC filter that can detect you are trying to send a request to a downstream that isn't currently running, it can compile it, start it, and then send the request to it."
Why do microservices work for Monzo, whereas in some cases they add complexity without delivering much benefit? The Register has attended many QCon events, and while software development trends have changed from year to year, some things have remained consistent. One is that the way developers interact with each other in a team (and with management) counts for more than whatever development methodology they espouse. Another is that an incremental approach wins over occasional large changes. "An iterative process is generally what we take to heart at Monzo," said Heath, "both from an infrastructure perspective but also from a product perspective. By making small changes frequently we make sure we are going in the right direction."
Another recurrent QCon theme is the advantage of simplicity over complexity. Taken as a whole, what Monzo's system does is highly complex, but it has designed its systems in such as way as to divide that complexity into smaller, simpler pieces, and abstract it away from developers working on the code. "You don't need to know how 1,600 services work," said Heath. The hard task of managing K8s is delegated to specialists.
Monzo has also standardised "a small set of technology choices", said Heath, so that "as a group, we can collectively improve those tools". This could be frustrating for developers who have different technology preferences, but must help substantially with collaboration since everyone learns the same set of tools. "Code needs to be readable to other humans," said Patel. "We optimise code for readability. One of our engineering principles is not to optimise [performance] unless it is a bottleneck."
What Monzo presented at QCon seemed to be a strong template for software development and deployment in the case where you have a complex system with many components, and need to be able to respond quickly when requirements change or features are added.
Heath and Patel made a great case for the value of microservices. Note, though, that Monzo uses a lot of custom, in-house tools and libraries that are not easy to replicate. Further, many of the principles they presented – like writing clean, readable and disciplined code, focusing on a few carefully chosen pieces of technology, and taking an incremental approach – are winners in any software architecture. ®