This article is more than 1 year old
Amazon’s Away Teams laid bare: How AWS's hivemind of engineers develop and maintain their internal tech
Cloud giant's structure, staff practices revealed
Deep dive Companies inside and out of Silicon Valley have found their own ways to rapidly develop and deploy features and functionality.
Within the belly of Amazon Web Services, the web giant's gigantic cloud beast, though, is a specific digestive system – a concept called Away Teams – that accepts certain weaknesses to achieve maximum velocity.
El Reg has spent a few months talking to about a dozen people who have lived inside this particular process, and now it's time to share it with you here. Our sources will remain anonymous as they are not authorized to speak publicly about Amazon. Official spokespeople for the US giant declined to comment on our findings.
Capturing the way things are at an organization as large as Amazon is always a challenge. The company has never publicly codified its management system as it has done for its leadership principles. But this picture might offer new ideas for people seeking to coordinate technology development at scale.
The problem at hand
Once your engineers and technical staffers number in the hundreds or thousands, the organization outgrows everything that works at the team level. When the whole mess is in production, some way must be found so those 20, 50, or 100 teams can get help from each other.
Agile, Scrum, and DevOps methods keep a specific project humming and evolving from conception to delivery, but they won't keep the work of a score of teams coordinated.
Creating a coherent design for a platform or application, of course, is a fundamental problem, and so is organizing the projects to implement such a design. But no matter how well you do at first, adjustments are needed.
Every one of those teams was set up to achieve certain objectives. Maybe they have an individual profit and loss (P&L), or Objectives and Key Results (the famous OKRs that Google adopted, inspired by Intel's use of them). But in a modern platform, almost all services that comprise the whole will use each other.
When someone shows up at your cube and asks for a new feature in the service you are offering or to fix a bug or to optimize performance, what do you do? Do you let them have access to your source code? If a new feature is popular with users or customers, do you keep it for your team or give it to the team where it may more naturally belong? If your team could add a capability that would help other teams make more money, should you do that before what is on your approved roadmap?
Anyone who thinks such issues are easily resolved and that everyone will just do the right thing has never worked inside a large organization in the real world.
Of course, good management should intervene to help teams work together. But seeking management attention slows things down. And, surprise, surprise: management doesn’t always make the right decision.
Amazon's system for internal collaboration
Amazon has faced these issues since its inception and has created a system based on the principles of service-oriented architecture (with some significant additions to codify the management innovations that have made Internet companies so successful).
Amazon consumer biz celebrates ridding itself of last Oracle database with tame staff party... and a Big Red piñataREAD MORE
Andrew Ng, the Stanford researcher, entrepreneur, and AI expert, in a talk at a San Francisco AI conference in 2017, explained that a real internet company was not a shopping mall with a website, but a company that embraced a short cycle time, A/B testing, and pushed down decision making.
Amazon is not re-inventing the wheel here – it's looking at a problem faced by a large number of firms – but it does seem to have found an interesting way to solve the problem. It has a system of optimizing internal collaboration by organizing development around a collection independently managed services with a fascinating set of policies for governing it all based on A/B testing, pushed-down decision making, and a carefully curated culture of collaboration that makes use of a novel concept: Away Teams.
As it turns out, Amazon’s system, especially the Away Teams, aligns with the findings of technology philosophers such as Ray Kurzweill’s explanation of the exponential progress of technology and MIT Professor Eric Von Hippel’s observations about the power of user-driven innovation.
From the Yegge rant to service-oriented collaboration
From what we know of his behavior, Amazon CEO Jeff Bezos is a huge fan of forcing functions, which, from a CEO perspective, are dictates from on high that mandate certain types of change.
Bezos uses his personal magnetism, the aura of his success, and his power as CEO to force the company to transform itself. Forcing Amazon.com to eat its own dogfood and use AWS was one such endeavor. The drive to move Amazon completely off Oracle is another, although the author of that may be Andy Jassy, head of AWS. But my favorite is the move toward service-oriented architecture, recounted in what became known as the Yegge Rant.
As told by Steve Yegge, a Google engineer who had moved to Google after several years at Amazon, around 2002 Bezos demanded that everyone at Amazon make their department’s offering available as services exposed through APIs. Yegge's post (on the now-deprecated GooglePlus) explains that this forcing function caused an ocean of pain as the company learned to address technical and operational issues such as debugging a service-oriented architecture, maintaining adequate performance when every internal user may be a potential unwitting DOS attacker that may spike traffic, handling operational support, discovering what services were available, and lots of other stuff. We should note that Yegge was quickly contrite about the posting.
The forcing function worked as planned, however, and created a technology culture around services that had some interesting principles. One such principle that we have not been able to get multiple sources to verify is the policy that once a team is the only remaining user of an API, they become owners of that service, even if they didn’t initially develop it.
But alone, technology, tools, and operations for a mature service-oriented architecture don’t solve the problem of internal collaboration. Here’s where Amazon broke new ground, especially with the concept of the Away Team. The Register hasn’t heard that Amazon has a name for this system, but service-oriented collaboration seems apt.