Amazon’s Away Teams laid bare: How AWS's hivemind of engineers develop and maintain their internal tech

Cloud giant's structure, staff practices revealed


Deep dive Companies inside and out of Silicon Valley have found their own ways to rapidly develop and deploy features and functionality.

Within the belly of Amazon Web Services, the web giant's gigantic cloud beast, though, is a specific digestive system – a concept called Away Teams – that accepts certain weaknesses to achieve maximum velocity.

El Reg has spent a few months talking to about a dozen people who have lived inside this particular process, and now it's time to share it with you here. Our sources will remain anonymous as they are not authorized to speak publicly about Amazon. Official spokespeople for the US giant declined to comment on our findings.

Capturing the way things are at an organization as large as Amazon is always a challenge. The company has never publicly codified its management system as it has done for its leadership principles. But this picture might offer new ideas for people seeking to coordinate technology development at scale.

The problem at hand

Once your engineers and technical staffers number in the hundreds or thousands, the organization outgrows everything that works at the team level. When the whole mess is in production, some way must be found so those 20, 50, or 100 teams can get help from each other.

Agile, Scrum, and DevOps methods keep a specific project humming and evolving from conception to delivery, but they won't keep the work of a score of teams coordinated.

Creating a coherent design for a platform or application, of course, is a fundamental problem, and so is organizing the projects to implement such a design. But no matter how well you do at first, adjustments are needed.

Every one of those teams was set up to achieve certain objectives. Maybe they have an individual profit and loss (P&L), or Objectives and Key Results (the famous OKRs that Google adopted, inspired by Intel's use of them). But in a modern platform, almost all services that comprise the whole will use each other.

When someone shows up at your cube and asks for a new feature in the service you are offering or to fix a bug or to optimize performance, what do you do? Do you let them have access to your source code? If a new feature is popular with users or customers, do you keep it for your team or give it to the team where it may more naturally belong? If your team could add a capability that would help other teams make more money, should you do that before what is on your approved roadmap?

Anyone who thinks such issues are easily resolved and that everyone will just do the right thing has never worked inside a large organization in the real world.

Of course, good management should intervene to help teams work together. But seeking management attention slows things down. And, surprise, surprise: management doesn’t always make the right decision.

Amazon's system for internal collaboration

Amazon has faced these issues since its inception and has created a system based on the principles of service-oriented architecture (with some significant additions to codify the management innovations that have made Internet companies so successful).

People fight in cartoon cloud. photo by Shutterstock

Amazon consumer biz celebrates ridding itself of last Oracle database with tame staff party... and a Big Red piñata

READ MORE

Andrew Ng, the Stanford researcher, entrepreneur, and AI expert, in a talk at a San Francisco AI conference in 2017, explained that a real internet company was not a shopping mall with a website, but a company that embraced a short cycle time, A/B testing, and pushed down decision making.

Amazon is not re-inventing the wheel here – it's looking at a problem faced by a large number of firms – but it does seem to have found an interesting way to solve the problem. It has a system of optimizing internal collaboration by organizing development around a collection independently managed services with a fascinating set of policies for governing it all based on A/B testing, pushed-down decision making, and a carefully curated culture of collaboration that makes use of a novel concept: Away Teams.

As it turns out, Amazon’s system, especially the Away Teams, aligns with the findings of technology philosophers such as Ray Kurzweill’s explanation of the exponential progress of technology and MIT Professor Eric Von Hippel’s observations about the power of user-driven innovation.

From the Yegge rant to service-oriented collaboration

From what we know of his behavior, Amazon CEO Jeff Bezos is a huge fan of forcing functions, which, from a CEO perspective, are dictates from on high that mandate certain types of change.

Bezos uses his personal magnetism, the aura of his success, and his power as CEO to force the company to transform itself. Forcing Amazon.com to eat its own dogfood and use AWS was one such endeavor. The drive to move Amazon completely off Oracle is another, although the author of that may be Andy Jassy, head of AWS. But my favorite is the move toward service-oriented architecture, recounted in what became known as the Yegge Rant.

As told by Steve Yegge, a Google engineer who had moved to Google after several years at Amazon, around 2002 Bezos demanded that everyone at Amazon make their department’s offering available as services exposed through APIs. Yegge's post (on the now-deprecated GooglePlus) explains that this forcing function caused an ocean of pain as the company learned to address technical and operational issues such as debugging a service-oriented architecture, maintaining adequate performance when every internal user may be a potential unwitting DOS attacker that may spike traffic, handling operational support, discovering what services were available, and lots of other stuff. We should note that Yegge was quickly contrite about the posting.

The forcing function worked as planned, however, and created a technology culture around services that had some interesting principles. One such principle that we have not been able to get multiple sources to verify is the policy that once a team is the only remaining user of an API, they become owners of that service, even if they didn’t initially develop it.

But alone, technology, tools, and operations for a mature service-oriented architecture don’t solve the problem of internal collaboration. Here’s where Amazon broke new ground, especially with the concept of the Away Team. The Register hasn’t heard that Amazon has a name for this system, but service-oriented collaboration seems apt.

Similar topics


Other stories you might like

  • AWS buys before it tries with quantum networking center
    Fundamental problems of qubit physics aside, the cloud giant thinks it can help

    Nothing in the quantum hardware world is fully cooked yet, but quantum computing is quite a bit further along than quantum networking – an esoteric but potentially significant technology area, particularly for ultra-secure transactions. Amazon Web Services is among those working to bring quantum connectivity from the lab to the real world. 

    Short of developing its own quantum processors, AWS has created an ecosystem around existing quantum devices and tools via its Braket (no, that's not a typo) service. While these bits and pieces focus on compute, the tech giant has turned its gaze to quantum networking.

    Alongside its Center for Quantum Computing, which it launched in late 2021, AWS has announced the launch of its Center for Quantum Networking. The latter is grandly working to solve "fundamental scientific and engineering challenges and to develop new hardware, software, and applications for quantum networks," the internet souk declared.

    Continue reading
  • Amazon not happy with antitrust law targeting Amazon
    We assume the world's smallest violin is available right now on Prime

    Updated Amazon has blasted a proposed antitrust law that aims to clamp down on anti-competitive practices by Big Tech.

    The American Innovation and Choice Online Act (AICOA) led by Senators Amy Klobuchar (D-MN) and House Representative David Cicilline (D-RI) is a bipartisan bill, with Democrat and Republican support in the Senate and House. It is still making its way through Congress.

    The bill [PDF] prohibits certain "online platforms" from unfairly promoting their own products and services in a way that prevents or hampers third-party businesses in competing. Said platforms with 50 million-plus active monthly users in the US or 100,000-plus US business users, and either $550 billion-plus in annual sales or market cap or a billion-plus worldwide users, that act as a "critical trading partner" for suppliers would be affected. 

    Continue reading
  • Elasticsearch server with no password or encryption leaks a million records
    POS and online ordering vendor StoreHub offered free Asian info takeaways

    Researchers at security product recommendation service Safety Detectives claim they’ve found almost a million customer records wide open on an Elasticsearch server run by Malaysian point-of-sale software vendor StoreHub.

    Safety Detectives’ report states it found a StoreHub sever that stored unencrypted data and was not password protected. The security company’s researchers were therefore able to waltz in and access 1.7 billion records describing the affairs of nearly a million people, in a trove totalling over a terabyte.

    StoreHub’s wares offer point of sale and online ordering, and the vendor therefore stores data about businesses that run its product and individual buyers’ activities.

    Continue reading
  • Price hikes, cloud expansion drive record datacenter spending
    High unit costs and fixed capex budgets propelling enterprises cloudwards

    The major hyperscalers and cloud providers are forecast to spend 25 percent more on datacenter infrastructure this year to $18 billion following record investments in the opening three months of 2022.

    This is according to Dell’Oro Group research, which found new cloud deployments and higher per-unit infrastructure costs underpinned capex spending in Q1, which grew at its fastest pace in nearly three years, the report found.

    Datacenter spending is expected to receive an additional boost later this year as the top four cloud providers expand their services to as many as 30 new regions and memory prices trend upward ahead of Intel and AMD’s next-gen processor families, Dell’Oro analyst Baron Fung told The Register

    Continue reading

Biting the hand that feeds IT © 1998–2022