Amazon’s Away Teams laid bare: How AWS's hivemind of engineers develop and maintain their internal tech

Cloud giant's structure, staff practices revealed


The principles of Amazon service-oriented collaboration

Here’s how Amazon’s service-oriented collaboration works based on our research:

  1. Team structure
    • Each of the groups that owns a service has a set of goals and possibly a P&L that represents success. A roadmap is generally in place to meet those goals.
    • The teams are ostensibly autonomous and can make any important decision needed to meet their goals.
    • The "value to the customer" is part of the mission for each team. This codified using content such as mock press releases to ensure developers keep end user needs in mind.
    • As much as possible, teams are kept small, adhering to the two-pizza rule, meaning about six people.
    • Services can be refactored or new services can be spun out to new teams. Teams that don’t work are shut down and the technology they created is distributed to other teams or discarded.
    • New teams often are created to solve urgent, end-to-end problems.
  2. Development process
    • Teams use a shared set of development tools for source code and managing the development pipeline, some offered as shared services. There are many tools and services that are commonly or universally used, but no hard requirements. Every team can do what makes sense to get the job done fast. While this is true, at some point you may have to show with data why you deviated.
    • The DevOps model is fully embraced. Each team performs operational support for its service.
    • Access to most source code is not hard to get. One group can usually quite easily take a look at the source code of another without prior restraint. There are some exceptions.
    • A/B testing and detailed monitoring is widespread and used for almost every aspect of the site and infrastructure. The testing is based on the WebLab service, supported by a team that trains staff on how to make testing statistically significant.
    • Teams do not generally have to worry about the rates of internal use of resources. There is no internal currency changing hands for tracking such usage. Rates of usage internally across services are allocated as part of the budget process and monitored by finance teams who meet periodically with teams to discuss any unusual growth in services and encourage optimization.
    • Decreasing technical debt is not considered a good reason to do anything unless it has an impact on reaching the goals of the team.
  3. Collaboration practices
    • Changes to one team’s service may be implemented by another team who needs the enhanced capability by what is called an Away Team. This team works on the Home Team’s code to add what it needs according to established engineering standards and then leaves that code in good order to be maintained by the Home Team who owns the service, with help when needed.
    • When an Away Team is not an option because the requestor doesn’t have the ability to implement improvements to the service, this does lead to a management discussion about how to optimize the big picture roadmap. Usually roadmaps are bursting, so accommodating a new request means reshuffling the existing roadmap.
    • If extending a service using an Away Team doesn’t work out for some reason, it is perfectly fine to duplicate and create whatever you need to accelerate your progress. There is no concern about duplication across the platform as long as you have a need that will help you move forward.
    • A team creating a service is given credit when they do something that has a positive downstream impact on other services. Management recognizes contributions to the big picture, usually on the P&L of the higher entity.
    • "Bar raisers", Amazon staff who act as independent experts who approve key decisions, often who work on other teams, are used not only for hiring, for which they are widely known, but for high impact decisions for design, customer experience, architecture, and A/B testing. It is possible to go against the recommendation of a bar raiser, but such a move is noted and made visible to higher levels of management.

These principles operate somewhat differently based on the part of Amazon that is using them.

The oldest, original set of technology that morphed into services is generally called legacy. There is an internal platform called MAWS, which is an internal platform of services that are not public. The public form of AWS is the latest. There may be others we have not heard about.

For example, older products such as Amazon.com or Kindle may use services from all three of these layers. Newer products like the Alexa and Echo tend to use more of the public services on AWS.

There have been many generations of evolution from legacy to MAWS to AWS and also with respect to development tools. All of these changes happen in waves that may take years to complete.

The teams outside AWS proper are less likely to have a P&L at the service or team level. In general, AWS teams are known for having the most methodological purity, a state in which the service, team, and P&L have the same boundary.

Keep in mind this picture was assembled from talking to many people with different perspectives at different levels of the organization. It would be wonderful to make it sharper. But finding someone who knows the whole picture and detailed history is not easy. Amazon PR staff take note: we are always ready to sit down with Werner Vogels, Amazon CTO, and go over the details.

How Kurzweil and Von Hippel explain the power of service-oriented collaboration

Amazon’s model encourages direct team-to-team, service-to-service collaboration, providing principles for collaboration so that as much progress as possible can take place based on each team optimizing the services it needs directly.

As your correspondent came to understand Amazon’s model, I realized that the structure of service-oriented collaboration used levers for acceleration that have been documented by two celebrated researchers who have studied how technology development can be optimized.

MIT professor Eric Von Hippel’s research into user-driven innovation shows that when the user is given direct access to the means for creating a solution, potentially at least, tremendous innovation can result. The "sticky information" that otherwise must be rendered into requirements documents or transferred from user to builder is difficult and never complete. When this step doesn’t have to take place because user and builder are the same person or same team, the outcome is much better. Amazon’s Away Team model embraces this concept and allows teams to create building blocks that have ideal fit to purpose.

for aws teams feature

Ray Kurzweil’s analysis of the exponential pace of technology development provides another lens through which the power of Amazon’s model can be explained. Your correspondent has summarised Kurzweil’s model in Research Mission on Technology Leverage, but his thesis is as follows:

  • At first, progress in any area of technology seems slow because basic services are being developed.
  • But then, more complex services are built out of the simpler ones, and so on, accelerating the pace of development.
  • At the same time, funding goes to improving services that are most impactful.
  • As the services are used more, the fit to purpose improves.

Kurzweil’s research shows how in many different areas of technology, this pattern has held throughout history. At Amazon, my view is that we are still in the early stages of this exponential curve, which is being driven by use of services both inside and outside of Amazon.

Amazon’s model wouldn’t work without data from usage driving funding and optimization. End-to-end teams and Away Teams play a crucial role in identifying new services and improving the fit of existing services.

for aws teams feature

Right now, AWS has focused on creating general purpose higher level services that all fit into a generic platform for software development. The highest level services are being created on top of the platform by Amazon itself (Amazon.com, Alexa, Kindle, etc) and by AWS customers who are building all sorts of products and IT infrastructure.

Similar topics


Other stories you might like

  • Electron-to-joule conversion formulae? Cute. Welcome to the school of hard knocks

    Shake, rattle and roll is incompatible with your PABX

    On Call There are some things they don't teach you in college, as a Register reader explains in this week's instalment of tales from the On Call coalface.

    Our reader, safely Regomised as "Col", headed up the technical support team of a PABX telecom provider and installer back in the early 1990s. PABX, or Private Automatic Branch eXchange, was the telephony backbone of many an office. A failure could be both contract and career-limiting.

    Col, however, was a professional and well versed in the ins and outs of such systems. Work was brisk and so, he told us, "I took on a university grad with all the spunk and vigour that comes with it. He knew the electron-to-joule conversion formulae et al."

    Continue reading
  • Korea's NAVER Cloud outlines global ambitions, aim to become Asia's third-biggest provider

    Alibaba is number two in much of the region, but is a bit on the nose right now

    Korean web giant NAVER has outlined its ambition to bring its cloud to the world, and to become the third-largest cloud provider in the Asia-Pacific region.

    NAVER started life as a Korean web portal, added search, won the lion's share of the market, and has kept it ever since. South Korea remains one of the very few nations in which Google does not dominate the search market.

    As NAVER grew it came to resemble Google in many ways – both in terms of the services it offers and its tendency to use its muscle to favour its own properties. NAVER also used its scale to start a cloud business: the NAVER Cloud Platform. It runs the Platform in its home market, plus Japan, Hong Kong, and Singapore. Presences in Taiwan, Vietnam and Thailand are imminent.

    Continue reading
  • Build it fast and they will come? Yeah, but they’ll only stay if you build it right

    Here’s where to start

    Sponsored Developers have never had so much choice. Every week there’s a new framework, API, or cloud service that promises to help deliver software to market faster than ever. And it’s not just tooling. Agile, continuous integration, and DevOps techniques have made teams more efficient, too. But speed brings with it increased expectations. Pressure from customers and colleagues, alongside the burden of staying current with new tooling, can lead to mistakes.

    Whether it’s a showstopping bug that slips through into production or an edge case that lies in wait for years, pressure to deliver is driving some teams to pile up technical debt and mismatched stakeholder expectations.

    What’s the solution? Well, it’s to do what we’ve always done: build on what came before. In the absence of unlimited time and budget, a low-code platform gives both experienced and new developers a suite of tools to accelerate their development. Automation in just the right places lets teams bring their unique value where it really matters, while all the standard building blocks are taken care of.

    Continue reading
  • Royal Navy will be getting autonomous machines – for donkey work humans can't be bothered with

    No robot killers 'in my lifetime' says admiral

    DSEI 2021 The British armed forces will be using robots as part of future warfare – but mostly for the "dull, dangerous and dirty" parts of military life, senior officers have said.

    At London's Defence and Security Equipment International arms fair, two senior officers in charge of digitisation and automation said the near future will be more Wall-E than Terminator – but fully automated war machines are no longer just the stuff of sci-fi.

    Brigadier John Read, the Royal Navy's deputy director of maritime capability, said in a speech the military "must automate" itself so it can "take advantage of advances in robotics, AI and machine learning."

    Continue reading
  • WTF? Microsoft makes fixing deadly OMIGOD flaws on Azure your job

    Clouds usually fix this sort of thing before bugs go public. This time it's best to assume you need to do this yourself

    Microsoft Azure users running Linux VMs in the IT giant's Azure cloud need to take action to protect themselves against the four "OMIGOD" bugs in the Open Management Infrastructure (OMI) framework, because Microsoft hasn't raced to do it for them.

    As The Register outlined in our report on this month's Patch Tuesday release, Microsoft included fixes for flaws security outfit Wiz spotted in Redmond's open-source OMI agents. Wiz named the four flaws OMIGOD because they are astonishing.

    The least severe of the flaws is rated 7/10 on the Common Vulnerability Scoring System. The worst is rated critical at 9.8/10.

    Continue reading
  • Businesses put robots to work when human workers are hard to find, argue econo-boffins

    The lure of shiny new tech isn't a motivator, although in the USA bots are used to cut costs

    Researchers have found that business adoption of robots and other forms of automation is largely driven by labor shortages.

    A study, authored by boffins from MIT and Boston University, will be published in a forthcoming print edition of The Review of Economic Studies. The authors, Daron Acemoglu and Pascual Restrepo, have both studied automation, robots and the workforce in depth, publishing numerous papers together and separately.

    "Our findings suggest that quite a bit of investment in robotics is not driven by the fact that this is the next 'amazing frontier,' but because some countries have shortages of labor, especially middle-aged labor that would be necessary for blue-collar work,” said Acemoglu in a canned statement.

    Continue reading
  • After eight years, SPEC delivers a new virtualisation benchmark

    Jumps from single-server tests to four hosts – but only for vSphere and RHV

    The Standard Performance Evaluation Corporation (SPEC) has released its first new virtualisation benchmark in eight years.

    The new SPECvirt Datacenter 2021 benchmark succeeds SPEC VIRT_SC 2013. The latter was designed to help users understand performance in the heady days of server consolidation, so required just one host. The new benchmark requires four hosts – a recognition of modern datacentre realities.

    The new tests are designed to test the combined performance of hypervisors and servers. For now, only two hypervisors are supported: VMware’s vSphere (versions 6.x and 7.x) and Red Hat Virtualisation (version 4.x). David Schmidt, chair of the SPEC Virtualization Committee, told The Register that Red Hat and VMware are paid up members of the committee, hence their inclusion. But the new benchmark can be used by other hypervisors if their vendors create an SDK. He opined that Microsoft, vendor of the Hyper-V hypervisor that has around 20 per cent market share, didn’t come to play because it’s busy working on other SPEC projects.

    Continue reading
  • Forget that Loon's balloon burst, we just fired 700TB of laser broadband between two cities, says Google

    Up to 20Gbps link sustained over the Congo in comms experiment

    Engineers at Google’s technology moonshot lab X say they used lasers to beam 700TB of internet traffic between two cities separated by the Congo River.

    The capitals of the Republic of the Congo and the Democratic Republic of Congo, Brazzaville and Kinshasa, respectively, are only 4.8 km (about three miles) apart. The denizens of Kinshasa have to pay five times more than their neighbors in Brazzaville for broadband connectivity, though. That's apparently because the fiber backbone to Kinshasa has to route more than 400 km (250 miles) around the river – no one wanted to put the cable through it.

    There's a shorter route for data to take between the cities. Instead of transmitting the information as light through networks of cables, it can be directly beamed over the river by laser.

    Continue reading
  • Apple's M1 MacBook screens are stunning – stunningly fragile and defective, that is, lawsuits allege

    Latest laptops prone to cracking, distortions, owners complain

    Aggrieved MacBook owners in two separate lawsuits claim Apple's latest laptops with its M1 chips have defective screens that break easily and malfunction.

    The complaints, both filed on Wednesday in a federal district court in San Jose, California, are each seeking class certification in the hope that the law firms involved will get a judicial blessing to represent the presumed large group of affected customers and, if victorious, to share any settlement.

    Each of the filings contends Apple's 2020-2021 MacBook line – consisting of the M1-based MacBook Air and M1-based 13" MacBook Pro – have screens that frequently fail. They say Apple knew about the alleged defect or should have known, based on its own extensive internal testing, reports from technicians, and feedback from customers.

    Continue reading
  • Microsoft's Azure Virtual Desktop now works without Active Directory – but there are caveats

    General availability of Azure AD-joined VMs

    Microsoft has declared general availability for Azure Virtual Desktop with the VMs joined to Azure AD rather than Active Directory, but the initial release has many limitations.

    Azure Virtual Desktop (AVD), once called Windows Virtual Desktop, is Microsoft's first-party VDI (Virtual Desktop Infrastructure) solution.

    Although cloud-hosted, Azure Virtual Desktop is (or was) based on Microsoft's Remote Desktop Services tech which required domain-joined PCs and therefore a connection to full Windows Active Directory (AD), either in the form of on-premises AD over a VPN, or via Azure Active Directory Domain Services (AAD DS) which is a Microsoft-managed AD server automatically linked to Azure AD. In the case that on-premises AD is used, AD Connect is also required, introducing further complexity.

    Continue reading
  • It's bizarre we're at a point where reports are written on how human rights trump AI rights

    But that's what UN group has done

    The protection of human rights should be front and centre of any decision to implement AI-based systems regardless of whether they're used as corporate tools such as recruitment or in areas such as law enforcement.

    And unless sufficient safeguards are in place to protect human rights, there should be a moratorium on the sale of AI systems and those that fail to meet international human rights laws should be banned.

    Those are just some of the conclusions from the Geneva-based Human Rights Council (HRC) in a report for the United Nations High Commissioner for Human Rights, Michelle Bachelet.

    Continue reading

Biting the hand that feeds IT © 1998–2021