Outposts, Local Zone, Wavelength: It's a new era of distributed cloud, says AWS architect
Adrian Cockcroft talks to El Reg about cloud architecture – and why we need more chaos in our systems
re:Invent The advent of Outposts, Local Zone and Wavelength - released at AWS' Re:Invent conference in Las Vegas - amounts to a "new platform" that is now distributed rather than centralised, a company veep has told The Reg.
Adrian Cockcroft, cloud architecture strategy veep, has been at AWS "about three years", though he is also well known for leading the charge towards microservices and public cloud back when he was cloud architect for Netflix, a position he held until 2014.
"The most interesting architectural announcement [at re:Invent] is Outposts, Local Zone and Wavelength," he told us, "because this takes a bunch of the architectural assumptions about cloud, that it's a centralising influence, and turns it into a fully distributed thing."
Outposts is a rack of servers managed by AWS but physically on-premises. The customer provides the power and network connection, but everything else is done for them. If there is a fault, such as a server failure, AWS will supply a replacement for you to slide in; it is configured automatically. Outposts runs a subset of AWS services, including EC2 (VMs), EBS (block storage), container services, relational databases and analytics. S3 storage is promised for some time in 2020. Outposts was announced at re:Invent 2018, but is only now becoming generally available.
Local Zone, currently only available in Los Angeles, is an extension of an AWS Region running in close proximity to the customers that require it for low latency. The requirement in LA is for video editing.
Wavelength is a physical deployment of AWS services in data centres operated by telecommunication providers to provide low-latency services over 5G networks. Operators signed up so far include Verizon, Vodafone Business, KDDI and SK Telecom.
It turns out that these three services are closely related. "Outposts is a rack of machines. What we had to figure out is ways that we could let other people host those racks," said Cockcroft. "Local Zone, that's effectively a large clump of outposts. Wavelength is a service provided by Verizon or KDDI which lets you deploy into that, but the way we implement it is, we ship some Outposts to Verizon and they stick them near the 5G endpoints."
Unlike Outposts, Local Zone is multi-tenant. "We have to get a group of customers for us to invest in a Local Zone, there has to be enough local demand," said Cockcroft. It is unlikely that London would get one because there is already an AWS region there. Perth, on the other hand, could make a good case. "Perth is a very long way from Sydney. And we support Australia. There are mining companies who want the cloud but it's too far away. There's a lot of interest in countries where there is just one region, to create disaster recovery regions or backup regions."
For Cockcroft, this is a new architecture. "What we've done is taken a bunch of assumptions about the architecture of the cloud, that have been true for 10-15 years, and said no, that's not true any more. Now the machines can be separated over the network, we can have them deployed anywhere. People can start thinking, what is a cloud-native architecture in this new distributed world?"
Adrian Cockcroft, VP Cloud Architecture Strategy, at AWS re:Invent
Listening to Cockcroft, you would almost imagine that the ability to run on-premises is some new thing. The actual new thing is to be able to hand over management of your on-premises computing to AWS and manage it as if it were just another deployment zone.
The details of what is a recommended distributed architecture with Outposts are still emerging. "During 2020 we'll have more to talk about it," said Cockcroft.
How has best practice for architecting a resilient, scalable application changed since the days Cockcroft ran this at Netflix? "The network layer sophistication is probably the biggest change," he said. "The way that you arrange the networking traffic and segment things and the security models. It's not good enough to just have a disaster recovery site, you've also got to be secure here and secure there, and have your security architecture so that if you get a failure in your primary it doesn't kill the security architecture, they have to be independent but they also have to trust each other. There's a whole lot of interesting problems to solve around the identity and key management. There's a coordination problem of knowing what is working here and what is working there."
The complexity is such that Cockcroft believes that creating AWS Solutions, templates for best-practice deployments, will help. This was the case, he said, with data lakes. "Every customer is building data lakes. They were all doing it different ways. Most of them weren't building in role-based access control and security as a baseline thing, so we came up with Lake Formation, which is the generic 'everyone should build a data lake using this'," he said.
What about chaos engineering, the practice of inserting deliberate failures into systems in order to verify resilience? Netflix was an early advocate of this – is AWS providing tooling? "We do a bit already," Cockcroft told us. "It's piecemeal, every individual service has a different thing. Aurora has an ability to go into it and tell the database to misbehave. You can cause a master to fail, you can introduce latency, you can introduce error rates in the database. There's an interface for creating failure scenarios.
"As you look across the product line, we have to talk to every single team and say, what do you need to do to expose a few hooks where we can introduce some deliberate failures? But quite often you can do it at application level.
"The driver for this is that we have an increasing number of customers in safety and business-critical industries moving all-in to cloud. If that's a healthcare provider or an airline, or a bank or a financial institution, this has to work. We have to understand what happens and all the different possible failure modes."
According to Cockcroft, current disaster recovery plans are in many cases inadequate. Businesses "have backup data centres that they daren't failover to because they know it wouldn't work. That's the common practice. Everyone looks embarrassed if you ask too many questions about how they test their disaster recovery."
Region-to-region recovery on AWS is a better solution, he claimed, because the target looks the same as the source, whereas "every data centre is different, so every data centre failover is custom built and very poorly tested".
AWS for on-premises. AWS for cloud. AWS for edge. AWS for disaster recovery. This is the world of "all-in" and it does require putting huge trust in one provider. That is one issue to think about, but what is less controversial is that if an organisation has made that decision, it pays to do it right, and in this respect there is still a lot to learn. ®