Sponsored First came the cloud and then, in its wake, a small snowstorm of initialisms that explained how it was going be offered to love-struck customers as services that could be paid for without stressing the capital budget. The famous examples are PaaS (Platform as a Service), IaaS (Infrastructure as a Service, SaaS (Software as a Service) but there are plenty of others if you dive deeper.
These each describe how the cloud turns everything into a button press built on automation, ease of use, and the whole shebang that makes computing services an always-available utility that just works. But what’s not always as well explained is the hairy issue of how exactly all this clever service anchors complex underlying hardware resources and connectivity to the software that sits on top and around it.
Initially, the answer was cloud service tools backed by specialised configuration systems specific to individual resources, but there was a problem - these needed sysadmins to fiddle with CLI consoles, which often left under-pressure developers clock watching. One answer was scripting, a way of turning simple, repeatable activities into software processes the dev team could manage on their own. But scripting only gets you so far when the cloud turns into complex infrastructure, which eventually it always does. There is just too much to do and developers need a way to spin up applications and their hardware dependencies without waiting three days for an admin to do it.
This is how the whole Infrastructure as Code (IaC) movement got started a decade ago, which some also call continuous configuration automation (CCA). Standing back, it’s a typical coders’ way of looking at the cloud. Instead of managing each resource on as single units on, say AWS - a physical server, adding a database, configuring security credentials, taking down an old server - IaC is a way of doing all this using blueprint-like templates as if the whole thing is software. Now when developers want to deploy an application, they can borrow or write code to do that, or more likely use a custom IaC tools such as AWS CloudFormation, Azure Resource Manager, and Terraform, or the software configuration equivalents Chef, Puppet, or Ansible, to do the complicated stuff for them.
This world wasn’t exactly hubris - IaC grows bigger and more important by the day - but interesting start-ups are now appearing that want to tidy up the security problems this headlong rush has created. An example of the new wave is Bridgecrew, a startup based in San Francisco and Tel Aviv, recently uncloaked from stealth mode, which has developed a IaC scanning and remediation system called Bridgecrew Cloud. This is designed to offer visibility into misconfigurations affecting Amazon AWS, Azure, Google Cloud Platform (GCP), as well as violations in Terraform, CloudFormation, and Kubernetes.
“Instead of going to an individual server and configuring it, with IaC you can build an entire blueprint of steps that will happen once it kicks in,” explains Guy Eisenkot, the co-founder and VP of Product. The problem? As helpful as these IaC blueprints might be, many publicly available examples were created to provision infrastructure rather than secure it. The possibility of misconfiguring something when using IaC templates is a big ongoing risk. Worse, knowing that something has gone awry because of an IaC template is incredibly difficult until the damage has been done, which might be within minutes.
It’s no secret misconfiguration is now the cloud’s biggest security worry, although tying IaC to specific cloud security incidents is much harder to assess - misconfiguration can happen via any interface and not only IaC. One way to grasp the scale of the issue is to infer the answer by looking at the IaC templates on public repositories such as GitHub - an approach used by Palo Alto’s Unit 42 earlier this year when it uncovered 199,000 insecure templates, including many high and medium-level flaws that would lead to serious misconfigurations. The worst affected were AWS CloudFormation (42 per cent of templates affected), TerraForm (22 per cent), and Kubernetes YAML (nine per cent).
Let’s define bad. Some of the problems were more a case of complacency, such as the fact that 60 per cent of storage lacked logging, essential to post-incident forensics. Then it heads downhill, for example that 22 per cent of user-configured AWS-EC2 instances left SSH on Port 22 exposed to the Internet, while 17 per cent of AWS security groups allowed open sesame to any inbound traffic on 0.0.0.0. Or that 43 per cent of databases lacked any encryption, a basic security protection mandated by every compliance regime. Similarly, over 10 per cent of Amazon S3 buckets referenced in templates were exposed, a finding that will surprise nobody given the recent barrage of incidents caused by this problem.
Unpublished research by Bridgecrew bears this out, finding that 42 per cent of Terraform templates contained at least one misconfiguration, with the majority failing to enable logging, define backup and recovery, or enable encryption. This list of woe is by no means exhaustive, which underlines how bad things have got. The commonest fail is just making things public that shouldn’t be, says Eisenkot. “Instead of using a private IP address, they end up using a public one. Instead of being part of a VPN, it’s part of Amazon’s public Internet range. This can be an Oracle or Elasticsearch database, or a virtual machine,” he says.
“Misconfigured cloud resources are likely the main root cause for unintended exposure of sensitive data for cloud native companies. Misconfigured public interfaces, exposed secrets, and encrypted databases are just a few very common examples where companies have made bad calls when configuring their cloud infrastructure.”
Or course, cybercriminals have now twigged that misconfigured assets are easy meat, not only to grab data but borrow big servers for the profitable hustles such as cryptomining at the host’s expense (Unit 42 found that 64 per cent of templates helpfully set no CPU or memory limits). The result is that exposed resources are spotted within minutes, before the developers who inadvertently created the hole have put down their polystyrene coffee cup after hitting ‘commit’. Patience is cheap and always rewarded.
“Vulnerabilities are exploited in seconds. The moment a database goes online, there are hundreds of online scanners that are scoping a given IP address range for weaknesses. Only ten minutes of access can allow someone to replicate the data.”
Filling the pipeline
According to Eisenkot, it’s not that IaC screw-ups cause all cloud misconfigurations, but that they are completely avoidable. This is code which can be analysed, after all. Bridgecrew’s answer is not simply to detect these misconfigurations – in run-time the tools to do that are commonplace, but in build-time less so. The company instead set out automate the process of remediating and fixing them. That’s why it built Checkov, an open source IaC scanner that can detect misconfigurations in Terraform, CloudFormation, and Kubernetes.
Eisenkot uses the example of a developer building a backend service for a web application using Terraform, a declarative framework because it defines the goal which the software works out how to achieve (as opposed to procedural frameworks which fulfil step-by-step instructions).
When the developer adds an Amazon RDS database, pushing it through a deployment tool such as Jenkins, it is first analysed by Bridgecrew. If this finds, say, that RDS has not been encrypted, the commit fails, leaving the developer to correct the encryption misconfiguration, ignore the warning if they are sure the protection should not apply, or raise a ticket using third-party tracking systems such as Jira. To aid the fix, Bridgecrew collects remediations into service-specific fixes called playbooks, in this instance to enforce the encryption of resources that would be exposed.
The automation is the critical part of this process because without it, problems either slip past checks or must be corrected manually. For organisations that are constantly tweaking and re-building their applications, this can end up creating a backlog or leave developers on their own to sort the problem without the necessary tools to hand.
“Most enterprises track misconfigurations in a ticketing and alerting system, like PagerDuty, OpsGenie or alternately using Jira or ServiceNow,” says Eisenkot. “A DevOps team would usually get their misconfiguration alerts from their cloud provider, open source tools or cloud security platforms and route it to the right person on a specific development team.”
Bridgecrew’s aim is to go beyond using misconfiguration checking as an extra layer that slows down development by turning it into just another automated element of an engineer’s workflow. Bridgecrew Cloud can also use existing APIs to check an organisation’s cloud infrastructure against security and compliance policies and best practice.
Hence the choice of Bridgecrew as the company’s name. “The origin was that we saw DevSecOps as a fragmented domain that needed technology to bridge the different crews taking part in securing the organisation. In that sense, our platform helps teams bridge the gaps between them.”
The irony is that by turning cloud configuration into a coding job, IaC should make security and compliance an analytical job which benefits security. The problem was that the cloud rush led some organisations into a dead end where getting the service up was all that matter. That ethos must now be unwound, argues Eisenkot.
“At Bridgecrew, we believe that the combination of IaC with a strong foundation of systematic code analysis, can drastically reduce the number of misconfigurations that can be caused due to the use of legacy provisioning protocols.”
Sponsored by Bridgecrew