Python Package Index found stuffed with AWS keys and malware
British developer uses homegrown scanning tool to check for risks
The Python Package Index, or PyPI, continues to surprise and not in a good way.
Ideally a source of Python libraries that developers can include in their projects to save time, PyPI has again been caught hosting packages with live Amazon Web Services (AWS) keys and data-stealing malware.
Malicious packages are, sadly, nothing new for PyPI or for packaging systems like npm, RubyGems, crates.io, and the like. Supply chain attacks – via compromising software libraries or typosquatting – have been an issue for years, though one that has gotten more attention recently with incidents like the compromise of SolarWinds.
Despite enhanced vigilance, these incidents still occur with alarming frequency. Just before the New Year, the maintainers of machine learning framework PyTorch warned that PyTorch-nightly, if installed on Linux via pip, included a compromised dependency available through PyPI called
Less than a week later, security firm Phylum said that in December it had identified a remote access trojan in a PyPI package called
pyrologin. Another security firm, ReversingLabs, also spotted a malicious PyPI package that month: The malware was masquerading as an SDK from security firm SentinelOne. And in November, dozens of newly published PyPI packages were found to include W4SP malware.
PyPI had a mass malware culling in March 2021 that resulted in the removal of 3,653 malicious code blocks. But the weeds have returned, to say nothing about the security issues identified through automated analysis a few months later in almost half the PyPI libraries.
Apart from the subverted libraries and the half-decent code, what has PyPI ever given us? Lately, it's been offering keys that provide access to the AWS computing resources and data used by Amazon, Intel, various US universities, the Australian government, US energy firm Fusion Atomics, and Malaysia-based Top Glove, the world's largest glove maker, among others.
Brits find the keys, again
UK-based software developer Tom Forbes on Friday published a blog post outlining how he found 57 active API access keys for AWS services from the above mentioned companies.
Forbes built a Rust tool to automatically scan all new packages released on PyPI for the inclusion of AWS API keys. And, well, it works.
Forbes in his post explains that his scanner runs periodically using GitHub Actions and looks for AWS keys in new releases from PyPI, HexPM, and RubyGems. If it finds any, it generates a report with the relevant details that gets committed to the aws-cred-scanner repo.
"This report contains the keys that have been found, as well as a public link to the keys and other metadata about the release," Forbes said in his post. "Because these keys are committed to a public GitHub repository, Github’s Secret Scanning service kicks in and notifies AWS that the keys are leaked."
As a result AWS opens a support ticket to notify the offending developer and applies a quarantine policy to limit the potential for misuse of the key.
The problem, of course, is that a less scrupulous person could create a similar scanning script for the purpose of exploitation and abuse. And it would be surprising if that isn't happening already.
Forbes in an email told The Register that AWS keys of this sort can be misused.
"It depends on the exact permissions given to the key itself," Forbes explained. "The key I found leaked by InfoSys [in November] had 'full admin access' which means it can do anything, and other keys I found in PyPI were ‘root keys’ which are also allowed to do anything. An attacker holding these keys would have full access to the AWS account it is linked to."
Other keys, he said, may have more limited but still excessive permissions. For example, he said it's common that a key intended to provide access to a single AWS S3 storage bucket has mistakenly been provisioned to provide access to all S3 buckets associated with that account.
Between a rock and a hard place
Forbes pointed to GitHub's automated key scanning, which also covers keys in npm packages, as an example of a useful defensive measure. But he said the company's approach has limitations.
"GitHub also cares a lot about supply chain security but they have dug themselves a hole: The way they scan for secrets involves a lot of collaboration with vendors who may disclose internal information about how keys are constructed to GitHub," he explained.
"This means that the regular expressions that GitHub uses to scan for secrets cannot be made public and are sensitive, which also means that third parties like PyPI are effectively unable to utilize this awesome infrastructure without sending every bit of code published on PyPI to GitHub."
Forbes said that's a shame because, while PyPI could do more to enhance supply chain security, it's a difficult job to do well.
"GitHub has a whole team working on this whereas PyPI simply doesn’t have those kinds of resources," he said. "I believe that there are improvements to be made in the Python ecosystem to help prevent keys (and code) being accidentally bundled and published to PyPI, and that might be a more effective use of resources."
A Python Foundation spokesperson didn't immediately respond to a request for comment.
"I believe a fair bit of the blame can be laid at the feet of developers, but this sort of thing may not be part of their core competency – security is hard to get right at the best of times," Forbes said. "AWS has some blame to share here as well: IAM is notoriously difficult to debug and get right which leads to overly wide permissions being granted on keys."
Forbes also suggested companies should think more carefully about their security policies.
"Policies may enforce that 'nothing on S3 should be public,' and when something is required to be public it may be simpler to make the IAM credentials public instead of trying to work through the security policies and get an exception made. This is something I’ve heard of happening before." ®