This malicious PyPI package mixed source and compiled code to dodge detection
Oh cool, something else to scan for
Researchers recently uncovered the following novel attack on the Python Package Index (PyPI).
ReversingLabs detected a Python package in April that mixed malware with compiled code as a way to evade detection by security tools that only check source code files and not compiled output.
"It may be the first supply chain attack to take advantage of the fact that Python byte code (PYC) files can be directly executed, and it comes amid a spike in malicious submissions to the Python Package Index," Karlo Zanki, a reverse engineer at ReversingLabs, wrote in a report on Thursday.
"If so, it poses yet another supply chain risk going forward, since this type of attack is likely to be missed by most security tools, which only scan Python source code (PY) files."
It's a worrying threat given the increasing number of attacks not only on PyPI but other open source code repositories like GitHub, NPM, and RubyGems. Miscreants are trying to slip malicious code into packages via these platforms in hopes that developers will grab one and inadvertently put the bad code into their software.
Developers who use PyPI now have to contend with a possible threat that is designed to go undetected by popular security tools.
A new attack technique
For now, the package unearthed by ReversingLabs – named fshec2 – appears to be out of the mix. On April 17 the biz notified PyPI's security team about the threat, and it was promptly removed from the repository. However, PyPI's team did tell ReversingLabs that it had never seen that type of attack before and other miscreants may use similar techniques, apparently.
Zanki wrote that ReversingLabs routinely scans repositories for suspicious files, which tend to show themselves through unusual qualities and behaviors. The fshec2 package was no different, holding URLs that reference a mystery remote host by IP address, creating new processes, and executing files.
A deeper dive into the package found that it only held three files, two of which were benign. However, the third file – full.pyc – caught the researchers' interest. What they found was a malware loader that didn't follow the typical patterns of attacks seen in PyPI incidents.
Most such attacks use obfuscation measures to hide malware published to the repository from analysts or detection tools – and such techniques are getting better and more plentiful. However, fschec2 didn't bother with obfuscation. Instead, it placed all the malicious code and functions into a single file that held compiled Python bytecode.
Putting all that into the file and sending it over the web was also unusual. Typically, attackers will gain initial entry by compromising a system and then have tools within the code communicate with a command-and-control (C2) server, which will then send the malware to be executed.
- Subpoenaed PyPI says bye-bye to as much IP address data as it can
- Python Package Index had one person on-call to hold back weekend malware rush
- Modular finds its Mojo, a Python superset with C-level speed
- Python still has the strongest grip on developers
Malicious functions in bytecode
Looking more into the full.pyc file, ReversingLabs found a method called
get_path which performs the expected malicious functions seen in other malevolent PyPI packages – including collecting usernames, hostnames, and directory listings. However,
get_path isn't found "in readable form" inside fshec2 because it's in the full.pyc file – which holds bytecode rather than source code.
Bytecode is a representation of Python code used as a set of instructions for the Python Virtual Machine. Unlike source code written by humans, bytecode is converted code that can be interpreted easily by a machine but is difficult to be understood by humans. Thus,
get_path wasn't seen in readable form and Inspector – the tool provided by the PyPI security team to scan PyPI packages – can't analyze binary files for malicious content or behavior.
"Compiled code from the .pyc file needed to be decompiled in order to analyze its content," Zanki wrote. "Once that was accomplished, the suspicious and malicious functionality was easy to see. The discovery of malicious code in the fshec2 package underscores why the ability to detect malicious functions such as
get_path is becoming more important for both security and DevSecOps teams."
Most security tools also don't typically run source code analysis when inspecting packages, which is "why malware hidden inside the Python compiled byte code could slip under the radar of the traditional security solutions" according to Zanki.
Repositories under attack
PyPI, GitHub, and other repositories have been under steady attack for years. Last month, PyPI – which has more than 455,000 Python code repositories – saw so many attempts to create malicious accounts and code libraries that it stopped allowing new users and projects in for a while.
That said, PyPI is working to harden its security. That means removing automatic PGP signature support, announcing Amazon Web Services as the group's security sponsor – including a $144,000 investment to fund security projects – and creating a security engineer role.
Most recently, the organization said it is making two-factor authentication a requirement for all accounts by the end of the year. The group first talked about this in 2019 and last year made it mandatory for projects in the top 1 percent of downloads.
Other repositories are doing the same. In March, GitHub rolled out 2FA requirements for developers who contribute to public projects. Last year, it made it mandatory for the maintainers of the top 100 npm packages, and later in 2022 expanded it to cover all maintainers of packages with more than a million weekly downloads or packages with more than 500 dependents.
RubyGems last year started requiring multi-factor authentication for owners of packages with more than 180 million downloads. ®