This article is more than 1 year old
Warning: JavaScript registry npm vulnerable to 'manifest confusion' abuse
Failure to match metadata with packaged files is perfect for supply chain attacks
The npm Public Registry, a database of JavaScript packages, fails to compare npm package manifest data with the archive of files that data describes, creating an opportunity for the installation and execution of malicious files.
In a blog post published on Tuesday, Darcy Clarke, who was staff engineering manager for the npm CLI (command line interface) team from July 2019 through December 2022, calls this "manifest confusion" and says it represents a potential software supply chain vulnerability.
"The npm Public Registry does not validate manifest information with the contents of the package tarball, relying instead on npm-compatible clients to interpret and enforce validation/consistency," Clarke explains.
Clarke is not an entirely disinterested party with regard to npm. He's developing an alternative JavaScript registry and package manager called vlt.
According to Clarke, the npm Public Registry server has never done manifest validation. It's an issue that has the potential to affect a lot of developers – npm, acquired by Microsoft's GitHub in 2020, is used by more than 17 million developers and hosts more than three million packages. Last month, it served over 215 billion downloads.
The registry.npmjs.com
endpoint, Clarke says, will let registered developers publish packages using a PUT request to the appropriate URI.
"The issue at hand is that the version metadata (a.k.a. 'manifest data') is submitted independent from the attached tarball which houses the package's package.json," he explains. "These two pieces of information are never validated against one another and [this] calls into question which one should be the canonical source of truth for data such as dependencies, scripts, license, and more."
The tarball – a compressed archive of files – gets signed, but the name and version fields declared in the package.json file can be different from the name and version fields in the manifest because they're not validated.
This lack of validation presents several risks, Clarke says, including cache poisoning, the installation of unanticipated dependencies, the execution of unanticipated scripts, and version downgrade attacks.
- GitHub debuts pedigree check for npm packages via Actions
- Python Package Index had one person on-call to hold back weekend malware rush
- Worried about the security of your code's dependencies? Try Google's Deps.dev
- So you want to integrate OpenAI's bot. Here's how that worked for software security scanner Socket
The problem came up in a bug report last year, though we have no doubt others spotted it earlier.
According to that report, the published package @datadog/native-metrics declared an install script but the attached tarball of files included a package.json file without an install script. While this wasn't a security issue, it could have been.
Asked whether lack of resources for npm development under GitHub led to this state of affairs, Clarke told The Register that while he believes GitHub underinvested in npm, "I think this issue actually went unnoticed for so long because of the horrible lack of up-to-date registry documentation."
"Many consumers don't interact directly with the registry interface so they only know what the developer tools/package managers say about the published packages," he explained.
"I also think the initial reason this came to pass was because npm, in its infancy, had both the client and registry open sourced."
The Register understands that the npm Public Registry hasn't been fully open source since early 2014, about four years after its initial release. Clarke's suggestion is that since then, npm registry code hasn't received as much attention as it might have otherwise.
The ecosystem is currently under the incorrect assumption that the manifest always contains the contents of the tarball's package.json
The potential for "manifest confusion," said Clarke, also affects various third-party tools and JavaScript package managers, though under different circumstances.
"The key point to make here is that the ecosystem is currently under the incorrect assumption that the manifest always contains the contents of the tarball's package.json," said Clarke, who again pointed to the lack of documentation about the need for npm client software to ensure manifest-tarball consistency.
In an email to The Register, Feross Aboukhadijeh, CEO of security biz Socket, said the issue raised by Darcy Clarke is valid and relevant to nearly all package managers and security tools in the space, with the exception of Socket, natch.
"The tldr of this issue is that it lets an attacker include a dependency in a package that won’t show up on the npm website, even though the CLI will actually install it," said Aboukhadijeh.
"The Socket research team independently discovered this so-called “manifest confusion” issue and deployed a fix for it on September 5, 2022. Since that date, all dependency analysis on Socket has been using the correct manifest file – specifically, the package.json inside the tarball – which matches the installation behavior of every major package manager. That means that the 'manifest confusion' technique would not successfully hide dependencies from Socket’s analysis.
It lets an attacker include a dependency in a package that won’t show up on the npm website, even though the CLI will actually install it
"However, public package pages on Socket, such as this page for left-pad, were using a different data source based on the registry metadata. We’ve resolved this issue today.
"Furthermore, we were already in the process of developing a new proactive detection for this technique as of last week, and we're rolling it out today. This means that any organization using Socket will receive a critical security alert if one of their dependencies attempts to use this technique in the wild (which is probably quite likely now that this technique is public)."
Aboukhadijeh said the broader issue of data quality in tooling should be considered because most software composition analysis (SCA) tools don't do a very good job generating accurate dependency graphs.
"Without throwing any specific security vendors under the bus, I’ll just say that every one of the dependency tools I’ve tested misses entire dependencies because of shortcuts taken, and a fundamental failure to understand the npm package installation process," he said.
"It’s like most security vendors just get to a 'minimum viable product’ and ship it. For that reason, I’m grateful to Darcy for raising awareness of this issue."
GitHub did not respond to a request for comment. Socket has more info for developers about this manifest confusion issue here, issued today. ®