This article is more than 1 year old
Sneaky Python package security fixes help no one – except miscreants
Good thing these eggheads have created a database of patches
Python security fixes often happen through "silent" code commits, without an associated Common Vulnerabilities and Exposures (CVE) identifier, according to a group of computer security researchers.
That's not ideal, they say, because attackers love to exploit undisclosed vulnerabilities in unpatched systems and because developers who are not security experts may not recognize that an upstream commit is targeting an exploitable flaw that's relevant to their code.
Ergo, a Python package could have a serious hole in it, application developers may not realize this because there's little or no announcement about it, and not incorporate a patched version into their code, and miscreants can make the most of this by exploiting those non-publicized vulnerabilities.
In a preprint paper titled, "Exploring Security Commits in Python," Shiyu Sun, Shu Wang, Xinda Wang, Yunlong Xing, Kun Sun from George Mason University, and Elisa Zhang from Dougherty Valley High School, all in the United States, propose a remedy: a database of security commits called PySecDB to make Python code repairs more visible to the community.
More security commits fall in the wild silently, without being indexed by CVE
"Since the CVE records on Python programs are limited, we observe that only 46 percent of them provide the corresponding security commits and more security commits fall in the wild silently, without being indexed by CVE," the group concluded in their paper, which was accepted for the 2023 ICSME conference.
PySecDB has three parts: a base dataset, a pilot dataset, and an augmented dataset. The base dataset consists of security commits associated with CVE identifiers. For example, CVE-2021-27213 includes a link to the actual code change in the relevant project's GitHub repo, a fix of CWE 502, Deserialization of Untrusted Data.
The pilot dataset comes from identifying GitHub commit messages in Python projects that contain relevant keywords.
And the augmented dataset, designed to catch security commits without telltale commit messages, comes from a graph neural network model called SCOPY that spots security-related code changes through the sequence and structure of code semantics.
Together, these form PySecDB, which the academics say represents the first security commit dataset in Python. It contains 1,258 security commits and 2,791 non-security commits culled from more than 351 popular GitHub projects, covering 119 more CWEs.
- Warning: JavaScript registry npm vulnerable to 'manifest confusion' abuse
- This malicious PyPI package mixed source and compiled code to dodge detection
- Python Package Index had one person on-call to hold back weekend malware rush
- Subpoenaed PyPI says bye-bye to as much IP address data as it can
By compiling PySecDB, the paper authors noticed four common security fix patterns, which they say can be generalized and turned into intermediate representations for use in automated program repair. These patterns include: adding or updating sanity checks; revising API usage; updating regular expressions; and restricting security properties.
The boffins caution that their SCOPY model has the potential to identify undisclosed vulnerability fixes, which while helpful could also enable an attacker to find flaws in unpatched systems.
"Our objective in this paper is to prioritize the security of the users’ systems; that is why we only share detailed information on the security fixes, rather than the vulnerabilities," they state in their paper. "By taking this approach, attackers cannot leverage the SCOPY to gain additional details on the vulnerabilities. However, with the SCOPY, open-source software maintainers can quickly reveal vulnerabilities as soon as security fixes become public, improving the overall security of their software systems."
Dr. Kun Sun, a professor in the Department of Information Sciences and Technology at George Mason University and a co-author of the paper, told The Register in an email that one of the reasons that so many Python vulnerabilities are addressed silently, is that "It is too complicated to get a CVE-ID for a Python vulnerability." He added also that "developers may consider the vulnerability as a performance bug."
To improve the security situation, Sun argues for increasing the awareness of silent security patches, creating guidance to help developers identify and label vulnerabilities, and applying tools to spot silent security patches.
Seth Michael Larson, security developer-in-residence at the Python Software Foundation, told The Register that while silent security patches have some impact on security, he suspects that serious flaws with significant impact are being appropriately recorded in CVE notices.
"Right now there's a variety of reasons there may be a discrepancy between security fixes and CVEs like lack of time and resources for open source maintainers or mismatches between an automatically annotated security fix and a projects' security model which typically can't be processed automatically," Larson explained.
"From the perspective of software producers: what I'm seeing now is that there's a general 'lowering of barriers' for projects wanting to adopt a disclosure policy, to publish advisories, and have CVE IDs allocated for vulnerabilities. This means there will be more CVEs issued for security vulnerabilities and fixes in the future."
"To that end in my own role: I'm working on registering the PSF as a CVE Numbering Authority (CNA) and will be publishing materials for other open source organizations or projects looking to manage their own CVEs and advisories and how to offer those benefits to projects in their scope."
PySecDB is available on request from Sun Security Laboratory at George Mason University, for non-commercial research or personal use. ®