Attacking multicore CPUs
Get exploited on time
The world of multi-core cpus we have just entered is facing a serious threat.
A security researcher at Cambridge disclosed a new class of vulnerabilities that takes advantage of concurrency to bypass security protections such as antivirus software
The attack is based on the assumption that the software that interacts with the kernel can be used without interference. The researcher, Robert Watson, showed that a careful written exploit can attack in the little timeframe when this happens, and literally change the "words" that they are exchanging.
Even if some of these dark aspects of concurrency were already known, Watson proved that real attacks can be developed, and showed that developers have to fix their code. Fast.
Watson presented his work at WOOT07, USENIX Workshop on Offense Technology, the results of his research entitled "Exploiting Concurrency Vulnerabilities in System Call Wrappers".
During the talk he showed how concurrency can be used to bypass security protections applied by so-called syscall wrappers.
A system call, briefly called syscall, is a basic function in the kernel that is called by a program. For example, when you open a file it's highly probable that the software you are using called the
open() syscall to open it.
A sycall wrapper sits between the kernel and the program itself, and analyzes which syscalls are called and their arguments. A security wrapper might be configured to block access to some files, so in the previous example trying to
open() the file "secrets.txt", it may stop the operation and return an error to the application.
We contacted Robert to learn more...
How does the attack work?
System call wrapping is a widely-used technique for extending kernel security, found in anti-virus systems and security policy enhancement frameworks such as the GSWTK, Systrace, and CerbNG systems I examine in the paper. System call interposition allows code running in the kernel address space to "wrap" system calls, adding new security checks, replacing the values of arguments to virtualize name spaces, or to audit arguments for the purposes of logging or intrusion detection. It's a very flexible technique, and appealing to software authors because it doesn't require changing existing kernel code, and allows control at the very well-understood system call interface.
This attack targets a weakness in the system call wraper architecture, in which system call arguments are separately copied by the system call wrapper and the kernel, allowing the attacker to "race" to replace the argument values between copies.
I was able to successfully bypass security in many system call wrappers by creating unmanaged concurrency between the attacking processes and the wrapper/kernel. This was possible on both uniprocessor systems and multiprocessor systems.
The existence of some of these vulnerabilities has been known for years (Ghormley 1998, Garfinkel 2003, Watson 2003), and I approached the authors of many of these wrapper systems as early as 2002 to report the problems. The contribution of this paper is in analyzing the vulnerability class, thoroughly exploring the attack space (I identify two previously undiscussed classes of race conditions, one of which is more broadly applicable), and to explore exploit strategies, allowing us to reason about the effectiveness of this attack aproach. It turns out that the approach is very effective indeed.
The paper [PDF] provides both a detailed discussion of the general class of concurrency vulnerabilities, and more concrete discussion of these specific vulnerabilities. I'd refer readers especially to the pictures and code in the slides [PDF] associated with the talk, which should make both the attack approach and simplicity of the exploits clear. In less than 20 lines of C code, and using only standard OS calls for memory access and management, the wrapper protections were completely disabled.
What is needed to succeed?
When I started working on this project, I was sure that the vulnerabilities could be exploited easily on multiprocessor systems, but didn't know to what extent uniprocessor systems would be susceptible. I was also unsure of the software requirements -- were threads required, etc. As it turns out, the attacks are broadly applicable, working on unprocessor OS's without threading. The attacker needs to be able to run code in a local process constrained by a system call wrapper, which he (or she) will then be able to bypass with relative ease.
On multiprocessor systems, we measure the size of the race window in cycles, and I found that the width of the race varied enourmously by wrapper system. Most of the wrapper systems I looked at were kernel-only, so 30,000 cycles might not be an unusual length. However, Systrace performs control in user space, leading to race conditions of 500,000 cycles or more due to context switching. In the end, the size in cycles doesn't make much difference, as both of those numbers are very large compared to the cost of local memory access.
On uniprocessor systems, creating concurrency between the kernel and user space may be done using page faults, introduced where the kernel accesses user memory that has been paged to disk due to memory pressure. They can also be introduced through network delays or other IPC, which cause the kernel to yield. The key is that the user process is able to execute during critical windows between access to a system call argument by a wrapper and the kernel -- this turns out to be quite straight forward.
Could it be used in a remote exploit? Or it requires too short/precise timing to work with common internet latency?
These specific attacks require the attacker to be able to control a process on the system -- either legitimately (perhaps they have an unprivileged user account) or less legitimately (they have exploited a vulnerability in a service, such as Apache, BIND, MySQL, etc to gain execution privilege). The attacker will then be able to escape from a sandbox placed around their user process or vulnerable service, gaining access to the remainder of the system.
The details vary based on the intended effects of the wrapper. For one GSWTK wrapper, I show how to bypass intrusion detection when exploiting a vulnerable IMAP daemon, preventing alarms from firing despite accessing files outside the expected execution profile of an IMAP daemon. For Sysjail, I show that access control limits on what IP address can be bound may be entirely bypassed. For Sudo monitor mode, I am able to prevent the arguments to commands from being properly audited.
How much does the hardware platform affect the attack?
Multiprocessor systems are marginally easier to exploit since they do not require forcing kernel context switches via paging or other techniques. However, I was able to successfully bypass the same wrappers on uniprocessor systems. I did my experimental work on Intel hardware, but they should work across a range of hardware architectures and configurations.
And what about the OS?
These attack techniques target an architectural vulnerability in the wrapper approach, and readily apply across operating systems and hardware platforms. I was able to use the same C language exploits across several operating systems, including Linux, FreeBSD, NetBSD, and OpenBSD. They should apply equally well on other operating systems.
Is it something that might affect software written in any programming language?
The broader class of concurrency vulnerabilities are relevant to all concurrent systems, and are something all software developers need to be aware of. These specific races require shared memory between the two parties (processes and kernel/system call wrapper), so vulnerable software would necessarily involve shared memory between two mutually untrusting processes. You might find this construction in cases where server and client processes share memory in order to optimize inter-process communication, such as between databases and clients or in windowing systems.
While more rich language systems, such as scripting languages, often introduce opacity in memory access, in practice they behave fairly predictably and must do so to use shared memory. If languages support shared memory, improperly written programs might well be vulnerable. Likewise, they might well support attacks against system call wrappers using the techniques I've described.
Robert Watson has been actively involved with FreeBSD since 1999 and started the TrustedBSD Project in 2000, with the goal of bringing more advanced security features to the platform. As of October, 2005, he returned to Academia to work on a PhD at the University of Cambridge Computer Laboratory, after spending about six years in industry working in commercial and government-sponsored operating system and network security research and development. ®