This article is more than 1 year old
Attacking multicore CPUs
Get exploited on time
What is needed to succeed?
When I started working on this project, I was sure that the vulnerabilities could be exploited easily on multiprocessor systems, but didn't know to what extent uniprocessor systems would be susceptible. I was also unsure of the software requirements -- were threads required, etc. As it turns out, the attacks are broadly applicable, working on unprocessor OS's without threading. The attacker needs to be able to run code in a local process constrained by a system call wrapper, which he (or she) will then be able to bypass with relative ease.
On multiprocessor systems, we measure the size of the race window in cycles, and I found that the width of the race varied enourmously by wrapper system. Most of the wrapper systems I looked at were kernel-only, so 30,000 cycles might not be an unusual length. However, Systrace performs control in user space, leading to race conditions of 500,000 cycles or more due to context switching. In the end, the size in cycles doesn't make much difference, as both of those numbers are very large compared to the cost of local memory access.
On uniprocessor systems, creating concurrency between the kernel and user space may be done using page faults, introduced where the kernel accesses user memory that has been paged to disk due to memory pressure. They can also be introduced through network delays or other IPC, which cause the kernel to yield. The key is that the user process is able to execute during critical windows between access to a system call argument by a wrapper and the kernel -- this turns out to be quite straight forward.
Could it be used in a remote exploit? Or it requires too short/precise timing to work with common internet latency?
These specific attacks require the attacker to be able to control a process on the system -- either legitimately (perhaps they have an unprivileged user account) or less legitimately (they have exploited a vulnerability in a service, such as Apache, BIND, MySQL, etc to gain execution privilege). The attacker will then be able to escape from a sandbox placed around their user process or vulnerable service, gaining access to the remainder of the system.
The details vary based on the intended effects of the wrapper. For one GSWTK wrapper, I show how to bypass intrusion detection when exploiting a vulnerable IMAP daemon, preventing alarms from firing despite accessing files outside the expected execution profile of an IMAP daemon. For Sysjail, I show that access control limits on what IP address can be bound may be entirely bypassed. For Sudo monitor mode, I am able to prevent the arguments to commands from being properly audited.
How much does the hardware platform affect the attack?
Multiprocessor systems are marginally easier to exploit since they do not require forcing kernel context switches via paging or other techniques. However, I was able to successfully bypass the same wrappers on uniprocessor systems. I did my experimental work on Intel hardware, but they should work across a range of hardware architectures and configurations.
And what about the OS?
These attack techniques target an architectural vulnerability in the wrapper approach, and readily apply across operating systems and hardware platforms. I was able to use the same C language exploits across several operating systems, including Linux, FreeBSD, NetBSD, and OpenBSD. They should apply equally well on other operating systems.
Is it something that might affect software written in any programming language?
The broader class of concurrency vulnerabilities are relevant to all concurrent systems, and are something all software developers need to be aware of. These specific races require shared memory between the two parties (processes and kernel/system call wrapper), so vulnerable software would necessarily involve shared memory between two mutually untrusting processes. You might find this construction in cases where server and client processes share memory in order to optimize inter-process communication, such as between databases and clients or in windowing systems.
While more rich language systems, such as scripting languages, often introduce opacity in memory access, in practice they behave fairly predictably and must do so to use shared memory. If languages support shared memory, improperly written programs might well be vulnerable. Likewise, they might well support attacks against system call wrappers using the techniques I've described.
Robert Watson has been actively involved with FreeBSD since 1999 and started the TrustedBSD Project in 2000, with the goal of bringing more advanced security features to the platform. As of October, 2005, he returned to Academia to work on a PhD at the University of Cambridge Computer Laboratory, after spending about six years in industry working in commercial and government-sponsored operating system and network security research and development. ®