Intel is pushing a neat technique that could block malware infections on computers at the processor level.
That's the 40,000ft view of the new safety mechanism, the details of which were published on Thursday. What's really going on is this: Intel's so-called Control-flow Enforcement Technology (CET) [PDF] attempts to thwart exploit code that uses return-orientated programming (ROP) and jump-orientated programming (JOP).
CET works by introducing a shadow stack – which only contains return addresses, is held in system RAM, and is protected by the CPU's memory management unit. When a subroutine is called, the return address is stashed on the thread's stack, as per normal, and also in the shadow stack. When the processor reaches a return instruction, the processor ensures the return address on the thread stack matches the address on the shadow stack.
If they don't match, then an exception is raised, allowing the operating system to catch and stop execution. Therefore, if exploit code starts tampering with the stack to chain together malicious instructions to install malware or otherwise compromise a system, these alterations will be detected and the infiltration halted before any damage can be done.
The shadow stack must sit in memory that has a new shadow stack bit set in the page tables. Any attempts by software to access the shadow stack – such as with a MOV instruction – are blocked by the memory management unit and a page fault raised to alert the operating system that shenanigans are afoot. Any attempt to use a control-flow instruction – such as RET – when the shadow stack is not marked as a shadow stack in the page tables will also raise a page fault.
The shadow stack pointer (SSP) for the running thread is stored in the Task state segment. There are various control registers that hold SSPs for privilege rings 0 to 2 (non-usermode rings) and for interrupts. You should really read the above PDF if you're interested in the detail – this is really only scratching the surface.
Hold up, what does this stop exactly?
Once upon a time, you could – for example – find a memory buffer in some software and inject more data into it than the array could hold, thus spilling your extra bytes over other variables and pointers. Eventually you could smash the return address on the stack and make it point to a payload of malicious code you smuggled into the gatecrashing data. When the running function returns, the processor wouldn't jump back to somewhere legitimate in the software, instead it will jump to wherever you've defined in the overwritten stack – ie: your malicious payload.
Voila, deliver this over a network, and you've gained arbitrary code execution in someone else's system. Their box is now your box.
Then operating systems and processors began implementing mechanisms to prevent this. The stack is stored in memory marked in the page tables as data, not executable code. It is therefore easy to trap these sorts of attack before any damage can be done: if the processor starts trying to execute code stored in the non-executable, data-only stack, an exception will be raised. That's the NX – no-execute – bit in the page tables; Intel, AMD, ARM etc have slightly different official names for the bit.
Now, here comes the fun part: return-orientated programming (ROP). Essentially, you still overwrite the stack and populate it with values of your choosing, but you do so to build up a sequence of addresses all pointing to blocks of useful instructions within the running program, effectively stitching together scraps of the software to form your own malicious program. As far as the processor is concerned, it's still executing code as per normal and no exception is raised. It's just dancing to your tune rather than the software's developer.
Think of it as this: rather than read a book the way the author intended – sentence by sentence, page by page – you decide to skip to the third sentence on page 43, then the eight sentence on page 3, then the twelfth sentence on page 122, and so on, effectively writing your own novel from someone else's work.
That's how ROP works: you fill the stack with locations of gadgets – useful code in the program; each gadget must each end with a RET instruction or similar. When the processor jumps to a gadget, executes its instructions, and then hits RET, it pulls the next return address off the stack and jumps to it – jumps to another gadget, that is, because you control the chain now.
Here's an example of two useful gadgets:
pop %ebx pop %eax ret
mov %eax, (%ebx) ret
The first gadget pulls two values off the stack and stores them in the registers ebx and eax. Don't forget, you control the stack's contents so you can make sure these instructions obtain the values you want. Next, the second gadget writes the contents of eax into the memory address pointed to by ebx. Chaining these together allows you to edit the contents of any memory address the current running thread has permission to alter – this is called arbitrary write, and it's extremely powerful for bending applications and servers to your will from within.
Eventually, you'll sew together enough little blocks of code and parameters to ask the operating system to mark a non-executable area of memory as executable, make sure it's filled with your malicious payload, and jump to it. Because you've labelled it as executable, the processor will run it just fine, and bingo: you've managed to get arbitrary execution.
In fact, ROP and JOP are pretty much what exploit and malware writers use to gain control of victims' computers.
What CET does here is ensure that, when returning from a subroutine, the stack hasn't been tampered with to hijack the flow of the software. No ROP, no working exploit, no malware infection.
The shadow stack can't be modified by normal program code. Of course, if you can somehow trick the kernel into unlocking the shadow stack, meddle with it so that it matches your ROP chain, and then reenable protection, you can sidestep CET. And if you can do that, I hope you're working for the Good Guys.
"The Control-flow Enforcement Technology specification published by Intel sets a direction of intent to leverage the fixed hardware architectures of the Central Processing Unit to establish controls to help prevent and interfere with code-reuse attacks," said Matthew Rosenquist, an Intel cybersecurity bod.
"Through the use of a shadow stack, pointers, and other mechanisms, CET puts structures in place designed to protect against misuse of legitimate code."
Right now, CET is in preview, and still has some work to be done on it. The specification – produced with the help of Microsoft – has been published for people to weigh in with their technical feedback. There may be some corner cases in the CET blueprints that can be exploited to evade the protection mechanism; it is a fairly complex system because it has to involve so many chip features, from privilege levels to interrupts and hypervisor entry and exit. It'll be some time yet before the tech appears in shipping silicon.
The idea of using a shadow stack [PDF] has been floated by computer scientists for some years now.
Until this technology hits the mainstream, the main defense against ROP exploits is ASLR – address space layout randomization. This operating-system-level feature randomly places components of a program, such as its libraries and executable, within the application's virtual space. Because the locations of the code changes each time the software starts up, the positions of the gadgets changes, and thus your beautifully crafted ROP stack no longer works: it's pointing in the wrong places, causing the app or server to crash.
ASLR isn't foolproof, though; it is sometimes possible, by exploiting information-leak bugs, to obtain the base addresses of the program's components, allowing the gadgets' positions to be calculated, pinpointed and jumped to. ®