This article is more than 1 year old
How Arm popped CHERI architecture into Morello Program hardware
Chip giant aims to adapt existing processor architectures to close off vulnerabilities in memory access
Hot Chips Arm used the Hot Chips conference to talk about its experimental Morello Program and how it implements the CHERI architecture, designed to address some of the memory access vulnerabilities underpinning attacks on computer systems.
CHERI stands for Capability Hardware Enhanced RISC Instructions, a research project from the University of Cambridge in the UK and US-based SRI International, while Morello is Arm's adaptation of CHERI into a prototype processor based on the Armv8.2-A architecture.
Arm shipped Morello evaluation system boards for testing purposes in January, and there is even a version of the FreeBSD operating system called CheriBSD that runs on the hardware.
The aim of CHERI and the Arm Morello implementation is to adapt existing processor architectures to improve system security, at least as far as memory accesses go.
According to Arm SVP, Chief Architect & Fellow Richard Grisenthwaite, memory safety issues such as buffer overflows or use-after-free errors are the cause of many reported vulnerabilities in a way that seems to be remarkably consistent across different computing ecosystems. While this has previously been seen as a software problem, CHERI aims to fix at least some of the issues in hardware.
"It is increasingly clear that people need to be able to trust technology, and the cost of cybercrime is extraordinarily high," Grisenthwaite said.
Morello does this by introducing the concept of capabilities, which operate like pointers and other data access registers, but alongside the address they also store permissions and bounds information that the hardware can use to check that operations being attempted by code are allowed within that memory space.
Like many ideas in computing, the concept of capabilities is not entirely new, and was previously implemented in some mainframe systems.
"These are held together as a single 128-bit unit, and a further metadata tag is added to the register file and to the memory system to distinguish a capability from data," explained Grisenthwaite. This metadata tag is important as it "allows us to make capabilities unforge-able," he added.
What this means is that the metadata tag can only be set by the kernel or some other privileged process, and any attempt by an attacker to manipulate a capability as if it were data and change its attributes causes the capability to lose its status so it is no longer valid.
"The main changes are that we have a full set of loads and stores that take their base address from the capability register and check the generated addresses, which might typically have an integer offset added to it against the bounds of the capability," Grisenthwaite explained.
Permissions for the capability are checked alongside the normal memory management checks, and any violation of the capability checks will result in a memory abort in the same way as memory management faults from the translation lookaside buffer (TLB).
According to Grisenthwaite, the Morello architecture still allows ordinary load and store instructions, which take the address from traditional general-purpose registers, but Arm has added something called a default data capability which applies bounds and permissions to such accesses, effectively creating a sandbox for legacy code that is not capability-aware.
Meanwhile, new data processing instructions are required in order to manipulate capabilities. These new data processing instructions have certain rules that allow the addresses to be adjusted within limits defined by the bounds of the capability.
"Importantly, it is generally not possible to increase the bounds that are held within a capability. The instructions allow you to decrease the bounds or permissions of a capability to allow the creation of sub objects from an original object. But you cannot increase the bounds of permissions without a capability that gives you that right," Grisenthwaite said.
One of the aims of the project is to change the program counter so that it becomes the program counter capability and thus has a set of bounds associated with it. Direct branches will be able to branch within the bounds, while indirect branches can change the capability bounds, giving the ability to call between different blocks of code.
"Once we have this functionality, then we will replace some or all of the addresses that are usually held in general purpose registers with capabilities and we have a foundation for improved security," Grisenthwaite said.
One way of using these new capabilities could be to replace pretty much every address pointer in program code with a capability through recompilation, which may need some tweaks to the code as well. This has the advantage of providing better memory safety, with the metadata that distinguishes capabilities from data allowing a quarantine and garbage collection like approach when freeing up memory in the C language, according to Grisenthwaite.
- Intel aims Flex datacenter GPUs at video, game streaming
- Intel shows how chiplets will form Meteor Lake CPUs
- Kernel-memory-leaking Intel processor design flaw forces Linux, Windows redesign
- Intel set to squeeze the flops out of Ponte Vecchio GPU
- Intel's Gelsinger talks up 'systems foundry' era of trillion-transistor chips
However, there are downsides to doing this, chiefly in the larger cache and memory footprint of the new capabilities. The Arm CPU cores in the Morello prototype system are customized and adapted to support a register file with a set of capabilities rather than the standard 64-bit general purpose registers. In practice, this means that all the registers are expanded to 129 bits, to include the metadata tag, as are the caches and the system buses.
For testing purposes, Arm made all 32 of the CPU registers able to hold either data or capabilities, although any future commercial implementation might choose to have fewer capability registers, Grisenthwaite said.
"Essentially, quite a lot of the micro architecture has expanded to have 129-bit capability, and that has quite a lot of impact on the overall data path design. In Morello, we chose not to simply double the width of all of the data paths to memory, partly to ensure the performance and area comparisons will be realistic. But in a production system, this would need to be looked at with more performance modelling," he explained.
Capabilities also offer the ability to construct much more fine-grained compartmentalization than can be achieved today, where application compartmentalization is typically constructed using multiple processes, Grisenthwaite said.
By delivering more fine-grained compartmentalization, software can be made more robust by limiting the damage that can be done by a single exploiting vulnerability, thereby increasing the overall security of the system.
Another plus is that the CHERI architecture can also reduce the overhead of switching between compartments by orders of magnitude compared with a traditional process switch, he claimed.
One intriguing aspect of the way CHERI is implemented is that the bounds and permissions information of a capability is stored in a compressed form, so that it adds just an extra 64 bits of state information. But this means that when the address checks are performed, the base and bounds values have to be uncompressed to allow them to be compared with the address calculation result.
According to Grisenthwaite, a lot of work has gone into the compression to come up with a scheme that can be quickly decompressed when necessary, so as not to impact any critical parts of the device.
"In the memory access path, the decompression of the base and bounds is done as a pair of expansions in parallel with the address generation arithmetic. And that means that the bounds check is timed in a very similar way to a normal TLB hit," he said.
As for the impact on software, Grisenthwaite said that the X11 KDE-based desktop environment in CheriBSD was ported to Morello in three months by a single engineer having to make changes to less than 0.03 percent of the 6 million lines of code, and that these changes delivered an assessed vulnerability and mitigation rate of some 73.8 percent.
Arm's confidence in Morello is backed up by a detailed study conducted by Microsoft Security Research Center (MSRC). This looked into all of the 2019 memory safety vulnerabilities that affected Microsoft products and required an update to fix, and Microsoft concluded that CHERI, when combined with other measures, would have mitigated at least two-thirds of those issues.
Only a relatively small number of Morello boards have been built as this is a pure prototyping system, not a commercial product. Arm has made these available to a variety of partners including Google, Microsoft and a number of universities and other companies under the guidance of the UK Research and Innovation (UKRI) organisation.
However, CHERI technologies are not patented and Arm is encouraging others in the computing industry to evaluate it, so Morello is serving as showcase of the technology for other architectures, Grisenthwaite said.
"We are very excited by the prospects of this technology, we want to see how Morello can be used to properly explore this very promising approach to the vital area of security." ®