This article is more than 1 year old
Nvidia DGX systems prone to side channel, covert attacks
Reverse engineering yields sticky microarchitectural vulnerabilities
Nvidia's ultra-dense GPU-driven AI training and inference systems are prone to covert and side channel attacks, according to research just published from a team led by Pacific Northwest National Laboratory (PNNL). This might be less concerning for those with on-prem DGX systems, but for cloud vendors selling time on the AI training boxes, the vulnerabilities are worth noting.
Let's start with the good news: the problems are most pressing for pre-Ampere GPU generation DGX machines and luckily, the major cloud operators have made the DGX switch to Nvidia Ampere-generation DGX machines. The bad news? Owners of Pascal and Volta based DGX boxes, read on.
Unlike more brute-force ways of compromising a system, the vulnerabilities cited by the PNNL-led group focus on microarchitectural gaps [PDF]. These can affect both on-prem and remotely hosted systems. The team executed a proof of concept attack to demonstrate the issues.
Specifically, the team reverse-engineered the cache hierarchy, showing how an attack on a single GPU can hit the L2 cache of a connected GPU (the accelerators are hooked together with Nvidia's proprietary NVLink) and cause a contention issue on a connected GPU.
They also developed a "prime and probe attack on a remote GPU allowing an attacker to recover the cache hit and miss behavior of another workload."
In reverse engineering the caches and poking around the shared Non-Uniform Memory Access (NUMA) configuration the team found "the L2 cache on each GPU caches the data for any memory pages mapped to that GPU's physical memory (even from a remote GPU)."
They add that "this observation enables us to create contention on remote caches by allocating memory on the target GPU, which is the essential ingredient enabling our covert and side channels. Specifically, we develop the first microarchitectural covert and side-channel attacks across GPUs in a multi-GPU servers (an Nvidia DGX-1 server)."
- Intel debuts Arc discrete GPUs for laptops
- Re-volting: AMD Secure Encrypted Virtualization undone by electrical attack
- Kernel-memory-leaking Intel processor design flaw forces Linux, Windows redesign
- Chinese chip designers hope to topple Arm's Cortex-A76 with XiangShan RISC-V design
- Nvidia CEO: We're open to Intel making our chips
Aside from the obvious, especially in the cloud case, these vulnerabilities are noteworthy because they are likely going to be difficult to pinpoint.
As the team notes, instead of attacking a single GPU on a node (instead focusing on a tightly interconnected system) attackers don't need to manipulate the scheduler for one GPU to hook into the victim's kernel.
They also "bypass isolation-based defenses, such as partition-based defense mechanisms that can be enabled for processes running within a single GPU," the team explains, adding:
The attacks we develop are first Prime+Probe based timing attacks on L2 cache on GPUs. Our attacks extract contention information at the granularity of a single cache set, providing highresolution attacks with fine-grained access time measurements, reducing the noise, and achieving high quality channels. The attacks are conducted entirely from the user level without any special access (e.g. huge pages or flush instruction). As a result, we believe this attack model challenges assumptions from prior GPU based attacks and significantly expands our understanding of the threat model in Multi-GPU servers.
With all of this said, there are mitigations, including static or dynamic partitioning of shared resources. This is easier with the newest Nvidia A100 GPU-based DGX machines, which have this built in. In essence, each individual GPU can be sealed off into discrete GPU instances in multi-user environments, which means direct and isolated paths through the cache and memory.
There are partitioning mechanisms the team proposes but these do have some performance overhead. "Although inherent GPU-to-GPU communications cannot be completely eliminated in multiGPU systems, making these cross-GPU data transfers more coarse-grained in normal applications will significantly increase the detection accuracy of high-bandwidth attacks, leading to more efficient defenses."
"Our work establishes for the first time the vulnerability of these machines to microarchitectural attacks, and we hope that it guides future research to improve their security," the team adds.
We have asked Nvidia to comment and will update with any responses. ®