Fujitsu says it can optimize CPU and GPU use to minimize execution time

Demos its Adaptive GPU Allocator as global shortage of geepies grinds on

SC23 Fujitsu will demonstrate tech aimed at optimizing the use of GPUs and the switching of batch jobs in a HPC cluster at this week's SC23 high performance computing (HPC) conference in Colorado.

The Japanese tech multinational said these technologies, known as Adaptive GPU Allocator and Interactive HPC, will be used in future in some of its own HPC products, which the company is largely shifting to the cloud under an as-a-service business model.

According to Fujitsu, the Adaptive GPU Allocator is capable of distinguishing between programs that require a GPU accelerator and those that can be processed by a CPU, and can allocate resources accordingly.

It claims to have developed this to help address the global shortage of GPUs, driven by huge demand for training generative AI models, by providing customers with a way to optimize the use of their compute resources.

This relies on predicting the acceleration that a specific program would experience, and switching GPUs between programs depending on whether this would minimize the overall processing time.

Precise details have not been disclosed, but the technology makes use of a GPU allocation server that can measure the performance of code as it is executed. If a program requests use of a GPU, it will be allocated one while its performance is measured, and if it is found that the overall processing time would be reduced, it gets to keep it, otherwise it will be re-allocated to run on the CPU.

This measurement examines the processing times for the same number of steps (iterations) of a mini batch on both the GPU and CPU at the start of training, from which the performance gain that should be seen by using the GPU is calculated.

Fujitsu told us that user programs must be developed using its framework, which has a function to communicate with the GPU allocation server and to switch between GPU and CPU. The framework makes use of TensorFlow and PyTorch.

The allocation server has been designed so that workloads can be switched every several mini-batch iterations, which have time scales longer than that of a typical OS scheduler, so the system does not currently integrate with other schedulers.

The company said it plans to use the Adaptive GPU Allocator in a future update to its AI platform codenamed Fujitsu Kozuchi, which is being developed to allow users to test advanced AI technologies. This is planned for sometime after the second half of the company's fiscal year 2024, ending March 31 2025, Fujitsu told us.

Batch-switch crazy

The Interactive HPC technology is claimed by Fujitsu to enable real-time switching of execution of multiple programs on an HPC system.

The conventional control method uses unicast communication, whereby a controller sends out the instruction to switch program execution to each node in a HPC cluster in turn, we're told.

Again, precise details have not been disclosed, but Fujitsu said that by adopting a broadcast communication method, this instruction can be sent simultaneously to each node, reducing the time taken to complete the processing switch from a few seconds to 100 milliseconds in a 256 node HPC environment.

Fujitsu conceded that the appropriate communication method will depend on application requirements and network quality. Picking the optimal method will involve consideration of the degree of performance improvement due to broadcast communication and performance degradation due to packet loss, it said.

To determine this, Interactive HPC monitors things like the number of job switches, whether there is a difference in the number of job switches between nodes due to packet drops, and whether to use broadcast communication by checking app performance.

The Interactive HPC technology will allow applications requiring real-time performance, such as generative AI and materials and drug discovery, to be executed more rapidly using HPC systems.

This will be applied to the operation of Fujitsu's quantum simulator, which at launch was capable of emulating 36-qubit quantum circuits but has since been upgraded to emulate 40 qubits.

The update is planned for the first half of its fiscal year 2024.

The original required the compute power of a 64-node cluster of PRIMEHPC FX 700 servers to operate, each based on the same 48-core A64FX Arm chip that features in the company's Fugaku supercomputer system. ®

More about


Send us news

Other stories you might like