The role of the CPU in sustainable AI/ML
Why datacentres need to optimise workload running costs and power consumption, and GPUs are not always the answer
Advertorial As AI extends its reach across business computing environments, its impact is causing some unanticipated knock-on effects. IDC's latest FutureScape report, for instance, predicts that as companies race to introduce AI-enhanced products/services and assist their customers with AI implementations, the technology will become a key motivator for innovation.
Another AI-driven change pivots on the extent to which datacentres may have to balance CPUs with discrete AI Accelerators, such as GPUs or specialised architectures in order to provide the high-performance compute capabilities that AI developers want.
It's a debate that raises high-stakes issues for datacentre owners, both in terms of additional CAPEX investment and the probability that (while methods of measurement are imprecise) typical GPU-driven AI operations consume more power than conventional IT workloads.
Dealing with AI's higher power/carbon overhead is an additional pain-point for datacentre operations, which must also ensure that upgraded compute architectures optimised for AI can manage the increased power demands without risk of overloading existing tech or facilities.
So as extended regulation in sustainability governance and carbon management pushes operations to reduce energy usage across the gamut of IT hardware and software, AI represents both opportunity and obstacle.
Mitigating AI power consumption
Taken together, the increased power consumption, and the necessary architectural reconfigurations required to accommodate AI and Machine Learning workloads pose an inexorable challenge for datacentres, explains Stephan Gillich, Director of Artificial Intelligence GTM in in Intel's AI Center of Excellence.
"It's fairly clear across vertical sectors and industries, wherever AI/Machine Learning applications and services are being developed, trained and run, that on-prem and cloud-hosted IT facilities' capabilities will have to undergo upgrades to deal with increased volumes of data-intensive workloads," Gillich says. "It is also clear that those upgrades will have to entail more than just ramping-up compute capability."
Much can be done to enhance the sustainability of AI-focused datacentres, Gillich believes, beginning with re-evaluating some of the assumptions around the AI/Machine Learning landscape. Processing units are a good place to start, particularly when deciding whether CPUs or GPUs are better suited to the task.
Because while AI specific compute-intensive workloads seem to be on the rise (no-one's quite sure at what pace) the bulk of datacentre work (the non-AI workloads) must continue to chug away day in/day out – delivering steady application and service revenue streams not to be disturbed.
Most of these are currently handled by CPUs and refitting a standard datacentre with more costly GPUs would for, very many facilities, be surplus to requirements. In general terms, a GPU consumes more wattage than a CPU to perform a similar task. Depending on the power supply to a given rack configuration, integrating GPUs into datacentre infrastructure requires upgrades to power distribution systems for example, which are bound to incur extra upfront costs, in addition to higher energy bills once they're running.
What's more, Intel's CPU development continues to innovate. In multiple use-cases a CPU can be proved to achieve as good – and sometimes better – overall performance as a GPU, Gillich argues. And their performance can be augmented with breakthrough tech like the Intel® AMX (Advanced Matrix Extensions) an accelerator built into 4th generation Intel Xeon CPUs.
"Intel Xeon processors can enable a datacentre to scale its AI adoption through built-in AI acceleration that boosts CPU performance for Machine Learning, training, and inference" Gillich points out. "This way, they can adopt discrete accelerators to minimise CAPEX and maximise performance while leveraging existing Intel Xeon processing environments."
Need to mix AI and non-AI workloads
Intel AMX is a dedicated hardware block on the Intel Xeon Scalable processor core that enables AI workloads to run on the CPU instead of offloading them to a discrete accelerator, providing a significant performance boost. It's suited to AI workloads like Machine Learning recommender systems, image recognition and natural language processing, that rely on matrix mathematics.
Another argument in favour of augmented CPUs is that they provide a cost-effective route for datacentre operators to make more of existing CPU commitments, futureproof their assets so that they're able to take on mixed workloads, and place them in a position to better control overall power usage.
This, in turn, may help providers of datacentre services (and their customers) meet sustainability targets, and provides a selling point for software developers (enterprise or third-party) who are looking for an optimised platform to showcase the energy efficiency of their coding outputs.
"The reality is that, rather than rushing at the opportunities AI workloads may promise, datacentre operators are realising that they should consider a range of imperatives that are informed as much by commercial concerns as technological choices," Gillich says.
These imperatives could include: the integration of AI workloads with non-AI workloads; the integration of different hardware and software stacks; and because they want to ensure that they have an architecture that's suitable for multiple different workloads, the integration of different workstream types.
"These questions point to complex challenges, because getting them right has a bearing on optimal technological and energy efficiency – with energy efficiency now a core performance benchmark that will increasingly affect a datacentre's commercial viability," Gillich says. "So again, it's of utmost importance."
From Gillich's perspective, the key to adapting to this emergent reality is a step-process of what can be termed 'AI assimilation'. Point one here is that AI workloads are not segregated from other workload types – they will be integrated into conventional workloads, rather than run separately.
Gillich gives videoconferencing as an example of this phased integration: "Already while streaming standard audio/video traffic across standard applications, AI is integrated to perform concomitant tasks like summarisation, translation, transcription. Such features are supported very well by AI.
End-to-end energy savings
Achieving energy efficiencies must be a truly end-to-end strategic undertaking, Gillich argues. "It spans the software side as well as the hardware architectures – the complete mechanism enabling a given workflow process. Where is data stored to make access most efficient – compute wise and therefore energy-wise – is that the best place for energy efficiency?"
The other factor to bring into this evaluation is to determine where the workload is running. For instance, is it running on clients (such as AI PC equipped with Intel Core Ultra processors, rather than servers in the datacentre? Can some of these AI workloads actually be run on clients (alongside servers)?
Every option is worthy of consideration if it's going to help bring the AI-compute/power consumption balance into better alignment, Gillich argues: "It's almost like a return to the old-school notion of distributed computing."
Gillich adds: "Sometimes our customers ask, 'Where will AI play?' – the answer is that AI will play everywhere. So at Intel our ambition is focused on what could be termed the universal accommodation of AI, because we believe it will enter into all application fields."
At Intel this encompasses middleware such as APIs, which as with any other part of the software stack, must be as efficient as possible. 'API sprawl' can result into unnecessary processing, minimising their infrastructure footprint, and lack of monitoring and control.
"With Intel oneAPI, enterprises can realise their full hardware value, develop high-performance cross-architecture code, and make their applications ready for future needs," explains Gillich.
"Intel oneAPI is an open, cross-industry, standards-based, unified, multiarchitecture, multi-vendor programming model that delivers a common developer experience across accelerator architectures – for faster application performance, and improved productivity. The oneAPI initiative encourages collaboration on the oneAPI specification and compatible oneAPI implementations across the ecosystem."
Gillich adds: "oneAPI provides a middleware stack which takes standard things like AI Frameworks – like Pytorch or TensorFlow [the open-source software platform for AI and Machine Learning] – and translates them at a machine level, and oneAPI enables an efficient way to do that. Users can use a common API at Ai framework level, and we have an API (oneAPI) that addresses the different hardware flavours." So a common API means users can create open software that can be supported on an open software stack.
GPU-level performance at CPU-level price-points
Progress in IT is driven largely by an expectation of continuous technological advancement allied to insight-driven improvements in deployment strategies. It's a model based on finding the best achievable balance between budget expenditure and business ROI, and the expectation that there's always further innovation to strive for. AI represents the apogee of this ideal – it's smart enough to reinvent its own value proposition through perpetual self-improvement.
By building the AMX accelerator into its 4th generation Intel Xeon CPUs, Intel shows how GPU-level performance can be achieved at CPU-level price-points. This allows datacentres to scale while maximising the return value of their existing Intel Xeon-powered processing estates, but also provides a pricing model that lowers the cost of entry for customers with AI workloads but limited budgets.
And CPUs' lower power consumption means that energy efficiency can be achieved holistically throughout a datacentre facility's entire operations – such as cooling and ventilation – and that's another winning pull for sustainability-conscientious software architects and developers of AL solutions.
Contributed by Intel.