AI engines, Arm brains, DSP brawn... Versal is Xilinx's Kitchen Sink Edition FPGA

Good news: It's 7nm. Sad news: It's shipping 2H 2019


XDF Xilinx has packed everything but the kitchen sink into its new Versal family of FPGAs (field programmable gate arrays).

These are chips that have electronic circuitry you can change on-the-fly as needed, so you can morph their internal logic to suit whatever needs doing. You usually describe how you want your chip to work using a design language like SystemVerilog, which is converted to a block of data fed into the gate array to configure the internal logic.

Typically, FPGAs are used to prototype custom chips before they are mass manufactured, or as glue between other chips by controlling their accesses to memory and peripherals. These days, engineers are eying up using FPGAs as specialist accelerators, performing work such as network packet inspection and machine-learning math, and taking the strain off the host CPU.

Well, Xilinx hopes to lure those engineers with its Versal family, which it launched this week at its developer forum in San Jose, USA. The FPGA designer previously teased the technology in March. The chips will be fabricated by TSMC using its 7nm process node. It's hoped the gate arrays are faster than general-purpose GPU and DSP accelerators, and more flexible and cheaper than manufacturing custom high-speed silicon.

Block diagram of Xilinx Versal family

Block diagram of the Versal family ... Click to enlarge

The Versal clan combines a cluster of dual-core Arm Cortex-A72 CPUs, used for running application code close to the offload circuitry, and dual-core Arm Cortex-R5 CPUs, for real-time code, with a big bunch of AI and DSP (digital signal processing) engines, plus the usual programmable logic, and a load of interfaces from 100GE to PCIe CCIX. Both the AI Core and Prime series have a platform controller included for performing secure boot, monitoring, and debug.

Any extra processing you want to do on top of the bundled math and signal coprocessor engines, you can carry out in the reprogramming logic array.

The Versal brand right now comes in two flavors: Versal AI Core and Versal Prime. The former, as you'd expect from the name, focuses on accelerating machine-learning math operations in hardware – think self-driving cars, and data-center neural-network workloads. The latter a more typical super-FPGA with an emphasis on signal processing – think wireless or 5G. Previous Xilinx top-end gate arrays used Cortex-A53 and Cortex-R5s, for what it's worth.

In the above block diagram, the adaptable engines are the fancy names for the reprogrammable logic arrays and on-die memory that can be arranged in hierarchies to reduce latency and increase memory bandwidth to particular engines. The intelligent engines are very long instruction word (VLIW) and single instruction, multiple data (SIMD) processing units that crunch through data.

We're told that the aforementioned flavors will be eventually joined by: Versal AI Edge, for doing machine-learning stuff at the edge of the network down to 5W of power; Versal AI RF, for radio communications; Versal Premium, for serious high-performance applications; and Versal HBM, geared toward products that need high-bandwidth memory.

There will be software libraries and frameworks to program the engines, and hardware designers can still use the familiar Vivado tools to configure the FPGAs. It's hoped people will follow in Amazon Annapurna's footsteps, and produce smart network interfaces using the Versal fmaily. These custom NICs can take on hypervisor networking functions, encryption, and such workloads, on the silicon, freeing up the host CPU and hardware.

Some quick specs, according to Xilinx: the Versal Prime series can have up to 3,080 intelligent engines, 984,576 logic lookup tables, 2.154m system logic cells, topping out at 31 trillion 8-bit integer operations per second (via adaptable logic) or 5 TFLOPs using 32-bit floating-point in the DSP engines (21.3 TFLOPS for INT8).

The Versal AI Core series can have up to 400 AI engines, 1,968 intelligent engines, 899,840 logic lookup tables, 1.968m system logic cells, topping out at 133 trillion 8-bit integer operations per second (via AI engines) or 3.2 TFLOPs using 32-bit floating-point in the DSP engines (13.6 TFLOPS for INT8).

Arm DesignStart Xilinx card

Free for every Reg reader – and everyone else, too: Arm Cortex-M CPUs for Xilinx FPGAs

READ MORE

You can check out Timothy Prickett Morgan's analysis, here, of Versal over on our sister site, The Next Platform, along with Nicole Hemsoth's feature on FPGA performance.

Meanwhile, Xilinx has a gentle technical paper on its Versal family here, and specifications of its AI Core series, here, and Prime series, here.

The chips will be generally available in the second half of 2019, we're told, although if you ask nicely, and mean a lot to Xilinx, you can get into its early access program.

Finally, Xilinx announced Alveo, a pair of deep neural-network accelerator cards that use UltraScale+ FPGAs to perform stuff like AI math in hardware, offloading the work from a host processor. Each dual-slot, full-height card has 64GB of DDR4 RAM, and sports two QSFP28 and x16 PCIe 3.0 interfaces, and draws up to 225W.

The Alveo U250 has 1,341K logic lookup tables, 2,749K registers, and 11,508 DSP slices, while the U200 has 892K lookup tables, 1,831K registers, and 5,867 DSP slices. The U250 can perform up to 33.3 trillion operations per second, and the U200 does 18.6, when using the machine-learning inference-friendly 8-bit integer math.

Xilinx claims the U250 and U200 are particularly good for real-time inference in data center servers processing information in the backend, and smoke GPU-based accelerators in terms of performance and latency, and completely blow away host general-purpose CPUs. The hardware is available now, starting from $8,995 apiece. A technical overview is here.

AMD also joined forces with Xilinx to produce a box of eight Alveo U250 cards and two Epyc server processor to form a high-speed neural-network-wrangling that processed 30,000 pictures a second using image-classification AI software GoogLeNet. This is, apparently, a world record. ®

Similar topics

Broader topics


Other stories you might like

  • AMD to offer CPUs with Xilinx AI engine in 2023
    And unlike Intel, Zen giant saw boom in PC land, thanks to focus on high-end parts

    AMD plans to introduce processors next year that integrate an AI engine from the company's recently acquired Xilinx FPGA business unit, which helped the chip designer deliver high sales growth in the first quarter along with the company's traditional PC and server businesses.

    CEO Lisa Su disclosed the plans for new AI-fueled CPUs during her company's first-quarter earnings call Tuesday, where she said the resulting microprocessors will "enable industry-leading inference capabilities" as part of broader plans to capitalize on AMD's $49 billion Xilinx acquisition.

    The AI engines are already being used in Xilinx's FPGA-based products for embedded and edge applications, including image recognition for cars, according to Victor Peng, Xilinx's former CEO who now leads AMD's Adaptive and Embedded Computing Group.

    Continue reading
  • AMD, Xilinx complete world's biggest semiconductor merger thanks to stock boom
    The effect of Ryzen share prices

    AMD has officially taken over FPGA maker Xilinx in what is, thanks to rising share prices, the biggest acquisition in the history of the chip industry.

    The x86 processor giant completed the all-stock $49bn takeover of Xilinx on Monday. In 2020, AMD announced it had agreed to acquire the company in a deal back then worth $35bn, pending approval. The last hurdle was getting the OK from the Chinese government, which wrapped up earlier this month.

    “It was an all-stock deal and AMD and Xilinx values rose,” Patrick Moorhead, principal analyst at Moor Insights & Strategy, told The Register. AMD confirmed the final value of the acquisition.

    Continue reading
  • AMD confirms Xilinx merger approved by regulators
    While Nvidia loses an Arm, this acquisition has legs

    AMD on Tuesday said it has passed all the regulatory hurdles to complete its $35bn acquisition of Xilinx, which will close on Monday.

    The acquisition of Xilinx will bulk up AMD's product offerings with FPGAs (that's field programmable gate arrays), which are reprogrammable chips used for all sorts of applications, from accelerating machine-learning software to prototyping chips and providing glue logic to handling network traffic at the edge.

    The acquisition was announced in October 2020, but the closing was delayed as AMD waited for China to approve the deal, which happened late last month.

    Continue reading

Biting the hand that feeds IT © 1998–2022