Broadcom's latest Trident switch silicon packs neural net processor to terminate congestion
Chip promises better telemetry, security, and traffic engineering, vendor claims
Broadcom's latest switch silicon boasts a built in neural networking engine it says can be trained to combat network congestion on the fly, at line speeds, all without compromising latency or throughput.
The compute capability, which Broadcom calls NetGNT — short for networking general purpose neural network traffic analyzer — is baked into the company's all new Trident 5-X12 chip.
As Broadcom explains it, your typical switch ASIC can only look at one packet per path at a time as it traverses the chips ports and buffers. With the introduction of NetGNT, it says, the chip can be trained to identify various traffic patterns spanning the entire chip.
The semiconductor giant suggests the chip is particularly good at identifying traffic patterns associated with AI workloads, such as incast, where packets converge on a single port and buffer introducing congestion and prevent it before it causes problems. The chipmaker notes it isn't limited to traffic management, and can be used to improve telemetry and network security.
Critically, the chipmaker claims this is done entirely in hardware at full line rate so there should be no impact on throughput or latency.
As far as the silicon is concerned, Trident 5-X12 is a relatively speedy piece of kit, at least for a Top of Rack (ToR) switch with a high-degree of programmability. Digging into the speeds and feeds, the chip is capable of pushing around 16 Tbps of bandwidth.
That bandwidth is broken out by standard 100G PAM4 Serializer/Deserializers, allowing for a wide array of port configurations. However, as a ToR switch, Broadcom expects to see the 5-X12 deployed as a 1U pizza box with 48 200 Gbps QSFP-DD ports and eight 800 Gbps ports for aggregation back to a spine switch.
This configuration strikes us as one that's tailored to AI compute clusters. In these environments, accelerators are so bandwidth hungry that networking can quickly become a bottleneck. Due to this, it's not uncommon to see each GPU paired with a dedicated NIC at anywhere from 200-400 Gbps. By our estimate, the configuration described by Broadcom could deliver 200 Gbps to about six GPU nodes with eight accelerators and NICs apiece.
In addition to switch's higher throughput, which is twice that of its older Trident 4-X9 parts, Broadcom is also talking up the chip's power efficiency, which it says is roughly 25 percent lower per 400 Gbps port this generation.
The new chip is currently shipping to "qualified customers," but it'll likely be a while before we see the first switches powered by the silicon make it to market.
The AI networking arms race
While networking switches and ASICs may not be as sexy as accelerators, like Nvidia's H100, AMD's MI300A/X, or Intel's upcoming Habana Gaudi 3, it's nonetheless a vital piece of the AI puzzle.
Networking vendors like Cisco, Broadcom, Nvidia have been quick to push AI optimized networking gear including, switches, DPUs, and now superNICs to help combat congestion and latency, which can extend training times if left unchecked.
- AWS unveils core-packed Graviton4 and beefier Trainium accelerators for AI
- Cisco whips up modded switch to secure Ukraine grid against Russian cyberattacks
- Nvidia's accelerated cadence spells trouble for AMD and Intel's AI aspirations
- Nvidia intros the 'SuperNIC' – it's like a SmartNIC, DPU or IPU, but more super
Broadcom's Trident 4-X9 isn't its first chip designed to capitalize on AI demand. Earlier this year we looked at the company's Jericho3 AI silicon. That chip is designed to mesh together massive arrays of GPUs. According to Broadcom the heavy emphasis on fabric connectivity allows the chip to scale to support up to 32,000 accelerators over an Ethernet fabric.
Cisco has made similar claims about its G200 switch ASIC, which is designed to support bandwidth-hungry web-scale networks or as is increasingly the trend, large AI/ML compute clusters. The chip can also scale to support clusters up to 32,000 GPUs, but it actually has more in common with the company's 51.2 Tbps Tomahawk 5 or Nvidia's Spectrum-4 switches, which we looked at last year.
Speaking of Nvidia's Ethernet switches, the company has been pushing its Spectrum-X platform to companies that don't want to mess with its proprietary Infiniband networking kit.
The platform combines its 51.2 Tbps Spectrum-4 switches with its BlueField-3 SuperNICs — a name it has only recently adopted — to actively mitigate congestion at both the switch and node level. Combined the company says it's able to achieve levels of latency and congestion more inline with its proprietary Infiniband interconnect using industry standard Ethernet. Whether this is actually a new idea, however, is up for debate. ®