Our man pops the hood on Intel's v4 engine: Broadwell Xeons
Taking new chips for a test drive
Sysadmin Blog Recently, I reviewed Supermicro's Microblade system. One of the goals of this review was to compare the new Intel v4 (Broadwell) Xeons to their predecessor v3 (Haswell) Xeons. This was not as easy as it should have been.
My front line tool for benchmarking CPUs is Prime 95. Supermicro provided me with 2x Intel Xeon E5 2695 v4 CPUs in the Broadwell blade. This means the Broadwell blade has 72 logical cores. Prime95 cannot bench this because it can only address 64 logical cores.
Intel broke Prime 95 through sheer core count. Achievement unlocked. I suppose turnabout is fair play given that Prime 95 can break Intel's Skylake CPUs.
The other challenge was that the Haswell Xeons provided weren't directly comparable to the Broadwell Xeons. They weren't the same core count or frequency, though they were the same TDP. I tried to overheat the chassis so that I could play silly buggers with thermal failure, but with only 2 blades in a chassis designed for 28 possessed of possibly the most aggressive fans in existence I did not succeed.
Fortunately, I have a lab with lots of different CPUs and the ability to do lovely things like over and underclock them. After a great deal of benchmarking I essentially verified the incomparable coverage of the Broadwell Xeons done by Timothy Prickett Morgan at The Register's sister site The Next Platform.
When not using AVX2, Broadwell seems to provide a 4 per cent to 5 per cent increase in performance clock for clock over Haswell. This seems slightly higher with AVX2, with Broadwell showing an 8 per cent increase over Haswell. If this doesn't seem like much, that's perfectly normal. Broadwell is a die shrink of Haswell and not much really changed in terms of microarchitecture.
One set of benchmarks did stand out from the rest: crypto. Some performance tests were showing 80 per cent improvement with many showing 20 to 25 per cent. It turns out that one of the architecture changes Intel made with the Broadwell line was to make the PCLMULQDQ (aka carry-less multiplication) AVX instruction suck less. The result is faster crypto.
Value for dollar
If Broadwell versus Haswell seems a little mediocre, let's compare Broadwell to Intel's v1 (Sandy Bridge) Xeons. A lot of organisations are looking at upgrading v1 Xeons to v4 Xeons and the jump in speed is actually worth it. Clock for clock, the Broadwells seem to be a little under 20 per cent faster for non-AVX workloads. AVX workloads were considerably faster.
What I hadn't known before was that the introduction of AVX2 with the Haswell Xeons doubled the speed of the AVX instructions. Sandy Bridge and v2 (Ivy Bridge) Xeons are capable of 8 double precision Floating Point Operations (FLOPs) per core per cycle. Haswell and Broadwell Xeons can do 16 double precision FLOPs/core/cycle. There were a huge number of other improvements that came along with AVX2 as well.
The net result is that many AVX workloads will easily more than double performance, clock for clock on Broadwell than Sandy Bridge. Now, that is restricted to only a few workloads that are make use of all the enhancements, but some workloads – CRC crypto, for example – show a quadrupling of clock for clock performance.
Considering that Intel has more or less kept the prices steady across the lines, the value for dollar has risen significantly with each successive generation. A lot of that value is in getting more cores for your dollar – great for virtualization – but single threaded performance isn't being neglected.
If you're running Haswell, there isn't a huge incentive to upgrade to Broadwell unless you do rather a lot of crypto. If, however, you're running Sandy Bridge or Ivy Bridge Xeons, Broadwell is probably worth your time. ®