Haswell Xeons bring brawn to microservers, media servers, more

Integrated graphics engine dual-purposed for media processing

Computex There are a lot of different ways that Intel could have deployed its 22-nanometer wafer-baking process to cook up the "Haswell" variants of the Xeon E3-1200 v3 processors, but the tactic they chose was to bring the low-power benefits inherent in the Haswell design to bear for entry servers, workstations, and the emerging media-processing system market.

Intel could have ramped up the core count and goosed the throughput of the chip, but as they announced on Tuesday at the Computex shindig in Taipei, Taiwan, they've kept the core count constant and memory addressing the same, focusing on those key markets in which the Xeon E3-1200 v3, as the server variant of the single-socket Haswell chip is called, will be deployed.

"We do not see a lot of requests for a big change in this segment," Dylan Larson, Xeon platform marketing director, told El Reg when asked about busting out of the four-core and two memory stick limit that the Xeon E3s have been at for the past three generations.

"We are watching it and we will respond to requests," Larson said, "but for now, it is rightsized for the power envelope and its targeted workstation, microserver, SMB, and media workloads."

At the moment, Xeon E3 chips, with their relatively brawny cores, are not getting huge uptake among the hyperscale data center operators such as Google, Facebook, and Yahoo. They are similarly not getting adopted by the HPC community, even though the supercomputer builders are notorious cheapskates, as are the hyperscale data center operators.

The issue is not cheap computing and coprocessing, which the Xeon E3-1200 v3 chips certainly offer. It's that their codes have been tuned to run on two-socket x86 processors, and despite the potential density and cost savings they might enjoy – and that AMD has, ironically, demonstrated quite well with its Xeon E3-based SeaMicro microservers – the grief of retuning those apps and building out new infrastructure is larger than the possible benefits.

On new workloads, however, there is a potentially different story unfolding, one that is aptly demonstrated by the vastly better computational power in the Iris integrated graphics processing units and how companies that are doing media transcoding are looking for a better solution than racks of two-socket Xeon E5 servers tricked out with Nvidia or AMD discrete GPU cards.

The Haswell Xeon E3-1200 v3 chip is tall and skinny

The Haswell Xeon E3-1200 v3 chip is tall
and skinny, like your Reg reporter

Media companies in Asia – in China in particular – are keen on building transcoding engines using the Haswell Xeon E3s, says Larson. It is far easier and takes less storage capacity to transcode to a zillion different formats and devices on the fly than to make all possible options and store them. In addition, for live feeds such as breaking news and sports events, you have to transcode on the fly anyway. So companies are looking for cheaper options.

"For media transcoding, we think we have some unique benefits with the Xeon E3-1200 v3 because of the way we built the GPU and done the transcoding," Larson says. "There is a benefit from having the graphics close by the CPU. The media players and software providers in the media industry don't just want to do some transcode that a digital signal processor might do. The want to do ad inserts in streams and disassemble things, and they can do it all close to the CPU and have stunning transcoding performance."

El Reg will do a more in-depth analysis of the performance of the Xeon E3-1200 v3 compared to its predecessor, the "Ivy Bridge" Xeon E3-1200 v2 that came out last March. But generally speaking, the performance bump on integer work for the top-bin Xeon E3 v3 without the integrated HD P4700 graphics is 9.8 per cent, the improvement for floating point and Java work is about 3.6 per cent, and for the low-voltage parts the performance-per-watt increase is 18.6 per cent.

But the integrated graphics in the server version of the four-core Haswell part is 38 per cent higher than on the Ivy Bridge part it replaces. The new 25-watt Haswell part has 52 per cent better performance per watt than the 45-watt Ivy Bridge part it replaces, and there is a a new 13-watt Haswell Xeon E3 that is going to see lots of action in microservers against x86 chips from AMD (specifically the "Kyoto" Opteron X chips announced last week) and the slew of ARM chips that are coming to market early next year.

The Xeon E3-1200 v3 is also expected to get some traction in online gaming and virtual-desktop infrastructure, where a mix of low-cost chippery and reasonably high graphics performance are required. Ditto for video conferencing and various cloud-based media services.

Single socket to me – and AMD

El Reg has already dove deeply into the guts of the Haswell architecture, which Chipzilla first talked about in detail at Intel Developer Forum last fall. You can review that and a follow-up we published on Monday for the nitty-gritty detail on how the chip has new low-power states that allow it to burn a lot less juice than the Ivy Bridge chips did.

Block diagram of the 'Haswell' Xeon E3-1200 v3 processor

Block diagram of the 'Haswell' Xeon E3-1200 v3 processor

Suffice it to say that the entry server and workstation Xeon E3 chips benefit from Intel's absolute necessity to use its process technology lead to their best advantage for a mix of performance and power efficiency to keep its x86 competitors and many impending ARM usurpers at bay.

The Xeon E3-1200 v3 processor has 1.4 billion transistors and has a die that is 177mm2 in area. The die is "hot", which means it doesn't slide into a socket like other Xeon chips, but rather is welded directly onto circuit boards like those used in PCs and tablets. We were not able to get the transistor count for the core, the graphics unit, and the uncore regions as we went to press, but we will update this article as that data becomes available.

Each Haswell core has 32KB of L1 instruction cache and 32KB of L1 data cache, as well as an L2 cache memory that weighs in at 256KB. The four cores share an 8MB L3 cache. The Haswell cores sport Advanced Vector Extensions 2 (AVX2) vector math units that have twice the floating point oomph of the Ivy Bridge AVX units. The chip has a fourteen-stage pipeline.

The Haswell Xeon E3 chip has a single memory controller that can drive two DDR3 memory channels, and the memory is tapped out at 16GB per stick. The chip has support for a single PCI-Express 3.0 x16 slot on the die as well, and hooks into the "Lynx Point" C220 chipset through the Direct Media Access (DMI) link and with the additional Flexible Display Interface (FDI) in those models of the chip that have an integrated GPU fired up. The Haswell Xeon E3 can drive up to three displays, and the C220 chipset can have up to six SATA ports running at 6Gb/sec, and up to six USB 3.0 ports hanging off of it.

That extra bandwidth on the SATA ports and the integrated PCI-Express 3.0 slot is going to be particularly useful for certain microserver workloads, and those USB ports are necessary for workstation users who can't seem to plug enough peripherals into their machines. (My own workstation could use about five more USB ports.)

The integrated graphics chip on the Haswell Xeon E3 is up to ten times faster when encoding video using the H.264 codec at 1080 pixels and 30 frames per second than the GPU in the Ivy Bridge Xeon E3 chip was. The GPU in the Haswell Xeon E3 can do full hardware encoding for the MVC format used by Blu-Ray 3D, as well as the MPEG-2 format. The GPU also supports JPEG and MJPEG hardware decoding.

By the way, the Haswell Xeon E3 chip does not use the high-end "Iris" GT3 GPU that some 4th Generation Core i7 chips are getting, but rather the GTX GPU that is branded the HD P4600 and P4700.

The Haswell Xeon E3 chips are aimed at a lot of different targets

The Haswell Xeon E3 chips are aimed at a lot of different targets

To help software developers make use of these encoding and decoding functions in the Haswell Xeon E3 chips, Intel has cooked up a media software development kit that provides a hardware abstraction layer for the GPU to unify coding for CPU and GPU units.

This SDK supports applications coded for both Windows and Linux, and Intel is promising that if you use this SDK your resulting ceepie-geepie applications will be "future proven" because they will be forward compatible with future Xeon processors with integrated graphics. Larson says that 130 software development companies are using the SDK already, and half of them are building applications specifically for the Haswell Xeon E3 chips.

This is not aimed at HPC – yet

This SDK is not, by the way, designed to turn that graphics unit into a generic floating-point coprocessor. But that could happen. "We are definitely not positioning it that way," says Larson, "but computationally it is a big vector engine to paint pixels."

So, make your mischief where you will, techies.

There's a Xeon E3-1200 v3 for just about any single-socket box you can imagine

There's a Xeon E3-1200 v3 for just about any single-socket box you can imagine

There are thirteen Xeon E3-1200 v3 processors, and generally speaking, the standard SKUs are priced about the same as their Ivy Bridge predecessors. As usual, Intel is charging a premium for top-bin and low-voltage parts in each of the four slices of the product line it is addressing. Some of the chips have HyperThreading, which is Intel's implementation of simultaneous multithreading, on the cores and some do not.

The E3-1265L is a low-volt part with one graphics core, while the remaining parts aimed at data-center graphics (VDI, media transcoding, and so forth) and workstations have two cores. All of the chips have Turbo Boost enabled on the Xeon cores, which lets them jump up their clock speeds if there is thermal headroom to do so.

Twelve of the chips are available now, but the 13 watt E3-1220L that will be particularly interesting for microservers is not going to be available until the third quarter. That chip only has two cores and four threads enabled, and has half of the 8MB of L3 cache memory disabled as well, which is how it is getting its wattage down.

Anyone ready for a four-core 15 watt Xeon E3-1200 v4 implemented in 14 nanometer tech? ®

Broader topics

Other stories you might like

  • Intel demands $625m in interest from Europe on overturned antitrust fine
    Chip giant still salty

    Having successfully appealed Europe's €1.06bn ($1.2bn) antitrust fine, Intel now wants €593m ($623.5m) in interest charges.

    In January, after years of contesting the fine, the x86 chip giant finally overturned the penalty, and was told it didn't have to pay up after all. The US tech titan isn't stopping there, however, and now says it is effectively seeking damages for being screwed around by Brussels.

    According to official documents [PDF] published on Monday, Intel has gone to the EU General Court for “payment of compensation and consequential interest for the damage sustained because of the European Commissions refusal to pay Intel default interest."

    Continue reading
  • Intel withholds Ohio fab ceremony over US chip subsidies inaction
    $20b factory construction start date unchanged – but the x86 giant is not happy

    Intel has found a new way to voice its displeasure over Congress' inability to pass $52 billion in subsidies to expand US semiconductor manufacturing: withholding a planned groundbreaking ceremony for its $20 billion fab mega-site in Ohio that stands to benefit from the federal funding.

    The Wall Street Journal reported that Intel was tentatively scheduled to hold a groundbreaking ceremony for the Ohio manufacturing site with state and federal bigwigs on July 22. But, in an email seen by the newspaper, the x86 giant told officials Wednesday it was indefinitely delaying the festivities "due in part to uncertainty around" the stalled Creating Helpful Incentives to Produce Semiconductors (CHIPS) for America Act.

    That proposed law authorizes the aforementioned subsidies for Intel and others, and so its delay is holding back funding for the chipmakers.

    Continue reading
  • Intel delivers first discrete Arc desktop GPUs ... in China
    Why not just ship it in Narnia and call it a win?

    Updated Intel has said its first discrete Arc desktop GPUs will, as planned, go on sale this month. But only in China.

    The x86 giant's foray into discrete graphics processors has been difficult. Intel has baked 2D and 3D acceleration into its chipsets for years but watched as AMD and Nvidia swept the market with more powerful discrete GPU cards.

    Intel announced it would offer discrete GPUs of its own in 2018 and promised shipments would start in 2020. But it was not until 2021 that Intel launched the Arc brand for its GPU efforts and promised discrete graphics silicon for desktops and laptops would appear in Q1 2022.

    Continue reading
  • Microsoft fixes under-attack Windows zero-day Follina
    Plus: Intel, AMD react to Hertzbleed data-leaking holes in CPUs

    Patch Tuesday Microsoft claims to have finally fixed the Follina zero-day flaw in Windows as part of its June Patch Tuesday batch, which included security updates to address 55 vulnerabilities.

    Follina, eventually acknowledged by Redmond in a security advisory last month, is the most significant of the bunch as it has already been exploited in the wild.

    Criminals and snoops can abuse the remote code execution (RCE) bug, tracked as CVE-2022-30190, by crafting a file, such as a Word document, so that when opened it calls out to the Microsoft Windows Support Diagnostic Tool, which is then exploited to run malicious code, such spyware and ransomware. Disabling macros in, say, Word won't stop this from happening.

    Continue reading
  • Linux Foundation thinks it can get you interested in smartNICs
    Step one: Make them easier to program

    The Linux Foundation wants to make data processing units (DPUs) easier to deploy, with the launch of the Open Programmable Infrastructure (OPI) project this week.

    The program has already garnered support from several leading chipmakers, systems builders, and software vendors – Nvidia, Intel, Marvell, F5, Keysight, Dell Tech, and Red Hat to name a few – and promises to build an open ecosystem of common software frameworks that can run on any DPU or smartNIC.

    SmartNICs, DPUs, IPUs – whatever you prefer to call them – have been used in cloud and hyperscale datacenters for years now. The devices typically feature onboard networking in a PCIe card form factor and are designed to offload and accelerate I/O-intensive processes and virtualization functions that would otherwise consume valuable host CPU resources.

    Continue reading
  • AMD bests Intel in cloud CPU performance study
    Overall price-performance in Big 3 hyperscalers a dead heat, says CockroachDB

    AMD's processors have come out on top in terms of cloud CPU performance across AWS, Microsoft Azure, and Google Cloud Platform, according to a recently published study.

    The multi-core x86-64 microprocessors Milan and Rome and beat Intel Cascade Lake and Ice Lake instances in tests of performance in the three most popular cloud providers, research from database company CockroachDB found.

    Using the CoreMark version 1.0 benchmark – which can be limited to run on a single vCPU or execute workloads on multiple vCPUs – the researchers showed AMD's Milan processors outperformed those of Intel in many cases, and at worst statistically tied with Intel's latest-gen Ice Lake processors across both the OLTP and CPU benchmarks.

    Continue reading
  • Apple’s M2 chip isn’t a slam dunk, but it does point to the future
    The chip’s GPU and neural engine could overshadow Apple’s concession on CPU performance

    Analysis For all the pomp and circumstance surrounding Apple's move to homegrown silicon for Macs, the tech giant has admitted that the new M2 chip isn't quite the slam dunk that its predecessor was when compared to the latest from Apple's former CPU supplier, Intel.

    During its WWDC 2022 keynote Monday, Apple focused its high-level sales pitch for the M2 on claims that the chip is much more power efficient than Intel's latest laptop CPUs. But while doing so, the iPhone maker admitted that Intel has it beat, at least for now, when it comes to CPU performance.

    Apple laid this out clearly during the presentation when Johny Srouji, Apple's senior vice president of hardware technologies, said the M2's eight-core CPU will provide 87 percent of the peak performance of Intel's 12-core Core i7-1260P while using just a quarter of the rival chip's power.

    Continue reading

Biting the hand that feeds IT © 1998–2022