Greased graphics turn a tick into a tock
But it's the improvements to Ivy Bridge's graphics improvement that both Mooley and Piazza cited as the main reason that Ivy Bridge should be considered a "tick-plus".
The graphics elements in Ivy Bridge have been reshuffled and enhanced for better performance (click to enlarge)
Ivy Bridge's integrated graphics, Piazza said, "is not a tick, it's a tock," citing such improvements as support for DirectX 11 – currently in AMD's APUs, but new for Intel – plus improved media performance and the ability to drive three displays from one Ivy Bridge chip, or two when a laptop is docked with its lid closed.
"So now you can dock with your lid down," Piazza said, "and those people with three eyes or three heads, you can also have three displays if you keep your lid up."
In addition to enabling Cerberus to better enjoy his Ivy Bridge Ultrabook™, the new graphics architecture, Piazza said, is able to "fix some things that were less efficient on Sandy Bridge, and create a system that's much more scalable, much more optimal" when used in different chips in the Ivy Bridge series.
The Ivy Bridge design team reordered the graphics pipeline, partially to make it more scalable through the addition of more or fewer graphics processing units, and partially to simply make it faster and more efficient.
Tom Piazza at IDF 2011
They also added an L3 cache deep inside the graphics goodness. "In Sandy Bridge," Piazza said, "we were going to put an L3 cache in. We did not do it because we couldn't find any real performance reason to do it." Asked why not, he answered, speaking of the Sandy Bridge development process: "If you look into Sandy Bridge, you can almost turn the L3 off – you'll see very few applications that suffer, and they suffer in the range of 5 to 10 per cent."
The bottom line: "There was no performance to be gained from the L3, at the time, so we just killed it."
In the new, rearranged Ivy Bridge graphics architecture, however, the L3 is closer to the units that it needs to feed, and it conserves power "because you don't have to go out and light up the whole [cache communications] ring,' which would eat put more power and eat up bandwidth." The addition of the L3 in Ivy Bridge "just floats all boats," Piazza said.
Piazza also ran down a laundry list of architectural changes and improvements in Ivy Bridge, including shared local memory. "You take a look at what we did here – the amount of scatter-gathers per clock – is 32 times more than Sandy Bridge. If you're running GPGPU workloads ... don't be surprised if performance is extremely higher," he said, citing 20X improvements they've seen in some workloads.
He also pointed out other improvements, such as better geometry performance, buffer-clearing optimizations using scoreboarding, higher sampler throughput, higher – and more honest – peak gigaflops, and improved anisotropic performance. "For those people who have been looking at our anisotropic angle thing," he said, "we now draw circles instead of flower petals."
Toss in a few more upgrades – such as the ability to support both encoding and decoding stereoscopic 3D – and the tick-tock, tick-plus cadence packed inside Ivy Bridge's 1.48bn transistors might be the sound of stopwatch timing AMD and Intel in the on-die, integrated-graphics race. ®