There's more to performance than just 'performance'
With all the talk about HSA, CPU tweaks, GPU upgrades, performance comparisons, and the like, we'd be remiss if we didn't mention some of the other goodies resident on the new Mobile Kaveri's die – namely, media processing for video and audio.
"When I talk about media playback, it's not about decoding a video stream and just presenting it," Macri said. "That's what everybody does. What we like to do at AMD is post-process it." What he was referring to are such post-processing niceties as removal of jitter, improvement of color gamut, edge sharpening, and the like.
Speaking of the 4K video–resolution capabilities of the new mobile Kaveri, Macri noted that although 4K is all well and good, there's very little 4K content available at present. "The best way to experience 4K today," he said, "is to take 1080p content and upscale it. Well, if you upscale it, you better have some great post-processing. Otherwise you're going to end up with a pretty ugly image on your beautiful monitor."
And guess what he says that his new APU has? Yup: great post-processing, enhanced by its HSA capabilities. "The ability for us to work on the right parts of the problem with the right hardware without moving the data improves performance, lowers power, gives us more accuracy," he said.
The new Kaveri mobile APUs also include three accelerators – coprocessors – that aid in media processing; one exists to enable AMD's TrueAudio technology.
The TrueAudio coprocessor consists of multiple digital signal processors (DSPs) with onboard data stores totaling 384KB of shared memory, and with its own direct memory access (DMA) engine for streaming.
"Basically, if you're watching a movie," Macri said, "the only thing the CPU needs to do in this case is take the audio/visual stream, crack it, ship the video to our decoder, ship the audio to our [audio coprocessor], and that's all the CPU is doing."
This frees the CPU from having to waste its time and energy on audio matters, and since the TrueAudio coprocessor is a dedicated piece of hardware, the quality of service (QoS) can be enhanced – no skips or glitches that might occur when the CPU is called away for other tasks, Macri said.
The TrueAudio coprocessor is also programmable, so developers can take advantage of capabilities such as directional audio, noise cancelation, beam forming, and the like. "We already have some games that are starting to take advantage of this now," Macri said.
The new Kaveri mobile APUs also have a video coding engine (VCE) and a unified video decoder (UVD). The biggest change in the VCE from Kaveri's predecessors Trinity and Richland, Macri said, is the addition of the YUV444 color encoding, which he said "allows you to create perfect text."
Kaveri offers one important upgrade from its predecessors: crisp text over wireless (click to enlarge)
This is important because of the emerging 60GHz WiGig tech from Wilocity and others, which enables wireless docking to displays. "If you want to have a wireless dock, you want a wireless monitor with great text, you need to have the right encoding for it," Macri said – and YUV444 is that encoding.
Speaking of wireless connections, he claimed that "We have the lowest wireless latency out there: sub-40 milliseconds" – an obvious boon to gamers. And why is wireless technology important? Macri was clear on that point. "Wires are evil," he said. "I think they're just the worst thing in the universe."
There were also improvements made to the UVD between Trinity/Richland and Kaveri, the biggest being error resiliency. "When you're decoding," he said, "if you get an error, many times it can effect multiple frames. And what we've done with this design is limit it to just one frame."
Errors are essentially inevitable, and can come from multiple sources: memory glitches or the quality of the encoding itself, for example. "You're really at the mercy of the encoder," Macri said – but if you can limit their effect to just one frame, they may be barely noticeable.
He also touted the many power optimizations in all areas of the new chips, saying that there are "thousands" of monitors scattered around the Kaveri die, some keeping track of temperature, and many more tracking activity. "What we're trying to do is understand what's happening out in the silicon in all the different blocks," he said – CPU and GPU cores and their subsystems, I/O, memory subsystems, data paths, caches, coprocessors and their subsystems, whatever – and use all that activity data as a proxy for actual temperature readings.
All that information is brought back to what he described as "basically a central processor," which is programmed differently for different members of the Kaveri family. That processor keeps track of what's happening throughout the die, then boosting, throttling, shutting down, and maintaining processes, cores, accelerators, or whatever to run the whole chip at maximum efficiency.
Briefly and selectively boosting a clock makes a mockery of stated TDPs, Marci said. "TDP is probably the worst way to describe anything anymore – we do it because it's easier," he argues.
"When you describe things to people, one number sticks, right? If I give you a transfer function, you look at me with googly eyes, right? But if I give you 15 watts it'll stick in your brain. Trust me, 15-watt systems are not 15-watt systems – we go boosting way above, we're moving all over the place, very quickly."
Since it takes time for heat to move around on a piece of silicon, he said, creative control of how one area on the die can act as a heat sink for another area makes it possible to squeeze every bit of performance out of Kaveri without blowing the power budget.
"Most importantly," he emphasized, "it takes into account what you're doing, when you're doing it, and reacts to it in an very unique way – because only you are going to do what you do the way you do it when you do it how you do it. And we will react to that dynamically. We won't react to it statically."
Despite that selective, dynamic boosting, however, battery life in a Kaveri-equipped system should be impressive, he claimed – including idle power. "One of the key things in life is to learn how to do 'nothing' well. Whether it's real life or computers, it's very important – or you'll burn yourself out for no reason."
Wrapping up, Macri returned to the importance of the heterogeneous system architecture – HSA – that underpins not just Wednesday's announcement of the new Kaveri mobile APU, but which now reaches across AMD's line.
"'Big A' architecture changes happen once every maybe 15, 20 years," AMD's CTO said. "HSA is a big one." ®