IDF Intel's unnamed next-generation microarchitecture (NGMA) will combine the Pentium 4's bus, the Pentium M's power optimisations and some "new innovations", Intel said today.
However, one of those features will not be HyperThreading, at least not in the NGMA's initial versions, the chip giant admitted.
At NGMA's heart is a 14-stage instruction pipeline - around half the length of 'Prescott' pipeline, the same as the old Pentium Pro, and probably in line with 'Dothan' pipeline length. Prescott's pipeline was extended to around 30 stages to support clock frequencies of 4GHz and beyond. Now that Intel is no longer targeting such high clock speeds - thanks to the heat dissipation problem - out goes the need for such a long pipeline, needed to keep the core efficient at high clock speeds.
A 14-stage pipeline suggests we'll be looking at 'Conroe' and 'Merom' clock frequencies well below those at which today's Pentium 4 runs - they'll probably run at 2-3GHz, though Intel didn't provide any guidance on clocking. The lower speed, plus the smaller, 65nm fabrication process will help keep the power consumption down. So too will the efficiency techniques derived from Dothan.
The NGMA core can execute four instructions per clock, Intel said, pulling them out of the order in which the program 'expects' them to be run. Interestingly, in its "first implementation", NGMA doesn't have HyperThreading, to make software easier to compile, David Perlmutter, VP and General Manager of Intel's Mobility Group suggested.
However, expect cores supporting up to eight threads over time, Intel's Digital Enterprise Group VP, Stephen Smith, said. At that time, some CPUs will be single-threaded, others multi-threaded, he added.
The NGMA will support direct connections between each core's L1 cache. L2 cache can be shared too, and is scalable - Intel will offer versions of the same core with different cache sizes, not unlike what AMD does today. Desktop processors - ie. 'Conroe' - will have more cache than the mobile version, 'Merom', and the server versions, 'Woodcrest' and 'Whitefield' will offer more. Again, Intel provided no guidance on cache sizes. The bus connecting the L2 cache to the execution core has been widened.
Perlmutter also said the NGMA dynamically adjusts the cache space alloted to each core, depending on load. Run one single-threaded app, he said, and that app will have access to all the cache. Run more apps to bring the second core into action, and cache will be assigned to that core too.
Finally, Intel said it has improved the NGMA's memory pre-fetch system and has added memory disambiguation, a technique which essentially grabs data from memory before the data currently being processed has been written back to memory. That's a risk: the data you've just read could be changed by what's going on inside the CPU, generating errors. Intel claims its system is smart enough to get data that isn't going to be changed - only real-world testing will reveal whether it's right. ®