In his 1959 address to the American Physical Society at Caltech, physicist Richard Feynman gave a lecture titled, "There's Plenty of Room at the Bottom," laying out the opportunity ahead to manipulate matter at the atomic scale.
The semiconductor industry since then has made much of miniaturization, increasing the density of transistors on chips at a steady rate for decades. In 1965, Gordon Moore, co-founder of Fairchild Semiconductor and Intel, predicted the rate of semiconductor improvement would continue for at least a decade.
That prediction, which came to be known fallaciously as Moore's Law, held up until it began to break down a few years ago. Moore's idea may not be quite played out yet but a group of MIT boffins are ready to put it to rest and look beyond the bottom of the tech stack.
Chip miniaturization, they say, looks like it will end at 5nm fabrication, due to the diminishing returns and the anticipated costs of trying to reduce transistors further. Absent gains from shrinking semiconductors, they want the computing industry to focus on the "Top" of the tech stack.
In a paper [paywall] published Friday in the journal Science, MIT professors Charles Leiserson and Daniel Sanchez, adjunct professor Butler Lampson, professor of the practice Joel Emer, and research scientists Bradley Kuszmaul, Tao Schardl, and Neil Thompson argue that the tech industry needs software performance engineering, better algorithmic approaches to problem solving, and streamlined hardware interaction.
These three areas at the top of the stack will yield less reliable gains than semiconductor density improvements of the past because they're interrelated.
"Unlike Moore’s law, which has driven up performance predictably by 'lifting all boats,' working at the Top to obtain performance will yield opportunistic, uneven, and sporadic gains, typically improving just one aspect of a particular computation at a time," the authors state.
Nonetheless, that's the present opportunity now that the further miniaturization no longer looks practical.
Bottom to top
This future demands better programming techniques to write faster code. To illustrate that point, the MIT researchers wrote a simple Python 2 program that multiplies two 4,096-by-4,096 matrices. They used an Intel Xeon processor with 2.9-GHz 18-core CPU and shared 25-mebibyte L3-cache, running Fedora 22 and version 4.0.4 of the Linux kernel.
for i in xrange(4096): for j in xrange(4096): for k in xrange(4096): C[i][j] += A[i][k] * B[k][j]
The code, they say, takes seven hours to compute the matrix product, or nine hours if you use Python 3. Better performance can be achieved by using a more efficient programming language, with Java resulting in a 10.8x speedup and C (v3) producing an additional 4.4x increase for a 47x improvement in execution time.
Beyond programming language gains, exploiting specific hardware features can make the code run 1300x faster still. By parallelizing the code to run on all 18 of the available processing cores, optimizing for processor memory hierarchy, vectorizing the code, and using Intel's Advanced Vector Extensions, the seven hour number crunching task can be reduced to 0.41s, or 60,000x faster than the original Python code.
Not every application can be improved by five orders of magnitude with better programming, the authors say, but most can benefit from performance engineering.
Moore's Law isn't dead, chip boffin declares – we need it to keep chugging along for the sake of AIREAD MORE
Improvements in algorithms, the authors say, can also speed things up, pointing to four advances since 1978 in the way to calculate the maximum flow in a network. Algorithmic refinement has kept pace with Moore's Law in terms of performance gains over time, they say, though these advancements are irregular and can't be expected to go on forever.
They also note that some hardware features used by specific algorithms to maximize performance – e.g. simultaneous multithreading, dynamic voltage and frequency scaling – make optimization difficult because they cause variability that isn't easily modeled.
The authors stress the need for hardware makers to focus on less rather than Moore.
"We argue that in the post-Moore era, architects will need to adopt the opposite strategy and focus on hardware streamlining: implementing hardware functions using fewer transistors and less silicon area," they say.
That means processor simplification – fewer transistors to allow room for more cores and parallelism – and domain specialization – chips tailored for specific tasks.
The possibilities of specialization can be seen in the context of GPUs. The boffins ran their Python test using an Advanced Micro Devices (AMD) FirePro S9150 GPU. It delivered results in 70ms, 5.4x faster than the best Xeon result and 360,000x faster than the original Python code.
In the past, the authors observe, general purpose processors tended to limit the market for specialized processors because they advanced rapidly. Now without that competition, and with cloud providers to aggregate demand for specialized applications, they expect more purpose-built chips.
The boffins conclude by noting that the bottom hasn't necessarily been sounded and that advances in materials like graphene and research areas like quantum computing and superconducting may change the picture. But such advances aren't near at hand, they say, and so we should lift our eyes to the top of the stack. ® ®