MIT boffins have devised a software-based tool for predicting how processors will perform when executing code for specific applications.
In three papers released over the past seven months, ten computer scientists describe Ithemal (Instruction THroughput Estimator using MAchine Learning), a tool for predicting the number processor clock cycles necessary to execute an instruction sequence when looped in steady state, and include a supporting benchmark and algorithm.
Throughput stats matter to compiler designers and performance engineers, but it isn't practical to make such measurements on-demand, according to MIT computer scientists Saman Amarasinghe, Eric Atkinson, Ajay Brahmakshatriya, Michael Carbin, Yishen Chen, Charith Mendis, Yewen Pu, Alex Renda, Ondˇrej Sykora, and Cambridge Yang.
So most systems rely on analytical models for their predictions. LLVM offers a command-line tool called llvm-mca that can presents a model for throughput estimation, and Intel offers a closed-source machine code analyzer called IACA (Intel Architecture Code Analyzer), which takes advantage of the company's internal knowledge about its processors.
Michael Carbin, a co-author of the research and an assistant professor and AI researcher at MIT, told the MIT News Service on Monday that performance model design is something of a black art, made more difficult by Intel's omission of certain proprietary details from its processor documentation.
The Ithemal paper [PDF], presented in June at the International Conference on Machine Learning, explains that these hand-crafted models tend to be an order of magnitude faster than measuring basic block throughput – sequences of instructions without branches or jumps. But building these models is a tedious, manual process that's prone to errors, particularly when processor details aren't entirely disclosed.
Using a neural network, Ithemal can learn to predict throughout using a set of labelled data. It relies on what the researchers describe as "a hierarchical multiscale recurrent neural network" to create its prediction model.
"We show that Ithemal’s learned model is significantly more accurate than the analytical models, dropping the mean absolute percent error by more than 50 per cent across all benchmarks, while still delivering fast estimation speeds," the paper explains.
Exploring AWS CodeGuru: New automated code review has smart features – but Java-onlyREAD MORE
A second paper presented in November at the IEEE International Symposium on Workload Characterization, "BHive: A Benchmark Suite and Measurement Framework for Validating x86-64 Basic Block Performance Models," describes the BHive benchmark for evaluating Ithemal and competing models, IACAm llvm-mca, and OSACA (Open Source Architecture Code Analyzer). It found Ithemal outperformed other models except on vectorized basic blocks.
And in December at the NeurIPS conference, the boffins presented a third paper titled Compiler Auto-Vectorization with Imitation Learning that describes a way to automatically generate compiler optimizations in a way that outperforms LLVM’s SLP vectorizer.
The academics argue that their work shows the value of machine learning in the context of performance analysis.
"Ithemal demonstrates that future compilation and performance engineering tools can be augmented with data–driven approaches to improve their performance and portability, while minimizing developer effort," the paper concludes. ®