Facebook has open sourced a binary optimization and layout tool, itself optimized into the acronym BOLT, in the hope it can make large applications faster.
The compiled code for large applications – the sort Facebook runs – tend to be too big to cram into the instruction cache of a modern CPU, explains Maxim Panchenko, software engineer at the social ad empire, in a blog post.
Consequently, the chip spends a significant amount of processing time – sometimes as much as 30 per cent – moving instructions from memory to the CPU.
"To address this issue, which is commonly known as instruction starvation, we have developed and deployed BOLT, a binary optimization and layout tool," says Panchenko. "BOLT optimizes placement of instructions in memory, thereby reducing CPU execution time by 2 per cent to 15 per cent."
The Linux command line tool works with applications created using various compilers, such as Clang or GCC. It's designed to complement a feedback-directed optimization tool built by Google several years ago called AutoFDO.
Squeezing in little Quake between builds? Not any more: Facebook Bucks up Java compile toolREAD MORE
In order to keep the CPU well-fed and prevent instruction starvation – which can result from branch misprediction as well as burdensome binaries – compiler profile-guided optimizations or PGOs may be employed, says Panchenko.
Such profile data can be used to recompile applications so they make better use of the CPU cache architecture and make better decisions about when to employ techniques like inlining.
PGOs have some limitations, which AutoFDO attempts to address. But according to Panchenko, AutoFDO doesn't play nice with the C++ exception mechanism used by HHVM, the HipHop Virtual Machine developed by Facebook for its Hack and PHP code.
(HHVM compiles to intermediate bytecode, which can then be translated into x64 machine code on the fly by the just-in-time (JIT) compiler, a strategy that allows optimizations that aren't possible with statically compiled binaries.)
BOLT has improved HHVM performance by 8 per cent and other services by anywhere from 2 to 15 per cent, Panchenko says.
"If you are running a large application that is CPU front-end bound — i.e., it experiences a significant amount of instruction cache and TLB misses — then BOLT will help address these bottlenecks," he says. ®