Latest update for 'extremely fast' compression algorithm LZ4 sprints past old versions

New release does something you might have thought it already did

The new version of the high-speed compression algorithm LZ4 gets a big speed boost – nearly an order of magnitude.

LZ4 is one of the faster compression algorithms in Linux, but the newly released LZ4 version 1.10 significantly raises the bar on its own forerunners. On some hardware, LZ4 1.10 compresses data over five and up to nearly ten times faster than previous releases by using multiple CPU cores in parallel.

As the release notes explain:

Multithreading is less critical for decompression, as modern NVMe drives can still be saturated with a single decompression thread. Nonetheless, the new version enhances performance by overlapping I/O operations with decompression processes.

Tested on a x64 Linux platform, decompressing a 5GB text file locally takes 5 seconds with v1.9.4; this is reduced to 3 seconds in v1.10.0, corresponding to > +60% performance improvement.

There are multiple compression algorithms in Linux and other FOSS OSes, such as the recently infamous xz. There's no single "best," they are all optimized for different uses, some for big files, some for certain types of data, some for the smallest possible compressed file size, some for the smallest memory usage, and so on. The LZ4 algorithm is one of the speed-optimized ones. Its self-description on GitHub is "Extremely Fast Compression algorithm."

It's been around for a while. As far as the FOSS desk can tell, The Register first mentioned it in 2012 and it was incorporated into the Linux kernel in version 3.11 the following year. It was used to compress the SquashFS found on many Linux boot media since kernel 3.19.

The curious can read a short but dense explanation of how LZ4 works from author Yann Collet, who works for Facebook and is also the creator of Zstd and xxHash. The US sitcom Silicon Valley fictionalized his work via a character called Richard Hendricks.

For a speed-focused compression scheme that's over a decade old, such a big performance jump is unexpected. It does it by spreading compression over multiple CPU cores, as previously done in the lz4mt C++ implementation. (The author of that variant, Takayuki Matsuoka, contributed multiple changes to the new LZ4 release.)

LZ4 could already do over half a gigabyte per second on each core, but now, if you have lots of cores to throw at it, it can do substantially more. The table in the announcement shows an AMD 7850HS – an octo-core chip – getting seven to eight times faster, and an Intel i7-9700K, also with eight cores, getting nearly six times as quick.

For us, this release illustrates several important points. First, writing efficient code to exploit the parallelism of multiple processor cores is very hard. The parallelized lz4mt implementation was ten years ago, and it's remarkable that it's taken a whole decade for this change to make it into what is a speed-focused algorithm. That, in turn, is why more parts of modern OSes and apps can't and don't make effective use of multiple CPU cores… and that's why the number of cores in desktop CPUs is increasing much more slowly than in server CPUs. More cores can't make a single-threaded process run any more quickly, and in general, most common apps tend to only use a small number of threads. There's still no way to automatically parallelize algorithms – only very smart humans can do that.

As we noted earlier this year when discussing code bloat, the late great Gene Amdahl formalized Amdahl's Law, which notes that the performance gains from making code more parallel usually tops out at about 20 processors. We also highly recommend "The Future of Microprocessors" talk by Arm co-creator Sophie Wilson, in which she notes that the silicon-chip industry is unique in successfully selling high-volume products where the purchasers can't use most of them. In fact, in any modern CPU, if it were possible to turn on all of any processor die at once, it would burn itself out in seconds.

In the meantime, though, LZ4 1.10 means you can use a bit more occasionally. Alongside LZ4, another thing that made it into the Linux kernel in version 3.11, humorously nicknamed Linux for Workgroups, was zswap, which can compress data before it's swapped out to virtual memory. As we described a couple of years ago, turning on zswap can really help the performance of any Linux box that uses swap heavily. When version 1.10 of LZ4 makes it into the kernel, that will get faster still, but in the meantime, you can easily turn it on and enjoy the result today. ®

More about

TIP US OFF

Send us news


Other stories you might like