Facebook engineers today emitted a bunch of open-source compression and database tools during its @Scale conference.
This software is used actively at hyper-scale within the sprawling social network. The California giant wants to share some of its source code with the world and hopefully get some useful patches in return. Here's what's new:
In US telly sitcom Silicon Valley, Richard Hendricks creates a brilliant data compression algorithm called Pied Piper that pits him and his startup against psychotic tech giant Hooli, a thinly veiled parody of Google.
In real life, Hendricks works for Facebook and has created Zstandard: a lossless data squeezing algorithm that is faster than zlib and xz at compression and decompression, and has a better compression ratio than lz4 and zlib. In practical terms, the zstd command-line tool has a slightly better compression ratio than gzip, and way faster compression and decompression speeds.
The idea, basically, is to replace zlib with code that is mostly branchless and is optimized for parallel execution, which boosts performance. The library also benefits from finite state entropy, which is a state-of-the-art probability compressor design, among other improvements over zlib. Zlib is used in pretty much everything, including compressing zip and gzip archives and in-transit web traffic, so Facebook is keen to get its superior library adopted.
From the project's README:
Zstandard is currently deployed within Facebook. It is used daily to compress and decompress very large amounts of data in multiple formats and use cases. Zstandard is considered safe for production environments.
"We're inspired by zlib's 20 years of success and hope Zstandard will carry the industry further where zlib leaves off," a Facebook spokesperson added.
Heads up, though. The three-clause BSD license on the Zstandard source includes an extra condition: that you won't sue Facebook for patent infringement, nor countersue if Facebook sues you first, while using the software in your own projects and products.
Facebook has taken RocksDB – its fork of Google's LevelDB – combined it with a branch of Oracle's MySQL 5.6, thrown in some extensions, and called it MyRocks.
RocksDB is a fast key-value storage system with efficient data compression. However, it doesn't support replication nor provide an SQL interface, so cue an integration with open-source MySQL and, bam, you've got MyRocks.
Facebook is a massive user of MySQL and its InnoDB storage engine. Right now, though, FB's engineers are migrating their ginormous user database from MySQL to MyRocks, we're told.
"After deploying MyRocks to this database tier in one of our data center regions, we were able to use 50 percent less storage for the same amount of data compared with compressed InnoDB," said Facebook techie Yoshinori Matsunobu.
The database code is pretty new and kinda specific to Facebook's hyper-scale needs: it lacks things like foreign keys, automatic deadlock detection, native partitioning and support for FULLTEXT and SPATIAL indexes. Facebook hopes people will muck in and help improve the software over time.
Finally, Facebook has developed a new algorithm for stabilizing 360-degree video. There's a technical outline here if you're interested. This is an internal effort; there's no code to share for this one.
You can find other open-source Facebook projects here. ®