Facebook promised to open up its log storage system

LogDevice: how to make sense of 10 hyperscale data centres

Sysadmins struggling to manage lots of logs may want to Like a new "friend", after Facebook last week decided to share its distributed log management system.

If you're just running one site, Zuck's "LogDevice" code might not be for you: it's how Facebook makes sense of its 10 data centres, including how The Social Network™ brings those logs back into sync when something goes wrong.

Perhaps the most impressive number is in that operation: Facebook claims that after a failure, LogDevice can rebuild logs to “fully restore the replication factor of all records affected” at between 5 Gbps and 10 Gbps per second.

As the post explains, logging at scale presents two particularly wicked problems: making the record storage highly available and durable, while maintaining a “repeatable total order on those records”.

The specs needed to achieve this are:

  • LogDevice is record-oriented, meaning rather than bytes, the smallest indivisible unit written to the log is a full record, which the company says provides “better write availability in the presence of failures”;
  • Logs are append-only – log records can't be modified;
  • To manage log size, files are trimmable according to either time-based or space-based retention policies.

One key to getting the scale Facebook needs is by decoupling log sequencing from the records themselves: the sequencer runs as a separate process, either on a storage node or on its own node.

The sequences themselves aren't a single datum, but a tuple containing an epoch, and an offset within the epoch. “The epoch store acts as a repository of durable counters, one per log, that are seldom incremented and are guaranteed to never regress. Today we use Apache Zookeeper as the epoch store for LogDevice.”

Facebook's LogDevice

LogDevice separates sequencing from object storage

As for log object storage, LogDevice randomly assigns a record to a storage node – hence, for example, you don't have all of the logs from a particular server landing on the same disk, and you don't lose the whole thing if the disk fails.

That's where the fast rebuilding is important: what if a record is waiting to be restored, when a second failure takes place? This is what the 5 Gbps to 10 Gbps rebuild is designed to avoid.

All of the centralised logging naturally enough comes from local logs in the first instance, and for this, LogDevice introduces a write-optimised store called LogDB. It's “designed to keep the number of disk seeks small and controlled, and the write and read IO patterns on the storage device mostly sequential”, the post says.

Facebook says its ultimate goal is to open source LogDevice, hopefully this year. ®

Similar topics

Narrower topics

Other stories you might like

  • Experts: AI should be recognized as inventors in patent law
    Plus: Police release deepfake of murdered teen in cold case, and more

    In-brief Governments around the world should pass intellectual property laws that grant rights to AI systems, two academics at the University of New South Wales in Australia argued.

    Alexandra George, and Toby Walsh, professors of law and AI, respectively, believe failing to recognize machines as inventors could have long-lasting impacts on economies and societies. 

    "If courts and governments decide that AI-made inventions cannot be patented, the implications could be huge," they wrote in a comment article published in Nature. "Funders and businesses would be less incentivized to pursue useful research using AI inventors when a return on their investment could be limited. Society could miss out on the development of worthwhile and life-saving inventions."

    Continue reading
  • Declassified and released: More secret files on US govt's emergency doomsday powers
    Nuke incoming? Quick break out the plans for rationing, censorship, property seizures, and more

    More papers describing the orders and messages the US President can issue in the event of apocalyptic crises, such as a devastating nuclear attack, have been declassified and released for all to see.

    These government files are part of a larger collection of records that discuss the nature, reach, and use of secret Presidential Emergency Action Documents: these are executive orders, announcements, and statements to Congress that are all ready to sign and send out as soon as a doomsday scenario occurs. PEADs are supposed to give America's commander-in-chief immediate extraordinary powers to overcome extraordinary events.

    PEADs have never been declassified or revealed before. They remain hush-hush, and their exact details are not publicly known.

    Continue reading
  • Stolen university credentials up for sale by Russian crooks, FBI warns
    Forget dark-web souks, thousands of these are already being traded on public bazaars

    Russian crooks are selling network credentials and virtual private network access for a "multitude" of US universities and colleges on criminal marketplaces, according to the FBI.

    According to a warning issued on Thursday, these stolen credentials sell for thousands of dollars on both dark web and public internet forums, and could lead to subsequent cyberattacks against individual employees or the schools themselves.

    "The exposure of usernames and passwords can lead to brute force credential stuffing computer network attacks, whereby attackers attempt logins across various internet sites or exploit them for subsequent cyber attacks as criminal actors take advantage of users recycling the same credentials across multiple accounts, internet sites, and services," the Feds' alert [PDF] said.

    Continue reading

Biting the hand that feeds IT © 1998–2022