Facebook storage techies: We sift through your family snaps to find 'warm BLOBs'

Hey, that's no way to talk about Uncle Gary


Facebook has ditched RAID and replication for its nearline storage, using distributed erasure coding to isolate what it calls "warm BLOBs" instead.

Translation please:

  • BLOB — Binary Large OBject — Facebook user’s photos, videos, etc.
  • Warm — data that has to be kept and is accessed at a lower rate than hot data, but more than archived or cold data. Typically, it’s more than a week old. Hot BLOBs, of course, are accessed more frequently.
  • Erasure coding — the adding of calculated parity values (Reed-Solomon codes) to a string of bytes, such that the string can be recovered if an error deletes or distorts some of the complete string. Typically more efficient than RAID at protecting data as it uses less space.

Facebook’s special problem is that it has three main types of user data, with associated metadata, and these three types need huge amounts of storage. Its main and most-accessed datasets are the recent, less than one-week-old postings on a user’s timeline. These get accessed a lot by the user’s "Friends".

It uses its Haystack storage system for this data, which uses triple replication to protect the data and make sure it can always be accessed and accessed quickly, with as near to a single disk access as possible (once the metadata calculations have been run).

As this data ages, it is accessed less often, cooling from hot to warm, and yet still requires fast access when it is actually called upon. Trouble is, the damn stuff just keeps on growing. For example, at the end of January this year, Facebook was storing more than 400 billion photos.

BLOB_requeat_rate_by_age

Relative request rates by age. Each line is relative to only itself, absolute values have been denormalized to increase readability, and points mark an order-of-magnitude decrease in request rate.

Computing the count of IOs per terabyte shows that its IO density is much less than hot data and that means it can be stored without using a triple rep scheme, and yet still have acceptably fast access, while being protected against disk, host and rack failures.

Facebook engineers have set up a new storage system, f4, to store this set of warm BLOBs. A paper by the engineers explains: “f4 is a new system that lowers the effective-replication-factor of warm BLOBs while remaining fault-tolerant and able to support the lower throughput demands.”

FAcebook_f4_Schematic

Facebook’s engineers say:

[f4] uses Reed-Solomon coding and lays blocks out on different racks to ensure resilience to disk, machine, and rack failures within a single data center. Is uses XOR coding in the wide-area to ensure resilience to data center failures. f4 has been running in production at Facebook for over 19 months. f4 currently stores over 65PB of logical data and saves over 53PB of storage.

BLOBs are aggregated into logical volumes of c100GB with aggregated filesystem metadata. They consist of a data file, index file and journal file. The index file is a snapshot of the in-memory lookup structure of the storage machines. When full volumes are locked and creates are not allowed.

Volumes are stored in one data centre and in cells, where a cell is 14 racks of 15 hosts with 30 x 4TB drives per host. Each volume/stripe/block is paired with a buddy volume/stripe/block in a different geographic region. Facebook stores an XOR of the buddies in a third region. This scheme protects against failure of one of the three regions.

Will enterprises in general need to move to such a storage scheme for their nearline data? It's unlikely as they won’t necessarily have the same amount of data as Facebook, nor its growth rate speed and or its immutability.

Read more about Facebook’s Mr BLOBby f4 scheme here (17-page PDF). ®


Airline software super-bug: Flight loads miscalculated because women using 'Miss' were treated as children

Weight blunder led to wrong thrust used on takeoff, says UK watchdog

A programming error in the software used by UK airline TUI to check-in passengers led to miscalculated flight loads on three flights last July, a potentially serious safety issue.

The error occurred, according to a report [PDF] released on Thursday by the UK Air Accidents Investigation Branch (AAIB), because the check-in software treated travelers identified as "Miss" in the passenger list as children, and assigned them a weight of 35 kg (~77 lbs) instead of 69 kg (~152 lbs) for an adult.

The AAIB report attributes the error to cultural differences in how the term Miss is understood.

Continue reading

W3C Technical Architecture Group slaps down Google's proposal to treat multiple domains as same origin

First Party Sets 'harmful to the web in its current form'

A Google proposal which enables a web browser to treat a group of domains as one for privacy and security reasons has been opposed by the W3C Technical Architecture Group (TAG).

Google's First Party Sets (FPS) relates to the way web browsers determine whether a cookie or other resource comes from the same site to which the user has navigated or from another site. The browser is likely to treat these differently, an obvious example being the plan to block third-party cookies.

The proposal suggests that where multiple domains owned by the same entity – such as google.com, google.co.uk, and youtube.com – they could be grouped into sets which "allow related domain names to declare themselves as the same first-party."

Continue reading

South Africa's state-owned energy firm to appeal after court rules Oracle does not have to support its software

Eskom disputes results of Big Red audit

South African electric utility Eskom is set to appeal against a court decision that refused to force Oracle to support software used by the firm while a licensing and payment dispute is settled.

In a case that dates back to 2019, Johannesburg High Court dismissed an attempt by Eskom to compel the global software giant to renew support services until April 2022.

The decision leaves the state-owned electricity company reliant on an "interim risk mitigating processes... to reduce the risk of its operations being disrupted."

Continue reading

Xen releases a new version 4.15 after a slightly delayed development process

Teases new ‘Hyperlaunch’ tech that will allow booting of whole VM fleets

The Xen project has released another upgrade to its open source hypervisor.

Development of this new cut – version 4.15 – proved a little trickier than expected, with initial plans for three release candidates and a March 23rd release stretching to five release candidates and release today, April 8th.

Was it worth the wait? Xen’s feature list highlights the new ability to export Intel Processor Trace data from guests to tools in dom0, which means tools like Intel’s kernel fuzzer have more to work with and thus a better chance of spotting code nasties.

Continue reading

Website maker Wix embarks on weird WordPress-trashing campaign, sends 'influencer' users headphones from 'WP'

'Creepy' videos liken CMS giant to 'absent, drunken father' – but its market share is only rising

Hosting company Wix is apparently running a bizarre campaign in an attempt to win over WordPress customers, causing WordPress founder Matt Mullenweg to accuse Wix of "dirty tricks."

WordPress is the content management system giant, with a 64.7 per cent market share and used in some measure by 40.9 per cent of active websites, according to W3Techs. Wix by contrast has a 2.4 per cent market share, though that is enough to place it fifth, behind Squarespace but above Drupal.

Wix kicked off its new campaign by apparently sending expensive Bose noise-cancelling headphones to selected people they considered to be influencers – the odd thing being that the gift was marked "Yours WP," though the sender was Wix.

Continue reading

Beloved pixel pusher Paint prepares to join Notepad for updates from Microsoft Store

You cannot kill what does not die

Microsoft Paint has followed its long-lived chum Notepad into the howling wilderness of the Microsoft Store.

It has been a while coming, but last night's Dev Channel Insider build of Windows 10 (21354) has made the MSPaint app updateable via the Microsoft Store.

The change, which was accompanied by a whizzy new icon for the aged bitmap editor, will allow Microsoft to tinker with the app without requiring a full-on Windows update. The same fate has already befallen the Notepad text editor, although we fervently hope those within the walls of Redmond fight the urge to fiddle with it too much.

Continue reading

Gitpod ditches Eclipse Theia for Visual Studio Code under redesign, sponsors new dev experience event

'Allowing everyone to use their favourite IDE just makes a lot of sense'

Gitpod, which provides remote environments for testing and debugging code, has shifted to Visual Studio Code from Eclipse Theia and is sponsoring a new event called DevX Conf, focused on the developer experience.

The idea behind the open-source Gitpod platform is that developers code, build, test, and debug in a remote workspace implemented as a Docker container, running on Kubernetes, and accessed via a web browser.

There are integrations with GitLab, GitHub, and Bitbucket, and the official IDE is Eclipse Theia – or was. "The IDE you get is now the original VS Code," co-founder Sven Efftinge told us.

Continue reading

Apple extends Find My support to third-party vendors including Belkin, Dutch bike maker VanMoof, and Chipolo

Expensive bike, earpods can now be tracked from inside the walled garden

An upgrade to Apple's Find My app has added support for devices from third-party manufacturers including gadget-tracking startup Chipolo, Belkin, and niche Dutch bike maker VanMoof.

Find My is a service that allows iPhone, iPad, Mac, and AirPod owners to locate their missing devices through a dedicated application or website. Until now, Apple had refused to support third-party vendors, forcing careless punters to rely on other services, such as Tile or (ironically) Chipolo.

That's changed with the launch of the Find My Network Accessory Program, which will allow independent firms to piggyback off Apple's tech, provided they meet Cupertino's stringent privacy and security rules.

Continue reading

UK reseller sues Microsoft for £270m in damages claiming prohibitive contracts choke off surplus Office licence supplies

ValueLicensing also calls for action to 'restore and maintain competition and choice in the market'

Updated Microsoft is being sued by UK reseller ValueLicensing for £270m in damages over claims of restrictive contractual practices and abuse of dominance.

The claim, filed in the UK's High Court in London, asserts that Microsoft stifled the supply of preowned Microsoft licences in the UK and EEA and added clauses into contracts that restrict customers reselling their licences (in return for a discount).

"The net result," alleges the Derby-based software reseller, "has been higher prices and less choice for customers, who have been steered into cloud-based Office365 and Azure subscriptions."

Continue reading

Belgian police seize 28 tons of cocaine after 'cracking' Sky ECC's chat app encryption

Euro cops take $1.65bn of blow off the streets after poring over messages

The Belgian plod says it seized 27.64 tons of cocaine worth €1.4bn (£1.2bn, $1.65bn) from shipments into Antwerp in the past six weeks after defeating the encryption in the Sky ECC chat app to read drug smugglers' messages.

"During a judicial investigation into a potential service criminal organization suspected of knowingly providing encrypted telephones to the criminal environment, police specialists managed to crack the encrypted messages from Sky ECC," the Belgian police claimed, CNN reports.

"This data provides elements in current files, but also opened up new criminal offenses. The international smuggling of cocaine batches plays a prominent role in intercepted reports."

Continue reading

Ex-Geeks staff lose legal bid to claw back withheld training costs from final paycheques

Company acted fairly and reasonably, rules judge

Two men who quit software development firm Geeks Ltd failed to prove the company unlawfully withheld more than £2,000 from each of them to claw back its training costs, a tribunal has ruled.

The duo, named by the London South Employment Tribunal as Mr Bennett and Mr Day, both left the South London firm in 2019 after spending about two years working there.

Both claimed, in echoes of another tribunal case against Sparta Global, that Geeks had unlawfully withheld thousands from their final paycheques for unjustifiable training costs – but Employment Judge Corinna Ferguson ruled that the company acted correctly.

Continue reading

Biting the hand that feeds IT © 1998–2021