Facebook simulated entire data center prior to launch

Zuckerberg's Project Triforce


Before turning on its new custom-built data center in Prineville, Oregon, Facebook simulated the facility inside one of its two existing data center regions. Known as Project Triforce, the simulation was designed to pinpoint places where engineers had unknowingly fashioned the company's back-end services under the assumption that it would run across only two regions, not three or more.

"We now have hundreds of back-end services, and in going to a third data center, we needed to make sure all of them worked," Facebook's Sanjeev Kumar tells The Register. "These services are designed to work with many different data centers. But the trick is that if you haven't tested them on more than two data centers, you may not catch some practices that subtly crept into the system that would cause it to not work."

In the beginning, Facebook served up its site from leased data center space in North California. Then, in 2007, it leased additional space in Northern Virginia, spreading the load from the West Coast of the United States to the East. This move, Kumar says, was relatively simple because at the time, the Facebook back-end was relatively simple. Facebook's software stack consisted of a web tier, a caching tier, a MySQL tier, and just handful of other services. But today, its infrastructure is significantly more complex, and this created additional worries prior to the launch of the Prineville facility, the first data center designed, built, and owned by Facebook itself – and "open sourced" to the rest of the world.

According to Kumar, Facebook didn't have the option of individually testing each service for use across a third data center region. It needed to test all services concurrently, with real user data, before the third region actually went live. "The number of components in our infrastructure meant that testing each independently would be inadequate: it would be difficult to have confidence that we had full test coverage of all components, and unexpected interactions between components wouldn’t be tested," Kumar writes in a post to the Facebook engineering blog.

"This required a more macro approach – we needed to test the entire infrastructure in an environment that resembled the Oregon data center as closely as possible."

In an effort to improve throughput, the company was also moving to a new MySQL setup that used the FlashCache – the open source Linux block cache – and since this required the use of two MySQL instances on each machine, the company needed to test changes to its software stack as well.

So, Kumar and his team commandeered a cluster within one of its Virginia data centers and used it to simulate the new Prineville facility. This data center simulation spanned tens of thousands of machines, and it was tested with live Facebook traffic. "A cluster of thousands of machines was the smallest thing we could use to serve production traffic, and production traffic had to be used to ensure it hit all the use cases," Kumars tells us.

To facilitate the creation of its simulated data center, the company built a new software suite known as Kobold, which automated the configuration of each machine. "Kobold gives our cluster deployment team the ability to build up and tear down clusters quickly, conduct synthetic load and power tests without impacting user traffic, and audit our steps along the way," Kumar says. The entire cluster was online within 30 days, and it started serving production traffic within 60 days.

The only thing the company's didn't replicate was Prineville's MySQL database layer. This would have meant buying a whole new set of physical machines – the company uses a different breed of machines for MySQL than for other pieces of its back-end – and it couldn't justify the cost. The machines in the simulation cluster will eventually be folded back into the everyday operations of the Virginia data center region, but at this point, the region has all the MySQL machines it needs.

The simulation began in October, about three months before the Prineville data center was turned on. Since the simulation was even further from the company's Northern California data center than Prineville is, it duplicated the inter-data center latency the company would experience – and then some. "The latency we needed to worry about was between Prineville and Northern California," Kumar says. "Any issue you might see in Prineville, you would definitely see in Virginia." The latency between Prineville and Northern California is about 10 to 20 milliseconds (one way), and between Virginia and Northern California is roughly 70 milliseconds.

And yes, there were cases when the company's services were ill-suited for a trio of data centers, but these were relatively minor problems. "None of the problems we discovered involved a software component that was fundamentally coded for only two data centers and needed major changes," Kumar explains. "These were all situations where, say, someone just didn't realize the service A depended on service B."

This sort of thing shouldn't crop up again as the company moves from three data centers to four and beyond. "One to two is a big change, and two to four is a big change," Kumar says. "But after that, it tends not to be a big deal."

Plus, the Virginia simulation is still up and running. In essence, Facebook is already tested its infrastructure across a fourth data center. ®

Similar topics

Narrower topics


Other stories you might like

  • Meta agrees to tweak ad system after US govt brands it discriminatory
    And pay the tiniest of fines, too

    Facebook parent Meta has settled a complaint brought by the US government, which alleged the internet giant's machine-learning algorithms broke the law by blocking certain users from seeing online real-estate adverts based on their nationality, race, religion, sex, and marital status.

    Specifically, Meta violated America's Fair Housing Act, which protects people looking to buy or rent properties from discrimination, it was claimed; it is illegal for homeowners to refuse to sell or rent their houses or advertise homes to specific demographics, and to evict tenants based on their demographics.

    This week, prosecutors sued Meta in New York City, alleging the mega-corp's algorithms discriminated against users on Facebook by unfairly targeting people with housing ads based on their "race, color, religion, sex, disability, familial status, and national origin."

    Continue reading
  • Metaverse progress update: Some VR headset prototypes nowhere near shipping
    But when it does work, bet you'll fall over yourselves to blow ten large on designer clobber for your avy

    Facebook owner Meta's pivot to the metaverse is drawing significant amounts of resources: not just billions in case, but time. The tech giant has demonstrated some prototype virtual-reality headsets that aren't close to shipping and highlight some of the challenges that must be overcome.

    The metaverse is CEO Mark Zuckerberg's grand idea of connected virtual worlds in which people can interact, play, shop, and work. For instance, inhabitants will be able to create avatars to represent themselves, wearing clothes bought using actual money – with designer gear going for five figures.

    Apropos of nothing, Meta COO Sheryl Sandberg is leaving the biz.

    Continue reading
  • Facebook phishing campaign nets millions in IDs and cash
    Hundreds of millions of stolen credentials and a cool $59 million

    An ongoing phishing campaign targeting Facebook users may have already netted hundreds of millions of credentials and a claimed $59 million, and it's only getting bigger.

    Identified by security researchers at phishing prevention company Pixm in late 2021, the campaign has only been running since the final quarter of last year, but has already proven incredibly successful. Just one landing page - out of around 400 Pixm found - got 2.7 million visitors in 2021, and has already tricked 8.5 million viewers into visiting it in 2022. 

    The flow of this phishing campaign isn't unique: Like many others targeting users on social media, the attack comes as a link sent via DM from a compromised account. That link performs a series of redirects, often through malvertising pages to rack up views and clicks, ultimately landing on a fake Facebook login page. That page, in turn, takes the victim to advert landing pages that generate additional revenue for the campaign's organizers. 

    Continue reading

Biting the hand that feeds IT © 1998–2022