USENIX Facebook has revealed details about Tao, its multi-petabyte data store for the company's social graph.
Though Facebook's social network may have little relevance for IT pros, its internal infrastructure does, because here the social network is dealing with quantities of information so vast that it has to come up with new ways to store, compute, and manage the data.
So the publication of details about Tao at the USENIX conference on Wednesday is novel for two reasons: one, it shows off the scale at which future enterprises are going to have to operate, and two, it highlights some of the design methodologies that modern data systems have brought about within sophisticated tech companies.
"A system like TAO is likely to be useful for any application domain that needs to efﬁciently generate ﬁne-grained customized content from highly interconnected data," Facebook's employees write in the paper. "The application should not expect the data to be stale in the common case, but should be able to tolerate it. Many social networks ﬁt in this category."
Other applications of a system like Tao could be large data sets relating to wildlife populations over time, or other complex systems with many agents whose relationship to one another is defined by a variety of actions. For the tin foil hat-aficionados, Tao would also seem to deal with the problems an intelligence agency would run into when trying to keep tabs on all its citizens.
Tao is a read-optimized data store that is deployed at Facebook as a single geographically distributed instance. It lets Facebook engineers access and write information across the company's "social graph" which stores all information about objects on Facebook (people, brands, comments and such), and associations (likes, pokes, tags).
It has been built to deal with over a billion reads per second across a data set "of many petabytes," Facebook said. Tao was designed by Facebook to better link together data kept in its main data store (MySQL) and caching layer (memcache), while being able to deal with unpredictable queries on objects.
"The fact Tao is using MySQL is completely hidden away from the client," Facebook director of engineering Venkat Venkataramani, tells The Register. "We haven't found anything that is better than MySQL, we are constantly looking at that."
Its API is mapped to a small amount of SQL queries, which ease communication with the underlying MySQL database. As Facebook's dataset is too large for a single database it has instead split data into logical shards which are handled by database servers.
TAO also has an eventually consistent caching layer which is built via a similar principle and filled with objects, associations, and association counts. The caching layer is crucial for allowing Facebook to speedily load the hundreds of objects and associations that populate any one page on the site.
Because Facebook's dataset is so large, the cache is split into a two-level hierarchy of a few "leader" caches which deal with writes and a subsidiary "follower" cache that helps with reads, which dramatically outnumber writes – Tao typically experiences a billion reads per second versus "millions of writes".
Data is cached in such a way that objects and associations have proximity to one another, Venkataramani says. "An important design decision is to keep the locality the system tries to exploit similar to the locality the workload has," he says. "That was one of the fundamental decisions that allowed us to scale."
Barack Obama's Facebook page, for instance, will generate vast numbers of reads at unpredictable times, and so many of Tao's design considerations revolve around guaranteeing read access to objects – hence its adoption of eventual consistency and high availability, over strong consistency and higher latency.
"Nothing before Facebook has seen this kind of workload," Venkataramani says. "When people think of web-scale apps people think of email, but the workload was very different because everyone checks their own email - you're not looking at other emails. The problem is very different when you take a social network because there are extremely high fan-outs."
Though the number of companies likely to deal with data in this way is quite small for now, studying Tao gives insights into the problems a company will run into when it gets really big, and shows that behind the blue and white bazaar of Facebook there's a rather sophisticated underlay.
"As the world moves more and more to cloud and a lot of data is being managed in bigger data centers I think this may be the starting of an era for new backend architectures," Venkataramani says. ®