Facebook has unveiled a software toolkit to help programmers plug AI – in the form of reinforcement-learning models – into their applications.
Those achievements are arguably impressive, although are mostly research efforts. The algorithms are crafted to seek victory in computer games, where the environments are relatively simple and controlled. Application software out in the real world, in the hands of normal users, are a different, uh, ball game.
In other words, it's one thing to develop a model for an academic paper where the environment is heavily controlled, and it's quite another to produce a piece of engineered software that survives real-world use. That's not to throw academic work under a bus: production code is typically based on lab experiments and studies. It's a matter of priorities. It's pioneering proof-of-concept work versus tested and deployed software.
Facebook’s platform, dubbed Horizon, is aimed at using RL in such production environments, setting it apart from OpenAI’s Gym and DeepMind Lab, which are simulated arenas in which to research-focused models are trained and tested.
“We deployed Horizon at Facebook over the past year, improving the platform's ability to adapt RL's decision-based approach to large-scale applications, making it the first publicly available RL platform for production,” the antisocial network spokespeople announced on Thursday.
The Silicon Valley giant uses Horizon in its own apps to perform small tasks to improve user experience, such as deciding if a video shared by a friend should be played at high or low bit rate, or if it’s worth sending a push notification when a pal has posted on Facebook, or appropriate stickers and GIFS in Messenger conversations.
Good software, have a bonus
RL systems learn actions by chasing rewards. Points are given when an agent gets closer to performing a task correctly and taken away when it fails. For example, a system realizes it should play a video at a low bit rate when it is taking longer to buffer. Facebook collected petabytes worth of user data to train and deploy models on thousands of machines using Horizon.
“Going from a simulation to a real-world system requires more than a good model,” Jason Gauci, software engineer at Facebook, told The Register.
"Horizon includes a preprocessing pipeline to handle messy real-world data, and something called counterfactual policy evaluation for evaluating a model before deploying it. We also built tools to help people define the right reward. These are necessary to do RL in production, and Horizon is the first open-source platform that puts all of this together."
US Declaration of Independence labeled hate speech by Facebook botsREAD MORE
Horizon includes simulated environments and supports distributed training and exports models for production. Apache Spark is used to preprocess the data and models are trained on the data on GPUs using PyTorch, then exported with ONNX for deployment.
It includes a range of models, including Deep Q-Network (DQN), parametric DQN, and deep deterministic policy gradient (DDPG) traditionally used by researchers to develop bots to play Atari games.
“Horizon excels in environments where it is impractical to build a simulator and where offline evaluation and large-scale deployment are vital, but Horizon can be used in other domains as well,” Gauci said.
"In addition to job scheduling, learned index structures, and other infrastructure tasks, Horizon can be used by anyone who is optimizing for long-term value. For example, a ride sharing company could use RL to decide whether to wait a moment for a better route or start a driver on the current route. It will be interesting to see what people use it for."
Developers can have a go at trying out Horizon right here. It is BSD licensed, although it has the usual patent caveats: your right to use the software is terminated if you kick off a squabble over intellectual property that involves Facebook, its friends, or its code. ®