The US Army Research Laboratory says it is experimenting with reinforcement learning algorithms to control swarms of drones and autonomous vehicles to overwhelm and dominate America's enemies.
“Finding optimal guidance policies for these swarming vehicles in real-time is a key requirement for enhancing warfighters' tactical situational awareness, allowing the US Army to dominate in a contested environment,” said Dr Jemin George, a scientist at the US Army Combat Capabilities Development Command, a boffinry nerve center of the US Army.
Dr George and his colleagues developed a method to control large swarms of agents by collecting them into groups using hierarchical reinforcement learning (HRL). By shifting drone control from a centralized approach to a hierarchical design, learning time for the software was cut 80 per cent, we're told.
Crucially, it means swarms of trained, unmanned equipment can be sent to particular areas with a set of instructions, and each collective maintains formation automatically among themselves to carry out those orders. Thus, human controllers won't have to worry about individual drones and vehicles, just point the groups at particular positions on a map; the machines will have learned to figure out their positioning for themselves, and as a team go where they are ordered and work together as intended, like a combat unit.
A US Army graphic depicting a hierarchical approach to ground and air autonomous vehicle coordination
“Our current HRL efforts will allow us to develop control policies for swarms of unmanned aerial and ground vehicles so that they can optimally accomplish different mission sets even though the individual dynamics for the swarming agents are unknown,” Dr George said.
The team envisions a future where self-driving robo-tanks and flying drones can work together autonomously to survey the land and the skies. “Swarms can be used for persistent surveillance and reconnaissance in dense urban terrain and perimeter defense of a forward operating base or a high-value asset,” he told The Register earlier today.
The reinforcement learning technique – described in a paper distributed via arXiv – provides a way to train multiple agents in different states. "Each hierarchy has its own learning loop with respective local and global reward functions," Dr George said. "We were able to significantly reduce the learning time by running these learning loops in parallel."
The algorithm has mainly been tested in simulation. Dr George told El Reg that tens to hundreds of machines constitute a swarm, though the team had only physically tested their method on four quadrotors in a room so far.
“Extensive test and evaluation, both in simulation and in real work using physical assets, is needed before the algorithm can be applied in real world settings,” he concluded. ®