This article is more than 1 year old

US Army develops natural-language voice-command AI for robots, tanks, etc. For search'n'rescue. For now

Judge, JUDI, and executioner?

The US Army is experimenting with machine-learning software that could be used to give tanks, trucks, and robots the ability to follow verbal orders from soldiers and communicate in natural language on the battlefield.

Researchers at the Army's Combat Capabilities Development Command (aka DEVCOM) and the University of Southern California have been working on JUDI: a Joint Understanding and Dialogue Interface. This is a conversational AI system for military machines.

As you might expect, unlike chatty digital assistants like Apple’s Siri or Amazon’s Alexa, JUDI isn’t designed for small talk about birthdays and the weather. It's trained to respond to spoken orders and handle situations military personnel are likely to find themselves in.

“It contrasts directly with current text-based chatbots and intelligent personal assistants in that it involves task-oriented dialogue with robots that are situated in the physical world and reason over their real-time sensory perceptions,” Felix Gervits, a computer scientist at DEVCOM, told El Reg on Tuesday.

“We employed a statistical classification technique for enabling conversational AI using state-of-the-art natural language understanding and dialogue management technologies. The statistical language classifier enables autonomous systems to interpret the intent of a soldier by recognizing the purpose of the communication and performing actions to realize the underlying intent."

US Army handout pic of its JUDI conversational robot

An AI-driven robot on a search'n'rescue exercise ... Source: 1st Lt Angelo Mejia / US Army. Click to enlarge

We're told the system was trained to recognize speech from the dialogue of soldiers recorded during a search-and-rescue task, and was tested by transferring it to a robot deployed in a similar exercise, with the expectation it would understand and carry out verbal natural-language commands. To achieve this, the classifier must be able to map recognized commands to the necessary control signals.

For example, the system should be able to understand the sentence, “turn 45 degrees and send a picture," and duly rotate itself to snap and transmit a photo from its attached camera.

“The task involved the robot navigating an urban environment and communicating with a remotely-located human, as in an urban search-and-rescue or disaster relief scenario,” Gervits said.

US military personnel wearing a Microsoft HoloLens

Pentagon pal Microsoft to supply US Army with 120,000+ HoloLens units in contract worth 'up to $22bn'


“We used a small, man-portable ground vehicle platform equipped with a LiDAR scanner and RGB camera. It was capable of carrying out movement instructions, sending pictures upon request, providing status updates, and requesting clarification for unclear instructions."

The team hopes to improve JUDI by mapping more natural-language commands to a machine’s capabilities, and getting the software to learn from more examples of real-world dialogue. And they want to integrate computer vision with natural-language processing, such as getting robots to analyze gestures or even facial expressions.

Future machines employing this technology will need to be equipped with a microphone to pick up speech, a computer to process that information, and a display screen. The goal is to have robots and vehicles capable of communicating fast and instinctively.

“The current focus is on search-and-rescue tasks, but other collaborative, remote tasks such as reconnaissance, transport, and surveillance are also possible,” Gervits concluded.

The researchers are planning to describe their conversational AI interface in an upcoming paper. ®

More about


Send us news

Other stories you might like