Pentagon enlists Scale AI to help military get smarter

This technology is all the rage right now, still too risky for armed forces

The US Department of Defense is reportedly working with startup Scale AI to test generative AI models for military use.

Scale AI will build a framework of tools and datasets that the Pentagon can deploy to evaluate large language models. The framework will involve "measuring model performance, offering real-time feedback for warfighters, and creating specialized public sector evaluation sets to test AI models for military support applications, such as organizing the findings from after action reports," the San Francisco-based biz told DefenseScoop.

Large language models' ability to analyze and generate text could boost the Pentagon's ability to gather intelligence, plan for operations, and thus guide decision making in the field of combat.

"Imagine a world where combatant commanders can see everything they need to see to make strategic decisions," said Craig Martell, the DoD's chief digital and artificial intelligence officer.

"Imagine a world where those combatant commanders aren't getting that information via PowerPoint or via emails from across the [organization] – the turnaround time for situational awareness shrinks from a day or two to ten minutes," he declared during the Advantage DoD 2024: Defense Data and AI Symposium.

AI can quickly process large amounts of information. Military data, however, is often highly sensitive – and officials worry that if it reaches large language models, prompt injection attacks or API abuse could see it leak.

The largest barrier to military implementation of LLMs is their tendency to generate inaccurate or false information – dubbed hallucination. By bringing in Scale AI, the Pentagon believes that it can test the performance of different models to identify potential risks before it considers using them to support warfighting or intelligence.

The startup will reportedly compile "holdout datasets" that contain examples of effective responses to input prompts that would be useful for the military. Officials at the DoD can then compare different models' responses to the same prompts and assess their utility.

Last year, the DoD launched Task Force Lima – a unit led by Martell, formerly head of machine learning at ride-share biz Lyft, to investigate military applications for generative AI.

"The DoD has an imperative to responsibly pursue the adoption of generative AI models while identifying proper protective measures and mitigating national security risks that may result from issues such as poorly managed training data," Martell explained at the time. "We must also consider the extent to which our adversaries will employ this technology and seek to disrupt our own use of AI-based solutions."

Tools like ChatGPT have been temporarily banned internally, however. The US Space Force told staff not to use the software, out of fears that military secrets could be revealed or extracted.

Scale AI declined to comment on the matter. ®

More about


Send us news

Other stories you might like