Microsoft rolls out safety tools for Azure AI. Hint: More models

Defenses against prompt injection, hallucination arrive as Feds eye ML risks

Microsoft has introduced a set of tools allegedly to help make AI models safer to use in Azure.

Since the cloud-and-code biz started shoveling funds into OpenAI and infusing its software empire with chatbot capabilities – a drama enacted with equal fervor by rivals amid grandiose promises about productivity – Microsoft has had to acknowledge that generative AI comes with risks.

The dangers are widely known and sometimes blithely brushed aside. A decade ago, Elon Musk warned that AI might just destroy humanity. Yet that concern didn't stop him from making AI available in cars, on his social media megaphone, and perhaps soon in robots.

The emergence of large language models that hallucinate and offer incorrect or harmful responses has not led to a return to the drawing board, but to the boardroom for further funding. Rather than produce a safe, ethical product, the tech industry is trying to tame feral models, or at least keep them far enough from customers that can run amok without hurting anyone.

And if that doesn't work, there's always indemnification from legal claims, subject to certain terms, from suppliers.

Industry commitments to AI safety coincide with corresponding government demands. In the US on Thursday, the White House Office of Management and Budget (OMB) issued its first government wide policy to address AI risks.

The policy requires federal agencies "to implement concrete safeguards when using AI in a way that could impact Americans’ rights or safety," by December 1. That means risk assessments, testing, and monitoring, efforts to limit discrimination and bias, and to promote transparency for AI applications touching health, education, housing, and employment.

Thus Microsoft brings word of its latest AI safety measures through Sarah Bird, chief product officer of responsible AI, a title that implies the existence of irresponsible AI – if you can imagine that.

Bird says that business leaders are trying to balance innovation and risk management, to allow them to use generative AI without being bitten by it.

"Prompt injection attacks have emerged as a significant challenge, where malicious actors try to manipulate an AI system into doing something outside its intended purpose, such as producing harmful content or exfiltrating confidential data," Bird explains in a blog post.

"In addition to mitigating these security risks, organizations are also concerned about quality and reliability. They want to ensure that their AI systems are not generating errors or adding information that isn’t substantiated in the application’s data sources, which can erode user trust."

Since safety and accuracy are not included in the AI subscription fee, Microsoft sees an opportunity to sell them as an add-on.

Customers using Azure AI Studio to help them create generative AI apps can look forward to four new tools.

First, there's Prompt Shields, which promise to help defend against prompt injection attacks. Previously known as Jailbreak Risk Detection and now in public preview, it's a way to mitigate the risk of both direct and indirect prompt meddling in foundation models.

Direct attacks involve prompts (inputs) designed to make the model ignore its safety training. Indirect attacks refer to efforts to sneak input into a model. One way to do this might be to include hidden text in an email with the knowledge that an AI model acting on behalf of the recipient through, say, Copilot in Outlook, will parse the message, interpret the hidden text as a command, and hopefully act on the instructions, doing something like silently replying with sensitive data.

Second is Groundedness Detection, a system for catching when AI models hallucinate, or make things up. It provides customers with several options when a false claim is detected, including sending the response back to be revised prior to display. Microsoft says it has accomplished this by building a custom language model that evaluates unsubstantiated claims based on source documents. So the answer to AI model safety is, you guessed it, another model.

Though this is a wonderful step towards trustworthy AI, the problem is still unsolved

Third, we have AI-assisted safety evaluations in AI Studio, which provide a testing framework for presenting prompt templates and parameters to model that tests various adversarial interactions with the customer's application. Again, it's AI to test AI.

And finally, there's "risks and safety monitoring", a feature for the Azure OpenAI Service that provides harmful content metrics.

Vinu Sankar Sadasivan, a doctoral student at the University of Maryland who helped develop the BEAST attack on LLMs, told The Register that while it's exciting to see Azure building tools to make AI more secure, adding more models into the mix expands the potential attack surface.

"Azure's safety evaluations and risk and safety monitoring tools are important for investigating the reliability of AI models," he said. "Though this is a wonderful step towards trustworthy AI, the problem is still unsolved. For instance, the Prompt Shields they introduce presumably use another AI model to detect and block indirect prompt attacks. This AI model can be vulnerable to threats such as adversarial attacks.

"Adversaries could leverage these vulnerabilities to bypass Prompt Shields. Though safety system messages have shown to be effective in some cases, existing attacks such as BEAST can adversarially attack AI models to jailbreak them in no time. While it is beneficial to implement defenses for AI systems, it's essential to remain cognizant of their potential drawbacks." ®

More about

TIP US OFF

Send us news


Other stories you might like