AI models may not yet be safe, but at least we can make them affordable … ish

Boffins devise query language for LLMs to make them more civil and less expensive

Scientists at ETH Zurich in Switzerland believe one way to make large language models (LLMs) more affordable – and perhaps a bit more safe – is not to address them directly in a natural language like English.

Rather, they propose making LLMs more programmable, via language model programming (LMP).

ETH Zurich computer scientists Luca Beurer-​Kellner, Marc Fischer, both doctoral students, and professor Martin Vechev have developed a programming language and runtime called LMQL, which stands for Language Model Query Language.

It's a bit like SQL for LLMs.

Language model programming is intended to complement, not replace, text-based prompts. It's for simplifying interaction with language models in order to achieve a specific task – a challenge that related projects like PromptChainer, langchain, OpenPrompt, and PromptSource attempt to address.

"LMP generalizes language model prompting from pure text prompts to an intuitive combination of text prompting and scripting," the authors explain in a research paper [PDF]. "Additionally, LMP allows constraints to be specified over the language model output."

And when LLMs can be coaxed to say awful things, constraining their output has a certain appeal.

Others, like Nvidia, also appear to be convinced that taming LLM output is a goal worth pursuing. The chipmaker's NeMo Guardrails project promises "specific ways of controlling the output of a large language model, such as not talking about politics…"

LMQL allows model developers to declare logical constraints governing model output. These get turned into "token-level prediction masks" – tokens being what LLMs deal with.

Here's an example from the documentation about an LMQL query:

argmax
   """Review: We had a great stay. Hiking in the mountains was fabulous and the food is really good.
   Q: What is the underlying sentiment of this review and why?
   A:[ANALYSIS]
   Based on this, the overall sentiment of the message can be considered to be[CLASSIFICATION]"""
from
   "openai/text-davinci-003"
where
   not "\n" in ANALYSIS and CLASSIFICATION in [" positive", " neutral", " negative"]

The model output would be:

Review: We had a great stay. Hiking in the mountains was fabulous and the food is really good.⏎
Q: What is the underlying sentiment of this review and why?⏎
A: ANALYSIS⏎
Based on this, the overall sentiment of the message can be considered to be CLASSIFICATION positive

In the absence of the three constraints – positive, neutral, negative – the model might have gone off the rails and proposed some crazy sentiment descriptor like good or bad. You get the idea.

"Using LMQL, you can restrict your language model to strictly follow a specific framework you designed," Luca Beurer-​Kellner explained in an ETH Zurich press statement. "This allows you to better control how the language model behaves. Of course, full guaranteed prevention of bad behavior is still very hard to achieve, but LMQL is one step in this direction."

Constraining output is a major issue for LLMs, known for being easily coaxed to regurgitate toxic content from unvetted training data, though it's not the only one. There's also the problem of manipulative input – specifically prompt injection attacks.

Simon Willison, who spoke to The Register recently about this issue, expressed skepticism that LMQL can fully mitigate prompt trickery. "I need them to stand up and say 'specifically regarding prompt injection … this is why our technique solves it where previous efforts have failed,'" he said.

While LMQL claims to have some utility for improving LLM safety, its primary purpose appears to be saving money. The language reduces model queries and the number of billable tokens by up to 41 percent and 31 percent respectively, which in turn requires fewer computational resources.

"LMQL leverages user constraints and scripted prompts to prune the search space of an LM by masking, resulting in an up to 80 percent reduction of inference cost," the boffins state, noting that latency is also reduced.

For pay-to-use APIs like those offered by OpenAI, they project cost savings in the range of 26 to 85 percent, based on the pricing of $0.02/1k tokens of the GPT-3 davinci model. ®

More about

TIP US OFF

Send us news


Other stories you might like