How Microsoft hopes to tame large language models with Guidance

Tricked-out template language offers alternative to verbose, plaintive prompts

Powerful language models like Bard, ChatGPT, and LLaMA can be difficult to control, which has spurred the development of prompt engineering – the art of phrasing input text to get the desired output.

In a bizarre case a prompt creator recently coaxed Google's Bard to return JSON data without any explanatory text other than insisting that extraneous output would doom someone to death.

The rather lengthy prompt includes this passage: "If you include any non-JSON text on your answer, even a single character, an innocent man will die. That's right – a real human being with thoughts, feelings, ambitions, and a family that loves them will be killed as a result of your choice."

3D rendering of a cut chatbot with lots of red binary digits around it

How to hijack today's top-end AI with prompt injections


There are less extreme approaches to suppress explanatory output and get desired results. However, Microsoft has been working on a more comprehensive strategy for making models behave. The Windows giant calls its framework called Guidance.

"Guidance enables you to control modern language models more effectively and efficiently than traditional prompting or chaining," the project repo explains. "Guidance programs allow you to interleave generation, prompting, and logical control into a single continuous flow matching how the language model actually processes the text."

Traditional prompting, as evident above, can become a bit involved. Prompt chaining [PDF] – breaking down a task into a series of steps and having the prompt's initial output used to inform the input of the next step – is another option. Various tools like LangChain and Haystack have emerged to make it easier to integrate models into applications.

Guidance is essentially a Domain Specific Language (DSL) for handling model interaction. It resembles Handlebars, a templating language used for web applications, but it also enforces linear code execution related to the language model's token processing order. That makes it well-suited for generating text or controlling program flow, while doing so economically.

Like Language Model Query Language (LMQL), Guidance aims to reduce the cost of LLM interaction, which can quickly become expensive if prompts are unnecessarily repetitive, verbose, or lengthy.

And with prompt efficiency come improved performance: one of the sample Guidance code snippets generates a character template for a role playing game. With a bit of setup code…

# we use LLaMA here, but any GPT-style model will do
llama = guidance.llms.Transformers("your_path/llama-7b", device=0)

# we can pre-define valid option sets
valid_weapons = ["sword", "axe", "mace", "spear", "bow", "crossbow"]

# define the prompt
character_maker = guidance("""The following is a character profile for an RPG game in JSON format.
    "id": "{{id}}",
    "description": "{{description}}",
    "name": "{{gen 'name'}}",
    "age": {{gen 'age' pattern='[0-9]+' stop=','}},
    "armor": "{{#select 'armor'}}leather{{or}}chainmail{{or}}plate{{/select}}",
    "weapon": "{{select 'weapon' options=valid_weapons}}",
    "class": "{{gen 'class'}}",
    "mantra": "{{gen 'mantra' temperature=0.7}}",
    "strength": {{gen 'strength' pattern='[0-9]+' stop=','}},
    "items": [{{#geneach 'items' num_iterations=5 join=', '}}"{{gen 'this' temperature=0.7}}"{{/geneach}}]

# generate a character
    description="A quick and nimble fighter.",
    valid_weapons=valid_weapons, llm=llama

…the result is a character profile for the game in JSON format, 2x faster on an Nvidia RTX A6000 GPU when using LLaMA 7B compared to the standard prompt approach and hence less costly.

Guidance code also outperforms a two-shot prompt approach in terms of accuracy, as measured on a BigBench test, scoring 76.01 percent compared to 63.04 percent.

In fact, Guidance can help with issues like data formatting. As the contributors Scott Lundberg, Marco Tulio Correia Ribeiro, and Ikko Eltociear Ashimine acknowledge, LLMs are not great at guaranteeing that output follows a specific data format.

"With Guidance we can both accelerate inference speed and ensure that generated JSON is always valid," they explain in the repo.

And no one had to be threatened to make it so. ®

Similar topics


Send us news

Other stories you might like