This article is more than 1 year old

Sure, Microsoft, let's put ChatGPT in control of robots

Doesn't the world have enough problems?

Video Microsoft, having committed to a "multi-year, multi-billion dollar" investment in OpenAI, is so besotted with large language models like ChatGPT that it sees such savvy software simplifying how we communicate with robots.

ChatGPT is a large language model (LLM) trained on the OpenAI GPT (Generative Pre-trained Transformer) dataset, which consists of text scraped from the web and other sources. Wedded with a chat interface, the model's ability to respond to questions semi-coherently, though not always accurately, won it a place in Microsoft's Bing search engine, and set tongues wagging that the dominance of ad-festooned, SEO-gamed, payment-propped Google Search may finally be coming to an end.

Insufficiently busy putting out fires from Bing's AI mind meld, Microsoft is now proposing ChatGPT as a way to help people direct robots in the physical world.

"Our goal with this research is to see if ChatGPT can think beyond text, and reason about the physical world to help with robotics tasks," the company said in a post on Monday. "We want to help people interact with robots more easily, without needing to learn complex programming languages or details about robotic systems."

Toward that end, Redmond's researchers have released PromptCraft, which is described as a collaborative open-source platform for sharing how to best word LLM queries and commands to robots.

It turns out you can't go straight to "Open the pod bay doors, please, Hal," if you're interacting with ChatGPT as a voice control channel for a drone. You have to set the scene for the model. It begins something like this:

Imagine you are helping me interact with the AirSim simulator for drones. At any given point of time, you have the following abilities, each identified by a unique tag. You are also required to output code for some of the requests.

Question: You can ask me a clarification question, as long as you specifically identify it saying "Question". Code: Output a code command that achieves the desired goal.

Reason: After you output code, you should provide an explanation why you did what you did.

The simulator contains a drone, along with several objects. Apart from the drone, none of the objects are movable. Within the code, we have the following commands available to us. You are not to use any other hypothetical functions.


And there are important navigational parameters that need to be specified. But after some preparation, you may get to the point where you can converse with ChatGPT and have it direct a drone to find you a drink in the surrounding environment. Or it may produce the Python code that, if there are no errors, will allow the drone to do your bidding.

Youtube Video

"ChatGPT unlocks a new robotics paradigm, and allows a (potentially non-technical) user to sit on the loop, providing high-level feedback to the large language model (LLM) while monitoring the robot’s performance," Microsoft explains. "By following our set of design principles, ChatGPT can generate code for robotics scenarios."

In other words, the same sort of not-necessarily-correct code produced by Github Copilot could be fed directly to a robot via ChatGPT to help it accomplish a specific mission.

Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor, from Microsoft Autonomous Systems and Robots Research Group, describe their attempt to direct robots via ChatGPT in a research paper [PDF] titled "ChatGPT for Robotics: Design Principles and Model Abilities."

The project defines a high-level API that ChatGPT can understand and mapping it to lower-level robot functions. Thereafter, they wrote text prompts for ChatGPT describing task goals, specifying available functions, and setting task constraints.

ChatGPT then responded by generating device-applicable code to accomplish whatever simulation goal had been set. The idea is that a person conversing with ChatGPT can bug test robot directives until they work properly.

The Microsoft boffins make it sound as if ChatGPT is capable of "spatio-temporal reasoning," based on its ability to control a robot with a camera, so it can use visual sensors to catch a basketball.

"We see that ChatGPT is able to appropriately use the provided API functions, reason about the ball’s appearance and call relevant OpenCV functions, and command the robot’s velocity based on a proportional controller," they explain in the paper.

Reasoning of that sort – having some common sense model of the world – makes it a lot easier for robots to operate effectively in a physical environment, it's argued. The autonomous vehicle industry isn't there yet and neither is ChatGPT it seems.

Just this week, a pair of researchers from University of Southern California, Zhisheng Tang and Mayank Kejriwal, released a paper via ArXiv challenging the ability of ChatGPT and DALL•E 2 to make sensible inferences about the world.

The paper, titled "A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning," concludes that the two models reason inconsistently.

With regard to ChatGPT, they found that, "although it demonstrates some level of rational decision-making, many of its decisions violate at least one of the axioms even under reasonable constructions of preferences, bets, and decision-making prompts." And sometimes, they said, ChatGPT makes the right decision for the wrong reasons.

Microsoft's boffins acknowledge that ChatGPT has limitations and they note that the model's output should not be applied to a robot unchecked.

"We emphasize that these tools should not be given full control of the robotics pipeline, especially for safety critical applications," they state in their paper. "Given the propensity of LLMs to eventually generate incorrect responses, it is fairly important to ensure solution quality and safety of the code with human supervision before executing it on the robot." ®

More about


Send us news

Other stories you might like