Boston Dynamics teaches robo-dog to recognise speech, respond using ChatGPT
The result is a very odd mechanical tour guide that thinks it has parents – sure, no problem
Video Totally non-evil robot-maker Boston Dynamics has taught one of its "Spot" robo-dogs to talk, by using ChatGPT.
As explained last week in a blog post, Boston Dynamics (BD) folks observed with considerable interest the advent of foundation models (FMs) and their use powering chatbots like ChatGPT. The firm therefore became interested in developing a demo of Spot using FMs to make decisions in real time.
"Large Language Models (LLMs) like ChatGPT are basically very big, very capable autocomplete algorithms; they take in a stream of text and predict the next bit of text," the post states. "We were inspired by the apparent ability of LLMs to roleplay, replicate culture and nuance, form plans, and maintain coherence over time, as well as by recently released Visual Question Answering (VQA) models that can caption images and answer simple questions about them."
A robot tour guide was chosen as good test case. "The robot could walk around, look at objects in the environment, use a VQA or captioning model to describe them, and then elaborate on those descriptions using an LLM," the droid-maker's post states. "Additionally, the LLM could answer questions from the tour audience, and plan what actions the robot should take next. In this way, the LLM can be thought of as an improv actor – we provide a broad strokes script and the LLM fills in the blanks on the fly."
A Spot-bot was therefore equipped with a speaker, microphone, and hooked up to ChatGPT and OpenAI's Whisper speech recognition API. Spot has a software development kit that makes this sort of thing possible. The post includes code fragments that show how the bot was built.
Boston Dynamics developers "wanted our robot tour guide to look like it was in conversation with the audience," so they analyzed its speech and translated that into movements of Spot’s gripping tool – "sort of like the mouth of a puppet."
"This illusion was enhanced by adding silly costumes to the gripper and googly eyes."
You can be the judge of the effectiveness of that illusion by gazing upon the image below.
And here, dear reader, is video of the robo-dog chatting – and trying to interact – with humans.
- Food robots delivering bombs? Oregon State campus shut down by 'prank'
- If you're brave enough to move fully-laden datacenter racks, here's the robot for you
- Billions of 'custobots' are coming online. Marketers may need to learn SEO for AI
- US Air Force wants $6B to build 2,000 AI-powered drones
While the above is impressive, the BD team encountered some weirdness as it worked.
"For example, we asked the robot 'who is Marc Raibert?'" – the founder, former CEO and now chair of BD. "It responded 'I don't know. Let's go to the IT help desk and ask!'. And then it did so."
"We didn't prompt the LLM to ask for help. It drew the association between the location 'IT help desk' and the action of asking for help independently," the BD post explains.
BD developers also asked Spot to identify its parents.
"It went to the 'old Spots' where Spot V1 and Big Dog are displayed in our office and told us that these were its 'elders'," the post reveals, not at all creepily.
"We were also surprised at just how well the LLM was at staying 'in character' even as we gave it ever more absurd 'personalities'," the post continues. "We learned right away that 'snarky' or 'sarcastic' personalities worked really well; and we even got the robot to go on a 'bigfoot hunt' around the office, asking random passerby whether they'd seen any cryptids around."
The bot also highlighted some of ChatGPT's known flaws. Prompts for info about BD's "Stretch" logistics bot produced a response that its purpose is yoga. A six-second or longer span between question and answer made for stilted conversation. "It's also susceptible to OpenAI being overwhelmed or the internet connection going down," the post states.
BD folk are nonetheless enthusiastic about the results.
"Being able to assign a task to a robot just by talking to it would help reduce the learning curve for using these systems," the post states, adding "A world in which robots can generally understand what you say and turn that into useful action is probably not that far off.
"That kind of skill would enable robots to perform better when working with and around people – whether as a tool, a guide, a companion, or an entertainer." ®