ChatGPT will soon accept speech and images in its prompts, and be able to talk back to you

Yakety Yak - AI talks back

Update Following an upgrade, ChatGPT will allow users to upload images, speak to the chatbot, and hear it talk back.

The latest features will be rolled out to paid subscribers and enterprise customers in the next two weeks on its web, iOS, and Android apps, and later for the free version, OpenAI announced on Monday. 

With new capabilities come new ways for misuse, of course. To that end, OpenAI has also shared that they've restricted abilities to comment on specific types of images to prevent it generating inappropriate, biased, offensive personal remarks.

"Vision-based models also present new challenges, ranging from hallucinations about people to relying on the model's interpretation of images in high-stakes domains. Prior to broader deployment, we tested the model with red teamers for risk in domains such as extremism and scientific proficiency, and a diverse set of alpha testers. Our research enabled us to align on a few key details for responsible usage," OpenAI said.

"We've also taken technical measures to significantly limit ChatGPT's ability to analyze and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals' privacy."

Processing data types beyond text expands ChatGPT's capabilities significantly. For instance, users could upload images of objects like historical landmarks to learn more about them, or pictures of the inside of their fridges to show the chatbot what they could make with the ingredients they have. They can also direct ChatGPT to focus on specific parts of an image by highlighting a section manually. 

OpenAI has integrated its speech recognition model, Whisper, to give ChatGPT the ability to transcribe voice to text and added a new system to convert text to speech. Users can choose how they want the chatbot to sound from five different AI-generated voices

Spotify is using the new generative audio model to translate podcasts into different languages whilst retaining the sound of the speakers' voices, it's claimed.

For now, ChatGPT can currently only transcribe speech in English and isn't effective with other languages, especially those that don't use the Latin-based alphabet script, OpenAI explained.

Large language models are a powerful technology but they're not perfect, and are still prone to generating false information. It's probably best not rely on the chatbot to make risky decisions, like identifying mushrooms to eat, for example. As Sir Terry Pratchett put it - "All Fungi are edible. Some fungi are only edible once."

The Register has asked OpenAI for clarification on whether it would be collecting users' voices and images at all. The company previously said it wouldn't train on data from its enterprise customers or from people's conversations if they disabled their chat histories. ®

Updated to add

OpenAI has confirmed it will use data from "non-API consumer services ChatGPT or DALL-E" to train its models, unless the user opts out. The same seems to be true for Whisper.

More about


Send us news

Other stories you might like