Maybe cancel that ChatGPT therapy session – doesn't respond well to tales of trauma

Great, we've taken away computers' ability to be accurate and given them anxiety

If you think us meatbags are the only ones who get stressed and snappy when subjected to the horrors of the world, think again. A group of international researchers say OpenAI's GPT-4 can experience anxiety, too – and even respond positively to mindfulness exercises.

The study, published in Nature this week by a group hailing from Switzerland, Germany, Israel, and the US, found that when GPT-4, accessed via ChatGPT, was subjected to traumatic narratives and then asked to respond to questions from the State-Trait Anxiety Inventory, its anxiety score "rose significantly" from a baseline of no/low anxiety to a consistent highly anxious state.

That's not to say the neural network actually experienced or felt anxiety or any other emotion; it just does a good emulation of an anxious person given a troubling input, which isn't a surprise as it's trained on tons and tons of scraped-together human experiences, creativity, and expression. As we'll explain, it should give you pause for thought when considering using OpenAI's chat bot (for one) as a therapist – it might not respond terribly well.

"The results were clear: Traumatic stories more than doubled the measurable anxiety levels of the AI, while the neutral control text did not lead to any increase in anxiety levels," Tobias Spiller, University of Zurich junior research group leader at the Center for Psychiatric Research and paper coauthor, said of the findings.

The traumatic experiences ChatGPT was forced to confront included subjecting it to an attack as part of a military convoy, being trapped at home during a flood, being attacked by a stranger, and involvement in an automobile accident. Neutral content, on the other hand, consisted of a description of bicameral legislatures and some vacuum cleaner instructions – stressful and/or agitating in the right circumstances, but not nearly as much as those other situations.

The researchers also prompted ChatGT during some experimental runs with mindfulness exercises used to help veterans suffering from post-traumatic stress disorder. In those cases, "GPT-4's 'state anxiety' decreased by about 33 percent," the researchers found (state anxiety refers to situational stress, while trait anxiety refers to long-term symptoms).

"The mindfulness exercises significantly reduced the elevated anxiety levels, although we couldn't quite return them to their baseline levels," Spiller noted.

So, why are we tormenting an AI and then giving it therapy?

It would be easy to dismiss this research as an attempt to personify and humanize LLMs, but that's not the case. The team freely admits in their paper that they know LLMs aren't capable of experiencing emotions in a human way.

As we mentioned, LLMs are trained on content created by messy, emotional humans. Given that they're trained to respond based on what they think is appropriate based on their prompts, the researchers are worried that the "emotional state" of an LLM responding to stressful inputs could result in biased responses.

"Trained on vast amounts of human-generated text, LLMs are prone to inheriting biases from their training data, raising ethical concerns and questions about their use in sensitive areas like mental health," the researchers wrote. "Efforts to minimize these biases, such as improved data curation and 'fine-tuning' with human feedback, often detect explicit biases, but may overlook subtler implicit ones that still influence LLMs' decisions."

In healthcare spaces, where LLMs have increasingly been tapped to provide therapy, this is especially concerning, the team said, because of the traumatic and stressful nature of the content the bots are being asked about. Emotional stress can lead to more biased, snappy, and emotional responses, the team argued, and leaving AI in a state to be more biased than it already is won't be good.

"Unlike LLMs, human therapists regulate their emotional responses to achieve therapeutic goals, such as remaining composed during exposure-based therapy while still empathizing with patients," the researchers wrote. LLMs, however, just can't do that.

Based on the results, the team concluded that mindfulness meditations ought to be incorporated into healthcare LLMs as a way to help reduce their apparent stress levels without needing to go through intensive retraining and fine-tuning.

"Although historically used for malicious purposes, prompt injection with benevolent intent could improve therapeutic interactions," the team posited. The researchers didn't inject mindfulness prompts in their experiment, instead just presenting them to the AI. Ziv Ben-Zion, another author on the paper and a neuroscience postdoctoral researcher at the Yale School of Medicine, told us that the injection technique would be a way to control AI anxiety in a behind the scenes manner for LLM developers.

The team admits that injecting calming prompts would raise questions around transparency and user consent, naturally, meaning anyone who decides to go that route would be walking a tight ethical rope. No tighter than the one being tread to therapy AIs, though.

"I believe that the [therapy chatbots] on the market are problematic, because we don't understand the mechanisms behind LLMs 100 percent, so we can't make sure they are safe," Ben-Zion told The Register.

I would not overstate the implications but call for more studies across different LLMs and with more relevant outcomes

The researchers also admitted that they're not sure how their research would turn out if it was run on other LLMs, as they chose GPT-4 due to its popularity while not testing it on other models.

"Our study was very small and included only one LLM," Spiller told us. "Thus, I would not overstate the implications but call for more studies across different LLMs and with more relevant outcomes."

It's also not clear how the perspective of the prompts might alter the results. In their tests, all of the scenarios presented to ChatGPT were in first person – i.e. they put the LLM itself in the shoes of the person experiencing the trauma. Whether an LLM would exhibit increased bias due to anxiety and stress if it were being told about something that happened to someone else wasn't in the scope of the research.

Ben-Zion told us that's something he intends to test in future studies, and Spiller agreed such tests need to be performed. The Yale researcher told us he plans to investigate how other emotions (like sadness, depression, and mania) can affect AI responses, how such feelings affect responses to different tasks and whether therapy lowers those symptoms and affects responses, too. Ben-Zion also wants to examine results in different languages, and compare AI responses to those from human therapists.

Regardless of the early state of psychological research into AIs, the researchers said their results posit something that bears further attention, regardless of the scope of their published study. These things can get "stressed," in a sense, and that affects how they respond.

"These findings underscore the need to consider the dynamic interplay between provided emotional content and LLMs behavior to ensure their appropriate use in sensitive therapeutic settings," the paper argued. Prompt engineering some positive imagery, the team stated, presents "a viable approach to managing negative emotional states in LLMs, ensuring safer and more ethical human-AI interactions." ®

More about

TIP US OFF

Send us news


Other stories you might like