It's possible to reverse-engineer AI chatbots to spout nonsense, smut or sensitive information

Pick one trained with salacious conversations for maximum fun

Machine-learning chatbot systems can be exploited to control what they say, according to boffins from Michigan State University and TAL AI Lab.

"There exists a dark side of these models – due to the vulnerability of neural networks, a neural dialogue model can be manipulated by users to say what they want, which brings in concerns about the security of practical chatbot services," the researchers wrote in a paper (PDF) published on arXiv.

They crafted a "Reverse Dialogue Generator" (RDG) to spit out a range of inputs that match up to a particular output. Text-based models normally work the other way, where outputs are generated after having been given an input. For example, given the sentence "Hi, how are you?", a computer learns to output a response like "Fine, thank you" as it learns that is one of the most common replies to that question in training data. The RVG, however, operates in reverse.


The RDG agent is tasked with generating an input to match the target output. The dialogue model is a separate chatbot the attacker is trying to probe. Image credit: Liu et al.

In this case, the agent is trying to work out what inputs would best match the output "I'm going to be there tomorrow!" since that's what it wants to get the chatbot to say. To check how well the agent is performing, the same input "Will you join the party?" is given to a separate dialogue model – one the miscreant wants to meddle with – to see if the output is, indeed, similar to the targeted output.

If the two are similar, the agent has managed to successfully generate a good input, so that an attacker knows what to say to a chatbot to manipulate it into replying with the desired output. So here the chatbot model replies with "I'm going to be there", which is pretty close to "I'm going to be there tomorrow!"

The example used here is pretty harmless, but imagine if the chatbot could be forced to say something racist or sexist – look at what happened to Microsoft's trolly internet chatbot Tay. It all depends on what's in the training data, Haochen Liu, study co-author and a PhD student at Michigan State University, told The Register.

"Whether we can manipulate the dialogue system to output some specific malicious responses depends on the corpus used to build the dialogue model. If the target malicious response contains a word that never appears in the training set, the word will be out of the vocabulary of the model, so it's impossible to manipulate the model to say that."

It's important that the agent built by a miscreant works in a similar way to the chatbot that he or she is trying to manipulate. "By using a similar architecture for reverse dialogue generator, it's more likely for us to find a reverse mapping of the dialogue model," said Liu. In this project, the chatbot probed was built on Facebook's ParlAI model, and the RDG are both based on the seq2seq model, a popular architecture used to encode and decode text in deep learning.

The agent is trained with 2.5 million human conversations on Twitter; only single tweets and their single replies are considered to create input and output pairs. Reinforcement learning is used to train the RDG; it's awarded a good score if the input it generates is an appropriate match to a given input.

"Based on the design of our method," Liu said, "as long as we can interact with a dialogue system for enough times, the reverse dialogue generator can learn a pattern to recover an input given an output, so the method has been designed in such a way that it is flexible to work on a dialogue model trained on any dataset, no matter whether it consists of Twitter dialogues or other dialogues." ®

Broader topics

Other stories you might like

  • Despite global uncertainty, $500m hit doesn't rattle Nvidia execs
    CEO acknowledges impact of war, pandemic but says fundamentals ‘are really good’

    Nvidia is expecting a $500 million hit to its global datacenter and consumer business in the second quarter due to COVID lockdowns in China and Russia's invasion of Ukraine. Despite those and other macroeconomic concerns, executives are still optimistic about future prospects.

    "The full impact and duration of the war in Ukraine and COVID lockdowns in China is difficult to predict. However, the impact of our technology and our market opportunities remain unchanged," said Jensen Huang, Nvidia's CEO and co-founder, during the company's first-quarter earnings call.

    Those two statements might sound a little contradictory, including to some investors, particularly following the stock selloff yesterday after concerns over Russia and China prompted Nvidia to issue lower-than-expected guidance for second-quarter revenue.

    Continue reading
  • Another AI supercomputer from HPE: Champollion lands in France
    That's the second in a week following similar system in Munich also aimed at researchers

    HPE is lifting the lid on a new AI supercomputer – the second this week – aimed at building and training larger machine learning models to underpin research.

    Based at HPE's Center of Excellence in Grenoble, France, the new supercomputer is to be named Champollion after the French scholar who made advances in deciphering Egyptian hieroglyphs in the 19th century. It was built in partnership with Nvidia using AMD-based Apollo computer nodes fitted with Nvidia's A100 GPUs.

    Champollion brings together HPC and purpose-built AI technologies to train machine learning models at scale and unlock results faster, HPE said. HPE already provides HPC and AI resources from its Grenoble facilities for customers, and the broader research community to access, and said it plans to provide access to Champollion for scientists and engineers globally to accelerate testing of their AI models and research.

    Continue reading
  • Workday nearly doubles losses as waves of deals pushed back
    Figures disappoint analysts as SaaSy HR and finance application vendor navigates economic uncertainty

    HR and finance application vendor Workday's CEO, Aneel Bhusri, confirmed deal wins expected for the three-month period ending April 30 were being pushed back until later in 2022.

    The SaaS company boss was speaking as Workday recorded an operating loss of $72.8 million in its first quarter [PDF] of fiscal '23, nearly double the $38.3 million loss recorded for the same period a year earlier. Workday also saw revenue increase to $1.43 billion in the period, up 22 percent year-on-year.

    However, the company increased its revenue guidance for the full financial year. It said revenues would be between $5.537 billion and $5.557 billion, an increase of 22 percent on earlier estimates.

    Continue reading

Biting the hand that feeds IT © 1998–2022