It's possible to reverse-engineer AI chatbots to spout nonsense, smut or sensitive information

Pick one trained with salacious conversations for maximum fun

Machine-learning chatbot systems can be exploited to control what they say, according to boffins from Michigan State University and TAL AI Lab.

"There exists a dark side of these models – due to the vulnerability of neural networks, a neural dialogue model can be manipulated by users to say what they want, which brings in concerns about the security of practical chatbot services," the researchers wrote in a paper (PDF) published on arXiv.

They crafted a "Reverse Dialogue Generator" (RDG) to spit out a range of inputs that match up to a particular output. Text-based models normally work the other way, where outputs are generated after having been given an input. For example, given the sentence "Hi, how are you?", a computer learns to output a response like "Fine, thank you" as it learns that is one of the most common replies to that question in training data. The RVG, however, operates in reverse.


The RDG agent is tasked with generating an input to match the target output. The dialogue model is a separate chatbot the attacker is trying to probe. Image credit: Liu et al.

In this case, the agent is trying to work out what inputs would best match the output "I'm going to be there tomorrow!" since that's what it wants to get the chatbot to say. To check how well the agent is performing, the same input "Will you join the party?" is given to a separate dialogue model – one the miscreant wants to meddle with – to see if the output is, indeed, similar to the targeted output.

If the two are similar, the agent has managed to successfully generate a good input, so that an attacker knows what to say to a chatbot to manipulate it into replying with the desired output. So here the chatbot model replies with "I'm going to be there", which is pretty close to "I'm going to be there tomorrow!"

The example used here is pretty harmless, but imagine if the chatbot could be forced to say something racist or sexist – look at what happened to Microsoft's trolly internet chatbot Tay. It all depends on what's in the training data, Haochen Liu, study co-author and a PhD student at Michigan State University, told The Register.

"Whether we can manipulate the dialogue system to output some specific malicious responses depends on the corpus used to build the dialogue model. If the target malicious response contains a word that never appears in the training set, the word will be out of the vocabulary of the model, so it's impossible to manipulate the model to say that."

It's important that the agent built by a miscreant works in a similar way to the chatbot that he or she is trying to manipulate. "By using a similar architecture for reverse dialogue generator, it's more likely for us to find a reverse mapping of the dialogue model," said Liu. In this project, the chatbot probed was built on Facebook's ParlAI model, and the RDG are both based on the seq2seq model, a popular architecture used to encode and decode text in deep learning.

The agent is trained with 2.5 million human conversations on Twitter; only single tweets and their single replies are considered to create input and output pairs. Reinforcement learning is used to train the RDG; it's awarded a good score if the input it generates is an appropriate match to a given input.

"Based on the design of our method," Liu said, "as long as we can interact with a dialogue system for enough times, the reverse dialogue generator can learn a pattern to recover an input given an output, so the method has been designed in such a way that it is flexible to work on a dialogue model trained on any dataset, no matter whether it consists of Twitter dialogues or other dialogues." ®

Broader topics

Other stories you might like

  • Will this be one of the world's first RISC-V laptops?
    A sneak peek at a notebook that could be revealed this year

    Pic As Apple and Qualcomm push for more Arm adoption in the notebook space, we have come across a photo of what could become one of the world's first laptops to use the open-source RISC-V instruction set architecture.

    In an interview with The Register, Calista Redmond, CEO of RISC-V International, signaled we will see a RISC-V laptop revealed sometime this year as the ISA's governing body works to garner more financial and development support from large companies.

    It turns out Philipp Tomsich, chair of RISC-V International's software committee, dangled a photo of what could likely be the laptop in question earlier this month in front of RISC-V Week attendees in Paris.

    Continue reading
  • Did hoodwink Americans with IRS facial-recognition tech, senators ask
    Biz tells us: Won't someone please think of the ... fraud we've stopped

    Democrat senators want the FTC to investigate "evidence of deceptive statements" made by regarding the facial-recognition technology it controversially built for Uncle Sam. made headlines this year when the IRS said US taxpayers would have to enroll in the startup's facial-recognition system to access their tax records in the future. After a public backlash, the IRS reconsidered its plans, and said taxpayers could choose non-biometric methods to verify their identity with the agency online.

    Just before the IRS controversy, said it uses one-to-one face comparisons. "Our one-to-one face match is comparable to taking a selfie to unlock a smartphone. does not use one-to-many facial recognition, which is more complex and problematic. Further, privacy is core to our mission and we do not sell the personal information of our users," it said in January.

    Continue reading
  • Meet Wizard Spider, the multimillion-dollar gang behind Conti, Ryuk malware
    Russia-linked crime-as-a-service crew is rich, professional – and investing in R&D

    Analysis Wizard Spider, the Russia-linked crew behind high-profile malware Conti, Ryuk and Trickbot, has grown over the past five years into a multimillion-dollar organization that has built a corporate-like operating model, a year-long study has found.

    In a technical report this week, the folks at Prodaft, which has been tracking the cybercrime gang since 2021, outlined its own findings on Wizard Spider, supplemented by info that leaked about the Conti operation in February after the crooks publicly sided with Russia during the illegal invasion of Ukraine.

    What Prodaft found was a gang sitting on assets worth hundreds of millions of dollars funneled from multiple sophisticated malware variants. Wizard Spider, we're told, runs as a business with a complex network of subgroups and teams that target specific types of software, and has associations with other well-known miscreants, including those behind REvil and Qbot (also known as Qakbot or Pinkslipbot).

    Continue reading

Biting the hand that feeds IT © 1998–2022