We just don't get enough time, contractor tasked with fact-checking Google Bard tells us
There's an old saying in business: Time, money, quality – you can have any two
Workers tasked with improving the output of Google's Bard chatbot say they've been told to focus on working fast at the expense of quality. Bard sometimes generates inaccurate information simply because there isn't enough time for these fact checkers to verify the software's output, one of those workers told The Register.
Large language models like Bard learn what words to generate next from a given prompt by ingesting mountains of text from various sources – like the web, books, and papers. But this information is complex, and sentence-predicting AI chatbots cannot tell fact from fiction. They just try their best to emulate us humans from our own work.
Hoping to make large language models like Bard more accurate, crowdsource workers are hired to assess the accuracy of the bot's responses; that feedback is then passed back into the pipeline so that future answers from the bot are of a higher quality. Google and others put humans in the loop to bump up the apparent abilities of the trained models.
Ed Stackhouse – a long-time contractor hired by data services provider Appen, working on behalf of Google to improve Bard – claims workers aren't given adequate time to analyze the accuracy of Bard's outputs.
They have to read an input prompt and Bard's responses, search the internet for the relevant information, and write up notes commenting on the quality of the text. "You can be given just two minutes for something that would actually take 15 minutes to verify," he told us. That doesn't bode well for improving the chatbot.
An example could be looking at a blurb generated by Bard describing a particular company. "You would have to check that a business was started at such and such date, that it manufactured such and such project, that the CEO is such and such," he said. There are multiple facts to check, and often not enough time to verify them thoroughly.
The input prompts are ones submitted by actual human users; the Appen contractors review the performance of the bot, effectively.
- AI is going to eat itself: Experiment shows people training bots are using bots
- Euro Parliament green lights its AI safety, privacy law
- Out with the old, in with the new – Accenture declares AI is 'mature and delivers value'
- Google warns its own employees: Do not use code generated by Bard
Stackhouse is part of a group of contract workers raising the alarm over how their working conditions can make Bard inaccurate and potentially harmful. "Bard could be asked 'can you tell me the side effects of a certain prescription?' and I would have to go through and verify each one [Bard listed]. What if I get one wrong?" he asked. "Every prompt and answer we see in our environment is one that could go out to customers – to end users."
It's not just medical issues – other topics can be risky, too. Bard spewing incorrect information on politicians, for example, could sway people's opinions on elections and undermine democracy.
Stackhouse's concerns aren't far-fetched. OpenAI's ChatGPT notably wrongly accused a mayor in Australia of being found guilty in a financial bribery case dating back to the early 2000s.
If workers like Stackhouse are unable to catch these errors and correct them, AI will continue to spread falsehoods. Chatbots like Bard could fuel a shift in the narrative threads of history or human culture – important truths could be erased over time, he argued. "The biggest danger is that they can mislead and sound so good that people will be convinced that AI is correct."
Appen contractors are penalized, we're told, if they don't complete tasks within an allotted time, and attempts to persuade managers to give them more time to assess Bard's responses haven't been successful, it is claimed.
Stackhouse is one of a group of six workers who said they were fired for speaking out, and have filed an unfair labor practice complaint with America's labor watchdog – the National Labor Relations Board – the Washington Post first reported.
The workers accuse Appen and Google of unlawful termination and interfering with their efforts to unionize. They were reportedly told they were axed on account of business conditions. Stackhouse said he found this hard to believe, since Appen had previously sent emails to workers stating that there was "a significant spike in jobs available" for Project Yukon – a program aimed at evaluating text for search engines, which includes Bard.
Appen was offering contractors additional $81 on top of base pay for working 27 hours per week. Workers are reportedly normally limited to working 26 hours per week for up to $14.50 per hour. The company has active job postings looking for Search Engine Evaluations specifically to work on Project Yukon. Appen did not respond to The Register's questions.
The group also tried to reach out to Google, and contacted senior vice president Prabahkar Raghavan – who leads the tech behemoth's search business – and were ignored.
Courtenay Mencini, a spokesperson from Google, did not address the workers' concerns that Bard could be harmful. "As we've shared, Appen is responsible for the working conditions of their employees – including pay, benefits, employment changes, and the tasks they're assigned. We, of course, respect the right of these workers to join a union or participate in organizing activity, but it's a matter between the workers and their employer, Appen," she told us in a statement.
Stackhouse, however, said: "It's their product. If they want a flawed product, that's on them." ®