When you try to hire a freelancer to write SQL and all you get is incorrect AI garbage
hCaptcha researchers find that online labor platforms need work
Online labor markets like Upwork have yet to formulate meaningful policies governing the use of generative AI tools to bid for and perform posted jobs. According to machine learning firm Intuition Machines, that lack of clarity is putting these platforms at risk.
Recently, the research team at Intuition Machines' hCaptcha bot detection service set out to test whether workers bidding on jobs posted on Upwork were using generative AI tools, like ChatGPT, to automate the bidding process – a trend immediately evident by searching the community forums for these services.
"The model of these platforms is for requesters of work to get multiple bids," researchers said in a report provided to The Register. "Earnings are thus driven by the number of jobs someone bids on and the time taken to respond to a bid."
The hCaptcha report argues that this creates an incentive for those answering job solicitations to automate their side of the bidding process.
To test that theory, hCaptcha researchers created a job post with a screening question they designed to take five minutes or less for a domain expert to answer and which they knew would produce an incorrect result when answered by known LLMs.
The question was developed from an old article about anomaly detection using SQL. Prospective bidders were given a two-column sample data structure with column types and were asked to formulate a valid query to find anomalies. A common formula was suggested but was not mandatory. The resulting answer had to execute on ClickHouse, an open source database for real-time apps.
The company planned to hire those who provided correct answers to write a tutorial on the subject. The job ad stated that the answer would be verified and that no LLM would provide a valid response, so applicants should not bother submitting an LLM-generated answer.
The researchers didn't end up hiring anyone because of the 14 unique bids submitted, nine answered the screening question. Of those, all nine answers were generated by an LLM and all were incorrect, exhibiting hallucinated functions, hallucinated columns, and other errors.
"We've been working on generative AI, both use and abuse, for many years," Eli-Shaoul Khedouri, founder and CEO at Intuition Machines, told The Register. "The thing that happened in the last year was that the performance of these large language models greatly outpaced the systems in place to detect those things.
"If you think about what Upwork was doing a few years ago, they have various kinds of spam detection to prevent people from circumventing policies but they are completely ineffective when it comes to the current generation of models."
Khedouri said the hCaptcha team found this to be true on other sites too, but that data has not yet been published. "We thought this was a good way to bring some attention to the issue because it's not impossible to remediate this."
As hCaptcha discussed on its website last month, there are techniques for detecting LLM output that work.
"If you are using sort of the standard screening approaches or you are relying on the veracity of profiles or the messages sent to you to, for example, determine hiring potential, then you need to completely reevaluate your methodology," Khedouri said, "because we basically just determined that 100 percent of people are attempting to use these tools right now, which means that you're no longer measuring their performance, you're measuring performance of these models.
"In this particular case, we determined that there was no human value added. None of the people who responded had anything above what the model added."
The Register would not conclude that everyone on these platforms is using AI tools based on such a small sample, but certainly many participants are doing so.
Khedouri said while a lot of people appear to have given up trying to detect the involvement of an automated system, he doesn't believe that's warranted. "It's not like they can't do it," he insisted. "It's just that they need to actually think this is a real issue and put something in place because if they don't, the platforms that fail to respond or to do an adequate job will radically decrease in value."
The use of large language models continues to be a topic of active discussion in various online forums, though efforts to automate work predate the current AI craze. Before large language models were so capable, tales about automating one's job appeared periodically in various online posts and news articles. They were often well-received and invited spirited discussion about the ethics of disclosing that a particular set of tasks could be handled by code.
- The first real robot war is coming: Machine versus lawyer
- White House pledges $140 million for seven new AI research centers
- Slack adding generative AI to interact with colleagues, so you don't have to
- How to tell an AI bot wrote that scammy-looking tax email: No spelling mistakes
At a time when so many people are still working remotely, often with limited scrutiny, job automation now looks plausible across a wide set of tasks, thanks to the improved performance of LLMs and other machine learning models, and to the growing ease with which these models can interact with other computing and network services.
On Fiverr, another online freelancing platform, a recent post among many mulling the impact of AI models warns that the service is struggling to deal with ChatGPT. Responding to the recommendation that buyers should conduct Zoom meetings with sellers to ensure they can communicate without AI assistance, freelance writer Vickie Ito insisted the issue is not just communication but the quality of it.
"In just the past month, I have had numerous buyers come to me to correct and rewrite content written entirely by ChatGPT," said Ito, who confirmed her authorship of the post to The Register. "In all of these cases, the sellers promised that their English fluency was native-level and in all of these cases, the buyers could tell immediately that the work was useless to them."
"These buyers were then approaching me with a reduced sense of trust and needed extra 'proof' that I was, in fact, fluent in English, and that my writing would be done manually."
Fiverr did not immediately respond to a request for comment.
In January, an Upwork community manager said: "Upwork freelancers must disclose clearly to their client when artificial intelligence was used in creating content, including job proposals and dash messages."
But a month later, an Upwork community member asked for clarification about the status of ChatGPT. "On Upwork, it is going to muddle the field for clients because it is being used to hide the fact freelancers have no skills and are being deceptive," an individual identified as "Jeanne H" said.
As of March, a community manager described Upwork's policy as a recommendation and said: "At this time, Upwork does not expressly encourage or prohibit the use of AI; how you work and the tools you choose to use are for you and your clients to discuss."
The Register asked Upwork to comment on the impact of generative AI tools and on its policies about the use of those tools.
We were told that the average number of weekly search queries related to generative AI in Q1 2023 had increased 1,000 percent compared to Q4 2022, and the average number of weekly jobs posts related to generative AI increased by more than 600 percent over the same period.
"To serve this explosive demand, we have continued updating our Talent Marketplace to reflect exciting new skills and roles like prompt engineers and added new Project Catalog categories of work, bringing the total number of categories on Upwork to over 125," Upwork's spokesperson said.
"Ultimately, deciding whether generative AI tools are the right fit for a project is up to our clients and freelancers to decide for themselves and in their contract terms," Upwork's policy says. ®