AI + ML

ChatGPT can't pass these medical exams – yet

Maybe letting AI models loose on patients ain't a great idea, prof tells us

Wed 24 May 2023 // 07:29 UTC

ChatGPT has failed to pass the American College of Gastroenterology exams and is not capable of generating accurate medical information for patients, doctors have warned.

A study led by physicians at the Feinstein Institutes for Medical Research tested both variants of ChatGPT – powered by OpenAI's older GPT-3.5 model and the latest GPT-4 system. The academic team copy and pasted the multiple choice questions taken from the 2021 and 2022 American College of Gastroenterology (ACG) Self-Assessment Tests into the bot, and analyzed the software's responses.

Interestingly, the less advanced version based on GPT-3.5 answered 65.1 percent of the 455 questions correctly while the more powerful GPT-4 scored 62.4 percent. How that happened is hard to explain as OpenAI is secretive about the way it trains its models. Its spokespeople told us, at least, both models were trained on data dated as recent as September 2021.

In any case, neither result was good enough to reach the 70 percent threshold to pass the exams.

Arvind Trindade, an associate professor at The Feinstein Institutes for Medical Research and senior author of the study published in the American Journal of Gastroenterology, told The Register.

"Although the score is not far away from passing or obtaining a 70 percent, I would argue that for medical advice or medical education, the score should be over 95."

"I don't think a patient would be comfortable with a doctor that only knows 70 percent of his or her medical field. If we demand this high standard for our doctors, we should demand this high standard from medical chatbots," he added.

The American College of Gastroenterology trains physicians, and its tests are used as practice for official exams. To become a board-certified gastroenterologist, doctors need to pass the American Board of Internal Medicine Gastroenterology examination. That takes knowledge and study – not just gut feeling.

ChatGPT generates responses by predicting the next word in a given sentence. AI learns common patterns in its training data to figure out what word should go next, and is partially effective at recalling information. Although the technology has improved rapidly, it's not perfect and is often prone to hallucinating false facts – especially if it's being quizzed on niche subjects that may not be present in its training data.

"ChatGPT's basic function is to predict the next word in a string of text to produce an expected response based on available information, regardless of whether such a response is factually correct or not. It does not have any intrinsic understanding of a topic or issue," the paper explains.

Trindade told us that it's possible that the gastroenterology-related information on webpages used to train the software is not accurate, and that the best resources like medical journals or databases should be used.

These resources, however, are not readily available and can be locked up behind paywalls. In that case, ChatGPT may not have been sufficiently exposed to the expert knowledge.

"The results are only applicable to ChatGPT – other chatbots need to be validated. The crux of the issue is where these chatbots are obtaining the information. In its current form ChatGPT should not be used for medical advice or medical education," Trindade concluded. ®

Topics

Special Features

Vendor Voice

Resources

AI + ML

ChatGPT can't pass these medical exams – yet

Maybe letting AI models loose on patients ain't a great idea, prof tells us

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Healthcare AI won't take jobs – it'll make nursing easier, says process automation founder

Logitech intros free tool for ChatGPT prompts... plus a mouse with an AI button

AI spam is winning the battle against search engine quality

A different view from the edge

MPs ask: Why is it so freakin' hard to get AI giants to pay copyright holders?

Google Cloud chief is really psyched about this AI thing

Turns out AI chatbots are way more persuasive than humans

What's up with AI lately? Let's start with soaring costs, public anger, regulations...

How to coax ChatGPT into making better predictions: Get it to tell tales from the future

AI PCs are here but a killer application for biz users? Nope

Psst, hey. It's the NSA. You want some AI security advice?

UK unions publish AI bill to protect workers from 'risks and harms' of tech

About Us

Our Websites

Your Privacy