Hope for nerds! ChatGPT's still a below-average math student

Detection algorithms also fail to distinguish between answers from real people and large language models

OpenAI's ChatGPT outperforms the average university student in computer science, according to researchers, but interestingly, not in math.

The large language model's performance also beats the average student in other subjects, including political studies, engineering, and psychology, a paper published in Scientific Reports said. In mathematics and economics, however, the chatbot technology was below par. The authors argue the findings can help inform policy on the introduction of AI tools in education.

The team led by Talal Rahwan and Yasir Zaki, both associate professors in computer science at New York University Abu Dhabi (NYUAD), found that ChatGPT outperformed students most markedly in a course called "Introduction to Public Policy," in which its average grade was 9.56 compared to an average of 4.39 for students.

The study asked faculty members at NYUAD to provide 10 questions from a course that they have taught at the university, along with three randomly chosen student answers to each question. Meanwhile, ChatGPT was used to generate three distinct answers to each of the 10 questions provided for each course.

Both students' and ChatGPT's answers were then compiled into a single document in random order, and graded by assessors who did not know which came from a person, and which from the LLM.

"We find that ChatGPT's performance is comparable, or even superior, to that of students on nine out of the 32 courses. Further, we find that current detection algorithms tend to misclassify human answers as AI-generated, and misclassify ChatGPT answers as human-generated," the study said.

However: "Again, we find that the largest performance gap between ChatGPT and students was for math-related questions, followed by trick questions. For the time being, humans seem to outperform ChatGPT in these areas."

The research also found that using an obfuscation attack designed to evade AI filters would make detection algorithms fail to detect 95 percent of ChatGPT answers.

To understand how AI might by perceived in educational environments, the researchers surveyed students and staff in Brazil, India, Japan, the UK, and the US.

They found 57 percent of students plan to use ChatGPT to assist with their assignments and 64 percent expect their peers to do the same. However, 69 percent of professors said they planned to treat the use of ChatGPT as plagiarism.

While students and their professors seem to disagree on the use of ChatGPT in college, it leveled the playing field for non-native English speakers.

"There is a general consensus between educators and students that the use of ChatGPT in school work should be acknowledged, and that it will increase the competitiveness of students who are non-native English speakers," the paper said. ®

More about


Send us news

Other stories you might like