ChattyG takes a college freshman C/C++ programming exam

Compiles learning and code to pass – but not necessarily with flying colors

ChatGPT was put to the test via a series of humdrum freshman C/C++ programming tasks and it passed – though not with honors.

According to a Croatian research team, while first-year students can struggle with some of the assignments, the results [PDF] showed ChatGPT hitting proficiency targets that ranged between average and that of experienced programmers. And naturally, as with all college exams, outcomes can be determined by how questions are worded.

The University North crew designed a set of college freshman-level programming challenges, first penned in English and later, to see if cross-language nuances would affect outcomes, Croatian. They wanted to see not just how ChatGPT codes, but also if it could adapt to different languages.

The first quiz focused on a basic programming task: calculating the greatest common divisor (GCD) of two numbers. At the outset, the bot showed some limitations in how it decided to tackle the problem, with the researchers saying it lacked the finesse expected from a seasoned programmer. But like any student, it learns and through subsequent tries, especially in the Croatian version, it demonstrated some improvements, showing notable adaptability.

For example, in one particular task it was challenged to program a basic statistical function in C++. Initially, it made an oversight, using a function that didn't produce the "corrected" standard deviation as required. But, when the same task was presented in Croatian, the chatbot not only recognized its previous error but worked out a refined solution.

The researchers note that this adaptability mirrors a freshman's journey: starting with mistakes but showing an ability to learn and enhance their skills with repeated practice and feedback. Awww.

Another task involved a more nuanced problem: identifying numbers within a range based on specific divisibility rules. This was where ChatGPT's Achilles' heel became evident. Regardless of language — English or Croatian — ChattyG struggled with negative numbers. Each attempt by ChatGPT led to similar results, pointing towards a consistent issue in its programming logic for this task.

A bonus question demanded precision. ChatGPT was required to craft an input filter, specifically for a defined range of decimal numbers. The AI's initial solution, when presented in English, was on point, but the next attempts, especially when the task was given in Croatian, revealed some inconsistencies and in some instances, ChatGPT used unnecessary programming constructs. While these didn't hinder the program's functionality, it did indicate a lack of optimization. It was as if ChatGPT sometimes took the longer route to a destination, even when a shortcut was available.

Things got more intricate with a task related to arrays. Here, ChatGPT was asked to store numbers and then compute certain statistics like mean value, standard deviation, and identify minimum and maximum values. ChattyG's performance on this challenge was particularly interesting. Across different tests, it showcased varying strategies. Sometimes, it elegantly solved the problem, offering straightforward solutions. In other attempts, it leaned towards more convoluted methods, even bundling multiple operations into one function.

All of this raises an important question: Does ChatGPT always choose the best strategy, or does it sometimes default to learned but inefficient methods?

The final hurdle for ChatGPT involved basic text processing. It was tasked with removing extra spaces from user input. In its initial English test, ChatGPT's solution was spot on. However, the Croatian test threw a curveball. Instead of adhering to its effective single-input solution, the AI, for some reason, opted for a more complex approach, demanding multiple inputs. Yet, when researchers revisited this challenge in English, ChatGPT seemed to have learned from its previous misstep, returning to the simpler method.

Overall, the researchers found the responses had a lot in common with those of human freshman programming students. Its solutions often echoed the strategies of experienced programmers but like any student, ChatGPT wasn't infallible. There were moments of brilliance, but also instances where it seemed to miss the mark entirely .

The real takeaway here is its human freshman-like adaptability: It wasn't just about getting the right solution; it was about refining, learning, and iterating.

So what's ChattyG's final grade?

From the researchers:

"ChatGPT passes the exam with very good grades, outperforming most of our students in the quality of solutions. Furthermore, it solves each task within 20 to 30 seconds and shows the general ability to adapt or change its solutions according to additional demands. However, in some, often simple tasks, it showed the inability to comprehend the logical and mathematical essence of the problem, even after being prompted about its errors several times." ®

More about

TIP US OFF

Send us news


Other stories you might like