Google teases AlphaCode 2 – a code-generating AI revamped with Gemini
Don't worry, your developer jobs are safe … for now
Google's latest code-generating model – AlphaCode 2, powered by its Gemini Pro system and making its public debut on Wednesday – reportedly scored above the 99.5 percentile of participants competing in programming contests online.
Researchers from Google DeepMind fine tuned Gemini Pro on a dataset to beef up its problem-solving skills to create AlphaCode 2. The dataset contained about 15,000 problems taken from CodeForces – a competitive programming site – and 30 million samples of code written by humans.
The model was fine tuned further on an additional dataset of "higher quality," but it's not really clear what kind of data was used or how much exactly, according to the scant details in the technical report [PDF]. When AlphaCode 2 was tested on 77 problems across 12 CodeForces contests – where it competed against more than 8,000 programmers in total – it managed to solve 43 percent of them. AlphaCode 2 submitted its answers in C++.
For comparison, the previous AlphaCode system solved 25 percent of a different set of problems also set by CodeForces.
"Mapping this to competition rankings, we estimate that AlphaCode 2 sits at the 85th percentile on average – ie it performs better than 85 [percent of entrants], ranking just between the 'Expert' and 'Candidate Master' categories on Codeforces," the researchers claimed.
Your jobs are safe … for now
In two contests out of the twelve in which it competed, AlphaCode 2 outperformed 99.5 percent of participants. Although impressive, the competition conditions were different for the machine and for humans.
AlphaCode 2 can submit up to ten different solutions for each problem and score points if one of them is correct – unlike the human candidates, who have one go at cracking the challenge.
AlphaCode 2 also operates very differently from biological programmers. Given a problem, it generates about a million different code samples, which are then filtered down. Random scripts that are irrelevant and don't match the problem's description – or those that generate the wrong sample test answers, or don't compile at all – are removed.
"Each competitive programming problem contains at least one public input/output test indicating how code samples should behave. We execute each code sample on the corresponding test input, and filter out all which do not produce the expected output and therefore could not have been correct," the researchers explained.
Filtering gets rid of 95 percent of code samples generated by AlphaCode 2. Next, a clustering algorithm collects ranks the 50,000 remaining programs by similarity and sorts them into different groups. The ten biggest clusters are then scored by a separate Gemini Pro model trained to predict their accuracy. The samples across the ten different clusters are then ranked from best to last, and the top one from each group is submitted.
- Google's DeepMind says its AI coding bot is 'competitive' with humans
- Microsoft touts Visual Studio Code as a Java juggernaut
- Microsoft reportedly runs GitHub's AI Copilot at a loss
Human coders usually think of different strategies to solve a problem, then home in on the most promising idea and write that up, instead of trying out millions of different solutions. Success depends on understanding the problems and coming up with clever mathematical tricks to solve them.
AlphaCode 2's brute force approach – filtering all of its code, and running the different models to score and rank the best ones – is computationally intensive, so it's probably too expensive to release until it's more efficient.
"Despite AlphaCode 2's impressive results, a lot more remains to be done before we see systems that can reliably reach the performance of the best human coders. Our system requires a lot of trial and error, and remains too costly to operate at scale. Further, it relies heavily on being able to filter out obviously bad code samples," the researchers admitted.
Still, AlphaCode 2 is a big improvement over the old AlphaCode and is more than 10,000 times more sample efficient, Google claims. It only requires 100 generated samples to reach the same performance as AlphaCode, which required a million.
Google DeepMind believes that it could build an even better code-writing model using Gemini Ultra – a larger and more powerful large language model than Gemini Pro – and said it was working to try and make its capabilities available to developers.
"We hope this kind of interactive coding will be the future of programming, where programmers make use of highly-capable AI models as collaborative tools that can help them reason about the problems, propose code designs, and assist with implementation," the team concluded.
"We are working towards bringing AlphaCode 2's unique capabilities to our foundation Gemini models as a first step to make this new programming paradigm available to everyone." ®