Programmers! Close the StackOverflow tabs. This AI robot will write your source code for you

Think Java code-completion on steriods


Code boffins at Rice University in Texas have developed a system called Bayou to partially automate the writing of Java code with the help of deep-learning algorithms and training data sampled from GitHub.

Much of modern programming is already automated in one way or another. Anyone including a code library or copying-and-pasting from Stack Overflow is essentially replaying stored keystrokes. Integrated development environments and text editors generally often include code completion, akin to the text autocompletion in messaging apps. Then there are low-code and no-code applications that translate basic intentions into specific programming instructions.

Bayou is a bit more ambitious. It fleshes out a skeleton Java program by generating API patterns or idioms, based on a programmer-supplied query consisting of API method names and variable types.

The project, available in an online demo, is described in a recently published paper, "Neural Sketch Learning for Conditional Program Generation," scheduled to be presented next month at the Sixth International Conference on Learning Representations, a deep learning conference being held in Canada.

Bayou's creators – Vijayaraghavan Murali, Letao Qi, Swarat Chaudhuri, and Chris Jermaine – describe the system as an assistive tool that allows a human programmer to specify a limited amount of information (a label) in order to produce a functioning program.

Complex

"I see Bayou as a smarter version of the kind of code-completion that's supported by IDEs," said Murali, a computer science researcher at Rice and one of the paper's co-authors, in an email to The Register.

"Bayou can generate more complex pieces of code, such as API calls, loops, and exception handling blocks, and it does this by learning common patterns from data. Our vision for Bayou would be to have it integrated within an IDE, running in the background suggesting snippets of code as the programmer is typing in their program."

What makes this approach noteworthy is that the label is not just a stub that gets replaced with a single correct answer. Rather, the system relies on a technique the researchers call "neural sketch learning," in conjunction with type-aware combinatorial search, to come up with possible answers.

Neural sketch learning is used to train a novel neural network called a Gaussian Encoder-Decoder on a data set of sample source code. It abstracts the source code into "tree-structured syntactic models," called sketches, which remove low-level names and operations but retain the code's control structure, the order in which API methods are invoked, and the types of data supplied and returned by these methods.

The neural network uses this information to match learned models to the supplied query and returns the best matching results.

Here's an example. In this bit of Java code to read from a given file, Bayou takes the query – /// call:readLine – and fills in the calls to the appropriate API methods.

Bayout input:

1
2
3
4
5
6
7
8
import java.io.File;
public class Test {
    void read(File file) {
        {
            /// call:readLine
        }
    }  
}

Bayou output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.FileNotFoundException;
import java.io.FileReader;
public class TestIO {
  void read(File file) {
    {
      FileReader fr1;
      BufferedReader br1;
      String s1;
      try {
        fr1 = new FileReader(file);
        br1 = new BufferedReader(fr1);
        s1 = br1.readLine();
      } catch (FileNotFoundException _e) {
      } catch (IOException _e) {
      }
      return;
    }
  }
}

Murali sees room for further automation. "In the short term, we are working on supporting natural language queries in Bayou, and also providing an interactive user experience," he said. "In the longer term, we are interested in generating larger pieces of code such as a group of methods, or classes, after further research into this technology."

Bayou still has limitations. Presently, it can only handle a limited number of APIs: java.lang, java.io, and java.util. Also, it cannot manage wildcard types. And because the system is based on real-world code examples, it may miss obscure APIs not present in the training set.

"The advantage of using open-source projects in GitHub is that the patterns that Bayou learns from that data are the most common ones across a wide variety of programmers," Murali explained. "Having said that, we had to be meticulous with the quality of data that Bayou is trained on, as not all GitHub projects are of the same quality. We also had to be careful with forks and duplicates, as they would bias the patterns that Bayou ends up learning. An officially vetted corpus would mitigate such problems."

Murali said he sees automation tools as a way to make programming available to a wider set of people.

"With further advances in this technology, such as the natural language-based interface that will soon be supported by Bayou, we envision programming to be made accessible to even non-programmers," he said.

The research was funded by grants from a DARPA MUSE award and a Google Research Award. ®

Narrower topics


Other stories you might like

  • Now that's wafer thin: Some manufacturers had less than five days of chip supplies, says Uncle Sam

    Components fabbed using 40nm-plus process nodes hit hard

    Hardware manufacturers hit hardest by the global semiconductor shortage had less than five days of chips in their inventories last year – and should expect supply chain issues to continue throughout 2022 – the US Department of Commerce said this week.

    Demand for semiconductors skyrocketed during the pandemic as folks purchased more PCs, laptops, and tablets to work or learn from home, and cloud giants scaled up their backend systems to cope. Supply, however, couldn't keep up. The median inventory of semiconductor buyers in 2019 was 40 days of supply. By 2021 that figure was down to less than five days for certain key US sectors, the department said in a report, while demand was up 17 per cent.

    Production was initially slowed at factories around the world due to shelter-at-home orders as the coronavirus pandemic took hold. Some facilities had to temporarily shut down after they were hit with natural disasters, such as fires and snowstorms. But between Q2 2020 and the end of 2021 fabs were operating at over 90 per cent capacity and still couldn't meet global demand.

    Continue reading
  • Baidu's AI predictions for 2022: Autonomous driving! Quantum computing! Space! Human-machine symbiosis!

    Did a computer program tell them to write this?

    Baidu Research's AI-centric "Top 10 Tech Trends in 2022" report has outlined the Middle Kingdom megacorp's predictions for technology over the coming year.

    Baidu CTO Haifeng Wang describes AI as a "key driving force of innovation and development," thanks to rapidly evolving core technologies, cross-domain connectivity, and expanding applications.

    It's no surprise that the list focuses on AI given Baidu's business domain. The Beijing-based company's search engine captures over 70 per cent of the Chinese market while also developing other products, particularly AI research and cloud computing. The research arm takes a deeper look at its associated technologies. Think Google but Chinese.

    Continue reading
  • Nvidia reportedly prepares for un-Arm'd fight with rivals: $40bn takeover may be abandoned

    Softbank, meanwhile, remains 'hopeful' it can offload Brit chip designer

    Nvidia is quietly preparing to give up on the purchase of Arm, according to Bloomberg, after repeatedly butting heads with competition regulators amid a wave of opposition from the tech industry.

    A report by the newswire states Nvidia privately told its partners it does not expect the Arm transaction to close. The report also claims Arm's current owner SoftBank is pressing ahead with an IPO of Arm.

    The $40bn bid Nvidia lodged for Arm in September 2020 has proved controversial: Arm licences its chip designs to multiple clients and some felt that buying the company will give Nvidia the power to stifle competition.

    Continue reading

Biting the hand that feeds IT © 1998–2022