OpenAI opens doors to DALL-E after the horse has bolted to Midjourney and others
Ironic that an ML lab with so many accelerators is such a slowpoke
OpenAI on Wednesday made DALL-E, its cloud service for generating images from text prompts, available to the public without any waitlist. But the crowd that had gathered outside its gate may have moved on.
The original DALL-E debuted in January 2021 and was superseded by DALL-E 2 this April. The latest release, which offers much improved text-to-image capabilities, allowed people to sign up to use the service but placed aspiring AI artists on a waitlist – one that didn't move in the past five months for this Reg reporter. The newly public service is called DALL-E, although it's still version 2 of the technology.
OpenAI justified the closed list by citing the need to be cautious. The org wanted to prevent users from generating violent, hateful, or pornographic imagery, and to prevent the creation of photorealistic images of public figures. And it created policies to that effect, because abuse and misinformation are genuine concerns with machine-learning image creation technology.
"To ensure responsible use and a great experience, we'll be sending invites gradually over time," OpenAI advised beta registrants in April via email. "We'll let you know when we're ready for you."
While OpenAI was doling out access at 1,000 users per week (as of May), Midjourney – a rival AI-based text-to-image service – entered public beta in July. Midjourney's Discord server, through which users interact with the service, reportedly reached about one million users by the end of July.
That was about the number of invitations extended by OpenAI at the time, following a transition to beta testing. Midjourney's Discord server currently lists 2.7 million members, while OpenAI presently claims to have 1.5 million users.
In August, another AI image generation company called Stability.ai released its own text-to-image model called Stable Diffusion, under a permissive CreativeML Open RAIL-M license.
- David Holz, founder of AI art generator Midjourney, on the future of imaging
- Your AI-generated digital artwork may not be protected by US copyright
- Getty bans AI-generated art due to copyright concerns
- No, OpenAI's image-making DALL·E 2 doesn't understand some secret language
The result was a surge of interest in Stable Diffusion because people can run the code on a local computer, without concern for fees – OpenAI and Midjouney require payment when users have exceeded their free tier allowances.
Also, Stable Diffusion is seen as a way to create explicit images without concern for censorious cloud gatekeepers – whether or not those images comply with the limited (and unlikely to be enforced) restrictions in the Stable Diffusion license.
"In just a few days, there has been an explosion of innovation around it," wrote Simon Willison, an open source software developer, in a blog post about a week after Stable Diffusion's public release. "The things people are building are absolutely astonishing."
Late to the party
Just one month on, it looks like OpenAI is late out of the starting gate.
"DALL-E has been opened up to everyone (no waitlist)!" quipped Brendan Dolan-Gavitt, assistant professor in the computer science and engineering department at NYU Tandon, via Twitter. "It's amazing what a few weeks of competition from open source can do ;)"
"The challenge OpenAI are facing is that they're not just competing against the team behind Stable Diffusion, they're competing against thousands of researchers and engineers who are building new tools on top of Stable Diffusion," Willison told The Register.
"The rate of innovation there in just the last five weeks has been extraordinary. DALL-E is a powerful piece of software but it's only being improved by OpenAI themselves. It's hard to see how they'll be able to keep up."
Artist Ryan Murdock (@advadnoun), who helped jumpstart text-to-image AI by flipping OpenAI's CLIP prompt evaluation model around and connecting it to VQGAN, expressed similar sentiment.
"I think OpenAI is still relevant but DALL-E is not," he said in a discussion with The Register. "I see very few people using DALL-E in the scene because it costs money, is gated in terms of what it can or will produce, and can't be used with interesting new research."
Murdock also observed that the texture of DALL-E images "looks really bad because the superresolution isn't conditioned on the text."
That's one area where open source innovation has helped: among the first additions to the Stable Diffusion image generation process were two code libraries, GFPGAN and Real-ESRGAN, which handle the repair of AI face rendering errors and image upscaling respectively.
Citing the ongoing debate about image ownership – many artists are not thrilled their work was used without their consent to train these models – Murdock said that ship seems to have sailed because Stable Diffusion's models now live on people's computers. He anticipates even more pushback as these AI models evolve to generate video.
Undaunted by external developments that have commodified AI image generation, and touting more robust filtering to ensure image safety, OpenAI sees a business opportunity.
"We are currently testing a DALL-E API with several customers and are excited to soon offer it more broadly to developers and businesses so they can build apps on this powerful system," the company said. ®