OpenAI gets to the Point•E with open source text-to-3D model

Designers' jobs to crash or GLIDE

OpenAI has extended the capabilities of its text-to-image software from two dimensions into three with the release of Point•E, an open source project that produces 3D images from text prompts.

The AI research firm has attracted considerable attention for its DALL•E software, which like rival projects Stable Diffusion and Midjourney can generate realistic or fantastical images from descriptive text.

While Point•E shares the bullet point symbol used in OpenAI's DALL•E branding, it relies on a different machine learning model called GLIDE. And presently, it's not nearly as capable. Given a text directive like "a traffic cone," Point•E produces a low-resolution point cloud – a set of points in space – that resembles a traffic cone.

Sample Point•E images

Sample Point•E images - Click to enlarge

The result is nowhere near the quality of a commercial 3D rendering in a film or video game. But it's not supposed to be. Point clouds represent an intermediate step – once fed into an 3D application like Blender, they can be turned into textured meshes which look more like familiar 3D imagery.

Sample Point•E images converted to mesh

Sample Point•E images converted to mesh - Click to enlarge

"While our method still falls short of the state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade-off for some use cases," explain OpenAI researchers Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen in a paper [PDF] describing the project.

The point of Point•E is that it "generates point clouds efficiently" – that's where the "E" comes from in this case. It can produce 3D models using only one to two minutes of GPU time, compared to state-of-the-art methods that require multiple GPU hours to create a finished rendering. It's significantly faster than Google's DreamFusion text-to-3D model – 600x by one estimate.

But Point•E not a commercially-ready project. It's foundational research that may, eventually, lead to on-demand, rapid 3D model creation. With further work, it may make virtual world creation easier and more accessible to those without professional 3D graphics skills. Or perhaps it will help simplify the process of creating 3D printed objects – Point•E supports the creation of point clouds for use in product fabrication.

"This has implications both when the models are used to create blueprints for dangerous objects and when the blueprints are trusted to be safe despite no empirical validation," the authors observe.

There are other potential problems that need to be ironed out. For example, like DALL•E, Point•E is expected to contain biases inherited from its training dataset.

And that dataset – several million 3D models and associated metadata of unspecified provenance – comes without any guarantee that the source models were used with permission or according to any applicable licensing terms. That could prove to be a big headache, legally.

There's already an issue posted to the Point•E GitHub repo asking for more information about the dataset. South Korean AI developer Doyup Lee observes, "I think that many researchers are also curious about the details of training data & data gathering process."

The AI community's cavalier attitude about training machine learning models using the work of others without explicit permission has already fueled an infringement claim against Github Copilot, a service that suggests programming code to developers using OpenAI's Codex model. Text-to-image models may be similarly tested as they get commercialized. ®


Similar topics


Send us news

Other stories you might like