This article is more than 1 year old

Instant NeRF turns 2D photos into 3D scenes in seconds

Nvidia says it has sped up NeRF rendering to mere tenths of a millisecond

Nvidia has hashed out a new approach to neural radiance field (NeRF) technology that will generate a fully rendered 3D scene from just a few still photos, all in a matter of seconds, including model training time.

NeRFs themselves were created in 2020 as a method "for synthesizing novel views of complex scenes" based on only a few still photos tagged with 5D coordinates including spatial location and viewing direction. 

Nvidia's Instant NeRF doesn't change the underlying NeRF algorithms; rather, it takes that existing concept and speeds it via a novel model input method with dramatic speedups in both training and inferencing when pumped through one of the company's top-end GPUs.

Nvidia describes NeRF as an inverse rendering process that uses AI to model the real-world behavior of light, then using those models to create the rest of the 3D scene with images captured from various angles.

NeRFs work best with still photos taken in close succession, and without any moving elements, which NeRFs still render as blurry. "A NeRF essentially fills in the blanks, training a small neural network to reconstruct the scene by predicting the color of light radiating in any direction, from any point in 3D space," Nvidia said, noting that it even works when objects are occluded by pillars or other objects. 

Nvidia min-maxes NeRF

Early NeRF rendering was already relatively fast with 3D generation times in the few minutes (versus seconds), but that was just for the inference step. The real delay came from the training of the NeRF neural network, which Nvidia said could take days. Now, Nvidia says training times have been crunched down to just a few seconds on a few dozen still photos with mere tenths of a millisecond for inferencing to render the scene.

Nvidia's big NeRF breakthrough came via its development of what it refers to as multi-resolution hash grid encoding, a new input method optimized for Nvidia GPUs. Nvidia built the model using its proprietary CUDA platform used for GPU computing, and says that it's so lightweight it can run on a single Nvidia GPU, though it works best on its higher-end GPUs, including its Ampere and more recently, Hopper devices.

Youtube Video

Nvidia sees a lot of potential applications for Instant NeRF, many of which are made more apparent in the (above) demo video Nvidia showed off at GTC. Relatively simple applications could involve using Instant NeRF to design video game scenes, quick rendering of architectural designs, and near-instant digital twinning. In addition, Nvidia said Instant NeRF could be used to train robots and self-driving cars to better understand the size and shape of objects based on 2D images or video.

Going beyond NeRFs, Nvidia said that its multi-resolution hash grid encoding technique can be used to accelerate a variety of AI challenges, like reinforcement learning, language translation, and general-purpose deep learning. Its primary use may be 3D rendering, but Nvidia said that estimating depth and appearance is generally a demanding task for AI. 

"Instant NeRF could be as important to 3D as digital cameras and JPEG compression have been to 2D photography – vastly increasing the speed, ease and reach of 3D capture and sharing," claimed David Luebke, Nvidia VP for Graphics Research. ®

More about


Send us news

Other stories you might like