LIDAR in iPhones is not about better photos – it's about the future of low-cost augmented reality
The cost of kit to capture the real world is plummeting, thanks to Apple
Column For the past six months I've been staring at the backside of my iPhone 13 Pro wondering what possessed Apple to build a Light Detection and Ranging (LIDAR) camera into its flagship smartphone.
It's not as though you need a time-of-flight depth camera, sensitive enough to chart the reflection time of individual photons, to create a great portrait. That's like swatting a fly with a flamethrower – fun, but ridiculous overkill. There are more than enough cameras on the back of my mobile to be able to map the depth of a scene – that's how Google does it on its Pixel phones. So what is Apple's intention here? Why go to all this trouble?
The answer lies beyond the iPhone, and points to what comes next.
In the earliest days of virtual reality, thirty years ago, the biggest barrier to entry was compute capacity necessary to render real-time three-dimensional graphics. Back in 1992, systems capable of real-time 3D looked like supercomputers and cost hundreds of thousands of dollars.
Then two British software engineers – one at Canon Research, another at startup RenderMorphics, both raised on the famed BBC Micro and its outré graphics capabilities – created tight, highly performant libraries to do real-time 3D rendering in software. When the first announcement of Canon's Renderware made it onto the usegroup
sci.virtual-worlds it was greeted with disbelief and disdain. Someone even quipped that a Canon researcher must have accidentally remained logged in over the weekend, so someone could send off an obviously prank post.
But Renderware was real. Along with Rendermorphics RealityLab (which hundreds of millions use today under its other name: Direct3D) it transformed the entire landscape of real-time 3D graphics. No-one needed a half-million dollar Silicon Graphics workstation for virtual reality anymore – a body blow from which the firm never recovered. Reflecting on SGI's unexpected collapse, one of my colleagues – who'd seen the future coming – delivered a quick eulogy: "Rendering happens," he said, "get used to it."
Yet it took virtual reality twenty years to catch up to the quantum leap in real-time 3D, because virtual reality is more than just drawing pretty pictures at thirty frames a second. It deeply involves the body – head and hand tracking are table stakes for any VR system. Tracking the body thirty years ago required expensive and fiddly sensors moving within a magnetic field. (For that reason, installing VR tracking systems in a building with a lot of metal components – such as a convention center held up by steel beams – was always a nightmare.)
An obvious solution for tracking was to point a camera at a person, then use computer vision techniques to calculate the orientation and position of the various body parts. While that sounds straightforward, computers in the 1990s were about a hundred times too slow to take on that task. Fortunately, by the mid 2010s, Moore's Law gave us computers a thousand times faster – more than enough horsepower to track a body, with plenty left over to run a decent VR simulation.
That's why I found myself in Intel's private demo suite at the 2017 Consumer Electronics Show in Las Vegas, wearing what was effectively a PC strapped to my forehead. This VR system had a pair outward-facing cameras that digested a continuous stream of video data, using it to track the position and orientation of my head as I moved through a virtual world – and through the demo suite. Although not yet quite perfect, the device proved that a PC had more than enough horsepower to enable sourceless, self-contained tracking. I emerged from that demo convinced that I'd seen the next great leap forward in virtual reality, which I summarized in two words: "Tracking happens."
Half a decade later, with multiple trillion-dollar companies working hard on augmented reality spectacles, we're ready to breach the next barrier. Yes, we can render any object in real time, and yes, we can track our heads and hands and bodies. But what about the world? It needs to be seen, interpreted and understood in order to be meaningfully incorporated into augmented reality. Otherwise, the augmented and the real will interpenetrate in ways reminiscent of a bad transporter accident from Star Trek.
For the computer to see the world, it must be able to capture the world. This has always been hard and expensive. It requires supercomputer-class capabilities, and sensors that cost tens of thousands of dollars … Wait a minute. This is sounding oddly familiar, isn't it?
Until just two years ago, LIDAR systems cost hundreds to thousands of dollars. Then Apple added a LIDAR camera to the back of its iPad Pro and iPhone 12 Pro. Suddenly a technology that had been rare and expensive became cheap and almost commonplace. The component cost for LIDAR suddenly dropped by two orders of magnitude – from hundreds of dollars per unit to a few dollars apiece.
Apple needed to do this because the company's much-rumored AR spectacles will necessarily sport several LIDAR cameras, feeding their M1-class SoC with a continuous stream of depth data so that the mixed reality environment managed by the device maps neatly and precisely onto the real world. As far as Apple is concerned, the LIDAR on my iPhone doesn't need to do much beyond drive component costs down for its next generation of hardware devices.
Capturing the real world is essential for augmented reality. We can't augment the real world until we've mapped it. That has always been both difficult and expensive. Today, I can look at the back of my iPhone and hear it whisper words I've long waited to hear: "Capture happens." ®