Machine learning research in acoustics could open up multimodal metaverse

Jury still out on whether any kind of metaverse is strictly necessary

Researchers at MIT and the IBM Watson AI Lab have created a machine learning model to predict what a listener would hear in a variety of locations within a 3D space.

The researchers first used the ML model to understand how any sound in a room will propagate through the space, building up a picture of a 3D room in the same way people use sound to understand their environment.

In a paper co-authored by Yilun Du, an MIT grad student in the Department of Electrical Engineering and Computer Science (EECS), the researchers show how techniques similar to visual 3D modeling can be applied to acoustics.

But they also struggled with elements where sound and light diverge. For example, changing the location of the listener in a room can create a very different impression of the sound due to obstacles, the shape of the room, and nature of the sound, making the outcome difficult to predict.

To overcome this problem, the researchers built into their model features of acoustics. Firstly, that the source of the sound and the listener can swap places without change in what the listener hears, all other things being equal. Sound is also specifically dependent on local features such as obstacles in the way of the listener or sound.

"Most researchers have only focused on modeling vision so far. But as humans, we have multimodal perception. Not only is vision important, sound is also important. I think this work opens up an exciting research direction on better-utilizing sound to model the world," Du said.

Using the approach, the resulting neural acoustic field (NAF) model was able to randomly sample points on that grid to learn the features at specific locations. For example, proximity to a doorway strongly affects what that listener hears relative to other geometric features further away on the other side of the room.

The model was then able to predict what the listener might hear from a specific acoustic stimulus based on their relative locations in the room.

"By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds," the paper said [PDF]. "We demonstrate that the continuous nature of NAFs enables us to render spatial acoustics for a listener at an arbitrary location, and can predict sound propagation at novel locations."

Chuang Gan, a principal research staff member at the MIT-IBM Watson AI Lab who also worked on the project, said: "This new technique might open up new opportunities to create a multimodal immersive experience in the metaverse application."

We understand not all Reg readers will be excited about the above use case. ®

Similar topics


Send us news

Other stories you might like