VR Audio: What you need to know about Ambisonics

By Claudio Santos

The explosion of virtual reality as a new entertainment medium has been largely discussed in the filmmaking community in the past year, and there is still no consensus about what the future will hold for the technology. But regardless of the predictions, it is a fact that more and more virtual reality content is being created and various producers are experimenting to find just how the technology fits into the current market.

Out of the vast possibilities of virtual reality, there is one segment that is particularly close to us filmmakers, and that is 360 videos. They are becoming more and more popular on platforms such as YouTube and Facebook and present the distinct advantage that —  beside playing in VR headsets, such as the GearVR or the DayDream — these videos can also be played in standalone mobile phones, tablets and stationary desktops. This considerably expands the potential audience when compared to the relatively small group of people who own virtual reality headsets.

But simply making the image immerse the viewer into a 360 environment is not enough. Without accompanying spatial audio the illusion is very easily broken, and it becomes very difficult to cue the audience to look in the direction in which the main action of each moment is happening. While there are technically a few ways to design and implement spatial audio into a 360 video, I will share some thoughts and tips on how to work with Ambisonics, the spatial audio format chosen as the standard for platforms such as YouTube.

VR shoot in Bryce Canyons with Google for the Hidden Worlds of the National Parks project. Credit: Hunt Beaty Picture by: Hunt Beaty

First, what is Ambisonics and why are we talking about it?
Ambisonics is a sound format that is slightly different from your usual stereo/surround paradigm because its channels are not attached to speakers. Instead, an Ambisonics recording actually represents the whole spherical soundfield around a point. In practice, it means that you can represent sound coming from all directions around a listening position and, using an appropriate decoder, you can playback the same recording in any set of speakers with any number of channels arranged around the listener horizontally or vertically. That is exactly why it is so interesting to us when we are working with spatial sound for VR.

The biggest challenge of VR audio is that you can’t predict which direction the viewer will be looking at in any given time. Using Ambisonics we can design the whole sound sphere and the VR player decodes the sound to match the direction of the video in realtime, decoding it into binaural for accurate headphone playback. The best part is that the decoding process is relatively light on processing power, which makes this a suitable option for mediums with limited resources such as smartphones.

In order to work with Ambisonics we have two options: to record the sound on location with an Ambisonics microphone, which gives us a very realistic representation of the sound in the location and is very well suited to ambiance recordings, for example; or we can encode other sound formats such as mono and stereo into Ambisonics and then manipulate the sound in the sphere from there, which gives us great flexibility in post production to use sound libraries and create interesting effects by carefully adjusting the positioning and width of a sound in the sphere.

Example: Mono “voice of God” placement. The left shows the soundfield completely filled, which gives the “in-head” illusion.

There are plenty of resources online explaining the technical nature of Ambisonics, and I definitely recommend reading them so you can better understand how to work with it and how the spatiality is achieved. But there aren’t many discussions yet about the creative decisions and techniques used in sound for 360 videos with Ambisonics, so that’s what we will be focusing on from now on.

What to do with mono “in-head” sources such as VO?
That was one of the first tricky challenges we found with Ambisonics. It is not exactly intuitive to place a sound source equally in all directions of the soundfield. The easiest solution comes more naturally once you understand how the four channels of the Ambisonics audio track interact with each other.

The first channel of the ambisonics audio, named W, is omnidirectional and contains the level information of the sound. The other three channels describe the position of the sound in the soundfield through phase relationships. Each one of the channels represents one dimension, which enables the positioning of sounds in three dimensions.

Now, if we want the sound to play at the same level and centered from every direction, what we want is for the sound source to be at the center of the soundfield “sphere,” where the listeners head is. In practice, that means that if you play the sound out of the first channel only, with no information into either of the other three channels, the sound will play “in-head.”

What to do with stereo non-diegetic music?
This is the natural question that follows the one of knowing what to do with mono sources. And the answer is a bit trickier. The mono, first channel trick doesn’t work perfectly with stereo sources because for that to work you would have to first sum the stereo to mono, which might be undesirable depending on your track.

If you want to maintain the stereo width of the source, one good option we found was to mirror the sound in two directions. Some plug-in suites, such as the Ambix VST, offer the functionality to mirror hemispheres of the soundfield. That could also be accomplish with careful positioning of a copy of the source, but this will make things easier.

Example of sound paced in the “left” of the soundfield in ambisonics.

Generally, what you want is to place the center of the stereo source in the focus of the action your audience will be looking at and mirror the top-bottom and the front-back. This will keep the music playing at the same level regardless of the direction the viewer looks at, but will keep the spatiality of the source. The downside is that the sound is not anchored to the viewer, so changes in direction of the sources will be noted as the viewer turns around, notably inverting the sides when looking at the back. I usually find this to be an interesting effect nonetheless, and it doesn’t distract the audience too much. If the directionality is too noticeable you can always mix a bit of the mono sum of the music into both channels in order to reduce the perceived width of the track.

How to creatively use reverberation in Ambisonics?
There is a lot you can do with reverberation in Ambisonics and this is only a single trick I find very useful when dealing with scenes in which you have one big obstacle in one direction (such as a wall), and no obstacles in the opposite direction.

In this situation, the sound would reflect from the barrier and return to the listener from one direction, while on the opposite side there would be no significant reflections because of the open field. You can simulate that by placing a slightly delayed reverb coming from the direction of the barrier only. You can adjust the width of the reflection sound to match the perceived size of the barrier and the delay based on the distance the barrier is from the viewer. In this case the effect usually works better with drier reverbs with defined early reflections but not a lot of late reflections.

Once you experiment with this technique you can use variations of if to simulate a variety of spaces and achieve even more realistic mixes that will fool anyone into believing the sounds you placed in post production were recorded on location.

Main Caption: VR shoot in Hawaii with Google for the Hidden Worlds of the National Parks project. Credit: Hunt Beaty.

Claudio Santos is a sound editor at Silver Sound/SilVR in New York.

2 thoughts on “VR Audio: What you need to know about Ambisonics

  1. Wylie Stateman

    Thank you Claudio, very much enjoyed your article.
    As a fellow sound designer, I share your enthusiasm for B-Format recordings and the ambisonic effect. Some, if not all of our colleagues are experimenting with Immersive sound design, mostly with ”mixed” results.
    I am still constantly amazed by how different the picture side capture process is from that of conventional location sound capture.
    While immersive audio is the leading edge of our creative future, we are in its infancy, working on basic common language in the discussion of sound in VR space. Our friends at Dolby are informing this discussion by describing it this way: head relative (as you stated fixed “in head”), scene relative, and gaze relative audio experiences.
    It’s reassuring to see a good variety of audio design success, as well as software plug-ins that deal with the manipulation of ambisonic sound. Acoustical space has never been more desired or, more relevant to the basic enjoyment of truly compelling immersive content.
    Again thank you for your insight.
    Wylie Stateman

    1. Claudio Santos

      Thank you for your comment Wylie, it’s always great to hear of others who share the enthusiasm for this new creative tool in sound. It’s true the lack of a common language is still a problem. I always find myself juggling weird descriptions and gesticulations when trying to explain what I’m trying to do in immersive sound. Hopefully soon we’ll all adopt a common vocabulary, which will surely make sharing ideas and techniques a lot easier. I find it especially problematic that there is little distinction in the terms used to describe VR sound design for linear media using fixed formats such as Ambisonics and interactive experiences that use object audio to it’s full potential.
      I am especially excited about the “gaze relative” audio experience. I believe that really opens up a lot of creative possibilities and allows us to add details and guiding sounds that would clutter the scene if played regardless of the viewers gaze. I am especially curious about how that would develop when the technology embraces a functional level of eye tracking. But that is still a bit far fetched as far as I know.
      Thanks again,
      Claudio Santos


