Tag Archives: 360 audio

VR audio terms: Gaze Activation v. Focus

By Claudio Santos

Virtual reality brings a lot of new terminology to the post process, and we’re all having a hard time agreeing on the meaning of everything. It’s tricky because clients and technicians sometimes have different understandings of the same term, which is a guaranteed recipe for headaches in post.

Two terms that I’ve seen being confused a few times in the spatial audio realm are Gaze Activation and Focus. They are both similar enough to be put in the same category, but at the same time different enough that most of the times you have to choose completely different tools and distribution platforms depending on which technology you want to use.

Field of view

Focus
Focus is what the Facebook Spatial Workstation calls this technology, but it is a tricky one to name. As you may know, ambisonics represents a full sphere of audio around the listener. Players like YouTube and Facebook (which uses ambisonics inside its own proprietary .tbe format) can dynamically rotate this sphere so the relative positions of the audio elements are accurate to the direction the audience is looking at. But the sounds don’t change noticeably in level depending on where you are looking.

If we take a step back and think about “surround sound” in the real world, it actually makes perfect sense. A hair clipper isn’t particularly louder when it’s in front of our eyes as opposed to when its trimming the back of our head. Nor can we ignore the annoying person who is loudly talking on their phone on the bus by simply looking away.

But for narrative construction, it can be very effective to emphasize what your audience is looking at. That opens up possibilities, such as presenting the viewer with simultaneous yet completely unrelated situations and letting them choose which one to pay attention to simply by looking in the direction of the chosen event. Keep in mind that in this case, all events are happening simultaneously and will carry on even if the viewer never looks at them.

This technology is not currently supported by YouTube, but it is possible in the Facebook Spatial Workstation with the use of high Focus Values.

Gaze Activation
When we talk about focus, the key thing to keep in mind is that all the events happen regardless of the viewer looking at them or not. If instead you want a certain sound to only happen when the viewer looks at a certain prop, regardless of the time, then you are looking for Gaze Activation.

This concept is much more akin to game audio then to film sound because of the interactivity element it presents. Essentially, you are using the direction of the gaze and potentially the length of the gaze (if you want your viewer to look in a direction for x amount of seconds before something happens) as a trigger for a sound/video playback.

This is very useful if you want to make impossible for your audience to miss something because they were looking in the “wrong” direction. Think of a jump scare in a horror experience. It’s not very scary if you’re looking in the opposite direction, is it?

This is currently only supported if you build your experience in a game engine or as an independent app with tools such as InstaVR.

Both concepts are very closely related and I expect many implementations will make use of both. We should all keep an eye on the VR content distribution platforms to see how these tools will be supported and make the best use of them in order to make 360 videos even more immersive.


Claudio Santos is a sound editor and spatial audio mixer at Silver Sound. Slightly too interested in technology and workflow hacks, he spends most of his waking hours tweaking, fiddling and tinkering away on his computer.

IBC: Surrounded by sound

By Simon Ray

I came to the 2016 IBC Show in Amsterdam at the start of a period of consolidation at Goldcrest in London. We had just gone through three years of expansion, upgrading, building and installing. Our flagship Dolby Atmos sound mixing theatre finished its first feature, Jason Bourne, and the DI department recently upgraded to offer 4K and HDR.

I didn’t have a particular area to research at the show, but there were two things that struck me almost immediately on arrival: the lack of drones and the abundance of VR headsets.

Goldcrest’s Atmos mixing stage.

360 audio is an area I knew a little about, and we did provide a binaural DTS Headphone X mix at the end of Jason Bourne, but there was so much more to learn.

Happily, my first IBC meeting was with Fraunhofer, where I was updated on some of the developments they have made in production, delivery and playback of immersive and 360 sound. Of particular interest was their Cingo technology. This is a playback solution that lives in devices such as phones and tablets and can already be found in products from Google, Samsung and LG. This technology renders 3D audio content onto headphones and can incorporate head movements. That means a binaural render that gives spatial information to make the sound appear to be originating outside the head rather than inside, as can be the case when listening to traditionally mixed stereo material.

For feature films, for example, this might mean taking the 5.1 home theatrical mix and rendering it into a binaural signal to be played back on headphones, giving the listener the experience of always sitting in the sweet spot of a surround sound speaker set-up. Cingo can also support content with a height component, such as 9.1 and 11.1 formats, and add that into the headphone stream as well to make it truly 3D. I had a great demo of this and it worked very well.

I was impressed that Fraunhofer had also created a tool for creating immersive content, a plug-in called Cingo Composer that could run as both VST and AAX plug-ins. This could run in Pro Tools, Nuendo and other DAWs and aid the creation of 3D content. For example, content could be mixed and automated in an immersive soundscape and then rendered into an FOA (First Order Ambisonics or B-Format) 4-channel file that could be played with a 360 video to be played on VR headsets with headtracking.

After Fraunhofer, I went straight to DTS to catch up with what they were doing. We had recently completed some immersive DTS:X theatrical, home theatrical and, as mentioned above, headphone mixes using the DTS tools, so I wanted to see what was new. There were some nice updates to the content creation tools, players and renderers and a great demo of the DTS decoder doing some live binaural decoding and headtracking.

With immersive and 3D audio being the exciting new things, there were other interesting products on display that related to this area. In the Future Zone Sennheiser was showing their Ambeo VR mic (see picture, right). This is an ambisonic microphone that has four capsules arranged in a tetrahedron, which make up the A-format. They also provide a proprietary A-B format encoder that can run as a VST or AAX plug-in on Mac and Windows to process the outputs of the four microphones to the W,X,Y,Z signals (the B-format).

From the B-Format it is possible to recreate the 3D soundfield, but you can also derive any number of first-order microphones pointing in any direction in post! The demo (with headtracking and 360 video) of a man speaking by the fireplace was recorded just using this mic and was the most convincing of all the binaural demos I saw (heard!).

Still in the Future Zone, for creating brand new content I visited the makers of the Spatial Audio Toolbox, which is similar to the Cingo Creator tool from Fraunhofer. B-Com’s Spatial Audio Toolbox contains VST plug-ins (soon to be AAX) to enable you to create an HOA (higher order ambisonics) encoded 3D sound scene using standard mono, stereo or surround source (using HOA Pan) and then listen to this sound scene on headphones (using Render Spk2Bin).

The demo we saw at the stand was impressive and included headtracking. The plug-ins themselves were running on a Pyramix on the Merging Technologies stand in Hall 8. It was great to get my hands on some “live” material and play with the 3D panning and hear the effect. It was generally quite effective, particularly in the horizontal plane.

I found all this binaural and VR stuff exciting. I am not sure exactly how and if it might fit into a film workflow, but it was a lot of fun playing! The idea of rendering a 3D soundfield into a binaural signal has been around for a long time (I even dedicated months of my final year at university to writing a project on that very subject quite a long time ago) but with mixed success. It is exciting to see now that today’s mobile devices contain the processing power to render the binaural signal on the fly. Combine that with VR video and headtracking, and the ability to add that information into the rendering process, and you have an offering that is very impressive when demonstrated.

I will be interested to see how content creators, specifically in the film area, use this (or don’t). The recreation of the 3D surround sound mix over 2-channel headphones works well, but whether headtracking gets added to this or not remains to be seen. If the sound is matched to video that’s designed for an immersive experience, then it makes sense to track the head movements with the sound. If not, then I think it would be off-putting. Exciting times ahead anyway.

Simon Ray is head of operations and engineering Goldcrest Post Production in London.