Tag Archives: Claudio Santos

Review: Audionamix IDC for cleaning dialogue

By Claudio Santos

Sound editing has many different faces. It is part of big-budget blockbuster movies and also an integral part of small hobby podcasting projects. Every project has its own sound needs. Some edit thousands upon thousands of sound effects. Others have to edit hundreds of hours of interviews. What most projects have in common, though, is that they circle around dialogue, whether in the form of character lines, interviews, narrators or any other format by which the spoken word guides the experience.

Now let’s be honest, dialogue is not always well recorded. Archival footage needs to be understood, even if the original recording was made with a microphone that was 20 feet away from the speaker in a basement full of machines. Interviews are often quickly recorded in the five minutes an artist has between two events while driving from point A to point B. And until electric cars are the norm, the engine sound will always be married to that recording.

The fact is, recordings are sometimes a little bit noisier than ideal, and it falls upon the sound editor to make it a little bit clearer.

To help with that endeavor, Audionamix has come out with the newest version of their IDC (Instant Dialogue Cleaner). I have been testing it on different kinds of material and must say that overall I’m very impressed with it.Let’s first get the awkward parts of this conversation out of the way. First, let’s see what the IDC is not.

– It is not a full-featured restoration workstation, such as Izotope RX.
– It does not depend on the cloud like other Audionamix plugins.
– It is not magic.

Honestly, all that is fine because what it does do, it does very well and in a very straightforward manner.

IDC aims to keep it simple. You get three controls plus output level and bypass. This makes trying out the plugin on different samples of audio a very quick task, which means you don’t waste time on clips that are beyond salvation.
The three controls you get are:
– Strength: The aggressiveness of the algorithm
– Background: Level of the separated background noise
– Speech: Level of the separated speech

Like all digital processing tools, things sound a bit techno glitchy toward the extremes of the scales, but within reasonable parameters the plugin makes a very good job of reducing background levels without gargling up the speech too noticeably. I personally had fairly good results with strengths around 40% to 60%, and background reductions of up to -24 dB. Anything more radical than that sounded heavily processed.

Now, it’s important to make a note that not all noise is the same. In fact, there are entirely different kinds of audio muck that obscures dialogue, and the IDC is more effective against some than others.

Noise reduction comparison between original clip (1), Cedar DNS Two VST (2), Audionamix IDC (3) and Izotope RX 7 Voice Denoise (4). The clip presents loud air conditioner noise in the background of close mic’d dialogue. All plugins had their level boosted by +4dB after processing.

– Constant broadband background noise (air conditioners, waterfalls, freezers): Here the IDC does fairly well. I couldn’t notice a lot of pumping at the beginning and end of phrases, and the background didn’t sound crippled either.

– Varying broadband background noise (distant cars passing, engines from inside cars): Here again, the IDC does a good job of increasing the dialogue/background ratio. It does introduce artifacts when the background noise spikes or varies very abruptly, but if the goal is to increase intelligibility then it is definitely a success in that area.

– Wind: On this kind of noise the IDC needs a little helping hand from other processes. I tried to clean up some heavily winded dialogue, and even though the wind was indeed lowered significantly so was the speech under it, resulting in a pumping clip that went up and down following the shadow of the removed wind. I believe with some pre-processing using high pass filters and a little bit of limiting the results could have been better, but if you are emergency buying this to clean up bad wind audio I’d definitely keep that in mind. It does work well on light wind reduction, but in those cases as well it seems it benefits from some pre-processing.

Summing Up
I am happily impressed by the plugin. It does not work miracles, but no one should really expect any tool to do so. It is great at improving the signal-to-noise ratio of your sound and does so in a very easy-to-use interface, which allows you to quickly decide whether you like the results or not. That alone is a plus that should be kept in consideration.

Claudio Santos is a sound mixer and tech aficionado who works at Silver Sound in NYC. He has worked on a wide range of sound projects ranging from traditional shows like I Was Prey for the Animal Planet and VR experiences like The Mile-Long Opera.

Lenovo’s ‘Transform’ event: IT subscriptions and AR

By Claudio Santos

Last week I had the opportunity to attend Lenovo’s “Transform” event, in which the company unveiled its newest releases as well as its plans for the near future. I must say they had quite the lineup ready.

The whole event was divided into two tracks “Datacenters” and “PC and Smart Devices.” Each focused on its own products and markets, but a single idea permeated all announcements in the day. It’s what Lenovo calls the “Fourth Revolution.” That’s what the company calls the next step in integration between devices and the cloud. Their vision is that soon 5G mobile Internet will be available, allowing for devices to seamlessly connect to the cloud on the go and more importantly, always stay connected.

While there were many interesting announcements throughout the day, I will focus on two that seem more closely relatable to most post facilities.

The first is what Lenovo is calling “PC as a service.” They want to sell the bulk of the IT hardware and support needs for companies as subscription-based deals, and that would be awesome! Why? Well, it’s simply a fact of life now that post production happens almost exclusively with the aid of computer software (sorry, if you’re still one of the few cutting film by hand, this article won’t be that interesting for you).

Having to choose, buy and maintain computers for our daily work takes a lot of research and, most notably, time. Between software updates, managing different licenses, subscriptions and hunting down weird quirks of the system, a lot of time is taken away from more important tasks such as editing or client relationship. When you throw a server and a local network in the mix it becomes a hefty job that takes a lot of maintenance.

That’s why bigger facilities employ IT specialists to deal with all that. But many post facilities aren’t big enough to employ a full-time IT person, nor are their needs complex enough to warrant the investment.

Lenovo sees this as an opportunity to simplify the role of the IT department by selling subscriptions that include the hardware, the software and all the necessary support (including a help desk) to keep the systems running without having to invest in a large IT department. More importantly, the subscription would be flexible. So, during periods in which you have need for more stations/support you can increase the scope of the subscription and then shrink it once again when the demands lower, freeing you from absorbing the cost of unused machines/software that would just sit around unused.

I see one big problem in this vision: Lenovo plans to start the service with a minimum of 1,000 seats for a deal. That is far, far more staff than most post facilities have, and at that point it would probably just be worth hiring a specialist that can also help you automate your workflow and develop customized tools for your projects. It is nonetheless an interesting approach, and I hope to see it trickle down to smaller clients as it solidifies as a feasible model.

The other announcement that should interest post facilities is Lenovo’s interest in the AR market. As many of you might know, augmented reality is projected to be an even bigger market than it’s more popular cousin virtual reality, largely due to its more professional application possibilities.

Lenovo has been investing in AR and has partnered up with Metavision to experiment and start working towards real work-environment offerings of the technology. Besides the hand gestures that are always emphasized in AR promo videos, one very simple use-case seems to be in Lenovo’s sights, and that’s one I hope to see being marketable very soon: workspace expansion. Instead of needing three or four different monitors to accommodate our ever-growing number of windows and displays while working, with AR we will be able to place windows anywhere around us, essentially giving us a giant spherical display. A very simple problem with a very simple solution, but one that I believe would increase the productivity of editors by a considerable amount.

We should definitely keep an eye on Lenovo as they embark one this new quest for high-efficiency solutions for businesses, because that’s exactly what the post production industry finds itself in need of right now.

Claudio Santos is a sound editor and spatial audio mixer at Silver Sound. Slightly too interested in technology and workflow hacks, he spends most of his waking hours tweaking, fiddling and tinkering away on his computer.

VR Audio: What you need to know about Ambisonics

By Claudio Santos

The explosion of virtual reality as a new entertainment medium has been largely discussed in the filmmaking community in the past year, and there is still no consensus about what the future will hold for the technology. But regardless of the predictions, it is a fact that more and more virtual reality content is being created and various producers are experimenting to find just how the technology fits into the current market.

Out of the vast possibilities of virtual reality, there is one segment that is particularly close to us filmmakers, and that is 360 videos. They are becoming more and more popular on platforms such as YouTube and Facebook and present the distinct advantage that —  beside playing in VR headsets, such as the GearVR or the DayDream — these videos can also be played in standalone mobile phones, tablets and stationary desktops. This considerably expands the potential audience when compared to the relatively small group of people who own virtual reality headsets.

But simply making the image immerse the viewer into a 360 environment is not enough. Without accompanying spatial audio the illusion is very easily broken, and it becomes very difficult to cue the audience to look in the direction in which the main action of each moment is happening. While there are technically a few ways to design and implement spatial audio into a 360 video, I will share some thoughts and tips on how to work with Ambisonics, the spatial audio format chosen as the standard for platforms such as YouTube.

VR shoot in Bryce Canyons with Google for the Hidden Worlds of the National Parks project. Credit: Hunt Beaty Picture by: Hunt Beaty

First, what is Ambisonics and why are we talking about it?
Ambisonics is a sound format that is slightly different from your usual stereo/surround paradigm because its channels are not attached to speakers. Instead, an Ambisonics recording actually represents the whole spherical soundfield around a point. In practice, it means that you can represent sound coming from all directions around a listening position and, using an appropriate decoder, you can playback the same recording in any set of speakers with any number of channels arranged around the listener horizontally or vertically. That is exactly why it is so interesting to us when we are working with spatial sound for VR.

The biggest challenge of VR audio is that you can’t predict which direction the viewer will be looking at in any given time. Using Ambisonics we can design the whole sound sphere and the VR player decodes the sound to match the direction of the video in realtime, decoding it into binaural for accurate headphone playback. The best part is that the decoding process is relatively light on processing power, which makes this a suitable option for mediums with limited resources such as smartphones.

In order to work with Ambisonics we have two options: to record the sound on location with an Ambisonics microphone, which gives us a very realistic representation of the sound in the location and is very well suited to ambiance recordings, for example; or we can encode other sound formats such as mono and stereo into Ambisonics and then manipulate the sound in the sphere from there, which gives us great flexibility in post production to use sound libraries and create interesting effects by carefully adjusting the positioning and width of a sound in the sphere.

Example: Mono “voice of God” placement. The left shows the soundfield completely filled, which gives the “in-head” illusion.

There are plenty of resources online explaining the technical nature of Ambisonics, and I definitely recommend reading them so you can better understand how to work with it and how the spatiality is achieved. But there aren’t many discussions yet about the creative decisions and techniques used in sound for 360 videos with Ambisonics, so that’s what we will be focusing on from now on.

What to do with mono “in-head” sources such as VO?
That was one of the first tricky challenges we found with Ambisonics. It is not exactly intuitive to place a sound source equally in all directions of the soundfield. The easiest solution comes more naturally once you understand how the four channels of the Ambisonics audio track interact with each other.

The first channel of the ambisonics audio, named W, is omnidirectional and contains the level information of the sound. The other three channels describe the position of the sound in the soundfield through phase relationships. Each one of the channels represents one dimension, which enables the positioning of sounds in three dimensions.

Now, if we want the sound to play at the same level and centered from every direction, what we want is for the sound source to be at the center of the soundfield “sphere,” where the listeners head is. In practice, that means that if you play the sound out of the first channel only, with no information into either of the other three channels, the sound will play “in-head.”

What to do with stereo non-diegetic music?
This is the natural question that follows the one of knowing what to do with mono sources. And the answer is a bit trickier. The mono, first channel trick doesn’t work perfectly with stereo sources because for that to work you would have to first sum the stereo to mono, which might be undesirable depending on your track.

If you want to maintain the stereo width of the source, one good option we found was to mirror the sound in two directions. Some plug-in suites, such as the Ambix VST, offer the functionality to mirror hemispheres of the soundfield. That could also be accomplish with careful positioning of a copy of the source, but this will make things easier.

Example of sound paced in the “left” of the soundfield in ambisonics.

Generally, what you want is to place the center of the stereo source in the focus of the action your audience will be looking at and mirror the top-bottom and the front-back. This will keep the music playing at the same level regardless of the direction the viewer looks at, but will keep the spatiality of the source. The downside is that the sound is not anchored to the viewer, so changes in direction of the sources will be noted as the viewer turns around, notably inverting the sides when looking at the back. I usually find this to be an interesting effect nonetheless, and it doesn’t distract the audience too much. If the directionality is too noticeable you can always mix a bit of the mono sum of the music into both channels in order to reduce the perceived width of the track.

How to creatively use reverberation in Ambisonics?
There is a lot you can do with reverberation in Ambisonics and this is only a single trick I find very useful when dealing with scenes in which you have one big obstacle in one direction (such as a wall), and no obstacles in the opposite direction.

In this situation, the sound would reflect from the barrier and return to the listener from one direction, while on the opposite side there would be no significant reflections because of the open field. You can simulate that by placing a slightly delayed reverb coming from the direction of the barrier only. You can adjust the width of the reflection sound to match the perceived size of the barrier and the delay based on the distance the barrier is from the viewer. In this case the effect usually works better with drier reverbs with defined early reflections but not a lot of late reflections.

Once you experiment with this technique you can use variations of if to simulate a variety of spaces and achieve even more realistic mixes that will fool anyone into believing the sounds you placed in post production were recorded on location.

Main Caption: VR shoot in Hawaii with Google for the Hidden Worlds of the National Parks project. Credit: Hunt Beaty.

Claudio Santos is a sound editor at Silver Sound/SilVR in New York.