Category Archives: 360

VR audio terms: Gaze Activation v. Focus

By Claudio Santos

Virtual reality brings a lot of new terminology to the post process, and we’re all having a hard time agreeing on the meaning of everything. It’s tricky because clients and technicians sometimes have different understandings of the same term, which is a guaranteed recipe for headaches in post.

Two terms that I’ve seen being confused a few times in the spatial audio realm are Gaze Activation and Focus. They are both similar enough to be put in the same category, but at the same time different enough that most of the times you have to choose completely different tools and distribution platforms depending on which technology you want to use.

Field of view

Focus
Focus is what the Facebook Spatial Workstation calls this technology, but it is a tricky one to name. As you may know, ambisonics represents a full sphere of audio around the listener. Players like YouTube and Facebook (which uses ambisonics inside its own proprietary .tbe format) can dynamically rotate this sphere so the relative positions of the audio elements are accurate to the direction the audience is looking at. But the sounds don’t change noticeably in level depending on where you are looking.

If we take a step back and think about “surround sound” in the real world, it actually makes perfect sense. A hair clipper isn’t particularly louder when it’s in front of our eyes as opposed to when its trimming the back of our head. Nor can we ignore the annoying person who is loudly talking on their phone on the bus by simply looking away.

But for narrative construction, it can be very effective to emphasize what your audience is looking at. That opens up possibilities, such as presenting the viewer with simultaneous yet completely unrelated situations and letting them choose which one to pay attention to simply by looking in the direction of the chosen event. Keep in mind that in this case, all events are happening simultaneously and will carry on even if the viewer never looks at them.

This technology is not currently supported by YouTube, but it is possible in the Facebook Spatial Workstation with the use of high Focus Values.

Gaze Activation
When we talk about focus, the key thing to keep in mind is that all the events happen regardless of the viewer looking at them or not. If instead you want a certain sound to only happen when the viewer looks at a certain prop, regardless of the time, then you are looking for Gaze Activation.

This concept is much more akin to game audio then to film sound because of the interactivity element it presents. Essentially, you are using the direction of the gaze and potentially the length of the gaze (if you want your viewer to look in a direction for x amount of seconds before something happens) as a trigger for a sound/video playback.

This is very useful if you want to make impossible for your audience to miss something because they were looking in the “wrong” direction. Think of a jump scare in a horror experience. It’s not very scary if you’re looking in the opposite direction, is it?

This is currently only supported if you build your experience in a game engine or as an independent app with tools such as InstaVR.

Both concepts are very closely related and I expect many implementations will make use of both. We should all keep an eye on the VR content distribution platforms to see how these tools will be supported and make the best use of them in order to make 360 videos even more immersive.


Claudio Santos is a sound editor and spatial audio mixer at Silver Sound. Slightly too interested in technology and workflow hacks, he spends most of his waking hours tweaking, fiddling and tinkering away on his computer.

John Hughes, Helena Packer, Kevin Donovan open post collective

Three industry vets have combined to launch PHD, a Los Angeles-based full-service post collective. Led by John Hughes (founder of Rhythm & Hues), Helena Packer (VFX supervisor/producer) and Kevin Donovan (film/TV/commercials director), PHD works across the genres of VR/AR, independent films, documentaries, TV — including limited series and commercials. In addition to post production, including color grading, offline and online editorial, the visual effects and final delivery, they offer live-action production services. In addition to Los Angeles, PHD has locations in India, Malaysia and South Africa.

Hughes was the co-founder of the legendary VFX shop Rhythm & Hues (R&H) and led that studio for 26 years, earning three Academy Awards for “Best Visual Effects” (Babe, The Golden Compass, Life of Pi) as well as four scientific and engineering Academy Awards.

Packer was inducted into the Academy of Motion Picture Arts and Sciences (AMPAS) in 2008 for her creative contributions to filmmaking as an accomplished VFX artist, supervisor and producer. Her expertise extends beyond feature films to episodic TV, stereoscopic 3D and animation. Packer has been the VFX supervisor and Flame artist for hundreds of commercials and over 20 films, including 21 Jump Street and Charlie Wilson’s War.

Director Kevin Donovan is particularly well-versed in action and visual effects. He directed the feature film, The Tuxedo, and is currently producing the TV series What Would Trejo Do? He has shot over 700 commercials during the course of his career and is the winner of six Cannes Lions.

Since the company’s launch, PHD has worked on a number of projects — two PSAs for the Climate Change organization 5 To Do Today featuring Arnold Schwarzenegger and James Cameron called Don’t Buy It and Precipice
a PSA for the international animal advocacy group WildAid shot in Tanzania and Oregon called Talking Elephant, another for WildAid shot in Cape Town, South Africa called Talking Rhino, and two additional WildAid PSAs featuring actor Josh Duhamel called Souvenir and Situation.

“In a sense, our new company is a reconfigured version of R&H, but now we are much smarter, much more nimble and much more results driven,” says Hughes about PHD. “We have very little overhead to deal with. Our team has worked on hundreds of award-winning films and commercials…”

Main Photo: L-R:  John Hughes, Helena Packer and Kevin Donovan.

Dell 6.15

Liron Ashkenazi-Eldar joins The Artery as design director  

Creative studio The Artery has brought on Liron Ashkenazi-Eldar as lead design director. In her new role, she will spearhead the formation of a department that will focus on design and branding. Ashkenazi-Eldar and team are also developing in-house design capabilities to support the company’s VFX, experiential and VR/AR content, as well as website development, including providing motion graphics, print and social campaigns.

“While we’ve been well established for many years in the areas of production and VFX, our design team can now bring a new dimension to our company,” says Ashkenazi-Eldar, who is based in The Artery’s NYC office. “We are seeking brand clients with strong identities so that we can offer them exciting, new and even weird creative solutions that are not part of the traditional branding process. We will be taking a completely new approach to branding — providing imagery that is more emotional and more personal, instead of just following an existing protocol. Our goal is to provide a highly immersive experience for our new brand clients.”

Originally from Israel, the 27-year-old Ashkenazi-Eldar is a recent graduate of New York’s School of Visual Arts with a BFA degree in Design. She is the winner of a 2017 ADC Silver Cube Award from The One Club, in the category 2017 Design: Typography, for her contributions to a project titled Asa Wife Zine. She led the Creative Team that submitted the project via the School of Visual Arts.

 


Recording live musicians in 360

By Luke Allen

I’ve had the opportunity to record live musicians in a couple of different in-the-field scenarios for 360 video content. In some situations — such as the ubiquitous 360 rock concert video — simply having access to the board feed is all one needs to create a pretty decent spatial mix (although the finer points of that type of mix would probably fill up a whole different article).

But what if you’re shooting in an acoustically interesting space where intimacy and immersion are the goal? What if you’re in the field in the middle of a rainstorm without access to AC power? It’s clear that in most cases, some combination of ambisonic capture and close micing is the right approach.

What I’ve found is that in all but a few elaborate set-ups, a mobile ambisonic recording rig (in my case, built around the Zaxcom Nomad and Soundfield SPS-200) — in addition to three to four omni-directional lavs for close micing — is more than sufficient to achieve excellent results. Last year, I had the pleasure of recording a four-piece country ensemble in a few different locations around Ireland.

Micing a Pub
For this particular job, I had the SPS and four lavs. For most of the day I had planted one Sanken COS-11 on the guitar, one on the mandolin, one on the lead singer and a DPA 4061 inside the upright bass (which sounded great!). Then, for the final song, the band wanted to add a fiddle to the mix — yet I was out of mics to cover everything. We had moved into the partially enclosed porch area of a pub with the musicians perched in a corner about six feet from the camera. I decided to roll the dice and trust the SPS to pick up the fiddle, which I figured would be loud enough in the small space that a lav wouldn’t be used much in the mix anyways. In post, the gamble paid off.

I was glad to have kept the quieter instruments mic’d up (especially the singer and the bass) while the fiddle lead parts sounded fantastic on the ambisonic recordings alone. This is one huge reason why it’s worth it to use higher-end Ambisonic mics, as you can trust them to provide fidelity for more than just ambient recordings.

An Orchestra
In another recent job, I was mixing for a 360 video of an orchestra. During production we moved the camera/sound rig around to different locations in a large rehearsal stage in London. Luckily, on this job we were able to also run small condensers into a board for each orchestra section, providing flexibility in the mix. Still, in post, the director wanted the spatial effect to be very perceptible and dynamic as we jump around the room during the lively performance. The SPS came in handy once again; not only does it offer good first-order spatial fidelity but a wide enough dynamic range and frequency response to be relied on heavily in the mix in situations where the close-mic recordings sounded flat. It was amazing opening up those recordings and listening to the SPS alone through a decent HRTF — it definitely exceeded my expectations.

It’s always good to be as prepared as possible when going into the field, but you don’t always have the budget or space for tons of equipment. In my experience, one high-quality and reliable ambisonic mic, along with some auxiliary lavs and maybe a long shotgun, are a good starting point for any field recording project for 360 video involving musicians.


Sound designer and composer Luke Allen is a veteran spatial audio designer and engineer, and a principal at SilVR in New York City. He can be reached at luke@silversound.us.


What was new at GTC 2017

By Mike McCarthy

I, once again, had the opportunity to attend Nvidia’s GPU Technology Conference (GTC) in San Jose last week. The event has become much more focused on AI supercomputing and deep learning as those industries mature, but there was also a concentration on VR for those of us from the visual world.

The big news was that Nvidia released the details of its next-generation GPU architecture, code named Volta. The flagship chip will be the Tesla V100 with 5,120 CUDA cores and 15 Teraflops of computing power. It is a huge 815mm chip, created with a 12nm manufacturing process for better energy efficiency. Most of its unique architectural improvements are focused on AI and deep learning with specialized execution units for Tensor calculations, which are foundational to those processes.

Tesla V100

Similar to last year’s GP100, the new Volta chip will initially be available in Nvidia’s SXM2 form factor for dedicated GPU servers like their DGX1, which uses the NVLink bus, now running at 300GB/s. The new GPUs will be a direct swap-in replacement for the current Pascal based GP100 chips. There will also be a 150W version of the chip on a PCIe card similar to their existing Tesla lineup, but only requiring a single half-length slot.

Assuming that Nvidia puts similar processing cores into their next generation of graphics cards, we should be looking at a 33% increase in maximum performance at the top end. The intermediate stages are more difficult to predict, since that depends on how they choose to tier their cards. But the increased efficiency should allow more significant increases in performance for laptops, within existing thermal limitations.

Nvidia is continuing its pursuit of GPU-enabled autonomous cars with its DrivePX2 and Xavier systems for vehicles. The newest version will have a 512 Core Volta GPU and a dedicated deep learning accelerator chip that they are going to open source for other devices. They are targeting larger vehicles now, specifically in the trucking industry this year, with an AI-enabled semi-truck in their booth.

They also had a tractor showing off Blue River’s AI-enabled spraying rig, targeting individual plants for fertilizer or herbicide. It seems like farm equipment would be an optimal place to implement autonomous driving, allowing perfectly straight rows and smooth grades, all in a flat controlled environment with few pedestrians or other dynamic obstructions to be concerned about (think Interstellar). But I didn’t see any reference to them looking in that direction, even with a giant tractor in their AI booth.

On the software and application front, software company SAP showed an interesting implementation of deep learning that analyzes broadcast footage and other content looking to identify logos and branding, in order to provide quantifiable measurements of the effectiveness of various forms of brand advertising. I expect we will continue to see more machine learning implementations of video analysis, for things like automated captioning and descriptive video tracks, as AI becomes more mature.

Nvidia also released an “AI-enabled” version of I-Ray to use image prediction to increase the speed of interactive ray tracing renders. I am hopeful that similar technology could be used to effectively increase the resolution of video footage as well. Basically, a computer sees a low-res image of a car and says, “I know what that car should look like,” and fills in the rest of the visual data. The possibilities are pretty incredible, especially in regard to VFX.

Iray AI

On the VR front, Nvidia announced a new SDK that allows live GPU-accelerated image stitching for stereoscopic VR processing and streaming. It scales from HD to 5K output, splitting the workload across one to four GPUs. The stereoscopic version is doing much more than basic stitching, processing for depth information and using that to filter the output to remove visual anomalies and improve the perception of depth. The output was much cleaner than any other live solution I have seen.

I also got to try my first VR experience recorded with a Light Field camera. This not only gives the user a 360 stereo look around capability, but also the ability to move their head around to shift their perspective within a limited range (based on the size the recording array). The project they were using to demo the technology didn’t highlight the amazing results until the very end of the piece, but when it did that was the most impressive VR implementation I have had the opportunity to experience yet.
———-
Mike McCarthy is an online editor/workflow consultant with 10 years of experience on feature films and commercials. He has been working on new solutions for tapeless workflows, DSLR filmmaking and multi-screen and surround video experiences. Check out his site.


VR Workflows: The Studio | B&H panel during NAB

At this year’s NAB Show in Las Vegas, The Studio B&H hosted a series of panels at their booth. One of those panels addressed workflows for virtual reality, including shooting, posting, best practices, hiccups and trends.

The panel, moderated by postPerspective editor-in-chief Randi Altman, was made up of SuperSphere’s Lucas Wilson, ReDesign’s Greg Ciaccio, Local Hero Post’s Steve Bannerman and Jaunt’s Koji Gardner.

While the panel was streamed live, it also lives on YouTube. Enjoy…


New AMD Radeon Pro Duo graphics card for pro workflows

AMD was at NAB this year with its dual-GPU graphics card designed for pros — the Polaris-architecture-based Radeon Pro Duo. Built on the capabilities of the Radeon Pro WX 7100, the Radeon Pro Duo graphics card is designed for media and entertainment, broadcast and design workflows.

The Radeon Pro Duo is equipped with 32GB of ultra-fast GDDR5 memory to handle larger data sets, more intricate 3D models, higher-resolution videos and complex assemblies. Operating at a max power of 250W, the Radeon Pro Duo uses a total of 72 compute units (4,608 stream processors) for a combined performance of up to 11.45 TFLOPS of single-precision compute performance on one board, and twice the geometry throughput of the Radeon Pro WX 7100.

The Radeon Pro Duo enables pros to work on up to four 4K monitors at 60Hz, drive the latest 8K single monitor display at 30Hz using a single cable or drive an 8K display at 60Hz using a dual cable solution.

The Radeon Pro Duo’s distinct dual-GPU design allows pros the flexibility to divide their workloads, enabling smooth multi-tasking between applications by committing GPU resources to each. This will allow users to focus on their creativity and get more done faster, allowing for a greater number of design iterations in the same time.

On select pro apps (including DaVinci Resolve, Nuke/Care VR, Blender Cycles and VRed), the Radeon Pro Duo offers up to two times faster performance compared with the Radeon Pro WX 7100.

For those working in VR, the Radeon Pro Duo graphics card uses the power of two GPUs to render out separate images for each eye, increasing VR performance over single GPU solutions by up to 50% in the SteamVR test. AMD’s LiquidVR technologies are also supported by the industry’s leading realtime engines, including Unity and Unreal, to help ensure smooth, comfortable and responsive VR experiences on Radeon Pro Duo.

The Radeon Pro Duo’s planned availability is the end of May at an expected price of US $999.

FMPX8.14

Timecode and GoPro partner to make posting VR easier

Timecode Systems and GoPro’s Kolor team recently worked together to create a new timecode sync feature for Kolor’s Autopano Video Pro stitching software. By combining their technologies, the two companies have developed a VR workflow solution that offers the efficiency benefits of professional standard timecode synchronization to VR and 360 filming.

Time-aligning files from the multiple cameras in a 360° VR rig can be a manual and time-consuming process if there is no easy synchronization point, especially when synchronizing with separate audio. Visually timecode-slating cameras is a disruptive manual process, and using the clap of a slate (or another visual or audio cue) as a sync marker can be unreliable when it comes to the edit process.

The new sync feature, included in the Version 3.0 update to Autopano Video Pro, incorporates full support for MP4 timecode generated by Timecode’s products. The solution is compatible with a range of custom, multi-camera VR rigs, including rigs using GoPro’s Hero 4 cameras with SyncBac Pro for timecode and also other camera models using alternative Timecode Systems products. This allows VR filmmakers to focus on the creative and not worry about whether every camera in the rig is shooting in frame-level synchronization. Whether filming using a two-camera GoPro Hero 4 rig or 24 cameras in a 360° array creating resolutions as high as 32K, the solution syncs with the same efficiency. The end results are media files that can be automatically timecode-aligned in Autopano Video Pro with the push of a button.

“We’re giving VR camera operators the confidence that they can start and stop recording all day long without the hassle of having to disturb filming to manually slate cameras; that’s the understated benefit of timecode,” says Paul Bannister, chief science officer of Timecode Systems.

“To create high-quality VR output using multiple cameras to capture high-quality spherical video isn’t enough; the footage that is captured needs to be stitched together as simply as possible — with ease, speed and accuracy, whatever the camera rig,” explains Alexandre Jenny, senior director of Immersive Media Solutions at GoPro. “Anyone who has produced 360 video will understand the difficulties involved in relying on a clap or visual cue to mark when all the cameras start recording to match up video for stitching. To solve that issue, either you use an integrated solution like GoPro Omni with a pixel-level synchronization, or now you have the alternative to use accurate timecode metadata from SyncBac Pro in a custom, scalable multicamera rig. It makes the workflow much easier for professional VR content producers.”


Hobo’s Howard Bowler and Jon Mackey on embracing full-service VR

By Randi Altman

New York-based audio post house Hobo, which offers sound design, original music composition and audio mixing, recently embraced virtual reality by launching a 360 VR division. Wanting to offer clients a full-service solution, they partnered with New York production/post production studios East Coast Digital and Hidden Content, allowing them to provide concepting through production, post, music and final audio mix in an immersive 360 format.

The studio is already working on some VR projects, using their “object-oriented audio mix” skills to enhance the 360 viewing experience.

We touched base with Hobo’s founder/president, Howard Bowler, and post production producer Jon Mackey to get more info on their foray into VR.

Why was now the right time to embrace 360 VR?
Bowler: We saw the opportunity stemming from the advancement of the technology not only in the headsets but also in the tools necessary to mix and sound design in a 360-degree environment. The great thing about VR is that we have many innovative companies trying to establish what the workflow norm will be in the years to come. We want to be on the cusp of those discoveries to test and deploy these tools as the ecosystem of VR expands.

As an audio shop you could have just offered audio-for-VR services only, but instead aligned with two other companies to provide a full-service experience. Why was that important?
Bowler: This partnership provides our clients with added security when venturing out into VR production. Since the medium is relatively new in the advertising and film world, partnering with experienced production companies gives us the opportunity to better understand the nuances of filming in VR.

How does that relationship work? Will you be collaborating remotely? Same location?
Bowler: Thankfully, we are all based in West Midtown, so the collaboration will be seamless.

Can you talk a bit about object-based audio mixing and its challenges?
Mackey: The challenge of object-based mixing is not only mixing based in a 360-degree environment or converting traditional audio into something that moves with the viewer but determining which objects will lead the viewer, with its sound cue, into another part of the environment.

Bowler: It’s the creative challenge that inspires us in our sound design. With traditional 2D film, the editor controls what you see with their cuts. With VR, the partnership between sight and sound becomes much more important.

Howard Bowler pictured embracing VR.

How different is your workflow — traditional broadcast or spot work versus VR/360?
Mackey: The VR/360 workflow isn’t much different than traditional spot work. It’s the testing and review that is a game changer. Things generally can’t be reviewed live unless you have a custom rig that runs its own headset. It’s a lot of trial and error in checking the mixes, sound design, and spacial mixes. You also have to take into account the extra time and instruction for your clients to review a project.

What has surprised you the most about working in this new realm?
Bowler: The great thing about the VR/360 space is the amount of opportunity there is. What surprised us the most is the passion of all the companies that are venturing into this area. It’s different than talking about conventional film or advertising; there’s a new spark and its fueling the rise of the industry and allowing larger companies to connect with smaller ones to create an atmosphere where passion is the only thing that counts.

What tools are you using for this type of work?
Mackey: The audio tools we use are the ones that best fit into our Avid ProTools workflow. This includes plug-ins from G-Audio and others that we are experimenting with.

Can you talk about some recent projects?
Bowler: We’ve completed projects for Samsung with East Coast Digital, and there are more on the way.

Main Image: Howard Bowler and Jon Mackey

The importance of audio in VR

By Anne Jimkes

While some might not be aware, sound is 50 percent of the experience in VR, as well as in film, television and games. Because we can’t physically see the audio, it might not get as much attention as the visual side of the medium. But the balance and collaboration between visual and aural is what creates the most effective, immersive and successful experience.

More specifically, sound in VR can be used to ease people into the experience, what we also call “on boarding.” It can be used subtly and subconsciously to guide viewers by motivating them to look in a specific direction of the virtual world, which completely surrounds them.

In every production process, it is important to discuss how sound can be used to benefit the storytelling and the overall experience of the final project. In VR, especially the many low-budget independent projects, it is crucial to keep the importance and use of audio in mind from the start to save time and money in the end. Oftentimes, there are no real opportunities or means to record ADR after a live-action VR shoot, so it is important to give the production mixer ample opportunity to capture the best production sound possible.

Anne Jimkes at work.

This involves capturing wild lines, making sure there is time to plant and check the mics, and recording room tone. Things that are already required, albeit not always granted, on regular shoots, but even more important on a set where a boom operator cannot be used due to the 360 degree view of the camera. The post process is also very similar to that for TV or film up to the point of actual spatialization. We come across similar issues of having to clean up dialogue and fill in the world through sound. What producers must be aware of, however, is that after all the necessary elements of the soundtrack have been prepared, we have to manually and meticulously place and move around all the “audio objects” and various audio sources throughout the space. Whenever people decide to re-orient the video — meaning when they change what is considered the initial point of facing forward or “north” — we have to rewrite all this information that established the location and movement of the sound, which takes time.

Capturing Audio for VR
To capture audio for virtual reality we have learned a lot about planting and hiding mics as efficiently as possible. Unlike regular productions, it is not possible to use a boom mic, which tends to be the primary and most naturally sounding microphone. Aside from the more common lavalier mics, we also use ambisonic mics, which capture a full sphere of audio and matches the 360 picture — if the mic is placed correctly on axis with the camera. Most of the time we work with Sennheiser and use their Ambeo microphone to capture 360 audio on set, after which we add the rest of the spatialized audio during post production. Playing back the spatialized audio has become easier lately, because more and more platforms and VR apps accept some form of 360 audio playback. There is still a difference between the file formats to which we can encode our audio outputs, meaning that some are more precise and others are a little more blurry regarding spatialization. With VR, there is not yet a standard for deliverables and specs, unlike the film/television workflow.

What matters most in the end is that people are aware of how the creative use of sound can enhance their experience, and how important it is to spend time on capturing good dialogue on set.


Anne Jimkes is a composer, sound designer, scholar and visual artist from the Netherlands. Her work includes VR sound design at EccoVR and work with the IMAX VR Centre. With a Master’s Degree from Chapman University, Jimkes previously served as a sound intern for the Academy of Television Arts & Sciences.