Category Archives: A.I.

Quick Chat: AI-based audio mastering

Antoine Rotondo is an audio engineer by trade who has been in the business for the past 17 years. Throughout his career he’s worked in audio across music, film and broadcast, focusing on sound reproduction. After completing college studies in sound design, undergraduate studies in music and music technology, as well as graduate studies in sound recording at McGill University in Montreal, Rotondo went on to work in recording, mixing, producing and mastering.

He is currently an audio engineer at Landr.com, which has released Landr Audio Mastering for Video, which provides professional video editors with AI-based audio mastering capabilities in Adobe Premiere Pro CC.

As an audio engineer how do you feel about AI tools to shortcut the mastering process?
Well first, there’s a myth about how AI and machines can’t possibly make valid decisions in the creative process in a consistent way. There’s actually a huge intersection between artistic intentions and technical solutions where we find many patterns, where people tend to agree and go about things very similarly, often unknowingly. We’ve been building technology around that.

Truth be told there are many tasks in audio mastering that are repetitive and that people don’t necessarily like spending a lot of time on, tasks such as leveling dialogue, music and background elements across multiple segments, or dealing with noise. Everyone’s job gets easier when those tasks become automated.

I see innovation in AI-driven audio mastering as a way to make creators more productive and efficient — not to replace them. It’s now more accessible than ever for amateur and aspiring producers and musicians to learn about mastering and have the resources to professionally polish their work. I think the same will apply to videographers.

What’s the key to making video content sound great?
Great sound quality is effortless and sounds as natural as possible. It’s about creating an experience that keeps the viewer engaged and entertained. It’s also about great communication — delivering a message to your audience and even conveying your artistic vision — all this to impact your audience in the way you intended.

More specifically, audio shouldn’t unintentionally sound muffled, distorted, noisy or erratic. Dialogue and music should shine through. Viewers should never need to change the volume or rewind the content to play something back during the program.

When are the times you’d want to hire an audio mastering engineer and when are the times that projects could solely use an AI-engine for audio mastering?
Mastering engineers are especially important for extremely intricate artistic projects that require direct communication with a producer or artist, including long-form narrative, feature films, television series and also TV commercials. Any project with conceptual sound design will almost always require an engineer to perfect the final master.

Users can truly benefit from AI-driven mastering in short form, non-fiction projects that require clean dialog, reduced background noise and overall leveling. Quick turnaround projects can also use AI mastering to elevate the audio to a more professional level even, when deadlines are tight. AI mastering can now insert itself in the offline creation process, where multiple revisions of a project are sent back and forth, making great sound accessible throughout the entire production cycle.

The other thing to consider is that AI mastering is a great option for video editors who don’t have technical audio expertise themselves, and where lower budgets translate into them having to work on their own. These editors could purchase purpose-built mastering plugins, but they don’t necessarily have the time to learn how to really take advantage of these tools. And even if they did have the time, some would prefer to focus more on all the other aspects of the work that they have to juggle.

AI for M&E: Should you take the leap?

By Nick Gold

In Hollywood, the promise of artificial intelligence is all the rage. Who wouldn’t want a technology that adds the magic of AI to smarter computers for an instant solution to tedious, time-intensive problems? With artificial intelligence, anyone with abundant rich media assets can easily churn out more revenue or cut costs, while simplifying operations … or so we’re told.

If you attended IBC, you probably already heard the pitch: “It’s an ‘easy’ button that’s simple to add to the workflow and foolproof to operate, turning your massive amounts of uncategorized footage into metadata.”

But should you take the leap? Before you sign on the dotted line, take a closer look at the technology behind AI and what it can — and can’t — do for you.

First, it’s important to understand the bigger picture of artificial intelligence in today’s marketplace. Taking unstructured data and generating relevant metadata from it is something that other industries have been doing for some time. In fact, many of the tools we embrace today started off in other industries. But unlike banking, finance or healthcare, our industry prioritizes creativity, which is why we have always shied away from tools that automate. The idea that we can rely on the same technology as a hedge fund manager just doesn’t sit well with many people in our industry, and for good reason.

Nick Gold talks AI for a UCLA Annex panel.

In the media and entertainment industry, we’re looking for various types of metadata that could include a transcript of spoken words, important events within a period of time or information about the production (e.g., people, location, props), and currently there’s no single machine-learning algorithm that will solve for all these types of metadata parameters. For that reason, the best starting point is to define your problems and identify which machine learning tools may be able to solve them. Expecting to parse reams of untagged, uncategorized and unstructured media data is unrealistic until you know what you’re looking for.

What works for M&E?
AI has become pretty good at solving some specific problems for our industry. Speech-to-text is one of them. With AI, extracting data from a generally accurate transcription offers an automated solution that saves time. However, it’s important to note that AI tools still have limitations. An AI tool, known as “sentiment analysis,” could theoretically look for the emotional undertones described in spoken word, but it first requires another tool to generate a transcript for analysis.

But no matter how good the algorithms are, they won’t give you the qualitative data that a human observer would provide, such as the emotions expressed through body language. They won’t tell you the facial expressions of the people being spoken to, or the tone of voice, pacing and volume level of the speaker, or what is conveyed by a sarcastic tone or a wry expression. There are sentiment analysis engines that try to do this, but breaking down the components ensures the parameters you need will be addressed and solved.

Another task at which machine learning has progressed significantly is logo recognition. Certain engines are good at finding, for example, all the images with a Coke logo in 10,000 hours of video. That’s impressive and quite useful, but it’s another story if you want to also find footage of two people drinking what are clearly Coke-shaped bottles where the logo is obscured. That’s because machine-learning engines tend to have a narrow focus, which goes back to the need to define very specifically what you hope to get from it.

There are a bevy of algorithms and engines out there. If you license a service that will find a specific logo, then you haven’t solved your problem for finding objects that represent the product as well. Even with the right engine, you’ve got to think about how this information fits in your pipeline, and there are a lot of workflow questions to be explored.

Let’s say you’ve generated speech-to-text with audio media, but have you figured out how someone can search the results? There are several options. Sometimes vendors have their own front end for searching. Others may offer an export option from one engine into a MAM that you either already have on-premise or plan to purchase. There are also vendors that don’t provide machine learning themselves but act as a third-party service organizing the engines.

It’s important to remember that none of these AI solutions are accurate all the time. You might get a nudity detection filter, for example, but these vendors rely on probabilistic results. If having one nude image slip through is a huge problem for your company, then machine learning alone isn’t the right solution for you. It’s important to understand whether occasional inaccuracies will be acceptable or deal breakers for your company. Testing samples of your core content in different scenarios for which you need to solve becomes another crucial step. And many vendors are happy to test footage in their systems.

Although machine learning is still in its nascent stages, there is a lot of interest in learning how to make it work in the media workflow. It can do some magical things, but it’s not a magic “easy” button (yet, anyway). Exploring the options and understanding in detail what you need goes hand-in-hand with finding the right solution to integrate with your workflow.


Nick Gold is lead technologist for Baltimore’s Chesapeake Systems, which specializes in M&E workflows and solutions for the creation, distribution and preservation of content. Active in both SMPTE and the Association of Moving Image Archivists (AMIA), Gold speaks on a range of topics. He also co-hosts the Workflow Show Podcast.
 

DG 7.9, 8.27, 9.26

Adobe updates Creative Cloud

By Brady Betzel

You know it’s almost fall when when pumpkin spice lattes are  back and Adobe announces its annual updates. At this year’s IBC, Adobe had a variety of updates to its Creative Cloud line of apps. From more info on their new editing platform Project Rush to the addition of Characterizer to Character Animator — there are a lot of updates so I’m going to focus on a select few that I think really stand out.

Project Rush

I use Adobe Premiere quite a lot these days; it’s quick and relatively easy to use and will work with pretty much every codec in the universe. In addition, the Dynamic Link between Adobe Premiere Pro and Adobe After Effects is an indispensible feature in my world.

With the 2018 fall updates, Adobe Premiere will be closer to a color tool like Blackmagic’s Resolve with the addition of new hue saturation curves in the Lumetri Color toolset. In Resolve these are some of the most important aspects of the color corrector, and I think that will be the same for Premiere. From Hue vs. Sat, which can help isolate a specific color and desaturate it to Hue vs. Luma, which can help add or subtract brightness values from specific hues and hue ranges — these new color correcting tools further Premiere’s venture into true professional color correction. These new curves will also be available inside of After Effects.

After Effects features many updates, but my favorites are the ability to access depth matte data of 3D elements and the addition of the new JavaScript engine for building expressions.

There is one update that runs across both Premiere and After Effects that seems to be a sleeper update. The improvements to motion graphics templates, if implemented correctly, could be a time and creativity saver for both artists and editors.

AI
Adobe, like many other companies, seem to be diving heavily into the “AI” pool, which is amazing, but… with great power comes great responsibility. While I feel this way and realize others might not, sometimes I don’t want all the work done for me. With new features like Auto Lip Sync and Color Match, editors and creators of all kinds should not lose the forest for the trees. I’m not telling people to ignore these features, but asking that they put a few minutes into discovering how the color of a shot was matched, so you can fix something if it goes wrong. You don’t want to be the editor who says, “Premiere did it” and not have a great solution to fix something when it goes wrong.

What Else?
I would love to see Adobe take a stab at digging up the bones of SpeedGrade and integrating that into the Premiere Pro world as a new tab. Call it Lumetri Grade, or whatever? A page with a more traditional colorist layout and clip organization would go a long way.

In the end, there are plenty of other updates to Adobe’s 2018 Creative Cloud apps, and you can read their blog to find out about other updates.


IBC 2018: Convergence and deep learning

By David Cox

In the 20 years I’ve been traveling to IBC, I’ve tried to seek out new technology, work practices and trends that could benefit my clients and help them be more competitive. One thing that is perennially exciting about this industry is the rapid pace of change. Certainly, from a post production point of view, there is a mini revolution every three years or so. In the past, those revolutions have increased image quality or the efficiency of making those images. The current revolution is to leverage the power and flexibly of cloud computing. But those revolutions haven’t fundamentally changed what we do. The images might have gotten sharper, brighter and easier to produce, but TV is still TV. This year though, there are some fascinating undercurrents that could herald a fundamental shift in the sort of content we create and how we create it.

Games and Media Collide
There is a new convergence on the horizon in our industry. A few years ago, all the talk was about the merge between telecommunications companies and broadcasters, as well as the joining of creative hardware and software for broadcast and film, as both moved to digital.

The new convergence is between media content creation as we know it and the games industry. It was subtle, but technology from gaming was present in many applications around the halls of IBC 2018.

One of the drivers for this is a giant leap forward in the quality of realtime rendering by the two main game engine providers: Unreal and Unity. I program with Unity for interactive applications, and their new HDSRP rendering allows for incredible realism, even when being rendered fast enough for 60+ frames per second. In order to create such high-quality images, those game engines must start with reasonably detailed models. This is a departure from the past, where less detailed models were used for games than were used for film CGI shots, to protect for realtime performance. So, the first clear advantage created by the new realtime renderers is that a film and its inevitable related game can use the same or similar model data.

NCam

Being able to use the same scene data between final CGI and a realtime game engine allows for some interesting applications. Habib Zargarpour from Digital Monarch Media showed a system based on Unity that allows a camera operator to control a virtual camera in realtime within a complex CGI scene. The resulting camera moves feel significantly more real than if they had been keyframed by an animator. The camera operator chases high-speed action, jumps at surprises and reacts to unfolding scenes. The subtleties that these human reactions deliver via minor deviations in the movement of the camera can convey the mood of a scene as much as the design of the scene itself.

NCam was showing the possibilities of augmenting scenes with digital assets, using their system based on the Unreal game engine. The NCam system provides realtime tracking data to specify the position and angle of a freely moving physical camera. This data was being fed to an Unreal game engine, which was then adding in animated digital objects. They were also using an additional ultra-wide-angle camera to capture realtime lighting information from the scene, which was then being passed back to Unreal to be used as a dynamic reflection and lighting map. This ensured that digitally added objects were lit by the physical lights in the realworld scene.

Even a seemingly unrelated (but very enlightening) chat with StreamGuys president Kiriki Delany about all things related to content streaming still referenced gaming technology. Delany talked about their tests to build applications with Unity to provide streaming services in VR headsets.

Unity itself has further aspirations to move into storytelling rather than just gaming. The latest version of Unity features an editing timeline and color grading. This allows scenes to be built and animated, then played out through various virtual cameras to create a linear story. Since those scenes are being rendered in realtime, tweaks to scenes such as positions of objects, lights and material properties are instantly updated.

Game engines not only offer us new ways to create our content, but they are a pathway to create a new type of hybrid entertainment, which sits between a game and a film.

Deep Learning
Other undercurrents at IBC 2018 were the possibilities offered by machine learning and deep learning software. Essentially, a normal computer program is hard wired to give a particular output for a given input. Machine learning allows an algorithm to compare its output to a set of data and adjust itself if the output is not correct. Deep learning extends that principle by using neural network structures to make a vast number of assessments of input data, then draw conclusions and predications from that data.

Real-world applications are already prevalent and are largely related in our industry to processing viewing metrics. For example, Netflix suggests what we might want to watch next by comparing our viewing habits to others with a similar viewing pattern.

But deep learning offers — indeed threatens — much more. Of course, it is understandable to think that, say, delivery drivers might be redundant in a world where autonomous vehicles rule, but surely creative jobs are safe, right? Think again!

IBM was showing how its Watson Studio has used deep learning to provide automated editing highlights packages for sporting events. The process is relatively simple to comprehend, although considerably more complicated in practice. A DL algorithm is trained to scan a video file and “listen” for a cheering crowd. This finds the highlight moment. Another algorithm rewinds back from that to find the logical beginning of that moment, such as the pass forward, the beginning of the volley etc. Taking the score into account helps decide whether that highlight was pivotal to the outcome of the game. Joining all that up creates a highlight package without the services of an editor. This isn’t future stuff. This has been happening over the last year.

BBC R&D was talking about their trials to have DL systems control cameras at sporting events, as they could be trained to follow the “two thirds” framing rule and to spot moments of excitement that justified close-ups.

In post production, manual tasks such as rotoscoping and color matching in color grading could be automated. Even styles for graphics, color and compositing could be “learned” from other projects.

It’s certainly possible to see that deep learning systems could provide a great deal of assistance in the creation of day-to-day media. Tasks that are based on repetitiveness or formula would be the obvious targets. The truth is, much of our industry is repetitive and formulaic. Investors prefer content that is more likely to be a hit, and this leads to replication over innovation.

So, are we heading for “Skynet” and need Arnold to save us? I thought it was very telling that IBM occupied the central stand position in Hall 7 — traditionally the home of the tech companies that have driven creativity in post. Clearly, IBM and its peers are staking their claim. I have no doubt that DL and ML will make massive changes to this industry in the years ahead. Creativity is probably, but not necessarily, the only defence for mere humans to keep a hand in.

That said, at IBC2018 the most popular place for us mere humans to visit was a bar area called The Beach, where we largely drank Heineken. If the ultimate deep learning system is tasked to emulate media people, surely it would create digital alcohol and spend hours talking nonsense, rather than try and take over the media world? So perhaps we have a few years left yet.


David Cox is a VFX compositor and colorist with 20-plus years of experience. He started his career with MPC and The Mill before forming his own London-based post facility. Cox recently created interactive projects with full body motion sensors and 4D/AR experiences.


Our SIGGRAPH 2018 video coverage

SIGGRAPH is always a great place to wander around and learn about new and future technology. You can get see amazing visual effects reels and learn how the work was created by the artists themselves. You can get demos of new products, and you can immerse yourself in a completely digital environment. In short, SIGGRAPH is educational and fun.

If you weren’t able to make it this year, or attended but couldn’t see it all, we would like to invite you to watch our video coverage from the show.

SIGGRAPH 2018


postPerspective Impact Award winners from SIGGRAPH 2018

postPerspective has announced the winners of our Impact Awards from SIGGRAPH 2018 in Vancouver. Seeking to recognize debut products with real-world applications, the postPerspective Impact Awards are voted on by an anonymous judging body made up of respected industry artists and professionals. It’s working pros who are going to be using new tools — so we let them make the call.

The awards honor innovative products and technologies for the visual effects, post production and production industries that will influence the way people work. They celebrate companies that push the boundaries of technology to produce tools that accelerate artistry and actually make users’ working lives easier.

While SIGGRAPH’s focus is on VFX, animation, VR/AR, AI and the like, the types of gear they have on display vary. Some are suited for graphics and animation, while others have uses that slide into post production, which makes these SIGGRAPH Impact Awards doubly interesting.

The winners are as follows:

postPerspective Impact Award — SIGGRAPH 2018 MVP Winner:

They generated a lot of buzz at the show, as well as a lot of votes from our team of judges, so our MVP Impact Award goes to Nvidia for its Quadro RTX raytracing GPU.

postPerspective Impact Awards — SIGGRAPH 2018 Winners:

  • Maxon for its Cinema 4D R20 3D design and animation software.
  • StarVR for its StarVR One headset with integrated eye tracking.

postPerspective Impact Awards — SIGGRAPH 2018 Horizon Winners:

This year we have started a new Imapct Award category. Our Horizon Award celebrates the next wave of impactful products being previewed at a particular show. At SIGGRAPH, the winners were:

  • Allegorithmic for its Substance Alchemist tool powered by AI.
  • OTOY and Epic Games for their OctaneRender 2019 integration with UnrealEngine 4.

And while these products and companies didn’t win enough votes for an award, our voters believe they do deserve a mention and your attention: Wrnch, Google Lightfields, Microsoft Mixed Reality Capture and Microsoft Cognitive Services integration with PixStor.

 


DeepMotion’s Neuron cloud app trains digital characters using AI

DeepMotion has launched DeepMotion Neuron, the first tool for completely procedural, physical character animation, for presale. The cloud application trains digital characters to develop physical intelligence using advanced artificial intelligence (AI), physics and deep learning. With guidance and practice, digital characters can now achieve adaptive motor control just as humans do, in turn allowing animators and developers to create more lifelike and responsive animations than those possible using traditional methods.

DeepMotion Neuron is a behavior-as-a-service platform that developers can use to upload and train their own 3D characters, choosing from hundreds of interactive motions available via an online library. Neuron will enable content creators to tell more immersive stories by adding responsive actors to games and experiences. By handling large portions of technical animation automatically, the service also will free up time for artists to focus on expressive details.

DeepMotion Neuron is built on techniques identified by researchers from DeepMotion and Carnegie Mellon University who studied the application of reinforcement learning to the growing domain of sports simulation, specifically basketball, where real-world human motor intelligence is at its peak. After training and optimization, the researchers’ characters were able to perform interactive ball-handling skills in real-time simulation. The same technology used to teach digital actors how to dribble can be applied to any physical movement using Neuron.

DeepMotion Neuron’s cloud platform is slated for release in Q4 of 2018. During the DeepMotion Neuron prelaunch, developers and animators can register on the DeepMotion website for early access and discounts.


Epic Games launches Unreal Engine 4.20

Epic Games has introduced Unreal Engine 4.20, which allows developers to build even more realistic characters and immersive environments across games, film and TV, VR/AR/MR and enterprise applications. The Unreal Engine 4.20 release combines the latest realtime rendering advancements with improved creative tools, making it even easier to ship games across all platforms. With hundreds of optimizations, especially for iOS, Android and Nintendo Switch — which have been built for Fortnite and are now rolled into Unreal Engine 4.20 and released to all users — Epic is providing developers with the scalable tools they need for these types of projects.

Artists working in visual effects, animation, broadcast and virtual production will find enhancements for digital humans, VFX and cinematic depth of field, allowing them to create realistic images across all forms of media and entertainment. In the enterprise space, Unreal Studio 4.20 includes upgrades to the UE4 Datasmith plugin suite, such as SketchUp support, which make it easier to get CAD data prepped, imported and working in Unreal Engine.

Here are some key features of Unreal Engine 4.20:

A new proxy LOD system: Users can handle sprawling worlds via UE4’s production-ready Proxy LOD system for the easy reduction of rendering cost due to poly count, draw calls and material complexity. Proxy LOD offers big gains when developing for mobile and console platforms.

A smoother mobile experience: Over 100 mobile optimizations developed for Fortnite come to all 4.20 users, marking a major shift for easy “shippability” and seamless gameplay optimization across platforms. Major enhancements include improved Android debugging, mobile Landscape improvements, RHI thread on Android and occlusion queries on mobile.

Works better with Switch: Epic has improved Nintendo Switch development by releasing tons of performance and memory improvements built for Fortnite on Nintendo Switch to 4.20 users as well.

Niagara VFX (early access): Unreal Engine’s new programmable VFX editor, Niagara, is now available in early access and will help developers take their VFX to the next level. This new suite of tools is built from the ground up to give artists unprecedented control over particle simulation, rendering and performance for more sophisticated visuals. This tool will eventually replace the Unreal Cascade particle editor.

Cinematic depth of field: Unreal Engine 4.20 delivers tools for achieving depth of field at true cinematic quality in any scene. This brand-new implementation replaces the Circle DOF method. It’s faster, cleaner and provides a cinematic appearance through the use of a procedural bokeh simulation. Cinematic DOF also supports alpha channel and dynamic resolution stability, and has multiple settings for scaling up or down on console platforms based on project requirements. This feature debuted at GDC this year as part of the Star Wars “Reflections” demo by Epic, ILMxLAB and Nvidia.

Digital human improvements: In-engine tools now include dual-lobe specular/double Beckman specular models, backscatter transmission in lights, boundary bleed color subsurface scattering, iris normal slot for eyes and screen space irradiance to build the most cutting-edge digital humans in games and beyond.

Live record and replay: All developers now have access to code from Epic’s Fortnite Replay system. Content creators can easily use footage of recorded gameplay sessions to create incredible replay videos.

Sequencer cinematic updates: New features include frame accuracy, media tracking, curve editor/evaluation and Final Cut Pro 7 XML import/export.

Shotgun integration: Shotgun, a production management and asset tracking solution, is now supported. This will streamline workflows for Shotgun users in game development who are leveraging Unreal’s realtime performance. Shotgun users can assign tasks to specific assets within Unreal Engine.

Mixed reality capture support (early access): Users with virtual production workflows will now have mixed reality capture support that includes video input, calibration and in-game compositing. Supported webcams and HDMI capture devices allow users to pull real-world greenscreened video into the engine, and supported tracking devices can match your camera location to the in-game camera for more dynamic shots.

AR support: Unreal Engine 4.20 ships with native support for ARKit 2, which includes features for creating shared, collaborative AR experiences. Also included is the latest support for Magic Leap One, Google ARCore 1.2 support.

Metadata control: Import metadata from 3ds Max, SketchUp and other common CAD tools for the opportunity to batch process objects by property, or expose metadata via scripts. Metadata enables more creative uses of Unreal Studio, such as Python script commands for updating all meshes of a certain type, or displaying relevant information in interactive experiences.

Mesh editing tools: Unreal Engine now includes a basic mesh editing toolset for quick, simple fixes to imported geometry without having to fix them in the source package and re-import. These tools are ideal for simple touch-ups without having to go to another application. Datasmith also now includes a base Python script that can generate Level of Detail (LOD) meshes automatically.

Non-destructive re-import: Achieve faster iteration through the new parameter tracking system, which monitors updates in both the source data and Unreal Editor, and only imports changed elements. Previous changes to the scene within Unreal Editor are retained and reapplied when source data updates.


GTC embraces machine learning and AI

By Mike McCarthy

I had the opportunity to attend GTC 2018, Nvidia‘s 9th annual technology conference in San Jose this week. GTC stands for GPU Technology Conference, and GPU stands for graphics processing unit, but graphics makes up a relatively small portion of the show at this point. The majority of the sessions and exhibitors are focused on machine learning and artificial intelligence.

And the majority of the graphics developments are centered around analyzing imagery, not generating it. Whether that is classifying photos on Pinterest or giving autonomous vehicles machine vision, it is based on the capability of computers to understand the content of an image. Now DriveSim, Nvidia’s new simulator for virtually testing autonomous drive software, dynamically creates imagery for the other system in the Constellation pair of servers to analyze and respond to, but that is entirely machine-to-machine imagery communication.

The main exception to this non-visual usage trend is Nvidia RTX, which allows raytracing to be rendered in realtime on GPUs. RTX can be used through Nvidia’s OptiX API, as well as Microsoft’s DirectX RayTracing API, and eventually through the open source Vulkan cross-platform graphics solution. It integrates with Nvidia’s AI Denoiser to use predictive rendering to further accelerate performance, and can be used in VR applications as well.

Nvidia RTX was first announced at the Game Developers Conference last week, but the first hardware to run it was just announced here at GTC, in the form of the new Quadro GV100. This $9,000 card replaces the existing Pascal-based GP100 with a Volta-based solution. It retains the same PCIe form factor, the quad DisplayPort 1.4 outputs and the NV-Link bridge to pair two cards at 200GB/s, but it jumps the GPU RAM per card from 16GB to 32GB of HBM2 memory. The GP100 was the first Quadro offering since the K6000 to support double-precision compute processing at full speed, and the increase from 3,584 to 5,120 CUDA cores should provide a 40% increase in performance, before you even look at the benefits of the 640 Tensor Cores.

Hopefully, we will see simpler versions of the Volta chip making their way into a broader array of more budget-conscious GPU options in the near future. The fact that the new Nvidia RTX technology is stated to require Volta architecture CPUs leads me to believe that they must be right on the horizon.

Nvidia also announced a new all-in-one GPU supercomputer — the DGX-2 supports twice as many Tesla V100 GPUs (16) with twice as much RAM each (32GB) compared to the existing DGX-1. This provides 81920 CUDA cores addressing 512GB of HBM2 memory, over a fabric of new NV-Link switches, as well as dual Xeon CPUs, Infiniband or 100GbE connectivity, and 32TB of SSD storage. This $400K supercomputer is marketed as the world’s largest GPU.

Nvidia and their partners had a number of cars and trucks on display throughout the show, showcasing various pieces of technology that are being developed to aid in the pursuit of autonomous vehicles.

Also on display in the category of “actually graphics related” was the new Max-Q version of the mobile Quadro P4000, which is integrated into PNY’s first mobile workstation, the Prevail Pro. Besides supporting professional VR applications, the HDMI and dual DisplayPort outputs allow a total of three external displays up to 4K each. It isn’t the smallest or lightest 15-inch laptop, but it is the only system under 17 inches I am aware of that supports the P4000, which is considered the minimum spec for professional VR implementation.

There are, of course, lots of other vendors exhibiting their products at GTC. I had the opportunity to watch 8K stereo 360 video playing off of a laptop with an external GPU. I also tried out the VRHero 5K Plus enterprise-level HMD, which brings the VR experience to whole other level. Much more affordable is TP-Cast’s $300 wireless upgrade Vive and Rift HMDs, the first of many untethered VR solutions. HTC has also recently announced the Vive Pro, which will be available in April for $800. It increases the resolution by 1/3 in both dimensions to 2880×1600 total, and moves from HDMI to DisplayPort 1.2 and USB-C. Besides VR products, they also had all sorts of robots in various forms on display.

Clearly the world of GPUs has extended far beyond the scope of accelerating computer graphics generation, and Nvidia is leading the way in bringing massive information processing to a variety of new and innovative applications. And if that leads us to hardware that can someday raytrace in realtime at 8K in VR, then I suppose everyone wins.


Mike McCarthy is an online editor/workflow consultant with 10 years of experience on feature films and commercials. He has been involved in pioneering new solutions for tapeless workflows, DSLR filmmaking and multi-screen and surround video experiences. Check out his site.

Axle Video rebrands as Axle AI

Media management company Axle Video has rebranded as Axle AI. The company has also launched their new Axle AI software, allowing users to automatically index and search large amounts of video, image and audio content.

Axle AI is available either as software, which runs on standard Mac hardware, or as a self-contained software/hardware appliance. Both options provide integrations with leading cloud AI engines. The appliance also includes embedded processing power that supports direct visual search for thousands of hours of footage with no cloud connectivity required. Axle AI has an open architecture, so new third-party capabilities can be added at any time.

Axle has also launched Axle Media Cloud with Wasabi, a 100% cloud-based option for simple media management. The offering is available now and is priced at $400 per month for 10 terabytes of managed storage, 10 user accounts and up to 10 terabytes of downloaded media per month.

In addition, Axle Embedded is a new version of axle software that can be run directly on storage solutions from a range of industry partners, including, G-Technology and Panasas. As with Axle Media Cloud, all of Axle AI’s automated tagging and search capabilities are simple add-ons to the system.