Tag Archives: machine learning

AI for M&E: Should you take the leap?

By Nick Gold

In Hollywood, the promise of artificial intelligence is all the rage. Who wouldn’t want a technology that adds the magic of AI to smarter computers for an instant solution to tedious, time-intensive problems? With artificial intelligence, anyone with abundant rich media assets can easily churn out more revenue or cut costs, while simplifying operations … or so we’re told.

If you attended IBC, you probably already heard the pitch: “It’s an ‘easy’ button that’s simple to add to the workflow and foolproof to operate, turning your massive amounts of uncategorized footage into metadata.”

But should you take the leap? Before you sign on the dotted line, take a closer look at the technology behind AI and what it can — and can’t — do for you.

First, it’s important to understand the bigger picture of artificial intelligence in today’s marketplace. Taking unstructured data and generating relevant metadata from it is something that other industries have been doing for some time. In fact, many of the tools we embrace today started off in other industries. But unlike banking, finance or healthcare, our industry prioritizes creativity, which is why we have always shied away from tools that automate. The idea that we can rely on the same technology as a hedge fund manager just doesn’t sit well with many people in our industry, and for good reason.

Nick Gold talks AI for a UCLA Annex panel.

In the media and entertainment industry, we’re looking for various types of metadata that could include a transcript of spoken words, important events within a period of time or information about the production (e.g., people, location, props), and currently there’s no single machine-learning algorithm that will solve for all these types of metadata parameters. For that reason, the best starting point is to define your problems and identify which machine learning tools may be able to solve them. Expecting to parse reams of untagged, uncategorized and unstructured media data is unrealistic until you know what you’re looking for.

What works for M&E?
AI has become pretty good at solving some specific problems for our industry. Speech-to-text is one of them. With AI, extracting data from a generally accurate transcription offers an automated solution that saves time. However, it’s important to note that AI tools still have limitations. An AI tool, known as “sentiment analysis,” could theoretically look for the emotional undertones described in spoken word, but it first requires another tool to generate a transcript for analysis.

But no matter how good the algorithms are, they won’t give you the qualitative data that a human observer would provide, such as the emotions expressed through body language. They won’t tell you the facial expressions of the people being spoken to, or the tone of voice, pacing and volume level of the speaker, or what is conveyed by a sarcastic tone or a wry expression. There are sentiment analysis engines that try to do this, but breaking down the components ensures the parameters you need will be addressed and solved.

Another task at which machine learning has progressed significantly is logo recognition. Certain engines are good at finding, for example, all the images with a Coke logo in 10,000 hours of video. That’s impressive and quite useful, but it’s another story if you want to also find footage of two people drinking what are clearly Coke-shaped bottles where the logo is obscured. That’s because machine-learning engines tend to have a narrow focus, which goes back to the need to define very specifically what you hope to get from it.

There are a bevy of algorithms and engines out there. If you license a service that will find a specific logo, then you haven’t solved your problem for finding objects that represent the product as well. Even with the right engine, you’ve got to think about how this information fits in your pipeline, and there are a lot of workflow questions to be explored.

Let’s say you’ve generated speech-to-text with audio media, but have you figured out how someone can search the results? There are several options. Sometimes vendors have their own front end for searching. Others may offer an export option from one engine into a MAM that you either already have on-premise or plan to purchase. There are also vendors that don’t provide machine learning themselves but act as a third-party service organizing the engines.

It’s important to remember that none of these AI solutions are accurate all the time. You might get a nudity detection filter, for example, but these vendors rely on probabilistic results. If having one nude image slip through is a huge problem for your company, then machine learning alone isn’t the right solution for you. It’s important to understand whether occasional inaccuracies will be acceptable or deal breakers for your company. Testing samples of your core content in different scenarios for which you need to solve becomes another crucial step. And many vendors are happy to test footage in their systems.

Although machine learning is still in its nascent stages, there is a lot of interest in learning how to make it work in the media workflow. It can do some magical things, but it’s not a magic “easy” button (yet, anyway). Exploring the options and understanding in detail what you need goes hand-in-hand with finding the right solution to integrate with your workflow.


Nick Gold is lead technologist for Baltimore’s Chesapeake Systems, which specializes in M&E workflows and solutions for the creation, distribution and preservation of content. Active in both SMPTE and the Association of Moving Image Archivists (AMIA), Gold speaks on a range of topics. He also co-hosts the Workflow Show Podcast.
 

Dell EMC’s ‘Ready Solutions for AI’ now available

Dell EMC has made available its new Ready Solutions for AI, with specialized designs for Machine Learning with Hadoop and Deep Learning with Nvidia.

Dell EMC Ready Solutions for AI eliminate the need for organizations to individually source and piece together their own solutions. They offer a Dell EMC-designed and validated set of best-of-breed technologies for software — including AI frameworks and libraries — with compute, networking and storage. Dell EMC’s portfolio of services include consulting, deployment, support and education.

Dell EMC’s Data Science Provisioning Portal offers an intuitive GUI that provides self-service access to hardware resources and a comprehensive set of AI libraries and frameworks, such as Caffe and TensorFlow. This reduces the steps it takes to configure a data scientist’s workspace to five clicks. Ready Solutions for AI’s distributed, scalable architecture offers the capacity and throughput of Dell EMC Isilon’s All-Flash scale-out design, which can improve model accuracy with fast access to larger data sets.

Dell EMC Ready Solutions for AI: Deep Learning with Nvidia solutions are built around Dell EMC PowerEdge servers with Nvidia Tesla V100 Tensor Core GPUs. Key features include Dell EMC PowerEdge R740xd and C4140 servers with four Nvidia Tesla V100 SXM2 Tensor Core GPUs; Dell EMC Isilon F800 All-Flash Scale-out NAS storage; and Bright Cluster Manager for Data Science in combination with the Dell EMC Data Science Provisioning Portal.

Dell EMC Ready Solutions for AI: Machine Learning with Hadoop includes an optimized solution stack, along with data science and framework optimization to get up and running quickly, and it allows expansion of existing Hadoop environments for machine learning.

Key features include Dell EMC PowerEdge R640 and R740xd servers; Cloudera Data Science Workbench for self-service data science for the enterprise; the Apache Spark open source unified data analytics engine; and the Dell EMC Data Science Provisioning Engine, which provides preconfigured containers that give data scientists access to the Intel BigDL distributed deep learning library on the Spark framework.

New Dell EMC Consulting services are available to help customers implement and operationalize the Ready Solution technologies and AI libraries, and scale their data engineering and data science capabilities. Dell EMC Education Services offers courses and certifications on data science and advanced analytics and workshops on machine learning in collaboration with Nvidia.

Dell makes updates to its Precision mobile workstation line

Recently, Dell made updates to its line of Precision mobile workstations targeting the media and entertainment industries. The Dell Precision 7730 and 7530 mobile workstations feature the latest eighth-generation IntelCore and Xeon processors, AMD Radeon WX and Nvidia Quadro professional graphics, 3200MHz SuperSpeed memory and memory capacity up to 128GB.

The Dell Precision 7530 is a 15-inch VR-ready mobile workstation with large PCIe SSD storage capacity, especially for a 15-inch mobile workstation — up to 6TB. Dell says the 7730 enables new uses such as AI and machine learning development and edge inference systems.

Also new is the 15-inch Dell Precision 5530 two-in-one, which targets content creation and editing and features a very thin design. A flexible 360-degree hinge enables multiple modes of interaction, including support for touch and pen. It features the next-generation InfinityEdge 4K Ultra HD display. The Dell Premium pen offers precise pressure sensitivity (4,096 pressure points), tilt functionality and low latency for an experience that is reminiscent of drawing on paper. The new MagLev keyboard design reduces keyboard thickness “without compromising critical keyboard shortcuts in content creation workflows,” and ultra-thin GORE Thermal Insulation keeps the system cool.

This workstation weighs 3.9 pounds and delivers next-generation professional graphics up to Nvidia Quadro P2000. With enhanced 2666MHz memory speeds up to 32GB, users can accelerate their complicated workflows. And with up to 4TB of SSD storage, users can access, transfer and store large 3D, video and multimedia files quickly and easily.

The fully customizable 15-inch Dell Precision 3530 mobile workstation features eighth-generation Intel Core and next-generation Xeon processors, memory speeds up to 2666MHz and Nvidia Quadro P600 professional graphics. It also features a 92WHr battery and wide range of ports, including HDMI 2.0, Thunderbolt and VGA.

GTC embraces machine learning and AI

By Mike McCarthy

I had the opportunity to attend GTC 2018, Nvidia‘s 9th annual technology conference in San Jose this week. GTC stands for GPU Technology Conference, and GPU stands for graphics processing unit, but graphics makes up a relatively small portion of the show at this point. The majority of the sessions and exhibitors are focused on machine learning and artificial intelligence.

And the majority of the graphics developments are centered around analyzing imagery, not generating it. Whether that is classifying photos on Pinterest or giving autonomous vehicles machine vision, it is based on the capability of computers to understand the content of an image. Now DriveSim, Nvidia’s new simulator for virtually testing autonomous drive software, dynamically creates imagery for the other system in the Constellation pair of servers to analyze and respond to, but that is entirely machine-to-machine imagery communication.

The main exception to this non-visual usage trend is Nvidia RTX, which allows raytracing to be rendered in realtime on GPUs. RTX can be used through Nvidia’s OptiX API, as well as Microsoft’s DirectX RayTracing API, and eventually through the open source Vulkan cross-platform graphics solution. It integrates with Nvidia’s AI Denoiser to use predictive rendering to further accelerate performance, and can be used in VR applications as well.

Nvidia RTX was first announced at the Game Developers Conference last week, but the first hardware to run it was just announced here at GTC, in the form of the new Quadro GV100. This $9,000 card replaces the existing Pascal-based GP100 with a Volta-based solution. It retains the same PCIe form factor, the quad DisplayPort 1.4 outputs and the NV-Link bridge to pair two cards at 200GB/s, but it jumps the GPU RAM per card from 16GB to 32GB of HBM2 memory. The GP100 was the first Quadro offering since the K6000 to support double-precision compute processing at full speed, and the increase from 3,584 to 5,120 CUDA cores should provide a 40% increase in performance, before you even look at the benefits of the 640 Tensor Cores.

Hopefully, we will see simpler versions of the Volta chip making their way into a broader array of more budget-conscious GPU options in the near future. The fact that the new Nvidia RTX technology is stated to require Volta architecture CPUs leads me to believe that they must be right on the horizon.

Nvidia also announced a new all-in-one GPU supercomputer — the DGX-2 supports twice as many Tesla V100 GPUs (16) with twice as much RAM each (32GB) compared to the existing DGX-1. This provides 81920 CUDA cores addressing 512GB of HBM2 memory, over a fabric of new NV-Link switches, as well as dual Xeon CPUs, Infiniband or 100GbE connectivity, and 32TB of SSD storage. This $400K supercomputer is marketed as the world’s largest GPU.

Nvidia and their partners had a number of cars and trucks on display throughout the show, showcasing various pieces of technology that are being developed to aid in the pursuit of autonomous vehicles.

Also on display in the category of “actually graphics related” was the new Max-Q version of the mobile Quadro P4000, which is integrated into PNY’s first mobile workstation, the Prevail Pro. Besides supporting professional VR applications, the HDMI and dual DisplayPort outputs allow a total of three external displays up to 4K each. It isn’t the smallest or lightest 15-inch laptop, but it is the only system under 17 inches I am aware of that supports the P4000, which is considered the minimum spec for professional VR implementation.

There are, of course, lots of other vendors exhibiting their products at GTC. I had the opportunity to watch 8K stereo 360 video playing off of a laptop with an external GPU. I also tried out the VRHero 5K Plus enterprise-level HMD, which brings the VR experience to whole other level. Much more affordable is TP-Cast’s $300 wireless upgrade Vive and Rift HMDs, the first of many untethered VR solutions. HTC has also recently announced the Vive Pro, which will be available in April for $800. It increases the resolution by 1/3 in both dimensions to 2880×1600 total, and moves from HDMI to DisplayPort 1.2 and USB-C. Besides VR products, they also had all sorts of robots in various forms on display.

Clearly the world of GPUs has extended far beyond the scope of accelerating computer graphics generation, and Nvidia is leading the way in bringing massive information processing to a variety of new and innovative applications. And if that leads us to hardware that can someday raytrace in realtime at 8K in VR, then I suppose everyone wins.


Mike McCarthy is an online editor/workflow consultant with 10 years of experience on feature films and commercials. He has been involved in pioneering new solutions for tapeless workflows, DSLR filmmaking and multi-screen and surround video experiences. Check out his site.