Wednesday 7 December 2016

Rise of the machines

Broadcast 

Using powerful pattern-recognition algorithms, artificial Intelligence software is increasingly able to take on the heavy lifting of post-production freeing up editors to concentrate on creative decisions.

p30-34

A science-fiction horror film about a bio-engineered being who exceeds the expectants of her creators provided fitting material for an experiment in editing using AI.

However, IBM’s AI system Watson was used as proof of concept to whittle down 90-minute Fox feature Morgan to six minutes of footage which was then used by an editor to cut the official trailer. The entire process was claimed to take just 24 hours.

Watson, which is described by IBM as ‘a technology platform that uses natural language processing and machine learning to reveal insights from large amounts of unstructured data’ was fed hundreds of suspense and horror film trailers and taught to categorise them by type of location, framing, lighting and audio, such as ambient sounds, characters’ tone of voice and musical score. 

“We looked at patterns in horror movie trailers to teach Watson the horror movie domain and it was able to cluster different kind of clips into categories, such as tender or scary moments,” says IBM manager of multimedia and vision John Smith.

When fed the full length feature Morgan, the system identified 10 moments that would be the best candidates for a trailer, nine of which were used by a craft editor for the final cut. 

“We decided not to use one clip because it was felt that it might convey information that could potentially be a spoiler. If we were working with a comedy, it would have a different set of parameters to select different types of moments.”

Software which automates part or all of the editing process is on the verge of breaking into professional post. Proponents argue that the technology can save time, and therefore cost, on anything from YouTube promos to fixed rig docs but does not yet threaten the role of the craft editor. Nonetheless, predictions of job losses will alarm Soho.

“There are two main and conflicting groups,” says Philip Hodgetts, co-founder, Lumberjack Systems. “One approach is to use Intelligent Augmentation to enhance human performance. The other uses Artificial Intelligence works to replace the human editor, and ultimately all of us, with machines.”

If that sounds a little dramatic then listen to the co-founder and CEO of Magisto, a AI-based consumer editing software developer with more than 80 million users.

“There are hundreds of tricks an editor uses every day to tell a story or evoke emotion,” says Dr. Oren Boiman. “Magisto employs those as algorithms.”

Cognitive computing in production process boils down to two principals. The first is the use of mathematical formulae to scan audio visual content and organise material according to a set of predefined parameters. The second is that of machine learning which means that over time the capabilities of the software become exponentially powerful.

“There is no doubt that computer algorithms will be involved in production’s future,” declares Hodgetts. “Today’s AI is very good at recognising patterns in a large dataset, but it is hard for AI to be creative enough to create good content. Smart people will work out how to master it.”

While auto-edit developers like Magisto have developed their own algorithms, Google, Microsoft and IBM are also developing APIs (Application Programming Interface) to generate the metadata building blocks for AI. At a basic level this includes analysis of in-frame action and camera motion, speech-to-text transcription, facial, object and even emotion detection. 

Combinations of such data are claimed to serve up as good as - if not better - results than a team of trained loggers and assistants - and in a fraction of the time. 

“AI can definitely help in identifying and marking up the content to give the editor and director more time and space to make creative decisions,” says Boiman. “It is automating all the manual heavy lifting.”

The current market for auto-edit packages are consumers or action sports enthusiasts who have neither the skill, time or inclination to edit video shot on GoPros or smartphones. Instead, they can upload raw footage to an application like Magisto or Antix and have it returned to them - with degrees of customisation - as a selection of highlights cut together into a storyline and packaged with effects, grading filters, even a music track.

The same techniques are also targeted at corporate communications and promos. “Making one cut of a promo was fine when TV was the only distribution medium but today that’s not good enough,” argues Boiman. “If you are creating a trailer for a newspaper website, Facebook, Snapchat, YouTube and Instagram each one should be formatted differently. You might target by gender, age or by country. With so many variants of the same source media required to optimise every impression online, doing so manually is extremely inefficient.”

The exact same pressures, it’s argued, will see cognitive computing applied to reality or documentary content where shoot ratios are increasing and turnaround times decreasing.

“If sixty hours have been shot in a day, even a week, there is no way an editor will be expected to see the rushes,” says Hodgetts. “The only way to do this is to use modern technologies to automate the task. It’s not about replacing an editor but making life easier for them by pre-assembling the material in a coherent way.

“Lumberjack automates the logging and pre-editing stages - all the really boring stuff up to the point where an editor takes over,” he says.

Lumberjack has been used on O.J. Speaks: The Hidden Tapes for A&E, and by Denmark’s STV to assemble 69 ten-minute episodes of semi-scripted kids series Klassen. “STV found the basic scene organisation saved them an enormous amount of time such that they would not have been able to do this show without it,” says Hodgetts. 

His goal is to present a documentary fully constructed by Lumberjack to SMPTE-backed trade association HPA in 2018: “We’re on track to do it.” 

Post house Evolutions facilitates fixed rig shows like One Born.., Educating.. and Eden. “If an AI could track one subject’s time in hospital over many months and provide that as a sequence I’d be interested,” says head of operations Ricky Martin. “I’d be surprised, though, if it could keep a story arc going, particularly when we are following a dozen other subjects simultaneously. Would an AI understand what key moments would prove useful on larger scale projects, or would it miss storytelling moments when so much is happening in a frame?”

“There may be some advantage to being able to transcribe, but the best transcriptions are only 80% accurate,” says Jason Cowen, business development director, Forbidden Technologies. “It’s also useful to discount content where nothing is happening - therefore selecting material where something is happening - but the ability to recognise a ‘smile’ from a ‘wry smile’ for example, is the sort of nuance which creates valuable content and is very much a human trait.”

Forbidden has no plans to incorporate AI into its Forscene edit package but does have AI at the core of its encoding technology Blackbird. “Every compression tech is based on algorithms but ours takes this further by using AI to examine every frame a thousand times a second and to define the best compression technique based on that frame,” explains Cowen. “This means we can offer online access to content without latency anywhere in the world.”

Martin also sees a place for automation on more formulaic material such as quiz shows. “You could almost auto-cut these as things stand by punching in the timecode of when events [such as questions being asked] are going to happen,” he says. “When four shows are shot a day, the turnaround is very tight.”

So far, AI is not being considered for scripted content, which is an area “with very set workflows and high budgets to deal with so people are unwilling to take a risk,” according to Hodgetts. 

“AI is an assistive tool,” says IBM’s Smith. “We see a combination of computer and human expertise as the sweet spot. A computer is incredibly powerful at parsing large repositories of data or ‘watching’ hundreds of hours of video and distilling that down to a smaller set of things, while the editor brings a unique set of skills that AI cannot replicate at this time.”

Automated systems could, however, replace loggers, runners or edit assistants. “Jobs will be lost,” believes Hodgetts. “Those assistant editing roles where material is organised around a manual transcript will be reduced and probably eradicated completely. If you can teach any job in one or two days then it will probably be automated out of existence in five to ten years.”

He suggests that this will happen faster in Europe than Hollywood. “Europe has less union involvement [than the U.S] and tends to be more willing to look at new solutions.”

Adam Theobald, founder of auto-edit software Antix admits that trying to sell the app into the video editing community is a challenge. “They put up a very big barrier because they are afraid of it replacing their jobs,” he says. “Yet we can demonstrate how our technology does the logistics and sourcing of content so that editors can concentrate on applying their creative process.”

While dismissing the idea of AI replacement as unlikely, Cowens agrees that it could raise the bar on quality. “Should it ever happen that an AI can take the grunt work out of post, the art of editing and grading will simply advance,” he says. “If we think about the quality of reality TV compared to drama there is more headroom for improvement but this would be a human role.”

Bioman insists that AI offers “editing superpowers” for craft editors do much more. “Professional video production tools like Avid are extremely sophisticated but dumb,” he contends. “They give the user ultimate control over the video, but such control forces you to do everything. 

“There’s a huge gulf between conventional professional video production which is one hundred per cent manual labour and software like ours which can produce entirely automated packages. There is no doubt the gap will close.”



AI edit tools 

Antix
Prosumer application with an emphasis on action sports. Its AI captures wearable data such as heart rate sensors, GPS, accelerometer and gyroscope. “Combined with other contextual data this can be used to create storylines with emotion,” says founder Adam Theobald. “For example, we can find clips where the subject has generated a faster heartbeat and those are more likely to represent a physical reaction.”
Users can customise the AI’s output. “Rather than just pumping video in and out we’re keen on letting the user put their own stamp on content because this will encourage others to share it more on social media.”

Magisto
Prosumer application that incorporates professional effects and transitions such as zooming in on important actions, panning to give movement to static shots and lowering the volume of the music when someone is talking. It applies colour correction, image stabilisation and other image enhancement technologies.
“We understand that AI cannot read the mind of a producer or marketer, not least because they may not know what they want from the process at the beginning,” explains Boiman. “We invite them to refine the initial draft by telling the AI, for example, to be more dramatic, or add highlights, incorporate stock footage or apply a new grade. In less than an hour we can have a very polished video.”

Lumberjack
Professional application that works with Final Cut Pro X and uses a Keyword Extraction API from Monkey Learn. On location one or more people log keywords ranges via a web-enabled device. These keywords are tied to the media and allow a producer or editor to set the AI to look for keywords, concepts and emotions relevant to where they think the story will unfold. “From the analysis of rushes you could immediately see that you had, for example, 18 minutes of a certain keyword and two minutes of another,” outlines Hodgetts. “That could suggest that you look for the story in the 18 minutes selection, or, if the keyword with two minutes is important to you, that indicates that you need to find more of that material.”
A Transcript mode will extract speech to text to save time searching for footage. 

No comments:

Post a Comment