NAB
https://amplify.nabshow.com/articles/ai-compression-enters-the-frame/
In a world where online video use is soaring and bandwidth
remains at a premium, video compression is essential to keep the gears running
smoothly.
But conventional techniques have reached the end of the
line. The coding algorithm on which all major video compression schemes have
been based for 30+ years has been refined and refined, but it is still based on
the same original concept.
Even Versatile Video Coding (VVC) which MPEG is targeting at
‘next-gen’ immersive applications like 8K VR is only an evolutionary step
forward from HEVC, itself a generation away from H.261 in 1988.
What’s more, the physical capacity of the silicon chip is
reaching its limit too. Codecs are at an evolutionary cul-de-sac. What we need
is a new species.
AI codecs developed
As this article for RedShark News makes clear, the smarts of codec development are
being trained on artificial intelligence, machine learning, and neural
networks. They have the benefit of being software-based and therefore more
suited for an environment in which applications will run on generic hardware or
virtualised in the cloud.
Among companies with AI-based codecs is V-Nova. Its VC6
codec, standardised as SMPTE ST 2117 can calculate bitrate to optimise bandwidth usage while maintaining an
appropriate level of quality at superspeed.
Nvidia’s Maxine system uses an AI to compress video for virtual collaborations like video
conferencing.
Haivision offers Lightflow Encode which uses ML to analyse video content (per
title or per scene), to determine the optimal bitrate ladder and encoding
configuration for video.
This also uses a video quality metric called LQI which
represents how well the human visual system perceives video content at
different bitrates and resolutions.
Perceptual quality rather than ‘broadcast quality’ is
increasingly being used to rate video codecs and automate bit rate tuning.
Metrics like VMAF (Video Multi-method Assessment Fusion) combines human vision
modelling with machine learning and seeks to understand how viewers perceive
content when streamed on a laptop, connected TV or smartphone.
It was originated by Netflix and is now open sourced.
Perceptual quality and VMAF
“VMAF can capture larger differences between codecs, as well
as scaling artifacts, in a way that’s better correlated with perceptual
quality,” Netflix explains. “It enables us to compare codecs
in the regions which are truly relevant.”
iSize Technologies has developed an encoder to capitalise on the trend for perceptual quality
metrics. Its bitrate saving and quality improvements are achieved by
incorporating a proprietary deep perceptual optimisation and
precoding technology as a pre-processing stage of a standard codec pipeline.
This ‘precoder’ stage enhances details of the areas of each
frame that affect the perceptual quality score of the content after encoding
and dials down details that are less important.
“Our perceptual optimisation algorithm seeks to understand
what part of the picture triggers our eyes and what we don’t notice at all,”
explains CEO Sergio Grce.
This not only keeps an organisation’s existing codec
infrastructure and workflow unchanged but is claimed to save 30 to 50 percent
on bitrate at the cost in latency of just 1 frame – making it suitable for live
as well as VOD.
The company has tested its technology (shown here iSize | Bringing Efficient, Intelligent And
Sustainable Solutions To Video Delivery) against AVC, HEVC and VVC with
“substantial savings” in each case.
“Companies with planet scale steaming services like YouTube
and Netflix have started to talk about hitting the tech walls,” says Grce.
“Their content is generating millions and millions of views but they cannot
adopt a new codec or build new data centres fast enough to cope with such an
increase in streaming demand.”
Using CNN
ML techniques which have been used heavily in image
recognition will be key to meeting the growing demand for video streaming that
we are seeing, according to Christian Timmerer, a co-founder of streaming
technology company Bitmovin and a member of the research project
Athena Christian Doppler Pilot Laboratory. The lab is currently preparing for
large-scale testing of a convolutional neural network (CNN) integrated into
production-style video coding solutions.
Timmerer’s team have proposed the use of CNNs to speed the encoding of ‘multiple representations’ of video.
In layperson’s terms, videos are stored in versions or ‘representations’ of
multiple sizes and qualities. The player, which is requesting the video content
from the server on which it resides, chooses the most suitable representation
based on whatever the network conditions are at the time.
In theory, this adds efficiency to the encoding and
streaming process. In practicality, however, the most common approach for
delivering video over the Internet - HTTP Adaptive Streaming limits in the
ability to encode the same content at different quality levels.
“Fast multirate encoding approaches leveraging CNNs, we
found, may offer the ability to speed the process by referencing information
from previously encoded representations,” he explains. “Basing
performance on the fastest, not the slowest element in the process.”
IP protection and standards
There’s a body looking to wrap a framework around these and
future developments in media as well as applications in other industries. MPAI
– Moving pictures, audio and data coding by Artificial Intelligence - is
founded by MPEG co-founder Leonardo Chiariglione.
He blogs about the 1997 the match between IBM Deep Blue and Garry Kasparov which made
headlines when the machine beat the man.
“As with IBM Deep Blue, old coding tools had a priori
statistical knowledge modelled and hardwired in the tools, but in AI, knowledge
is acquired by learning the statistics.
“This is the reason why AI tools are more promising than
traditional data processing tools. For a new age you need new tools and a new
organisation tuned to use those new tools.”
No comments:
Post a Comment