Artificial intelligence for video compression is a
technology that is coming to a streaming service near you, and it can't arrive
quickly enough.
https://www.redsharknews.com/ai-video-compression-is-arriving-not-a-moment-too-soon
A year ago, when everyone decamped home overnight and
overheated the global demand for the internet. In a selfless act, Netflix,
YouTube and Disney+ dialled down their bitrates to ease bandwidth consumption
in the process deliberately compromising the ultimate quality of their service
(for about a month).
That immediate crisis may have subsided but in a world where
online video use is soaring and bandwidth remains at a premium, some longer
term solution is required. Even in a world with universal 5G, bandwidth is not
a finite resource. Not when 5G promises uber video-centric bandwidth hogging
applications like 8K VR.
New video compression technologies are the conventional
answer but the ‘Moore’s Laws’ for its development have reached the end of the
line. The coding algorithm has been tweaked over and over, but it is still
based on the same original scheme.
Even great new hope Versatile Video Coding (VVC) which MPEG
is targeting at ‘next-gen’ immersive applications is only an evolutionary step
forward from HEVC, itself a generation away from the neanderthal H.261 in 1988.
It’s not only the concept which has reached its limit. So
too has physical capacity on a silicon chip. Codecs are at an evolutionary
cul-de-sac. What we need is a new species.
AI compression enters the frame
The smarts of codec development are being trained on
artificial intelligence, machine learning, and neural networks.
AI/ML techniques fundamentally differ from traditional
methods because they can solve multi-dimensional issues that are difficult to
model mathematically. They are also software-based and therefore more suited
for an environment in which applications will run on generic hardware or
virtualised in the cloud.
“We think that you could use AI to retain essentially the
same schema as currently but using some AI modules,” says Lionel Oisel,
director, Imaging Science Lab, InterDigital which owns patents in HEVC and VVC.
“This would be quite conservative and be pushed by the more cost-conscious
manufacturers. We also think that we could throw the existing schema away and
start again using a compete end to end chain for AI - a neural network design.”
Some vendors have used ML to optimise the selection of encoding
parameters, and others have incorporated techniques at a much deeper level, for
example, to assist with the prediction of elements of output frames.
First AI-driven solutions
V-Nova lays claim to being the first company to have
standardised an AI-based codec. It teamed with Metaliquid, a video analysis
provider, to build V-Nova’s codec Perseus Pro into a AI solution for
contribution workflows now enshrined as VC-6 (SMPTE standard 2117).
Algorithms AI can calculate bitrate to optimise bandwidth
usage while maintaining an appropriate level of quality at superspeed.
Nvidia’s Maxine system uses an AI to compress video
for very low bandwidth video conferencing.
Haivision offers Lightflow Encode which uses ML to analyse
video content (per title or per scene), to determine the optimal bitrate ladder
and encoding configuration for video.
Perceptual optimisation
It uses a video quality metric called LQI which represents
how good the human visual system perceives video content at different bitrates
and resolutions. Haivision claims this results in “significant”
bitrate reductions and “perceptual quality improvements, ensuring that an
optimised cost-quality value is realised.”
Perceptual quality rather than ‘broadcast quality’ is
increasingly being used to rate video codecs and automate bit rate tuning.
Metrics like VMAF (Video Multi-method Assessment Fusion) combines human vision
modelling with machine learning and seeks to understand how viewers perceive
content when streamed on a laptop, connected TV or smartphone.
It was originated by Netflix and is now open sourced.
“VMAF can capture larger differences between codecs, as well
as scaling artifacts, in a way that’s better correlated with perceptual
quality,” Netflix explains “It enables us to compare codecs in the
regions which are truly relevant.”
ML techniques which have been used heavily in image
recognition will be key to meeting the growing demand for video streaming that
we are seeing, according to Christian Timmerer, a co-founder of streaming
technology company Bitmovin and a member of the research project
Athena Christian Doppler Pilot Laboratory. The lab is currently preparing for
large-scale testing of a convolutional neural network (CNN) integrated into production-style
video coding solutions.
In a paper recently presented to the IEEE
Timmerer’s team proposed the use of CNNs to speed up the encoding of ‘multiple
representations’ of video. In layperson’s terms, videos are stored in versions
or ‘representations’ of multiple sizes and qualities. The player, which is
requesting the video content from the server on which it resides, chooses the
most suitable representation based on whatever the network conditions are at
the time.
In theory, this adds efficiency to the encoding and
streaming process. In practicality, however, the most common approach for
delivering video over the Internet - HTTP Adaptive Streaming limits in the
ability to encode the same content at different quality levels.
“Fast multirate encoding approaches leveraging CNNs, we
found, may offer the ability to speed the process by referencing information
from previously encoded representations,” he explains. “Basing
performance on the fastest, not the slowest element in the process.”
iSIZE steps up
London-based startup iSIZE Technologies has
developed an encoder to capitalise on the trend for perceptual quality metrics
such as VMAF. Its bitrate saving and quality improvements are achieved by
incorporating a proprietary deep perceptual optimisation and
precoding technology as a preprocessing stage of a standard codec
pipeline.
This ‘precoder’ stage enhances details of the areas of each
frame that affect the perceptual quality score of the content after encoding
and dials down details that are less important.
“Our perceptual optimisation algorithm seeks to understand
what part of the picture triggers our eyes and what we don’t notice at
all,” explains Sergio Grce, company CEO.
This not only keeps an organisation’s existing codec
infrastructure and workflow unchanged but is claimed to save 30 to 50 percent
on bitrate at the cost in latency of just 1 frame – making it suitable for live
as well as VOD.
The company has tested its technology against AVC, HEVC and
VVC with “substantial savings” in each case.
“Companies with planet scale steaming services like YouTube
and Netflix have started to talk about hitting the tech walls,” says Grce. “Their
content is generating millions and millions of views but they cannot adopt a
new codec or build new data centres fast enough to cope with such an increase
in streaming demand.”
Old problem, new tools
Even MPEG co-founder Leonardo Chiariglione saw the writing
on the wall. He left the body in 2019 to found MPAI – Moving pictures, audio
and data coding by Artificial Intelligence (AI).
MPAI is an international non-profit organisation with
the mission is to develop AI enabled digital data compression specifications,
with clear Intellectual Property Rights (IPR) licensing frameworks – that is,
unlike MPEG in its latter days.
In 1997 the match between IBM Deep Blue and Garry Kasparov
made headlines. Machine beat man.
“As with IBM Deep Blue, old coding tools had a priori
statistical knowledge modelled and hardwired in the tools, but in AI, knowledge
is acquired by learning the statistics,” Chiariglione says.
“This is the reason why AI tools are more promising than
traditional data processing tools. For a new age you need new tools and a new
organisation tuned to use those new tools.”
No comments:
Post a Comment