NAB News
Artificial Intelligence and Machine Learning are the great
hope for video compression if applications that require hefty data like 8K true
freedom to roam Virtual Reality are to come to pass. However, such techniques
burn through computer processing and power, so one area of investigation is to
make those compute costs more sustainable.
https://amplify.nabshow.com/articles/ai-encoding-is-evolving-everywhere/
Headway is being made according to an expert in the area,
according to Thierry Fautier, VP of video strategy at Harmonic. A first phase
will focus on AI/ML techniques using existing codecs such as AVC, HEVC, AV1,
and AVS3. A second phase will focus on newer codecs like VVC and AV2.
Explaining progress to the estimable Chris Chinnock of
the 8K Association, Fautier says Harmonic has already deployed the first
version of such AI-assisted encoding schemes that it calls Content Aware
Encoding (CAE).
The idea is to use AI and the mechanics of the human visual
system to “continuously assess video quality in real-time and focus bits where
and when they matter most for the viewer.” Exactly how the algorithm works
remains confidential, but Fautier says their operator customers see up to a 40%
bit rate reduction for comparable quality when implementing CAE.
“There are now over 100 CAE deployments worldwide using AVC
and HEVC mostly for OTT services,” noted Fautier, “and we have shown it can
reduce bit rates for 8K using HEVC during the French Open trial we did in 2019
with France Televisions.”
Other AI techniques using existing codecs can be put in two
categories: implementations that require a big increase in CPU usage, and
techniques like Convoluted Neural Networks (CNN) that are being studied in
groups like MPEG.
According to Chinnock, the focus with CNN solutions is to
re-localize compute power more to the client-side to save bandwidth.
Researchers are therefore trying to figure out how to balance the load between
AI-based algorithms that run on a neural network vs. the GPU/CPU processing
needed for the raw encoding.
“It is important to understand that AI techniques are based
on a learning process (supervised or not) where a considerable CPU budget is
used,” Chinnock reports. “One must also consider the CPU power used at run time
to try to limit its impact when using an AI-based technique. Netflix and some others
are using AI to make exhaustive encodes of all the parameter combinations and
deduce the best set of resolution-bit rate combinations. This is very accurate
but is also very CPU intensive and therefore not applicable to live
applications. It is also not very green in terms of carbon footprint or in
terms of dollars spent.”
AI Encoding on Existing Codecs
As for directions in AI-assisted encoding being deployed on
existing codecs, Fautier says there are three main areas of development:
dynamic resolution encoding; dynamic frame rate encoding; and layering.
Dynamic Resolution Encoding (DRE) is an extension to the
encoding ladders that OTT content providers use today. With Dynamic frame rate
encoding the idea is to encode only at a frame rate that is necessary. That is,
talking heads can likely be encoded at 30 fps or lower without loss of detail,
whereas live sports will probably need to be encoded at the frame rate at which
it is captured. The objective is to reduce the compute load for the encoding
process — by up to 30%, depending on the content.
Scalable HEVC, LCEVC, pre/post-processing pairing are all
examples of layering. With this approach, you encode a base layer at 4K
resolution along with an enhancement layer that conveys the extra 8K details.
These two layers may or may not be transmitted over the same transport system.
For example, a 4K signal could be broadcast with an enhancement layer sent over
an IP connection. If the receiving TV is 4K, it ignores the enhancement data.
But if an 8K TV receives these signals, it can use this enhancement data to
reconstruct and decode an 8K signal.
Chinnock says that this layering approach can be done today
using Scalable HEVC, deployed in the US in ATSC 3.0 with a base layer in HD for
mobile and an extension layer for 4K TVs. Scalable VVC and VVC-based LCEVC have
been proposed to the TV 3.0 consortium. Also under investigation is the use of
LCEVC to create a base layer of legacy HD AVC-encoded content with a UHD
extension layer.
One additional challenge with the use of neural networks is
in establishing the standards for the interchange of encoding/processing data.
MPEG is currently looking into this for its new version of video standards.
No comments:
Post a Comment