An international standard for a new video codec named (ISO MPEG) VVC or (ITU-T) H.266 has been released. Share related articles.
Unlike HEVC, where most of the technologies were completed before the advent of deep learning technology, in the case of VVC, deep learning-based technologies tried to enter the standard. However, since there was no technology with high level of completion to replace the entire Prediction + Transform hybrid framework that has been refined for decades, efforts were made to improve the performance of the existing framework partial technologies.
First of all, in the case of intra picture, which is used when surrounding frames cannot be referenced, only surrounding pixels can be used for prediction. A technique has been proposed to put these pixels as inputs into a 3-layer FCN and produce block pixels as outputs. In addition, VDSR technology, which is well known for its super-resolution and JPEG artifact reduction using CNN, was also used, and it was proposed as an in-loop filter, a technology that improves image quality after coding is complete.
However, unfortunately, it can be seen that no deep learning-based technology was finally adopted in VVC. The 3-layer FCN technology was simplified to 1-layer and became a simple matrix multiplication rather than a neural network, and the CNN-based in-loop filter was also excluded in terms of implementation complexity, and instead, ALF (adaptive loop filter), a technology that estimates a convolution filter for every frame without any nonlinear elements was adopted.
The main reason is that the efficiency gained compared to the added complexity is not so great. Most of them are more than twice as complex, but only efficiency improvement within 5% is observed, and there is no reason to include deep learning-based technology based on these criteria. It may not be the case when it is in the research stage, but it can be seen as a gateway that must be passed when applying it to the actual market.
International standard video compression technology is a mass market applied to billions of devices. (As of 2019, 1.5 billion smartphones, 200 million TVs, 300 million PCs, 140 million tablets, of which H.264 is almost 100%, HEVC is about 60%) If any of deep learning technologies is included at this time, it is probably the first deep learning based technology applied to the global international standard market, but it is not the case now.
The next international video coding standard does not come out in about eight years. Until then, I hope that deep learning will make a lot of progress in both efficiency and complexity reduction, so that it will be able to compete with the existing technology that has continued for more than 20 years.