In the field of video compression, there are the same things as Moore's Law (the number of transistors doubles every two years), MPEG-1 in 1993, MPEG-4/AVC (H.264) in 2003, MPEG-H/HEVC in 2013 ( H.265). For reference, for image compression, JPEG came out in 1992 and JPEG-2000 came out in 2000.
However, in fact, the current H.265 has a very similar structure to the MPEG-1 released in 1993, and the image codecs HEIF and BPG based on this also have a block-based directional prediction technology called directional intra prediction. It doesn't make much difference. In summary, limited to image compression, it is "predicting the block to be coded by using the values of the neighboring pixels that have already been coded, and then compressing the remaining values after subtraction using DCT to focus on important information."
Among the above technologies, the only part related to human 'visual cognitive characteristics' is frequency expression by DCT. Visual system optimization is performed by making the error tolerance for each of the low-frequency region corresponding to the 'face' and the high-frequency region corresponding to the 'line' different, but even that has been seldom used with the introduction of directional prediction technology.
In the early stages of standardization of HEVC/H.265 (about 10 years ago), 'generation'-based technologies have been discussed at international standardization organizations. Specifically, in the case of areas such as 'water', 'forest', and 'sand', it is a technology that improves the compression rate by 30% while maintaining visual similarity by replacing the generated texture with the generated texture rather than the exact original data. However, at that time, different generation models were required for each of 'water', 'forest', and 'sand', and it was not adopted as a standard because of lack of generality.
The paper at the link below deals with the technique of using GAN in the field of image compression. With the advent of GAN, the generation technology for general textures has greatly improved, and it is possible to learn a large amount of data, far exceeding the performance of existing signal processing-based technologies. Since the advent of JPEG in 1992, it can be seen as the biggest paradigm change (analysis synthesis -> generation), but even though technologies that show more than twice the efficiency of JPEG such as JPEG-2000, HEIF, BPG, etc. have been released, it has not been able to settle in the market due to legacy dependence and complexity. Given that, it is still unclear how much GAN-based technology (10s to hundreds of times more complex than BPG) could be used in the future market.