On Pre-chewing Compression Degradation for Learned Video Compression

Man M. Ho¹

Heming Sun^2,3

Zhiqiang Zhang¹

Jinjia Zhou¹

1. Hosei Univerisity, Tokyo, Japan

2. Waseda University, Tokyo, Japan

3. JST PRESTO, Tokyo, Japan

[Paper]

[Source Code]

In IEEE International Conference on Visual Communications and Image Processing (VCIP), 2022

Abstract

Artificial Intelligence (AI) needs huge amounts of data, and so does Learned Restoration for Video Compression. There are two main problems regarding training data. 1) Preparing training compression degradation using a video codec (e.g., Versatile Video Coding - VVC) costs a considerable resource. Significantly, the more Quantization Parameters (QPs) we compress with, the more coding time and storage are required. 2) The common way of training a newly initialized Restoration Network on pure compression degradation at the beginning is not effective. To solve these problems, we propose a Degradation Network to pre-chew (generalize and learn to synthesize) the real compression degradation, then present a hybrid training scheme that allows a Restoration Network to be trained on unlimited videos without compression. Concretely, we propose a QP-wise Degradation Network to learn how to compress video frames like VVC in real-time and can transform the degradation output between QPs linearly. The real compression degradation is thus pre-chewed as our Degradation Network can synthesize the more generalized degradation for a newly initialized Restoration Network to learn easier. To diversify training video content without compression and avoid overfitting, we design a Training Framework for Semi-Compression Degradation (TF-SCD) to train our model on many fake compressed videos together with real compressed videos. As a result, the Restoration Network can quickly jump to the near-best optimum at the beginning of training, proving our promising scheme of using pre-chewed data for the very first steps of training. In other words, a newly initialized Learned Video Compression can be warmed up efficiently but effectively with our pre-trained Degradation Network. Besides, our proposed TF-SCD can further enhance the restoration performance in a specific range of QPs and provide a better generalization about QPs compared with the common way of training a restoration model.

Overall Concept

Figure: Our Training Framework for Semi-Compression Degradation. We train a QP-wise Degradation Network to learn and synthesize compression degradation for an unlimited number of videos on the Internet. As an advantage, the Restoration Network can be trained on more video frames without compression, diversifying training video content and QPs.

Network Architecture

Ablation study on metrics to learn compression degradation

Table: Ablation study on loss functions such as L1, L2, and LPIPS for generating compression degradation. The combination of L1 and LPIPS provides the best performance in PSNR, LPIPS, and MS-SSIM on average as bold.

Quantization Parameters	L2			L1			L1+LPIPS↓
Quantization Parameters	PSNR↑	LPIPS↓	MS-SSIM↑	PSNR↑	LPIPS↓	MS-SSIM↑	PSNR↑	LPIPS↓	MS-SSIM↑
32	36.31	0.0910	0.9668	37.92	0.0476	0.9777	38.42	0.0368	0.9799
35	36.03	0.0958	0.9647	37.49	0.0570	0.9757	37.99	0.0428	0.9781
37	35.95	0.0976	0.9657	37.25	0.0630	0.9763	37.71	0.0476	0.9782
40	35.57	0.1001	0.9628	36.59	0.0721	0.9728	36.93	0.0566	0.9747
42	35.27	0.1026	0.9606	36.09	0.0790	0.9701	36.33	0.0640	0.9718
45	34.74	0.1084	0.9560	35.22	0.0936	0.9645	35.33	0.0785	0.9658
47	34.27	0.1157	0.9519	34.53	0.1086	0.9592	34.57	0.0941	0.9603
Avg.	35.45	0.1016	0.9612	36.44	0.0744	0.9709	36.76	0.0601	0.9727

Comparison

Table: Quantitative Comparison on Restoration Performance between DnCNN trained on S-Set with Real QPs={32,37,42,47} and S-Set with Real QPs={32,37,42,47} combined with U-Set with Fake QPs={32,35,37,40,42,45,47} on videos from the Internet using PSNR, LPIPS, MS-SSIM. Our method outperforms the common way in a QP range of [22 .. 51] as highlighted rows.

Quantization Parameters	The Common Way			Our Way
Quantization Parameters	PSNR↑	LPIPS↓	MS-SSIM↑	PSNR↑	LPIPS↓	MS-SSIM↑
17	42.69	0.0189	0.9880	42.53	0.0177	0.9875
22	40.87	0.0312	0.9837	40.92	0.0243	0.9839
27	39.60	0.0390	0.9788	39.68	0.0312	0.9793
32	38.23	0.0494	0.9718	38.32	0.0421	0.9725
35	37.31	0.0592	0.9662	37.40	0.0522	0.9670
37	36.54	0.0677	0.9591	36.59	0.0621	0.9597
40	35.52	0.0833	0.9508	35.55	0.0785	0.9514
42	34.80	0.0960	0.9444	34.81	0.0918	0.9451
45	33.72	0.1195	0.9340	33.73	0.1170	0.9347
47	32.92	0.1398	0.9245	32.92	0.1383	0.9250
51	31.22	0.1984	0.8997	31.22	0.1978	0.9003

Consider citing our work


					@inproceedings{ho2022pre, 

						title={On Pre-chewing Compression Degradation for Learned Video Compression}, 

						author={Ho, Man Minh and Sun, Heming and Zhang, Zhiqiang and Zhou, Jinjia}, 

						booktitle={2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)}, 

						pages={1--5}, 

						year={2022},

						organization={IEEE}

					}

License

This work (as well as its materials) is for non-commercial uses and research purposes only.