In IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Recently, many deep learning based researches are conducted to explore the potential quality improvement of compressed videos. These methods mostly utilize either the spatial or temporal information to perform frame-level video enhancement. However, they fail in combining different spatial-temporal information to adaptively utilize adjacent patches to enhance the current patch and achieve limited enhancement performance especially on scene-changing and strong-motion videos. To overcome these limitations, we propose a patch-wise spatial-temporal quality enhancement network which firstly extracts spatial and temporal features, then recalibrates and fuses the obtained spatial and temporal features. Specifically, we design a temporal and spatial-wise attention-based feature distillation structure to adaptively utilize the adjacent patches for distilling patch-wise temporal features. For adaptively enhancing different patch with spatial and temporal information, a channel and spatial-wise attention fusion block is proposed to achieve patch-wise recalibration and fusion of spatial and temporal features. Experimental results demonstrate our network achieves peak signal-to-noise ratio improvement, 0.55 - 0.69 dB compared with the compressed videos at different quantization parameters, outperforming state-of-the-art approach.
Ding Qing, Shen Liquan, Yu Liangwei, Yang Hao, Xu Mai