In Physiological measurement ; h5-index 36.0
OBJECTIVE : Efficient non-contact heart rate (HR) measurement from facial video has received much attention in health monitoring. Past methods relied on prior knowledge and unproven hypothesis to extract rPPG signals, e.g., manually designed regions of interest (ROIs) and skin reflection model.
APPROACH : This paper presents a short-time end-to-end HR estimation framework based on facial features and temporal relationships of video frames. In the proposed method, a deep 3D multi-scale network with cross-layer residual structure is designed to construct an autoencoder and extract robust remote photoplethysmography (rPPG) features. Then, a spatial-temporal fusion mechanism is proposed to help the network focus on features related to rPPG signals. Both shallow and fused 3D spatial-temporal features are distilled to suppress redundant information in the complex environment. Finally, a data augmentation strategy is presented to solve the problem of uneven distribution of HR in existing datasets.
MAIN RESULTS : The experimental results on four face-rPPG datasets show that our method overperforms the state-of-the-art methods and requires fewer video frames. Compared with the previous best results, the proposed method improves the RMSE by 5.9% , 3.4% 21.4% on the OBF dataset (intra-test), COHFACE dataset (intra-test) and UBFC dataset (cross-test), respectively.
SIGNIFICANCE : Our method achieves good results on diverse datasets (i.e., highly compressed video, low-resolution and illumination variation), demonstrating that our method can extract stable rPPG signals in short time.
Li Bin, Jiang Wei, Peng Jinye, Li Xiaobai
Remote photoplethysmography, fusion attention mechanism, heart rate estimation, short-time monitoring, spatial-temporal convolution