In Computers in biology and medicine
BACKGROUND : Subject-wise modeling using machine learning is useful in many applications requiring low error and complexity, such as wearable medical devices. However, regression accuracy depends highly on the data available to train the model and the model's generalization ability. Adversely, the prediction error may increase severely if unknown data patterns test the model; such a model is known to be overfitted. In medicine-related applications, such as Non-Invasive Blood Pressure (NIBP) estimation, the high error renders the estimation model useless and dangerous.
METHODS : This paper presents a novel algorithm to handle overfitting by editing the training data to achieve generalization for subject-wise models. The pooling and patching (PaP) algorithms use a relatively short record segment of a subject as a Key-Segment (KS) to search through a larger dataset for similar subjects. Then samples taken from the matched subjects' pool records are used to patch the original subject's KS. Due to the significance of systolic blood pressure (SBP) and the complexity of its variability, non-invasive estimation of SBP from electrocardiography (ECG) and photoplethysmography (PPG) is introduced as an application to assess the algorithm. The study was performed on 2051 subjects with a wide range of age, height, weight, length, and health status. The subjects' records were taken from a large public dataset, VitalDB, which is acquired from subjects undergoing different surgeries. Finally, all the results are obtained without using other model generalization techniques.
RESULTS : The generalization effect of the proposed algorithm, PaP, significantly outperformed cross-validation, which is widely used in regression model generalization. Moreover, the testing results show that a KS of 200 to 2000 samples is sufficient for providing high accuracy for much longer testing data of about 12000 to 24000 samples long, which is less than %10 of the record length on average. Furthermore, compared to other works based on the same dataset, PaP provides a significantly lower mean error of -0.75 ± 5.51 mmHg, with a small training data portion of 15% over 2051 subjects.
Mohammed Hazem, Wang Kai, Wu Hao, Wang Guoxing
2022-Nov-11
Electrocardiography, Generalization, Machine learning, Overfitting, Photoplethysmography, Regression, Systolic blood pressure