In IEEE transactions on neural networks and learning systems
Good generalization performance is the fundamental goal of any machine learning algorithm. Using the uniform stability concept, this article theoretically proves that the choice of loss function impacts the generalization performance of a trained deep neural network (DNN). The adopted stability-based framework provides an effective tool for comparing the generalization error bound with respect to the utilized loss function. The main result of our analysis is that using an effective loss function makes stochastic gradient descent more stable which consequently leads to the tighter generalization error bound, and so better generalization performance. To validate our analysis, we study learning problems in which the classes are semantically correlated. To capture this semantic similarity of neighboring classes, we adopt the well-known semantics-preserving learning framework, namely label distribution learning (LDL). We propose two novel loss functions for the LDL framework and theoretically show that they provide stronger stability than the other widely used loss functions adopted for training DNNs. The experimental results on three applications with semantically correlated classes, including facial age estimation, head pose estimation, and image esthetic assessment, validate the theoretical insights gained by our analysis and demonstrate the usefulness of the proposed loss functions in practical applications.
Akbari Ali, Awais Muhammad, Bashar Manijeh, Kittler Josef