In IEEE transactions on neural networks and learning systems
This article explores the utilization of the effective degree-of-freedom (DoF) of a deep learning model to regularize its stochastic gradient descent (SGD)-based training. The effective DoF of a deep learning model is defined only by a subset of its total parameters. This subset is highly responsive or sensitive toward the training loss, and its cardinality can be used to govern the effective DoF of a model during training. To this aim, the incremental trainable parameter selection (ITPS) algorithm is introduced in this article. The proposed ITPS algorithm acts as a wrapper over SGD and incrementally selects the parameters for updation that exhibit the maximum sensitivity toward the training loss. Hence, it gradually increases the DoF of the model during training. In ideal cases, the proposed algorithm arrives at a model configuration (i.e., DoF) optimum for the task at hand. This whole process results in a regularization-like behavior induced by a gradual increment of the DoF. Since the selection and updation of parameters is a function of the training loss, the proposed algorithm can be seen as a task and data-dependent regularization mechanism. This article exhibits the general utility of ITPS by evaluating it on various prominent neural network architectures such as CNNs, transformers, recurrent neural networks (RNNs), and multilayer perceptrons. These models are trained for image classification and healthcare tasks using the publicly available CIFAR-10, SLT-10, and MIMIC-III datasets.
Thakur Anshul, Abrol Vinayak, Sharma Pulkit, Zhu Tingting, Clifton David A
2022-Oct-11