Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In Neural networks : the official journal of the International Neural Network Society

Recurrent Neural Network (RNN) models have been applied in different domains, producing high accuracies on time-dependent data. However, RNNs have long suffered from exploding gradients during training, mainly due to their recurrent process. In this context, we propose a variant of the scalar gated FastRNN architecture, called Scalar Gated Orthogonal Recurrent Neural Networks (SGORNN). SGORNN utilizes orthogonal matrices at the recurrent step. Our experiments evaluate SGORNN using two recently proposed orthogonal parametrizations for the recurrent weights of an RNN. We present a constraint on the scalar gates of SGORNN, which is easily enforced at training time to provide a probabilistic generalization gap which grows linearly with the length of sequences processed. Next, we provide bounds on the gradients of SGORNN to show the impossibility of exponentially exploding gradients through time. Our experimental results on the addition problem confirm that our combination of orthogonal and scalar gated RNNs are able to outperform other orthogonal RNNs and LSTM on long sequences. We further evaluate SGORNN on the HAR-2 classification task, where it improves upon the accuracy of several models using far fewer parameters than standard RNNs. Finally, we evaluate SGORNN on the Penn Treebank word-level language modeling task, where it again outperforms its related architectures and shows comparable performance to LSTM using far less parameters. Overall, SGORNN shows higher representation capacity than the other orthogonal RNNs tested, suffers from less overfitting than other models in our experiments, benefits from a decrease in parameter count, and alleviates exploding gradients during backpropagation through time.

Taylor-Melanson Will, Ferreira Martha Dais, Matwin Stan

2022-Nov-25

Deep learning, Exploding gradient problem, Recurrent Neural Networks, Unitary matrices