ArXiv Preprint
A fundamental challenge for multi-task learning is that different tasks may
conflict with each other when they are solved jointly, and a cause of this
phenomenon is conflicting gradients during optimization. Recent works attempt
to mitigate the influence of conflicting gradients by directly altering the
gradients based on some criteria. However, our empirical study shows that
``gradient surgery'' cannot effectively reduce the occurrence of conflicting
gradients. In this paper, we take a different approach to reduce conflicting
gradients from the root. In essence, we investigate the task gradients w.r.t.
each shared network layer, select the layers with high conflict scores, and
turn them to task-specific layers. Our experiments show that such a simple
approach can greatly reduce the occurrence of conflicting gradients in the
remaining shared layers and achieve better performance, with only a slight
increase in model parameters in many cases. Our approach can be easily applied
to improve various state-of-the-art methods including gradient manipulation
methods and branched architecture search methods. Given a network architecture
(e.g., ResNet18), it only needs to search for the conflict layers once, and the
network can be modified to be used with different methods on the same or even
different datasets to gain performance improvement. The source code is
available at https://github.com/moukamisama/Recon.
Guangyuan Shi, Qimai Li, Wenlong Zhang, Jiaxin Chen, Xiao-Ming Wu
2023-02-22