In IEEE transactions on neural networks and learning systems
Knowledge distillation (KD), as an efficient and effective model compression technique, has received considerable attention in deep learning. The key to its success is about transferring knowledge from a large teacher network to a small student network. However, most existing KD methods consider only one type of knowledge learned from either instance features or relations via a specific distillation strategy, failing to explore the idea of transferring different types of knowledge with different distillation strategies. Moreover, the widely used offline distillation also suffers from a limited learning capacity due to the fixed large-to-small teacher-student architecture. In this article, we devise a collaborative KD via multiknowledge transfer (CKD-MKT) that prompts both self-learning and collaborative learning in a unified framework. Specifically, CKD-MKT utilizes a multiple knowledge transfer framework that assembles self and online distillation strategies to effectively: 1) fuse different kinds of knowledge, which allows multiple students to learn knowledge from both individual instances and instance relations, and 2) guide each other by learning from themselves using collaborative and self-learning. Experiments and ablation studies on six image datasets demonstrate that the proposed CKD-MKT significantly outperforms recent state-of-the-art methods for KD.
Gou Jianping, Sun Liyuan, Yu Baosheng, Du Lan, Ramamohanarao Kotagiri, Tao Dacheng