ArXiv Preprint
Food recognition has a wide range of applications, such as health-aware
recommendation and self-service restaurants. Most previous methods of food
recognition firstly locate informative regions in some weakly-supervised
manners and then aggregate their features. However, location errors of
informative regions limit the effectiveness of these methods to some extent.
Instead of locating multiple regions, we propose a Progressive
Self-Distillation (PSD) method, which progressively enhances the ability of
network to mine more details for food recognition. The training of PSD
simultaneously contains multiple self-distillations, in which a teacher network
and a student network share the same embedding network. Since the student
network receives a modified image from its teacher network by masking some
informative regions, the teacher network outputs stronger semantic
representations than the student network. Guided by such teacher network with
stronger semantics, the student network is encouraged to mine more useful
regions from the modified image by enhancing its own ability. The ability of
the teacher network is also enhanced with the shared embedding network. By
using progressive training, the teacher network incrementally improves its
ability to mine more discriminative regions. In inference phase, only the
teacher network is used without the help of the student network. Extensive
experiments on three datasets demonstrate the effectiveness of our proposed
method and state-of-the-art performance.
Yaohui Zhu, Linhu Liu, Jiang Tian
2023-03-09