Restructuring the Teacher and Student in Self-Distillation.


Journal

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
ISSN: 1941-0042
Titre abrégé: IEEE Trans Image Process
Pays: United States
ID NLM: 9886191

Informations de publication

Date de publication:
24 Sep 2024
Historique:
medline: 24 9 2024
pubmed: 24 9 2024
entrez: 24 9 2024
Statut: aheadofprint

Résumé

Knowledge distillation aims to achieve model compression by transferring knowledge from complex teacher models to lightweight student models. To reduce reliance on pre-trained teacher models, self-distillation methods utilize knowledge from the model itself as additional supervision. However, their performance is limited by the same or similar network architecture between the teacher and student. In order to increase architecture variety, we propose a new self-distillation framework called restructured self-distillation (RSD), which involves restructuring both the teacher and student networks. The self-distilled model is expanded into a multi-branch topology to create a more powerful teacher. During training, diverse student sub-networks are generated by randomly discarding the teacher's branches. Additionally, the teacher and student models are linked by a randomly inserted feature mixture block, introducing additional knowledge distillation in the mixed feature space. To avoid extra inference costs, the branches of the teacher model are then converted back to its original structure equivalently. Comprehensive experiments have demonstrated the effectiveness of our proposed framework for most architectures on CIFAR-10/100 and ImageNet datasets. Code is available at https://github.com/YujieZheng99/RSD.

Identifiants

pubmed: 39316482
doi: 10.1109/TIP.2024.3463421
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Auteurs

Classifications MeSH