Blog: Model Distilation
mutual leanring
teacher-student learning
Hinton et al. 2015
assistant teaching
lifelong learning
self-learning
Knowledge types and distillation
response-based distillatin
used for model predictions refers to the neural response of the last output layer -> mimic the final prediction of the teacher model.
soft targets = label smoothing -> limited to supervised learning