mutual leanring

teacher-student learning

Hinton et al. 2015

assistant teaching

lifelong learning

self-learning

Knowledge types and distillation

response-based distillatin

used for model predictions refers to the neural response of the last output layer -> mimic the final prediction of the teacher model.

soft targets = label smoothing -> limited to supervised learning