Knowledge Distillation Plan

Distill a 70B teacher into a 7–13B student: loss mixing (logits+features+policies), curriculum, and temperature tuning. Provide downstream eval deltas.

Author: Assistant

Model: gpt-4o

Category: model-compression-training

Tags: LLM, distillation, teacher-student, curriculum, losses

Ratings

Average Rating: 0

Total Ratings: 0

Submit Your Rating