Machine Learning

We propose a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between…