Define the prompt distribution
The student will only generalise where the prompts cover. Sample from real production traffic if you can. Synthesize otherwise with Evol-Instruct or similar.
Train a small student on a large teacher's outputs. DeepSeek-R1 (Jan 2025) used 800k verified reasoning trajectories to SFT smaller students (Qwen 1.5B to Llama 70B) into frontier-grade reasoning at a fraction of training cost. The 2025 breakout pattern.
Most production traffic does not need a 100B+ frontier model. Distillation captures the teacher's behaviour on the specific task into a 7B to 14B student you can serve cheaply. Reasoning distillation (R1 lineage) extended this from outputs-only to full reasoning traces.
Generate, filter, train, iterate. Verifier in the loop for reasoning tasks.
The student will only generalise where the prompts cover. Sample from real production traffic if you can. Synthesize otherwise with Evol-Instruct or similar.
N=1 to 8 per prompt. For reasoning, capture the full chain of thought, not just the final answer.
Verifier where possible (correct answer, valid output). LLM-judge with a calibration set otherwise. Bad teacher data poisons the student.
SFT the student on the filtered teacher data. Add DPO if you have preference data, GRPO if the task has a verifier.
Latency-critical serving, cost-pressured workloads, reasoning behaviour the base does not have. Specialist tasks where you can ship a small task-tuned model instead of a generalist.
When the teacher does not materially outperform the student on the task. When you cannot legally use the teacher's outputs (check terms of service).
We distil aggressively when latency, cost, or vendor lock-in is the constraint. R1-style reasoning distillation is in every serious project plan we make in 2026.
Sourcing, PII redaction (Presidio), synthetic data (Distilabel, Nemotron), DEITA quality scoring, MinHash + SemDedup, labeling vendors, feedback loops.
Read moreOpenAI RFT, Anthropic on Bedrock, Vertex, Azure Foundry, Databricks Mosaic, Together, Predibase, NeMo Customizer, Modal, Lambda. Side-by-side with our take.
Read moreUnder 1k examples to over 1M, single A10G to 128 B200. Indicative cost, recommended method, hardware tier.
Read moreContinued pretraining, SFT, preference optimization (DPO, SimPO, ORPO), reasoning distillation (R1 lineage), model merging (TIES, DARE). The full build pipeline.
Read moreEU AI Act (Article 25 substantial-modification trap), GDPR, HIPAA, FedRAMP, Colorado AI Act, India DPDP, China GenAI Measures. Region-by-region for tuned LLMs.
Read more