SFT base + preference data
Same starting point as DPO. SimPO does not change the data layer.
Reference-free preference optimization with a length-normalized log-probability reward. Reports +6.4 AlpacaEval 2 and +7.5 Arena-Hard over DPO on the same training data (Meng et al., NeurIPS 2024). The strongest DPO challenger as of 2026.
DPO needs the SFT reference model in memory for the loss anchor. SimPO drops it: the reward is just the length-normalized log-probability of the response. Same preference data, one less model in memory, length bias controlled.
Same data shape as DPO. No reference model in memory. Train.
Same starting point as DPO. SimPO does not change the data layer.
TRL supports SimPO via CPOTrainer with loss_type='simpo'. Defaults are reasonable. Tune gamma (target margin) and the length normalization weight if results look off.
High-quality preference data, when memory at training is tight, when DPO has been benchmarked and the upgrade is worth the extra hyperparameter discipline.
Without an SFT base or preference data, when noisy preference labels make the length normalization unstable.
On clean preference data SimPO often wins by the paper's numbers. On noisy data DPO can be more robust. We run both on the same data and pick by held-out evals.
Sourcing, PII redaction (Presidio), synthetic data (Distilabel, Nemotron), DEITA quality scoring, MinHash + SemDedup, labeling vendors, feedback loops.
Read moreOpenAI RFT, Anthropic on Bedrock, Vertex, Azure Foundry, Databricks Mosaic, Together, Predibase, NeMo Customizer, Modal, Lambda. Side-by-side with our take.
Read moreUnder 1k examples to over 1M, single A10G to 128 B200. Indicative cost, recommended method, hardware tier.
Read moreContinued pretraining, SFT, preference optimization (DPO, SimPO, ORPO), reasoning distillation (R1 lineage), model merging (TIES, DARE). The full build pipeline.
Read moreEU AI Act (Article 25 substantial-modification trap), GDPR, HIPAA, FedRAMP, Colorado AI Act, India DPDP, China GenAI Measures. Region-by-region for tuned LLMs.
Read more