★ Custom modelsDirect LLM · sovereign weightsProduction grade

FINE-TUNING · CUSTOM MODELS

Building a custom enterprise LLM. The full pipeline, named and sequenced.

Continued pretraining, SFT, preference optimization, reasoning distillation, model merging. The recipe that produced Llama-Nemotron and the R1 distillation family. We run it end to end and ship signed weights with a model card you can defend in audit.

Llama 4Qwen 3MistralDeepSeekNemotronPyTorch FSDP

Start a conversation →Fine-tuning hub →

Pipeline

5 stages (CPT · SFT · Preference · Distill · Merge)

Base

Llama, Mistral, Qwen, Gemma, Phi, DeepSeek

Output

Open-weight signed model + model card

Cycle

Quarterly program · sized to corpus

[THE PIPELINE]

From open base to signed custom model.

Eight stages, each with a named output. The first four cover vocabulary and behaviour. The next two add reasoning and consolidation. The last two close the loop with evaluation and a model card.

The custom-model recipe.

Base + CPT + SFT + Preference + Distill + Merge + Eval + Sign.

Base

Llama / Qwen / Mistral

CPT

Domain corpus

SFT

Labelled (prompt, response)

DPO

Preference pairs

Distill + sign

R1 lineage + Sigstore

[STAGE BY STAGE]

Eight stages, eight signed outputs.

Pick the base

Llama 4 Maverick (85.5% MMLU, late 2025), Qwen 3 (leading multilingual), Mistral Small 4 (Apache 2.0, function calling native), Phi-4 (small + strong), DeepSeek V3 (MoE). Pick by license posture, target size, and the language coverage that matches your data. Always start from base, never from Instruct, when you intend to retrain alignment.

Continued pretraining (CPT) on domain corpus

1B to 100B tokens of curated domain text. Vocabulary extension if the tokenization is inefficient. Replay 5 to 20% of the original instruction-tuning mix to limit forgetting. Multi-node FSDP or Megatron-LM, checkpoint every 1B tokens. This is the stage that teaches vocabulary the base never saw (legal Latin, ICD codes, chemistry SMILES, regional scripts).

SFT on labelled (prompt, response) data

Full SFT for deep re-tasking, LoRA at rank 64 for cheaper iteration. LR 1e-5 to 5e-5, cosine, low warmup. The transition from CPT to SFT is delicate: too high an LR destroys the CPT capabilities, too low and SFT does not stick.

Preference optimization (DPO, SimPO, ORPO, KTO)

DPO is our default. SimPO claims +6.4 AlpacaEval 2 over DPO and is worth benchmarking. ORPO collapses SFT and preference into one stage if budget is tight. KTO when production feedback is thumbs not pairs. Beta 0.1 to 0.5, 1 to 2 epochs.

Reasoning distillation (R1 lineage)

DeepSeek-R1 (Jan 2025) used 800k verified reasoning trajectories to SFT smaller students (Qwen 1.5B up to Llama 70B) into frontier-grade reasoning. The 2025 breakout pattern. Distill from R1, GPT-4.1, Claude Sonnet 4.5, or a custom reasoning teacher into the production-target student. Verifier in the loop for math and code.

Model merging (TIES, DARE, SLERP)

Combine task-specific fine-tunes into a single deployable with mergekit. TIES for general consolidation, DARE for noisy LoRAs, SLERP for two-model blends. Multi-skill consolidation in minutes, no GPU training needed.

Full evaluation suite + safety pass

MMLU-Pro, IFEval, MT-Bench, AlpacaEval 2, Arena-Hard, domain golden set. HarmBench + JailbreakBench for safety. Bias audit. LLM-as-judge with calibration. Block the deploy on any regression.

Sign + ship + model card

Sigstore + in-toto attestation per checkpoint. Dataset hash maps to base model hash maps to checkpoint hash. Model card per OECD format with intended use, training data summary, evaluation results, known limitations, copyright posture. Required for EU AI Act GPAI providers.

[OUR TAKE]

Custom models earn the build in narrow shapes. Distillation earns it almost everywhere.

The 2025 R1 lineage made reasoning teachable to a small student. Most enterprise reasoning use cases can now be served by a 7B-14B specialist that costs an order of magnitude less per call than a frontier API. Custom models from scratch remain a labs-and-frontier game; custom models from distillation are a board-room decision we ship from week one.

[WHAT YOU GET]

What's signed at handoff.

Base

Licensed, audited, sovereign

Weights

Sigstore-signed, EU residency optional

Model card

OECD format, audit-ready

Pipeline

Reproducible from dataset hash

[COMMON QUESTIONS]

What buyers ask before they sign.

When does a custom model actually beat fine-tuning a hosted frontier model?: When you need sovereign weights (defense, regulated), when the domain has structurally new vocabulary that CPT solves and SFT cannot, when you need to ship a small fast specialist (distillation), or when vendor lock-in is a real cost. Otherwise, a LoRA on top of GPT-4.1 or Claude or Gemini is cheaper and ships sooner.
What did R1 actually prove?: That reasoning is teachable through distillation. DeepSeek-R1 generated 800k verified reasoning trajectories and used them to SFT smaller students. The Qwen 1.5B distill beat much larger models on math benchmarks. The implication: most enterprise reasoning use cases can be served by a small distilled specialist instead of a frontier API call.
Can we use frontier model outputs (Claude, GPT-4.1) as teacher data?: Check the ToS. OpenAI prohibits using outputs to train competing models. Anthropic has similar terms. Open-weight teachers (Llama, R1, Mistral, Qwen) are the safe choice for redistribution. For internal-only models you have more latitude but read the terms carefully.
How long is a custom-model program?: Audit + base selection (week 1). CPT (4 to 12 weeks depending on token volume). SFT + DPO (2 to 4 weeks). Distillation + merging (2 to 4 weeks). Eval + safety + sign + model card (2 weeks). Most enterprise custom-model programs run as quarterly phases with monthly checkpoints.
What does a custom-model engagement cost?: Compute scales from $50k for a small specialist (8B base, CPT on 5B tokens, LoRA SFT, DPO) to $500k+ for a large multi-stage program (70B base, 50B token CPT, full SFT, GRPO distillation). Engineering scope is comparable. The ROI calculation is: cost of training versus the cost of frontier API calls over the model's lifetime, plus the value of sovereign weights and brand control.

[RELATED FINE-TUNING TOPICS]

Worth a look next.

01 · FINE-TUNING

Bring the use case. We will build the model.

From open base to signed weights to model card. Quarterly program, monthly checkpoints, eval gates at every stage, exit-friendly artifacts. We do not lock you in to ourselves.

Start a conversation →All fine-tuning topics

Building a custom enterprise LLM. The full pipeline, named and sequenced.

From open base to signed custom model.

The custom-model recipe.

Base

CPT

SFT

DPO

Distill + sign

Eight stages, eight signed outputs.

Pick the base

Continued pretraining (CPT) on domain corpus

SFT on labelled (prompt, response) data

Preference optimization (DPO, SimPO, ORPO, KTO)

Reasoning distillation (R1 lineage)

Model merging (TIES, DARE, SLERP)

Full evaluation suite + safety pass

Sign + ship + model card

Custom models earn the build in narrow shapes. Distillation earns it almost everywhere.

What's signed at handoff.

What buyers ask before they sign.

Worth a look next.

Methods

Data pipeline

Platforms

By data + compute scale

Compliance

Bring the use case. We will build the model.