← All methods·12 · WEIGHT ARITHMETIC · SPECIALISED

★ Model mergingSpecialised

METHOD · TIES / DARE / MODEL SOUP

Model merging. Combine fine-tunes by weight arithmetic, no extra training.

Merging combines multiple fine-tunes into one model by averaging, trimming, or arithmetic on the deltas. Model soup averages, TIES trims and resolves sign conflicts, DARE drops and rescales delta parameters. Production use: stitch task-specific LoRAs into a single deployable.

mergekitPEFT

Talk to our team →Fine-tuning hub

Cost

Minutes, not days

Methods

Model soup, TIES, DARE, SLERP, task arithmetic

Use

Multi-skill consolidation

[WHY THIS EXISTS]

Serving N adapters for N tasks adds operational tax.

If three task-specific fine-tunes need to coexist in one model (legal analysis + customer support + structured extraction), retraining a unified model is expensive. Merging combines the three into one set of weights in minutes, no GPU training needed.

Model soup: simple parameter average
TIES: trim small deltas, elect signs by magnitude, average winners
DARE: drop and rescale random deltas before averaging
Task arithmetic: add or subtract task vectors to compose behaviours

[THE PIPELINE]

Model merging, end to end.

mergekit YAML config, one command, eval the merged checkpoint.

Fine-tune A (LoRA)

Fine-tune B (LoRA)

Fine-tune C (LoRA)

mergekit YAML config

TIES or DARE merge

Eval merged model

Ship

Pick the merge method

TIES for general consolidation. DARE for noisy LoRAs. SLERP for combining two models. Task arithmetic for adding or subtracting behaviours.

mergekit config, one command

YAML lists the source models, weights, and method. Run mergekit-yaml. Output is a merged model checkpoint.

Eval every task the merge was supposed to cover

Merging can degrade one task while combining the others. Eval all of them against the original golden sets.

[THE STACK WE'D DEPLOY]

What we run in production for Model merging.

mergekitPEFT (LoRA add)HuggingFace Transformers

[ACCURACY · COST · TRADE]

The numbers we measure Model merging on.

Time to merge

Minutes

GPU needed

None for the merge itself

Risk

Per-task degradation

When it earns the build

Multi-skill consolidation, serving cost reduction (one model vs N adapters), behaviour composition (combining a refusal-tuned model with a code-tuned model).

When it doesn't

When the source fine-tunes are deeply incompatible (different bases, different vocabularies), when per-task accuracy is the project's whole goal.

[OUR TAKE]

Free wins when the source fine-tunes are compatible. Eval-gate aggressively.

We merge when serving N adapters would cost more than the merge degrades any one task. The TIES method is our default.

[READ AT THE SOURCE]

Papers, docs, and primary sources.

[COMMON QUESTIONS]

What buyers ask before they sign.

Can we merge models with different bases?: No. All sources must share the same base architecture and tokenizer.
Does merging beat multi-task fine-tuning?: Sometimes, when the multi-task data is hard to balance. Often a clean multi-task LoRA wins. Benchmark both.

[RELATED FINE-TUNING TOPICS]

Worth a look next.

02 · FINE-TUNING

Considering Model merging? Let's pressure-test it first.

We benchmark the cheap method first, name the trade, and only deploy the expensive one when the numbers force it. Sized to your data, your evals, your residency.

Start a conversation →All fine-tuning topics