Kensink Labs
← All methods·12 · WEIGHT ARITHMETIC · SPECIALISED
Model mergingSpecialised
METHOD · TIES / DARE / MODEL SOUP

Model merging. Combine fine-tunes by weight arithmetic, no extra training.

Merging combines multiple fine-tunes into one model by averaging, trimming, or arithmetic on the deltas. Model soup averages, TIES trims and resolves sign conflicts, DARE drops and rescales delta parameters. Production use: stitch task-specific LoRAs into a single deployable.

mergekitPEFT
Cost
Minutes, not days
Methods
Model soup, TIES, DARE, SLERP, task arithmetic
Use
Multi-skill consolidation
[WHY THIS EXISTS]

Serving N adapters for N tasks adds operational tax.

If three task-specific fine-tunes need to coexist in one model (legal analysis + customer support + structured extraction), retraining a unified model is expensive. Merging combines the three into one set of weights in minutes, no GPU training needed.

  • Model soup: simple parameter average
  • TIES: trim small deltas, elect signs by magnitude, average winners
  • DARE: drop and rescale random deltas before averaging
  • Task arithmetic: add or subtract task vectors to compose behaviours
[THE PIPELINE]

Model merging, end to end.

mergekit YAML config, one command, eval the merged checkpoint.

Fine-tune A (LoRA)
Fine-tune B (LoRA)
Fine-tune C (LoRA)
mergekit YAML config
TIES or DARE merge
Eval merged model
Ship
01

Pick the merge method

TIES for general consolidation. DARE for noisy LoRAs. SLERP for combining two models. Task arithmetic for adding or subtracting behaviours.

02

mergekit config, one command

YAML lists the source models, weights, and method. Run mergekit-yaml. Output is a merged model checkpoint.

03

Eval every task the merge was supposed to cover

Merging can degrade one task while combining the others. Eval all of them against the original golden sets.

[THE STACK WE'D DEPLOY]

What we run in production for Model merging.

mergekitPEFT (LoRA add)HuggingFace Transformers
[ACCURACY · COST · TRADE]

The numbers we measure Model merging on.

Time to merge
Minutes
GPU needed
None for the merge itself
Risk
Per-task degradation
When it earns the build

Multi-skill consolidation, serving cost reduction (one model vs N adapters), behaviour composition (combining a refusal-tuned model with a code-tuned model).

When it doesn't

When the source fine-tunes are deeply incompatible (different bases, different vocabularies), when per-task accuracy is the project's whole goal.

[OUR TAKE]

Free wins when the source fine-tunes are compatible. Eval-gate aggressively.

We merge when serving N adapters would cost more than the merge degrades any one task. The TIES method is our default.

[READ AT THE SOURCE]

Papers, docs, and primary sources.

[COMMON QUESTIONS]

What buyers ask before they sign.

Can we merge models with different bases?
No. All sources must share the same base architecture and tokenizer.
Does merging beat multi-task fine-tuning?
Sometimes, when the multi-task data is hard to balance. Often a clean multi-task LoRA wins. Benchmark both.
FINE-TUNING · KENSINK LABS

Considering Model merging? Let's pressure-test it first.

We benchmark the cheap method first, name the trade, and only deploy the expensive one when the numbers force it. Sized to your data, your evals, your residency.