Kensink Labs
← All methods·06 · WEIGHT-DECOMPOSED LORA · SPECIALISED
DoRASpecialised
METHOD · WEIGHT-DECOMPOSED LORA

DoRA. LoRA with magnitude and direction split.

Decomposes the weight matrix into magnitude (a scalar per column) and direction (a unit vector). LoRA modulates only the direction, the magnitude is trained separately. Reports +1 to +4.4% over LoRA on commonsense benchmarks (LLaMA-7B/13B, LLaMA3-8B). Our default replacement for plain LoRA in 2026.

PyTorchPEFTTRLUnsloth
Gain vs LoRA
+1 to +4.4 points
Cost
Same as LoRA
Code change
One flag in PEFT (use_dora=True)
When
Drop-in for LoRA
[WHY THIS EXISTS]

LoRA conflates magnitude and direction.

A weight update has two parts: how much (magnitude) and which way (direction). LoRA's low-rank decomposition mixes them. DoRA decomposes the pretrained weight first, then trains magnitude and direction separately. The intuition matches what full SFT does naturally and what LoRA approximates with friction.

  • Weight = magnitude * direction (per output column)
  • LoRA on direction, scalar on magnitude
  • Same trainable param budget, better accuracy
  • Toggle on existing LoRA configs (use_dora=True in PEFT)
[THE PIPELINE]

DoRA, the same as LoRA with one extra config.

Use the same data, the same hyperparameters, the same eval. Set use_dora=True and the trainer handles the decomposition.

Same data as LoRA
PEFT config
use_dora=True
Train
Eval
Ship adapter
01

Reuse the LoRA config, add use_dora

Rank 16, alpha 32, all-linear, use_dora=True. PEFT 0.10+ supports DoRA natively. No data or LR change needed.

02

Train, eval, ship

Same pipeline as LoRA. Adapter file is the same size. vLLM and LoRAX serve it the same way.

[THE STACK WE'D DEPLOY]

What we run in production for DoRA.

PEFT (use_dora=True)TRLUnsloth (DoRA support)vLLM (multi-LoRA)
[ACCURACY · COST · TRADE]

The numbers we measure DoRA on.

LLaMA-7B commonsense
+3.7% vs LoRA
Per Liu et al., ICML 2024
LLaMA-13B commonsense
+1.0% vs LoRA
LLaMA3-8B (multi-task)
+4.4% vs LoRA
Trainable params
Same as LoRA
When it earns the build

Anywhere you would use LoRA. The gain is a flag flip.

When it doesn't

When the framework does not support it (older PEFT, custom training stacks).

[OUR TAKE]

Our 2026 default LoRA variant.

We turn on use_dora unless the runtime does not support it. The accuracy gain is consistent and the cost is zero.

[READ AT THE SOURCE]

Papers, docs, and primary sources.

[COMMON QUESTIONS]

What buyers ask before they sign.

Why is DoRA not just always the default in libraries?
It is becoming so. PEFT, Unsloth, and Axolotl all support it. Some custom training stacks have not caught up. If your stack supports it, use it.
Does DoRA add inference cost?
Negligible. The magnitude scalar fuses into the adapter at serving time.
FINE-TUNING · KENSINK LABS

Considering DoRA? Let's pressure-test it first.

We benchmark the cheap method first, name the trade, and only deploy the expensive one when the numbers force it. Sized to your data, your evals, your residency.