← All methods·06 · WEIGHT-DECOMPOSED LORA · SPECIALISED

★ DoRASpecialised

METHOD · WEIGHT-DECOMPOSED LORA

DoRA. LoRA with magnitude and direction split.

Decomposes the weight matrix into magnitude (a scalar per column) and direction (a unit vector). LoRA modulates only the direction, the magnitude is trained separately. Reports +1 to +4.4% over LoRA on commonsense benchmarks (LLaMA-7B/13B, LLaMA3-8B). Our default replacement for plain LoRA in 2026.

PyTorchPEFTTRLUnsloth

Talk to our team →Fine-tuning hub

Gain vs LoRA

+1 to +4.4 points

Cost

Same as LoRA

Code change

One flag in PEFT (use_dora=True)

When

Drop-in for LoRA

[WHY THIS EXISTS]

LoRA conflates magnitude and direction.

A weight update has two parts: how much (magnitude) and which way (direction). LoRA's low-rank decomposition mixes them. DoRA decomposes the pretrained weight first, then trains magnitude and direction separately. The intuition matches what full SFT does naturally and what LoRA approximates with friction.

Weight = magnitude * direction (per output column)
LoRA on direction, scalar on magnitude
Same trainable param budget, better accuracy
Toggle on existing LoRA configs (use_dora=True in PEFT)

[THE PIPELINE]

DoRA, the same as LoRA with one extra config.

Use the same data, the same hyperparameters, the same eval. Set use_dora=True and the trainer handles the decomposition.

Same data as LoRA

PEFT config

use_dora=True

Train

Eval

Ship adapter

Reuse the LoRA config, add use_dora

Rank 16, alpha 32, all-linear, use_dora=True. PEFT 0.10+ supports DoRA natively. No data or LR change needed.

Train, eval, ship

Same pipeline as LoRA. Adapter file is the same size. vLLM and LoRAX serve it the same way.

[THE STACK WE'D DEPLOY]

What we run in production for DoRA.

PEFT (use_dora=True)TRLUnsloth (DoRA support)vLLM (multi-LoRA)

[ACCURACY · COST · TRADE]

The numbers we measure DoRA on.

LLaMA-7B commonsense

+3.7% vs LoRA

Per Liu et al., ICML 2024

LLaMA-13B commonsense

+1.0% vs LoRA

LLaMA3-8B (multi-task)

+4.4% vs LoRA

Trainable params

Same as LoRA

When it earns the build

Anywhere you would use LoRA. The gain is a flag flip.

When it doesn't

When the framework does not support it (older PEFT, custom training stacks).

[OUR TAKE]

Our 2026 default LoRA variant.

We turn on use_dora unless the runtime does not support it. The accuracy gain is consistent and the cost is zero.

[READ AT THE SOURCE]

Papers, docs, and primary sources.

[COMMON QUESTIONS]

What buyers ask before they sign.

Why is DoRA not just always the default in libraries?: It is becoming so. PEFT, Unsloth, and Axolotl all support it. Some custom training stacks have not caught up. If your stack supports it, use it.
Does DoRA add inference cost?: Negligible. The magnitude scalar fuses into the adapter at serving time.

[RELATED FINE-TUNING TOPICS]

Worth a look next.

02 · FINE-TUNING

Considering DoRA? Let's pressure-test it first.

We benchmark the cheap method first, name the trade, and only deploy the expensive one when the numbers force it. Sized to your data, your evals, your residency.

Start a conversation →All fine-tuning topics