★ PlatformsDirect LLM · vendor-neutralProduction grade

FINE-TUNING · VENDORS

Twelve fine-tuning platforms. Ranked by what each actually does well.

Managed APIs, serverless multi-LoRA, BYO GPU clusters, on-prem NIM. The 2026 vendor landscape with pricing, residency, audit, and the one we'd pick for each shape of project.

OpenAIAnthropicGoogle VertexAWS BedrockAzureTogetherPredibaseModal

Start a conversation →Fine-tuning hub →

Vendors compared

12 (managed + BYO)

Default managed

Together AI or Predibase serverless

Default RFT

OpenAI RFT (o4-mini)

On-prem default

NVIDIA NeMo Customizer

[SIDE BY SIDE]

The matrix.

Pricing snapshot is mid-2026. Numbers move quarterly; we re-validate every engagement. The qualitative columns (base models, methods, residency) are the durable parts of the comparison.

Twelve platforms, side by side.

Pricing is mid-2026 and moves quarterly; the qualitative columns (bases, methods, residency) are the durable parts.

Vendor	Base models	Methods	Pricing	Deployment	Residency
OpenAI fine-tuning + RFT	gpt-4.1, 4.1-mini, 4.1-nano, gpt-4o, o4-mini (RFT)	SFT + RFT (GRPO)	$25/1M tokens SFT (gpt-4.1); $100/hr RFT, $5k/job cap	Managed in-platform endpoints	US (default), EU via Azure
Anthropic Claude (via Bedrock)	Claude 3 Haiku only (GA Nov 2024)	SFT	Bedrock fine-tune pricing + Provisioned Throughput	AWS Bedrock, requires Provisioned Throughput	US, EU regions
Google Vertex AI	Gemini 2.5 Pro / Flash / Flash-Lite	SFT + preference tuning (DPO-style)	Per training token + 1.5x base inference for tuned	Vertex AI endpoints	US, EU, asia-* regions
AWS Bedrock	Bedrock-supported models + Custom Model Import (any HF model)	Managed SFT + Custom Model Import	Per token managed; $0.0785/min/CMU for imports	Bedrock endpoints, Provisioned Throughput for tuned	All AWS regions
Azure AI Foundry	GPT-4.1, 4.1-mini, o-series (RFT)	SFT + RFT (mirrors OpenAI)	Mirrors OpenAI; $100/hr RFT o4-mini	Standard, Global Standard, Provisioned Throughput	Azure global regions
Databricks Mosaic AI	Llama 3, Mistral, DBRX	Full SFT, LoRA, DPO	Serverless H100 with InfiniBand, ~10x lower than proprietary per Databricks	Unity Catalog governance + Model Serving	Databricks regions
Together AI Our default	Open models up to 100B	LoRA, full SFT, DPO	$0.48 / $1.20 (16B); $1.50 / $3.75 (17-69B); $2.90 / $7.25 (70-100B) per 1M tokens	Serverless multi-LoRA, dedicated endpoints	US (default), EU on request
Predibase Multi-tenant SaaS	Llama, Mistral, Qwen, Gemma, Phi	LoRA, RFT, DPO, KTO	Serverless at base-model per-token; Turbo LoRA add-on	LoRAX multi-adapter serving, VPC	US + VPC anywhere
NVIDIA NeMo Customizer On-prem default	Llama, Mistral, Nemotron, custom	LoRA, P-tuning, full SFT, DPO, GRPO	On-prem (compute is yours)	Kubernetes + NIM via NIM Operator 2.0	Your data center
Modal	Any (BYO training script)	Any (PyTorch, Unsloth, Axolotl)	A10G $1.10/hr, H100 ~$3.95/hr, B200 clusters available	Serverless Python, schedule 128 B200s in one line	US (default), EU on request
Lambda Labs	Any (BYO training script)	Any	1-Click Clusters: $4.49/GPU-hr, 1-week minimum, no egress	16 to 512 H100s, InfiniBand 400 Gb/s	US
HuggingFace TRL + AutoTrain	Any open base on the Hub	SFT, RM, DPO, GRPO (TRL v1.0)	Open source; AutoTrain managed pay-per-use	Inference Endpoints or self-hosted	Hub-hosted (US/EU) or self-hosted

Highlighted rows are our default picks for the most common project shapes.

[OUR DEFAULT PICKS]

What we recommend by shape of project.

Predibase or Together

Multi-tenant SaaS, hundreds of customer adapters

Serverless multi-LoRA at base-model per-token pricing. LoRAX-backed, Turbo LoRA for speedup. Cheapest path to ship per-customer fine-tunes.

OpenAI RFT (o4-mini)

Reasoning fine-tune, math + code + tool use

$100/hr, $5k cap per job. Verifier in the loop. The fastest path to a measurable reasoning lift if your data fits the o-series.

NVIDIA NeMo Customizer + NIM

On-prem, regulated industry, sovereign weights

Kubernetes-native, NIM Operator 2.0 for serving, full method coverage (LoRA, SFT, DPO, GRPO). The enterprise on-prem default.

Modal + Unsloth + Axolotl

70B+ fine-tune on a tight budget

QLoRA on a single 48GB GPU via Unsloth (2x faster, 70% less VRAM). Modal's per-second billing matches the iteration loop.

AWS Bedrock Frankfurt (Claude 3 Haiku)

EU residency + Claude voice

Only path to fine-tuned Claude. EU region, Provisioned Throughput. Good when the use case specifically needs Claude.

HuggingFace TRL + Lambda 1-Click

Open ecosystem, full control

TRL v1.0 unifies SFT, RM, DPO, GRPO. Lambda 1-Click for the GPUs. The recipe we recommend when no vendor lock-in is acceptable.

[WHAT YOU GET]

What we leave at handoff.

1 audit

Vendor selection memo, signed

1 setup

Account, BAA/DPA, residency configured

1 run

First fine-tune, eval-gated, deployed

Exit

Weights exportable, no lock-in

[COMMON QUESTIONS]

What buyers ask before they sign.

We need EU residency. Who's the right choice?: Vertex AI europe-west and Mistral La Plateforme are the most mature with both regional residency and customer-managed keys. AWS Bedrock Frankfurt is solid for Claude 3 Haiku fine-tunes and Custom Model Import. Azure West Europe mirrors OpenAI. For self-hosted on EU soil, Modal and Lambda both offer EU regions on request.
OpenAI RFT or self-hosted GRPO?: OpenAI RFT is faster to start, the bill is capped ($5k per job), and it runs on closed o-series. Self-hosted GRPO requires a rollout server and verifier infra, can run on any open base, and stays in your VPC. Pick by data sensitivity and base-model preference. For most regulated industries, self-hosted wins.
Why do you recommend Predibase or Together by default?: Serverless multi-LoRA at base-model per-token pricing is the cheapest path for SaaS products with hundreds of customer adapters. Predibase ships Turbo LoRA (speculative decoding) and LoRAX (open source) for thousands of adapters per GPU. Together has the broadest open-model coverage and the cleanest DPO pricing.
Anthropic fine-tuning: when is Claude 3 Haiku the right answer?: Narrowly. Claude 3 Haiku is the only Claude model with fine-tuning, available only through Bedrock, text-only, 32k context. It's a good choice when you specifically want Claude's voice in a tuned model and Provisioned Throughput cost is acceptable. For most projects we tune an open base instead.

[RELATED FINE-TUNING TOPICS]

Worth a look next.

01 · FINE-TUNING

Bring the use case. We will pick the vendor.

Independent benchmark, residency-aware, BAA/DPA negotiated, deployment hardened. Sized to the scope, scoped to the audit, signed at the artifact.

Start a conversation →All fine-tuning topics