Multi-tenant SaaS, hundreds of customer adapters
Serverless multi-LoRA at base-model per-token pricing. LoRAX-backed, Turbo LoRA for speedup. Cheapest path to ship per-customer fine-tunes.
Managed APIs, serverless multi-LoRA, BYO GPU clusters, on-prem NIM. The 2026 vendor landscape with pricing, residency, audit, and the one we'd pick for each shape of project.
Pricing snapshot is mid-2026. Numbers move quarterly; we re-validate every engagement. The qualitative columns (base models, methods, residency) are the durable parts of the comparison.
Pricing is mid-2026 and moves quarterly; the qualitative columns (bases, methods, residency) are the durable parts.
| Vendor | Base models | Methods | Pricing | Deployment | Residency |
|---|---|---|---|---|---|
OpenAI fine-tuning + RFT | gpt-4.1, 4.1-mini, 4.1-nano, gpt-4o, o4-mini (RFT) | SFT + RFT (GRPO) | $25/1M tokens SFT (gpt-4.1); $100/hr RFT, $5k/job cap | Managed in-platform endpoints | US (default), EU via Azure |
Anthropic Claude (via Bedrock) | Claude 3 Haiku only (GA Nov 2024) | SFT | Bedrock fine-tune pricing + Provisioned Throughput | AWS Bedrock, requires Provisioned Throughput | US, EU regions |
Google Vertex AI | Gemini 2.5 Pro / Flash / Flash-Lite | SFT + preference tuning (DPO-style) | Per training token + 1.5x base inference for tuned | Vertex AI endpoints | US, EU, asia-* regions |
AWS Bedrock | Bedrock-supported models + Custom Model Import (any HF model) | Managed SFT + Custom Model Import | Per token managed; $0.0785/min/CMU for imports | Bedrock endpoints, Provisioned Throughput for tuned | All AWS regions |
Azure AI Foundry | GPT-4.1, 4.1-mini, o-series (RFT) | SFT + RFT (mirrors OpenAI) | Mirrors OpenAI; $100/hr RFT o4-mini | Standard, Global Standard, Provisioned Throughput | Azure global regions |
Databricks Mosaic AI | Llama 3, Mistral, DBRX | Full SFT, LoRA, DPO | Serverless H100 with InfiniBand, ~10x lower than proprietary per Databricks | Unity Catalog governance + Model Serving | Databricks regions |
Together AI Our default | Open models up to 100B | LoRA, full SFT, DPO | $0.48 / $1.20 (16B); $1.50 / $3.75 (17-69B); $2.90 / $7.25 (70-100B) per 1M tokens | Serverless multi-LoRA, dedicated endpoints | US (default), EU on request |
Predibase Multi-tenant SaaS | Llama, Mistral, Qwen, Gemma, Phi | LoRA, RFT, DPO, KTO | Serverless at base-model per-token; Turbo LoRA add-on | LoRAX multi-adapter serving, VPC | US + VPC anywhere |
NVIDIA NeMo Customizer On-prem default | Llama, Mistral, Nemotron, custom | LoRA, P-tuning, full SFT, DPO, GRPO | On-prem (compute is yours) | Kubernetes + NIM via NIM Operator 2.0 | Your data center |
Modal | Any (BYO training script) | Any (PyTorch, Unsloth, Axolotl) | A10G $1.10/hr, H100 ~$3.95/hr, B200 clusters available | Serverless Python, schedule 128 B200s in one line | US (default), EU on request |
Lambda Labs | Any (BYO training script) | Any | 1-Click Clusters: $4.49/GPU-hr, 1-week minimum, no egress | 16 to 512 H100s, InfiniBand 400 Gb/s | US |
HuggingFace TRL + AutoTrain | Any open base on the Hub | SFT, RM, DPO, GRPO (TRL v1.0) | Open source; AutoTrain managed pay-per-use | Inference Endpoints or self-hosted | Hub-hosted (US/EU) or self-hosted |
Highlighted rows are our default picks for the most common project shapes.
Serverless multi-LoRA at base-model per-token pricing. LoRAX-backed, Turbo LoRA for speedup. Cheapest path to ship per-customer fine-tunes.
$100/hr, $5k cap per job. Verifier in the loop. The fastest path to a measurable reasoning lift if your data fits the o-series.
Kubernetes-native, NIM Operator 2.0 for serving, full method coverage (LoRA, SFT, DPO, GRPO). The enterprise on-prem default.
QLoRA on a single 48GB GPU via Unsloth (2x faster, 70% less VRAM). Modal's per-second billing matches the iteration loop.
Only path to fine-tuned Claude. EU region, Provisioned Throughput. Good when the use case specifically needs Claude.
TRL v1.0 unifies SFT, RM, DPO, GRPO. Lambda 1-Click for the GPUs. The recipe we recommend when no vendor lock-in is acceptable.
SFT, LoRA, QLoRA, DoRA, DPO, SimPO, ORPO, KTO, GRPO/RFT, distillation, model merging. Every named technique with when it earns the build.
Read moreSourcing, PII redaction (Presidio), synthetic data (Distilabel, Nemotron), DEITA quality scoring, MinHash + SemDedup, labeling vendors, feedback loops.
Read moreUnder 1k examples to over 1M, single A10G to 128 B200. Indicative cost, recommended method, hardware tier.
Read moreContinued pretraining, SFT, preference optimization (DPO, SimPO, ORPO), reasoning distillation (R1 lineage), model merging (TIES, DARE). The full build pipeline.
Read moreEU AI Act (Article 25 substantial-modification trap), GDPR, HIPAA, FedRAMP, Colorado AI Act, India DPDP, China GenAI Measures. Region-by-region for tuned LLMs.
Read more