Is Fugu a model or a framework?

Both, by design. It is a learned coordinator over a pool of models, packaged so you call it like a single model. For a buyer the useful question is behaviour, not taxonomy: latency, cost, reliability, and whether the answer is right. We evaluate it as we would any model, on your real tasks.

Can we see which models Fugu uses?

No. The pool and routing are proprietary, and you cannot pin a specific underlying model or version. For most product work that is acceptable. For audited or regulated pipelines that need to attest to exactly which model ran, it can be a blocker, and we will say so up front.

Fugu is the low-latency everyday default (chatbots, code review, interactive services). Fugu Ultra coordinates a deeper agent pool for maximum accuracy on hard problems (research reproduction, security analysis, competition-grade work). They share one API, so escalating is a model swap, not a rewrite.

★ Sakana FuguLLM Models8-week engagement

SAKANA FUGU · ORCHESTRATION MODEL

Direct Fugu integration. A multi-agent system, called like one model.

Sakana AI's Fugu delivers a full multi-agent orchestration system behind a single OpenAI-compatible endpoint. It selects, delegates, verifies, and synthesizes across a proprietary pool of expert models, so you get coordinated, frontier-class answers from one call. We integrate it directly, behind a vendor-neutral abstraction and an eval suite.

OrchestrationMulti-agentLLM APIEval pipelines

Start a conversation →All llm models →

Cycle

8 weeks · fixed price

Stack

Fugu API, direct

Output

Production code + eval suite

Handoff

Full source ownership

[THE SHORT VERSION]

Orchestration as the product, not a framework you wire.

Fugu's bet is that the best results come from coordinating many models, not from one monolith. You send one request; Fugu decides whether to answer directly or assemble a team of experts, handling routing, delegation, verification, and synthesis internally. The catch is that the pool and the routing are proprietary, so you trade visibility and control for not building the orchestration yourself. We integrate it the way we integrate any model: direct API, evals on your real tasks, and a fallback to a model you fully control.

When it fits

Hard, multi-step work where built-in verification earns its keep (Fugu Ultra)
Responsive, everyday tasks where low latency matters (Fugu)
Teams that want frontier results without betting on a single vendor or facing export-control risk

When it does not

Audited or regulated pipelines that must attest to exactly which model produced an output
Workloads where a single, cheaper model already clears your eval bar

[HOW WE BUILD IT]

How we build with Sakana Fugu.

Direct API, thin abstraction

Fugu speaks an OpenAI-compatible API, so it sits behind the same small provider interface as Claude and GPT. Adding it, or swapping it out, is a config change plus an eval pass, not a rewrite.

Route Fugu vs Ultra by task

We use standard Fugu for responsive, high-volume work and Fugu Ultra for the hard, high-stakes tail. The agent picks by difficulty and cost at runtime.

Evals before you trust it

Sakana's benchmarks are a starting point, not a verdict. An eval set built from your real tasks gates the choice, and we probe the proprietary routing for content-policy and provenance concerns.

Fallback you control

Because the pool is hidden, we keep a fallback path to a model you can pin and audit, for the regions, regulations, or steps Fugu cannot serve. Observability on every call.

[WHAT YOU GET]

What the engagement leaves behind.

1 endpoint

A whole agent team, one call

No lock-in

Vendor risk spread across a pool

Eval-gated

Quality measured, not assumed

1 swap

Vendor change is config

[VARIANTS]

Pick the variant that fits.

Fugu for low-latency everyday work, Fugu Ultra for maximum accuracy on hard problems. We integrate Fugu directly behind a vendor-neutral abstraction, then route by task and cost. Swapping variants, or swapping Fugu for Claude, is a config change, not a rewrite. Eval-gated, either way.

FlagshipOrchestration

22 Jun 2026

Fugu Ultra

Coordinates a deeper pool of expert agents for maximum accuracy on hard problems
Matches frontier closed models (Fable 5, Mythos Preview) on engineering and science benchmarks
Frontier capability with no single-vendor dependency and no export-control exposure

Read the technical brief

Everyday defaultOrchestration

22 Jun 2026

Fugu

Balances strong performance with low latency, the default for everyday work
Built for responsive use: chatbots, code review, interactive services
One OpenAI-compatible endpoint orchestrates the right models for you

Read the technical brief

[METHODOLOGY · K-FRAMEWORK]

Integrated through the
K-Framework.

Every model we integrate runs through the same operating system. Three pillars, sixteen layers, one Compound Growth Loop. The methodology that keeps AI work from rotting after the first ship.

Read the K-Framework

Foundations

Direct API integration with the model. No LangChain, no orchestration vendor, no agent framework built on quicksand. Typed contracts, the same way we wire up Postgres.

Amplification

An eval suite built from your real tasks gates every prompt and model change. Quality is measured before it ships, not vibed in a demo.

Judgment

Governance, audit, and oversight wired in from day one. Who called what, with which prompt version, at what cost. Your auditors get answers, not screenshots.

[OBSERVABILITY]

Observability your team can read.

A model in production without observability is roulette. We instrument every integration so engineering and finance can see the same numbers, and so a regression at 3am surfaces before a customer opens a ticket.

Instrumented

Cost per call

Tokens in, tokens out, dollars spent. Sliced by feature, tenant, and route. Budgets enforced where it matters.

Instrumented

Latency p50 / p95 / p99

Real distributions, not averages. We know which routes are slow, and why.

Instrumented

Eval pass rates

The same eval suite that gates a release runs continuously in production. A regression on real traffic surfaces fast.

Instrumented

Prompt + completion logs

PII scrubbed at the proxy, shipped to your SIEM. Retention controls match your compliance window.

Dashboards your team owns, not ours. At handoff you get the queries, the alerts, and the runbook. We are not in the path to read your metrics.

[COMMON QUESTIONS]

Questions we get asked.

Is Fugu a model or a framework?: Both, by design. It is a learned coordinator over a pool of models, packaged so you call it like a single model. For a buyer the useful question is behaviour, not taxonomy: latency, cost, reliability, and whether the answer is right. We evaluate it as we would any model, on your real tasks.
Can we see which models Fugu uses?: No. The pool and routing are proprietary, and you cannot pin a specific underlying model or version. For most product work that is acceptable. For audited or regulated pipelines that need to attest to exactly which model ran, it can be a blocker, and we will say so up front.
Fugu or Fugu Ultra?: Fugu is the low-latency everyday default (chatbots, code review, interactive services). Fugu Ultra coordinates a deeper agent pool for maximum accuracy on hard problems (research reproduction, security analysis, competition-grade work). They share one API, so escalating is a model swap, not a rewrite.