Direct API, thin abstraction
Fugu speaks an OpenAI-compatible API, so it sits behind the same small provider interface as Claude and GPT. Adding it, or swapping it out, is a config change plus an eval pass, not a rewrite.
Sakana AI's Fugu delivers a full multi-agent orchestration system behind a single OpenAI-compatible endpoint. It selects, delegates, verifies, and synthesizes across a proprietary pool of expert models, so you get coordinated, frontier-class answers from one call. We integrate it directly, behind a vendor-neutral abstraction and an eval suite.
Fugu's bet is that the best results come from coordinating many models, not from one monolith. You send one request; Fugu decides whether to answer directly or assemble a team of experts, handling routing, delegation, verification, and synthesis internally. The catch is that the pool and the routing are proprietary, so you trade visibility and control for not building the orchestration yourself. We integrate it the way we integrate any model: direct API, evals on your real tasks, and a fallback to a model you fully control.
Fugu speaks an OpenAI-compatible API, so it sits behind the same small provider interface as Claude and GPT. Adding it, or swapping it out, is a config change plus an eval pass, not a rewrite.
We use standard Fugu for responsive, high-volume work and Fugu Ultra for the hard, high-stakes tail. The agent picks by difficulty and cost at runtime.
Sakana's benchmarks are a starting point, not a verdict. An eval set built from your real tasks gates the choice, and we probe the proprietary routing for content-policy and provenance concerns.
Because the pool is hidden, we keep a fallback path to a model you can pin and audit, for the regions, regulations, or steps Fugu cannot serve. Observability on every call.
Fugu for low-latency everyday work, Fugu Ultra for maximum accuracy on hard problems. We integrate Fugu directly behind a vendor-neutral abstraction, then route by task and cost. Swapping variants, or swapping Fugu for Claude, is a config change, not a rewrite. Eval-gated, either way.
Every model we integrate runs through the same operating system. Three pillars, sixteen layers, one Compound Growth Loop. The methodology that keeps AI work from rotting after the first ship.
Read the K-FrameworkDirect API integration with the model. No LangChain, no orchestration vendor, no agent framework built on quicksand. Typed contracts, the same way we wire up Postgres.
An eval suite built from your real tasks gates every prompt and model change. Quality is measured before it ships, not vibed in a demo.
Governance, audit, and oversight wired in from day one. Who called what, with which prompt version, at what cost. Your auditors get answers, not screenshots.
A model in production without observability is roulette. We instrument every integration so engineering and finance can see the same numbers, and so a regression at 3am surfaces before a customer opens a ticket.
Tokens in, tokens out, dollars spent. Sliced by feature, tenant, and route. Budgets enforced where it matters.
Real distributions, not averages. We know which routes are slow, and why.
The same eval suite that gates a release runs continuously in production. A regression on real traffic surfaces fast.
PII scrubbed at the proxy, shipped to your SIEM. Retention controls match your compliance window.
Dashboards your team owns, not ours. At handoff you get the queries, the alerts, and the runbook. We are not in the path to read your metrics.