Is an open-weight model like Kimi production-ready?

Increasingly, yes, and the market agrees. Kimi competes with the closed leaders on agentic coding, and open-weight bases now sit underneath commercial coding products. Cursor's Composer model is widely believed to be a fine-tune of an open-weight base, with Kimi K2 among the names raised, though Cursor has not published which base it used. We still prove fit the same way for every model: an eval suite on your real tasks before any production traffic.

Hosted Kimi or self-hosted weights?

Both are on the table. The hosted Moonshot API is the fastest way to ship and is very cheap per token. Self-hosting the open weights buys data residency and cost control at high volume, at the price of running GPU inference. We size that trade against your workload rather than defaulting either way.

★ KimiLLM Models8-week engagement

MOONSHOT KIMI · OPEN-WEIGHT INTEGRATION

Direct Kimi integration. Open weights, frontier coding, a fraction of the cost.

Moonshot AI's Kimi is an open-weight model family that competes at the frontier on agentic coding while costing a fraction of the closed leaders. We integrate it directly, hosted or self-run, behind a vendor-neutral abstraction and an eval suite.

Open weightsLLM APIAgentic codingEval pipelines

Start a conversation →All llm models →

Cycle

8 weeks · fixed price

Stack

Kimi API or self-hosted

Output

Production code + eval suite

Handoff

Full source ownership

[THE SHORT VERSION]

An open-weight model that competes on agentic coding.

Kimi is a large Mixture-of-Experts model from Moonshot AI, released open-weight and strong at agentic coding and tool use. The open weights mean you can call the hosted API for speed, or serve the model on your own GPUs for residency and cost control. The engineering that matters is the same as for any model: prompt design, evals, retries, cost control, and an abstraction that keeps you vendor-neutral. We integrate directly, no LangChain in the path.

When it fits

Agentic coding and tool-use workloads on a tight budget
Teams that want open weights they can host for residency or control
High-volume work where closed frontier per-token pricing hurts

When it does not

Tasks that need a specific closed model's ecosystem or features
Workloads where a smaller model already clears your eval bar

[HOW WE BUILD IT]

How we build with Kimi.

Hosted or self-hosted, your call

We run Kimi against Moonshot's API for speed, or serve the open weights on your own GPUs for residency and cost control. The decision is data-driven, not dogmatic.

Direct API, thin abstraction

Kimi sits behind the same small provider interface as every other model. Swapping it in, out, or alongside Claude is a config change, not a rewrite.

Evals before you trust it

An eval set built from your real tasks gates the choice. We measure Kimi against the closed leaders on your inputs before routing real traffic to it.

Cost, latency, and fallback

Token budgets, caching, streaming, and a fallback path to a closed model when a task needs it. Observability on every call.

[WHAT YOU GET]

What the engagement leaves behind.

Open

Weights you can host yourself

1/10th

Output price vs Claude Opus

Eval-gated

Quality measured, not assumed

1 swap

Vendor change is config

[VERSIONS]

Pick the version that fits.

K3, K2.7, K2, K1.5. We integrate Kimi directly behind a vendor-neutral abstraction, hosted or self-run, then route by task and cost. Swapping versions, or swapping Kimi for Claude, is a config change, not a rewrite. Eval-gated, either way.

LatestOpen weights

17 Jul 2026

Kimi K3

The world's first open 3T-class model: 2.8T-param MoE, 16 of 896 experts active per token
1M-token context and native vision, with always-on reasoning and a 2.5x scaling-efficiency jump over K2
Frontier-contender coding: caching-first pricing drops effective cost far below the $3 / $15 headline

Read the technical brief

CurrentOpen weights

11 Jun 2026

Kimi K2.7

Frontier agentic coding at roughly a tenth of closed-leader output price
Open weights you can self-host: 1T-param Mixture-of-Experts, ~32B active
The value pick below K3: cheaper per token when the very hardest tasks are not in scope

Read the technical brief

PreviousOpen weights

Jul 2025

Kimi K2

kimi-k2

Input

$0.6 / 1M tokens

Output

$2.5 / 1M tokens

The release that put open-weight agentic coding on the map
Widely named as a candidate base for Cursor's Composer model
Still a strong, cheap option behind our vendor-neutral abstraction

Supported · brief not yet published

Jan 2025

Kimi K1.5

kimi-k1.5

Input

$0.3 / 1M tokens

Output

$1.5 / 1M tokens

Moonshot's earlier long-context reasoning model
Superseded by the K2 line for agentic and coding work
Kept here for teams comparing against an older baseline

Supported · brief not yet published

[METHODOLOGY · K-FRAMEWORK]

Integrated through the
K-Framework.

Every model we integrate runs through the same operating system. Three pillars, sixteen layers, one Compound Growth Loop. The methodology that keeps AI work from rotting after the first ship.

Read the K-Framework

Foundations

Direct API integration with the model. No LangChain, no orchestration vendor, no agent framework built on quicksand. Typed contracts, the same way we wire up Postgres.

Amplification

An eval suite built from your real tasks gates every prompt and model change. Quality is measured before it ships, not vibed in a demo.

Judgment

Governance, audit, and oversight wired in from day one. Who called what, with which prompt version, at what cost. Your auditors get answers, not screenshots.

[OBSERVABILITY]

Observability your team can read.

A model in production without observability is roulette. We instrument every integration so engineering and finance can see the same numbers, and so a regression at 3am surfaces before a customer opens a ticket.

Instrumented

Cost per call

Tokens in, tokens out, dollars spent. Sliced by feature, tenant, and route. Budgets enforced where it matters.

Instrumented

Latency p50 / p95 / p99

Real distributions, not averages. We know which routes are slow, and why.

Instrumented

Eval pass rates

The same eval suite that gates a release runs continuously in production. A regression on real traffic surfaces fast.

Instrumented

Prompt + completion logs

PII scrubbed at the proxy, shipped to your SIEM. Retention controls match your compliance window.

Dashboards your team owns, not ours. At handoff you get the queries, the alerts, and the runbook. We are not in the path to read your metrics.

[COMMON QUESTIONS]

Questions we get asked.

Is an open-weight model like Kimi production-ready?: Increasingly, yes, and the market agrees. Kimi competes with the closed leaders on agentic coding, and open-weight bases now sit underneath commercial coding products. Cursor's Composer model is widely believed to be a fine-tune of an open-weight base, with Kimi K2 among the names raised, though Cursor has not published which base it used. We still prove fit the same way for every model: an eval suite on your real tasks before any production traffic.
Hosted Kimi or self-hosted weights?: Both are on the table. The hosted Moonshot API is the fastest way to ship and is very cheap per token. Self-hosting the open weights buys data residency and cost control at high volume, at the price of running GPU inference. We size that trade against your workload rather than defaulting either way.

View .md

[RELATED]