Kensink Labs
Kimi by Moonshot AI
OPEN WEIGHTS · LATESTMoonshot AIModel brief
MOONSHOT KIMI · VERSION K2.7 · 11 JUN 2026

Kimi K2.7. Frontier coding, open weights, a tenth of the price.

Moonshot AI's latest open-weight model pushes agentic coding toward the closed frontier while charging a fraction of the per-token price. It ships with published weights you can self-host. For cost-sensitive, high-volume coding work, this is the open option we benchmark first.

Open weightskimi-k2.7Agentic codingEval pipelines
Released
11 Jun 2026
Model ID
kimi-k2.7
Input
$0.6 / 1M tokens
Output
$2.5 / 1M tokens
[TL;DR FOR CEO + CTO]

Five things to know.

  • 01

    Open weights, not just an API.

    Moonshot publishes the weights under a permissive licence, so you can call the hosted API for speed or serve the model on your own GPUs for residency and cost control. Most closed leaders give you neither option.

  • 02

    Roughly a tenth of the output price of a closed flagship.

    Hosted pricing is $0.60 input and $2.50 output per million tokens. Against Claude Opus at $5 and $25, that is about an eighth of the input cost and a tenth of the output cost for frontier-class agentic coding.

  • 03

    Built for agentic coding.

    K2.7 continues the K2 line's focus: long tool-use chains, multi-file edits, and terminal work. On reported coding benchmarks it sits close to the closed leaders and ahead of every other open-weight model.

  • 04

    Large Mixture-of-Experts, sparse at inference.

    Around a trillion total parameters with roughly 32B active per token. You get frontier capacity without paying to activate the whole network on every call, which is part of why the hosted price is so low.

  • 05

    The open base the market already trusts.

    Cursor's Composer coding model is widely believed to be a fine-tune of an open-weight base, with Kimi K2 among the names raised. Whatever the truth, the debate is a signal: open-weight Kimi is now production-grade enough that a frontier tool may be built on it.

[BENCHMARKS]

How it stacks up.

Reported and illustrative numbers, framed the way we frame every model: a starting point, not a verdict. K2.7 leads the open-weight field on agentic coding and reasoning and closes much of the gap to the closed leaders, at a fraction of their price.

CapabilityKimi K2.7Kimi K2Claude Opus 4.8GPT-5.5
Agentic coding
SWE-Bench Verified
71.3%
+5.5 pts vs K2
65.8%
74.5%
72.1%
Agentic terminal coding
Terminal-Bench 2.1
Terminus-2 public harness
63.4%
+7.3 pts vs K2
56.1%
74.6%
78.2%
Tool use
Tau2-Bench
72.0%
+5.6 pts vs K2
66.4%
76.8%
73.5%
Reasoning
GPQA Diamond
78.6%
+3.5 pts vs K2
75.1%
83.2%
82.4%
Math
AIME 2025
89.4%
+4.7 pts vs K2
84.7%
91.0%
92.3%
Open-weight field
Best open model on coding
vs DeepSeek, Llama, Qwen
Leads
Prev. leader
Closed
Closed

Figures are reported or illustrative and used for orientation only. We re-run our own evals on customer tasks before recommending any model, open or closed, and the cost advantage has to survive a quality check on your workload.

[SOFTWARE DEVELOPMENT IMPACT]

What it changes for the team building with it.

What changes for the engineering team. Two comparisons that matter: K2.7 against the K2 it succeeds, and against a closed flagship (Claude Opus 4.8) on the dimensions a buyer actually weighs: capability, cost, control, and risk.

Dimensionvs Kimi K2vs Claude Opus 4.8
Coding workflows
+5.5 pts on SWE-Bench Verified and +7.3 pts on Terminal-Bench 2.1 vs K2. The agentic coding gap to the closed leaders is now small enough to matter on price.Opus still leads on the hardest terminal and reasoning tasks. K2.7 closes most of the everyday coding gap at a fraction of the cost, so it is a strong default for high-volume, well-scoped work.
Cost and latency
Same low hosted price as K2 ($0.60 / $2.50 per million). The capability went up, the price did not.About an eighth of Opus input cost and a tenth of output cost. On a coding agent that burns millions of tokens a day, that is the difference between a viable margin and an unviable one.
Control and residency
Open weights, same as K2. Self-hosting is a deployment decision, not a vendor negotiation.Opus is API-only. Kimi's open weights let you run inference in your own environment for data residency, air-gapped work, or fixed-cost GPU economics. That is a capability Opus cannot offer.
Risk and provenance
Same licence and origin questions as K2. Nothing new to diligence beyond the version bump.A China-origin open model carries different diligence: licence terms, content-policy behaviour, and supply-chain review. We treat those as eval and governance line items, not blockers.

Inside a Kensink build, Kimi is a routing option behind the same abstraction as Claude and GPT. The agent picks the model by task and cost at runtime, not by a vendor commitment frozen at design time.

[WHAT IS NEW]

The features that ship with it.

01

Sharper agentic coding

K2.7 improves on K2's headline strength: longer reliable tool-use chains, fewer derailments on multi-file edits, and better recovery when a step fails. This is where the version earns its number.

02

Open weights, permissive licence

Moonshot publishes the model weights for self-hosting. You can run K2.7 on your own GPUs for residency or fixed-cost economics, or call the hosted API when speed and zero ops matter more.

03

Large context window

A long context window suits codebase-scale tasks and long agent transcripts. As always, retrieval and context hygiene beat stuffing the whole window, and we build for that.

04

Cheap, cache-friendly hosted pricing

Hosted input at $0.60 and output at $2.50 per million, with prompt caching that makes shared system preambles cheaper still. The economics are the headline feature for high-volume work.

05

Drop-in behind a vendor-neutral abstraction

Kimi speaks an OpenAI-compatible API surface, so adding it next to Claude and GPT in our provider layer is a config change plus an eval pass, not a rewrite.

[VALUE FOR COST]

What it costs.

The headline is the economics. Frontier-class agentic coding at a fraction of the closed-leader per-token price, plus open weights you can run yourself.

Hosted (Moonshot API)
$0.6 input
$2.5 output
Hosted Moonshot API, per million tokens. Roughly an eighth of Claude Opus input cost and a tenth of its output cost. Prompt caching lowers the effective rate further on shared preambles.
Open weightsSelf-host
1T params (MoE)
~32B active
Published weights you can self-host on your own GPUs for data residency, air-gapped deployment, or fixed-cost economics at high volume. The trade is that you operate inference: serving, scaling, and updates.
[PROVENANCE + CONTROVERSY]

The Cursor Composer question.

Cursor's Composer and the open-weight base question.

When Cursor shipped its in-house Composer coding model, a widely-discussed theory held that it was not trained from scratch but fine-tuned from an open-weight base, with Kimi K2 among the names raised alongside other open models. Cursor has not published which base, if any, it used. Treat the specific claim as unconfirmed. The signal worth taking seriously is the direction: open-weight Kimi is now strong enough that a frontier commercial tool plausibly building on it is a debate at all.

Open weights make this both possible and legitimate.

A permissive licence is what lets anyone, Cursor included, fine-tune and ship on top of an open model. That is the point of open weights, not an abuse of them. The real governance question is transparency: buyers deserve to know what a product is built on, which is exactly the diligence we run before we put any model in a customer's path.

A China-origin open model needs real diligence, not a reflex.

Kimi's origin raises fair questions: licence terms, content-policy and refusal behaviour, and supply-chain review of weights you self-host. We handle these as concrete eval and governance items, content-policy probes in the eval suite, licence sign-off, and provenance checks, rather than as a blanket yes or no. For many workloads it clears the bar; for some regulated ones it will not, and we say so.

[OUR TAKE]

What this means for the build.

01

We benchmark Kimi first when cost is the constraint.

For high-volume, well-scoped coding and tool-use work, K2.7 is the open option we put on the eval suite before reaching for a closed flagship. If it clears the quality bar on your tasks, the cost difference is hard to argue with.

02

Open weights are a real capability, not a talking point.

The ability to self-host changes what is buildable: data residency, air-gapped deployments, and fixed-cost GPU economics that closed APIs cannot match. We weigh that against the operational cost of running inference, per workload.

03

The Cursor debate is a validation, read carefully.

Whether or not Composer is built on Kimi, the fact that it is a credible theory tells you open-weight models have crossed into production-grade for coding. We do not repeat the unconfirmed parts as fact, and we do not let the headline replace our own evals.

04

It runs behind the same abstraction as everything else.

Kimi is one more routing option in our provider layer. The agent picks Kimi, Claude, or GPT by task difficulty and cost at runtime. No lock-in, no rewrite, and a closed-model fallback for the steps that need it.

[METHODOLOGY · K-FRAMEWORK]

Integrated through the
K-Framework.

Every model we integrate runs through the same operating system. Three pillars, sixteen layers, one Compound Growth Loop. The methodology that keeps AI work from rotting after the first ship.

Read the K-Framework
01

Foundations

Direct API integration with the model. No LangChain, no orchestration vendor, no agent framework built on quicksand. Typed contracts, the same way we wire up Postgres.

02

Amplification

An eval suite built from your real tasks gates every prompt and model change. Quality is measured before it ships, not vibed in a demo.

03

Judgment

Governance, audit, and oversight wired in from day one. Who called what, with which prompt version, at what cost. Your auditors get answers, not screenshots.

[OBSERVABILITY]

Observability your team can read.

A model in production without observability is roulette. We instrument every integration so engineering and finance can see the same numbers, and so a regression at 3am surfaces before a customer opens a ticket.

Instrumented

Cost per call

Tokens in, tokens out, dollars spent. Sliced by feature, tenant, and route. Budgets enforced where it matters.

Instrumented

Latency p50 / p95 / p99

Real distributions, not averages. We know which routes are slow, and why.

Instrumented

Eval pass rates

The same eval suite that gates a release runs continuously in production. A regression on real traffic surfaces fast.

Instrumented

Prompt + completion logs

PII scrubbed at the proxy, shipped to your SIEM. Retention controls match your compliance window.

Dashboards your team owns, not ours. At handoff you get the queries, the alerts, and the runbook. We are not in the path to read your metrics.

[COMMON QUESTIONS]

Questions we are getting asked.

Is Cursor's Composer actually built on Kimi?
Cursor has not published Composer's base model, so the honest answer is that it is unconfirmed. There is a widely-discussed theory that Composer is a fine-tune of an open-weight base, and Kimi K2 is one of the open models named in that discussion alongside others. We treat it as speculation worth knowing, not as fact. What is solid is the underlying point: open-weight coding models are now strong enough for that theory to be plausible.
How much cheaper is Kimi than a closed flagship?
Hosted Kimi K2.7 is about $0.60 input and $2.50 output per million tokens. Against Claude Opus 4.8 at $5 and $25, that is roughly an eighth of the input cost and a tenth of the output cost. On a high-volume coding agent, the per-token gap compounds into a materially different cost structure. We still prove the quality holds on your tasks before routing real traffic.
Should we self-host the weights or use the hosted API?
Start on the hosted API: it is fast to ship and very cheap per token. Move to self-hosting when data residency, air-gapped requirements, or fixed-cost economics at scale justify operating GPU inference yourself. We size that trade against your actual volume and constraints rather than defaulting either way.
Is a China-origin open model safe to use?
It depends on the workload, and we make that call with evidence. We run content-policy and refusal probes in the eval suite, review the licence terms, and do provenance and supply-chain checks on weights you self-host. Many product workloads clear that bar comfortably. Some regulated or sensitive ones will not, and we will tell you plainly when Kimi is the wrong choice.
How hard is it to add Kimi to an existing build?
Behind a vendor-neutral abstraction, the way we build, it is a config change plus an eval pass. Kimi exposes an OpenAI-compatible API, so it slots into the provider layer next to Claude and GPT. If your team wired a model in directly with no abstraction, budget a day to add the seam, then the same eval pass.
DIRECT INTEGRATION · HOSTED OR SELF-HOSTED

Want Kimi K2.7
in your product?

Eval suite at handoff, full source ownership. We integrate against the model the same way we integrate against Postgres, hosted on the Moonshot API or self-hosted on your GPUs. Sized to your scope.