Kensink Labs
Kimi by Moonshot AI
KimiLLM Models8-week engagement
MOONSHOT KIMI · OPEN-WEIGHT INTEGRATION

Direct Kimi integration. Open weights, frontier coding, a fraction of the cost.

Moonshot AI's Kimi is an open-weight model family that competes at the frontier on agentic coding while costing a fraction of the closed leaders. We integrate it directly, hosted or self-run, behind a vendor-neutral abstraction and an eval suite.

Open weightsLLM APIAgentic codingEval pipelines
Cycle
8 weeks · fixed price
Stack
Kimi API or self-hosted
Output
Production code + eval suite
Handoff
Full source ownership
[THE SHORT VERSION]

An open-weight model that competes on agentic coding.

Kimi is a large Mixture-of-Experts model from Moonshot AI, released open-weight and strong at agentic coding and tool use. The open weights mean you can call the hosted API for speed, or serve the model on your own GPUs for residency and cost control. The engineering that matters is the same as for any model: prompt design, evals, retries, cost control, and an abstraction that keeps you vendor-neutral. We integrate directly, no LangChain in the path.

When it fits
  • Agentic coding and tool-use workloads on a tight budget
  • Teams that want open weights they can host for residency or control
  • High-volume work where closed frontier per-token pricing hurts
When it does not
  • Tasks that need a specific closed model's ecosystem or features
  • Workloads where a smaller model already clears your eval bar
[HOW WE BUILD IT]

How we build with Kimi.

01

Hosted or self-hosted, your call

We run Kimi against Moonshot's API for speed, or serve the open weights on your own GPUs for residency and cost control. The decision is data-driven, not dogmatic.

02

Direct API, thin abstraction

Kimi sits behind the same small provider interface as every other model. Swapping it in, out, or alongside Claude is a config change, not a rewrite.

03

Evals before you trust it

An eval set built from your real tasks gates the choice. We measure Kimi against the closed leaders on your inputs before routing real traffic to it.

04

Cost, latency, and fallback

Token budgets, caching, streaming, and a fallback path to a closed model when a task needs it. Observability on every call.

[WHAT YOU GET]

What the engagement leaves behind.

Open
Weights you can host yourself
1/10th
Output price vs Claude Opus
Eval-gated
Quality measured, not assumed
1 swap
Vendor change is config
[VERSIONS]

Pick the version that fits.

K2.7, K2, K1.5. We integrate Kimi directly behind a vendor-neutral abstraction, hosted or self-run, then route by task and cost. Swapping versions, or swapping Kimi for Claude, is a config change, not a rewrite. Eval-gated, either way.

LatestOpen weights
11 Jun 2026

Kimi K2.7

kimi-k2.7
Input
$0.6 / 1M tokens
Output
$2.5 / 1M tokens
  • Frontier agentic coding at roughly a tenth of closed-leader output price
  • Open weights you can self-host: 1T-param Mixture-of-Experts, ~32B active
  • The open base the market is taking seriously, see the Cursor Composer debate
Read the technical brief
PreviousOpen weights
Jul 2025

Kimi K2

kimi-k2
Input
$0.6 / 1M tokens
Output
$2.5 / 1M tokens
  • The release that put open-weight agentic coding on the map
  • Widely named as a candidate base for Cursor's Composer model
  • Still a strong, cheap option behind our vendor-neutral abstraction
Supported · brief not yet published
Previous
Jan 2025

Kimi K1.5

kimi-k1.5
Input
$0.3 / 1M tokens
Output
$1.5 / 1M tokens
  • Moonshot's earlier long-context reasoning model
  • Superseded by the K2 line for agentic and coding work
  • Kept here for teams comparing against an older baseline
Supported · brief not yet published
[METHODOLOGY · K-FRAMEWORK]

Integrated through the
K-Framework.

Every model we integrate runs through the same operating system. Three pillars, sixteen layers, one Compound Growth Loop. The methodology that keeps AI work from rotting after the first ship.

Read the K-Framework
01

Foundations

Direct API integration with the model. No LangChain, no orchestration vendor, no agent framework built on quicksand. Typed contracts, the same way we wire up Postgres.

02

Amplification

An eval suite built from your real tasks gates every prompt and model change. Quality is measured before it ships, not vibed in a demo.

03

Judgment

Governance, audit, and oversight wired in from day one. Who called what, with which prompt version, at what cost. Your auditors get answers, not screenshots.

[OBSERVABILITY]

Observability your team can read.

A model in production without observability is roulette. We instrument every integration so engineering and finance can see the same numbers, and so a regression at 3am surfaces before a customer opens a ticket.

Instrumented

Cost per call

Tokens in, tokens out, dollars spent. Sliced by feature, tenant, and route. Budgets enforced where it matters.

Instrumented

Latency p50 / p95 / p99

Real distributions, not averages. We know which routes are slow, and why.

Instrumented

Eval pass rates

The same eval suite that gates a release runs continuously in production. A regression on real traffic surfaces fast.

Instrumented

Prompt + completion logs

PII scrubbed at the proxy, shipped to your SIEM. Retention controls match your compliance window.

Dashboards your team owns, not ours. At handoff you get the queries, the alerts, and the runbook. We are not in the path to read your metrics.

[COMMON QUESTIONS]

Questions we get asked.

Is an open-weight model like Kimi production-ready?
Increasingly, yes, and the market agrees. Kimi competes with the closed leaders on agentic coding, and open-weight bases now sit underneath commercial coding products. Cursor's Composer model is widely believed to be a fine-tune of an open-weight base, with Kimi K2 among the names raised, though Cursor has not published which base it used. We still prove fit the same way for every model: an eval suite on your real tasks before any production traffic.
Hosted Kimi or self-hosted weights?
Both are on the table. The hosted Moonshot API is the fastest way to ship and is very cheap per token. Self-hosting the open weights buys data residency and cost control at high volume, at the price of running GPU inference. We size that trade against your workload rather than defaulting either way.
APPLIED K-FRAMEWORK

Bring the problem.
We’ll bring the build.

Senior engineers, eval suite at handoff, full source ownership. Sprint, program, or ongoing. We shape the engagement to the work.