Kensink Labs
LLM ObservabilityDirect LLM · no framework8-week engagement
OBSERVABILITY · COST TELEMETRY

Token telemetry, drift detection, cost caps.

Per-user, per-endpoint, per-prompt-version cost rollups. Drift alerts before users notice. Per-tenant caps with graceful degradation. What gets measured gets shipped — what doesn't gets a surprise invoice.

OpenTelemetryGrafanaPostgreSQLSentry
Cycle
8 weeks · instrumented from day one
Stack
OpenTelemetry · Grafana · Postgres
Output
Dashboards + alerts + cost caps
Discipline
Every prompt is a versioned, traced artifact
[WHY THIS EXISTS]

Most LLM bills are a mystery, not a forecast.

Teams discover their LLM spend at month-end, can't attribute it to a feature, can't see drift before users complain, and can't set a cap without engineering panic. Observability is the difference between a controllable line item and a runaway expense.

  • Token + cost telemetry per user, per endpoint, per prompt version
  • Drift detection — score distribution shifts trigger alerts
  • Per-tenant cost caps with graceful degradation, not 500s
  • Every prompt is versioned + traced — debug without re-running
[HOW WE BUILD IT]

Instrument at the proxy.

01

Inference proxy

Cloudflare Worker in front of every model call. Captures token counts, latency, cost, user identity, prompt version, model version, retries. Single source of truth.

02

OpenTelemetry traces

Every request emits a span tree to your existing APM (Datadog, Honeycomb, Grafana Tempo). Filter, group, alert on whatever you already do.

03

Cost dashboards

Per-user, per-endpoint, per-prompt-version rollups. Daily, weekly, monthly anomaly reports. Set a budget, get an alert before you blow through it.

04

Drift detection

Daily sample of production traffic re-runs through the eval suite. Score distribution shift triggers alert. Catch regressions before users open tickets.

[OUTCOMES AT HANDOFF]

What's live at week eight.

1 dashboard
Cost by user, endpoint, prompt version
Per-tenant
Budget caps with graceful degradation
Daily
Drift report against the golden eval
Replayable
Every prompt versioned, every call traced
DIRECT LLM · APPLIED K

Bring the problem.
We’ll bring the build.

Eight weeks, fixed price, eval suite at handoff. Direct LLM engineering on top of the K-Framework. Two Q3 slots remain.