Kensink Labs
★ Direct LLM LabLab open — 2 slots Q38 services · one teamNo framework lock-in
DIRECT LLM ENGINEERING · EST. 2024

Direct LLM. No frameworks on quicksand.

Senior engineers integrating against the model API the same way we integrate against Postgres. No LangChain, no LlamaIndex, no agent framework that needs a migration every six months. Eight weeks from problem to live, eval suite included, full source ownership at handoff.

Cycle
8 weeks · problem to live
Stack
Direct API · no LangChain, no LlamaIndex
Output
Code + eval suite + runbook
Framework
Applied K-Framework on every build
[THREE PRINCIPLES · ONE STUDIO]

How direct LLM work
actually stays shipped.

These three rules are why our LLM builds survive their first month in production. Skip any one and you’ve shipped AI slop on a timer.

01

Direct API integration

We call the model the same way we call Postgres. No wrapper SDK, no agent framework, no graph DSL. If you can read the OpenAI cookbook, you can read our code.

  • OpenAI, Anthropic, Google, or local — your model choice
  • TypeScript types from the API spec, not a third-party abstraction
  • Retry, fallback, timeout logic written by us, owned by you
02

Evals before features

We write a golden eval set before we ship a prompt. Every regression closes the gate. Production traffic feeds the next eval cycle — that's the Compound Growth Loop.

  • 10–100 golden examples per task, version-controlled with the prompt
  • Hard assertions on must-pass cases, soft scoring on quality
  • Drift detection in production — alerts before users see it
03

Boring infrastructure

Postgres, pgvector, Cloudflare Workers. Tools your team already runs, your ops team already monitors, your CTO already approved. No new vendors to onboard.

  • Postgres + pgvector for retrieval — no separate vector DB
  • Cloudflare Workers for inference proxy + cost guardrails
  • Sentry + OpenTelemetry for traces — your existing observability
APPLIED FRAMEWORK

Every LLM build runs on the K-Framework.

Three pillars, sixteen layers, one feedback loop. The discipline that separates a system that survives production from a demo that dies in week three. Foundations · Amplification · Judgment — applied to every prompt, every retrieval, every eval gate.

Read the K-Framework
The K-Framework: a layered map of AI development. Three pillars — Foundations, Amplification, Judgment — across sixteen named layers.
[EIGHT SERVICES · ONE STUDIO]

Pick the LLM problem.
We’ll bring the build.

Each service below is a focused eight-week sprint — fixed scope, eval suite at handoff. Bundle two or three if the problem warrants, or sequence them as a multi-phase program for regulated builds.

SERVICE · 01 / 08
Enterprise LLM
Security + governance

Production LLM that passes legal, security, and procurement on the same go-live date.

  • SSO + RBAC + audit trails baked in
  • Vendor-neutral abstraction — swap models without rewriting
  • Data-residency + PII policy enforced at the proxy
TypeScriptPostgresOpenAIAnthropic
See the engagement
SERVICE · 02 / 08
On-premise / private LLM
Self-hosted inference

Run your own weights in your own VPC. Latency, cost, and compliance — under your control.

  • vLLM + Triton for production-grade throughput
  • GPU sizing + autoscaling that doesn't melt your finance team
  • Air-gapped deployments where required
PythonvLLMTritonLlama
See the engagement
SERVICE · 03 / 08
Model evaluation
Eval-first development

A golden eval suite before a single prompt ships. Regressions close the gate, not the user.

  • Golden sets + hard assertions + soft LLM-as-judge
  • Drift detection on production traffic
  • A/B prompt tests with statistical significance gates
TypeScriptPromptfooOpenTelemetryPostgreSQL
See the engagement
SERVICE · 04 / 08
Feedback training & fine-tuning
When RAG isn't enough

LoRA, DPO, or full fine-tune. We pick based on data volume, not vendor pitch.

  • RAG vs fine-tuning audit before any training spend
  • Feedback capture pipeline → labeled dataset → eval gate
  • LoRA adapters you can hot-swap per customer
PythonPyTorchLoRAHuggingFace
See the engagement
SERVICE · 05 / 08
RAG architecture
Retrieval done right

Hybrid retrieval on Postgres + pgvector. No separate vector DB, no five-system synchronisation problem.

  • Pgvector + BM25 hybrid — recall and precision both
  • Citation-first answers — every claim links to its chunk
  • Chunking strategy tuned to your corpus, not someone's blog post
PostgreSQLpgvectorTypeScriptOpenAI
See the engagement
SERVICE · 06 / 08
Production agents
Tool-using LLMs

Function-calling agents with hard guardrails. Crossreferences /ai-agents — same lab, deeper engineering view.

  • Schema-validated tool calls — no JSON parse roulette
  • Per-tool rate limits + cost guardrails
  • Observable agent traces — every loop, every retry, every cost
TypeScriptAnthropicZodOpenTelemetry
See the engagement
SERVICE · 07 / 08
Observability & cost
Telemetry from day one

Token telemetry, drift detection, cost-per-conversation dashboards. What gets measured gets shipped.

  • Token + cost telemetry per user, per endpoint, per prompt version
  • Drift alerts before the user notices
  • Per-tenant cost caps with graceful degradation
OpenTelemetryGrafanaPostgreSQLSentry
See the engagement
SERVICE · 08 / 08
Structured output
Deterministic pipelines

JSON schema enforcement, validator loops, repair prompts. LLM as a structured component, not a chatbot.

  • Zod schemas mirror the API contract
  • Validator loop with bounded retries + repair prompts
  • Type-safe end-to-end — model output to client
TypeScriptZodOpenAIAnthropic
See the engagement
[WHAT WE HAND OVER]

Six artifacts. All yours at week eight.

A1

Eval harness

TypeScript test runner with golden sets, hard assertions, and LLM-as-judge soft scoring. Runs locally, in CI, and in production against live traffic.

A2

Inference proxy

Cloudflare Worker in front of every model call. Vendor abstraction, retry/fallback, cost caps, PII redaction, and OpenTelemetry traces.

A3

Retrieval layer

Postgres + pgvector with hybrid BM25 + dense search. Chunking pipeline tuned to your corpus. Citation surface in every answer.

A4

Cost dashboard

Per-user, per-endpoint, per-prompt-version token + cost rollups. Drift alerts, daily anomaly reports, per-tenant caps.

A5

Prompt registry

Versioned prompts with eval results attached. Deploy a prompt the way you deploy a service — review, test, ship, rollback.

A6

Printed runbook

Twenty pages, paper-printed at handoff. How to debug, what to monitor, when to call us. Your team owns operations from day one.

[EIGHT-WEEK PROCESS]

The process, not a pitch deck.

Same five-step cadence on every engagement. Aligned to the K-Framework loop — Build, Measure, Reflect, Improve.

  1. 01Week 1
    Discovery

    Find the real problem.

    Two-day workshop. We map the use case to the K-Framework, write the golden eval set with you, and decide direct-API vs RAG vs fine-tune. Output: a one-page engagement contract.

  2. 02Weeks 2–3
    Build (Foundations)

    Stand up the spine.

    Inference proxy, retrieval layer if needed, eval harness, observability. The boring infra goes first so the interesting work has a place to land.

  3. 03Weeks 4–5
    Build (Amplification)

    Iterate on the prompt + retrieval.

    Daily eval runs against the golden set. Hard assertions close the gate. We tune chunking, prompt structure, model choice — measured, not guessed.

  4. 04Weeks 6–7
    Build (Judgment)

    Guardrails + drift detection.

    Cost caps, PII redaction, schema validation, rate limits per tool. Drift-detection pipeline against production traffic. Runbook drafted alongside the engineering.

  5. 05Week 8
    Ship

    Handoff, not abandonment.

    Code review with your team. Printed runbook. Eval suite walkthrough. 90 days of warranty support. After that, you own everything — including the right to extend it without us.

[NUMBERS · NOT ADJECTIVES]

Lead with a number.
The rest is noise.

8 wk
From problem to production
0
LLM frameworks in our stack
99.7%
Best eval pass rate shipped (Affidavit Mapp)
+18 pt
Activation lift on AICoach onboarding
16
Named K-Framework layers
100%
Source ownership at handoff
DIRECT LLM · APPLIED K

Bring the real LLM problem.
We’ll bring the build.

Eight weeks, fixed price, eval suite at handoff. Pick one of the eight engagements or bring a problem and we’ll scope it against the K. Two Q3 slots remain.