Inference proxy
Cloudflare Worker in front of every model call. Captures token counts, latency, cost, user identity, prompt version, model version, retries. Single source of truth.
Per-user, per-endpoint, per-prompt-version cost rollups. Drift alerts before users notice. Per-tenant caps with graceful degradation. What gets measured gets shipped — what doesn't gets a surprise invoice.
Teams discover their LLM spend at month-end, can't attribute it to a feature, can't see drift before users complain, and can't set a cap without engineering panic. Observability is the difference between a controllable line item and a runaway expense.
Cloudflare Worker in front of every model call. Captures token counts, latency, cost, user identity, prompt version, model version, retries. Single source of truth.
Every request emits a span tree to your existing APM (Datadog, Honeycomb, Grafana Tempo). Filter, group, alert on whatever you already do.
Per-user, per-endpoint, per-prompt-version rollups. Daily, weekly, monthly anomaly reports. Set a budget, get an alert before you blow through it.
Daily sample of production traffic re-runs through the eval suite. Score distribution shift triggers alert. Catch regressions before users open tickets.