Documentation Index
Fetch the complete documentation index at: https://docs.crewship.ai/llms.txt
Use this file to discover all available pages before exploring further.
LLM Middleware
internal/llm/middleware.go composes the full call stack that every LLM request flows through:
paymaster.LLMCaller signature so they compose as plain function wrappers. The returned Provider preserves the original Name() for routing and fan-out.
Composition
lookout.Scope (set by the HTTP handler chain). Without it, paymaster rejects the call because WorkspaceID is empty — calls without a workspace are not billable.
Layer order rationale
Layer order is deliberate and the comment inmiddleware.go is the source of truth. Getting it wrong produces subtle bugs that only surface under load:
1. Telemetry outermost
An SRE looking at a slow trace must see every contributor: budget check, guardrails, cache lookup, network hop. If telemetry sat inside paymaster, the trace would start at “provider call” and hide the time spent on enforcement, which is often where slowness lives.2. Paymaster outside lookout
Load-bearing. A pre-call budget check must refuse an over-budget request before we’ve done any guardrail work — otherwise a workspace out of budget still pays in sanitization time. And the cost ledger row is written here, outside the guardrail layer, so sanitize latency is not counted toward “provider latency”.3. Lookout inside paymaster
Load-bearing. Running Lookout INSIDE Paymaster means: if Lookout blocks the call, nocost_ledger row is written because next.Call is never invoked. A blocked call is not a billable call. If the order were reversed, Paymaster would record a ledger row for work that never reached the provider.
4. Caching (provider-side) below lookout
A future request-level cache layer would sit here. Anthropic and OpenAI prompt caching is handled wire-side today:- Anthropic —
internal/llm/anthropic.goshipsanthropic-beta: prompt-caching-2024-07-31by default and stampscache_control: ephemeralon the system prompt and the last tool definition (tool schemas are usually large and stable across turns — single highest-leverage breakpoint). Response usage parsescache_read_input_tokens+cache_creation_input_tokens. - OpenAI — auto-activates for prompts ≥1024 tokens (Sept 2025). Response usage parses
prompt_tokens_details.cached_tokens(no separate creation counter).
Response.CachedInputToks + .CacheCreationToks into paymaster.CallResponse and onto the cost_ledger row + the OTel llm.call span. See Paymaster and Tracing for the downstream details.
5. Base provider innermost
The innermostproviderCaller unpacks the opaque CallRequest.Inputs back into a typed llm.Request. It trusts that guardrails have already scanned the prompt and paymaster has green-lit the spend.
Write-path order beyond LLM calls
The same principle shows up elsewhere:- Keeper: SecretStore -> Gatekeeper LLM -> Decision. Journal emit is outermost so even rejected requests are audited.
- Harbor Master: Enqueue -> (optional sync poll) -> Decide. Journal emit fires on each state transition.
- Hooks: Dispatch -> blocking handlers (sequential, stop on Block) -> non-blocking goroutines. The hook.fired entry lands regardless of outcome.
Stream() path
Provider.Stream() currently bypasses the full middleware. Reason: the paymaster ledger row depends on final token counts which arrive in the terminal message_delta event; wiring that through the sync CallResponse shape would need a streaming variant of CallResponse. That’s deferred.
Lookout is still wired synchronously before streaming starts — wrappedProvider.Stream() scans every user/tool message with lookout.ScanInput and returns *lookout.BlockedError if any fires. Without this pre-call guard, a caller that picks Stream over Complete would silently bypass every guardrail.
Streaming callers who also want paymaster accounting today fall back to the orchestrator-level accounting that predates this middleware (see internal/orchestrator).
Custom composition
You almost never want this — usellm.Middleware — but the layers are individually exported so tests can swap them:
Gotchas
- Context scope is required. Without
lookout.WithScopein ctx, the paymaster rejects every call. This is by design (unscoped calls are unbillable) but produces a confusing error in tests — always attach a scope. - Type assertion failures fail fast.
providerCallerreturns"inputs not llm.Request (got %T)"if something upstream handed it the wrong shape. This is always a wiring bug. - Stream bypass is documented but not gated. Nothing prevents an existing caller from picking
StreamoverComplete. Audit new call sites; if the caller doesn’t need streaming, preferCompleteso the full stack runs. - Journal emitter is shared. Both paymaster and lookout emit through the same
journal.Emitterinstance. A nil emitter would no-op silently — the production path always sets it.
Related
- Paymaster — layer 2. Cost + budget.
- Lookout — layer 3. Guardrails.
- Tracing — layer 1. OTel spans.
- Architecture — where this stack fits.