args — JSON Schema validation of tool-call arguments before they reach the tool implementation.
output — structured-output parser that strips markdown fences, validates against schema, and produces a corrective re-prompt.
secrets — regex-based secrets redactor for outbound text.
All layers are purely in-process — no network calls in the default build. An optional Lakera bridge is available for the injection layer but disabled unless explicitly wired.When a scanner blocks something it emits guardrail.input_blocked or guardrail.output_blocked into the Crew Journal so the action is auditable. The matched secret value is NEVER persisted in the entry payload — only the finding kind and a stable redacted detail string.
Phrases like “ignore previous instructions”, “you are now”, “disregard system prompt”.
system_prompt_leak
Attempts to exfiltrate the system prompt.
jailbreak
Known jailbreak trope patterns (DAN, hypothetical framings, translations-as-evasion).
zero_width_unicode
ZWJ/ZWNJ/ZWSP sequences.
rtl_override_unicode
RTL override codepoints (U+202E, U+2066-2069).
lakera_detected
External Lakera Guard verdict when enabled.
The guard is wired into llm.Middleware so every Complete() call is scanned. The Stream() path runs through the samelookoutCaller chain (internal/llm/middleware.go), scanning every user/tool message synchronously before the first token flows.
The default is hard-block on any high/critical finding. Routines whose upstream produces text that occasionally trips the heuristic on benign content (translations, security write-ups, adversarial-prompt research) can opt into a softer mode via DSL:
Replaces matched byte ranges with [REDACTED], lets the (defanged) text through.
Noisy upstreams where false positives would block legit traffic.
log
Passes the original text through unchanged; emits the journal entry only.
Observability-only mode; great for tuning the heuristic against a real workload.
The journal entry fires for ALL modes — log mode is observability, not “guard disabled”. The GuardListener integration hook (below) fires for all modes too.Sanitize uses offset-based replacement (Finding.Position + Finding.MatchEnd) so long matches and synthetic unicode findings (zero-width, RTL override) are properly redacted. An earlier substring-based implementation silently let those through.
Wire a callback to a notification target (Slack, PagerDuty, the hooks subsystem) so guardrail trips don’t just land in the journal:
ctx = lookout.WithGuardListener(ctx, func(ctx context.Context, direction string, f lookout.Finding) { // synchronous — keep cheap or dispatch async inside notify(direction, f)})
Runs synchronously after the journal entry is written, regardless of the action policy. The pipeline runner (internal/pipeline/runner_llm.go) already wires this to hooks.Dispatch(EventOnGuardrailTriggered, ...) when both the DB and a journal emitter are available — see the on_guardrail_triggered hook event.
The default output policy is sanitize-and-pass. The secrets scanner runs over every outbound text and returns a redacted copy alongside findings:
guard := lookout.OutputGuard(j)redacted, err := guard(ctx, llmOutput)// redacted may differ from llmOutput; err is non-nil only on journal emit failure
Why sanitize instead of block? Losing an entire response because one sk-xxxx slipped through is too disruptive; redaction preserves the response while surfacing the finding in the journal. Callers that want hard-block semantics should re-scan the returned text and refuse downstream.Note: the output guard is NOT wired into llm.Middleware. Scanning output there would mutate text while leaving the provider-reported token counts intact — a desync. Output scanning lives in the orchestrator streaming pipeline where text mutations are visible to the agent loop.
lookout.ValidateArgs(schemaJSON, args) runs draft-07 JSON Schema over a tool call’s arguments. Use it before dispatching:
if err := lookout.ValidateArgs(mySchema, toolArgs); err != nil { // Block the tool call; err describes which field didn't satisfy the schema.}
ValidateArgs returns error (nil on pass). Unknown-key and type mismatches
produce a non-nil error with a message pointing at the offending path. Empty
schema = pass.
Add a new Kind constant in types.go, a regex/detector function in the relevant layer file (injection.go, secrets.go, output.go), and register it in the scanner’s internal rule list. Tests in lookout_test.go use table-driven cases — add one for every new kind.Guidelines:
Detectors must be pure (no network, no state). The one exception, Lakera, is gated behind an explicit WithLakeraAPIKey option.
Never put the raw matched value in the Finding.Matched field for a secret detector. Use a prefix or a kind string.
Severity should reflect operational risk, not “how confident the regex is”. A false-positive secret_anthropic at warn is fine; one at critical would flood oncall.
Output guard is not in the LLM middleware. If you add a new streaming consumer, wire OutputGuard yourself or secrets may flow through to clients unredacted.
More edge cases
Scope is required.emitGuardEntry silently no-ops if lookout.ScopeFromContext(ctx) returns zero. Always wrap your request context with lookout.WithScope(ctx, scope) before invoking the guards — the HTTP handler chain does this for you.
Sanitize verdict is not a block. The returned text is the safe version; the err is non-nil only if the journal emit fell over.