Routines - Crewship

What is a Routine?

A Routine is a declarative recipe for repeatable AI work. You author it once (or an agent authors it for you when it spots a repetitive pattern), it runs the same way every time, and it lives in your workspace as a reusable asset alongside crews, skills, and credentials. Each routine bundles:

A JSON DSL definition — inputs, outputs, ordered or DAG-structured steps, validation gates, declared egress, credential requirements
Authorship metadata — which crew, which agent (or user), via which path
Triggers — cron schedules and webhook tokens that fire it autonomously
Run history — every invocation as an immutable journal trace with step-level events
Versions — every save creates a new version; rollback is one API call

Compared to other layers in Crewship:

Layer	Scope	Authoring	Use case
Routine	atomic	AI-authored OR human DSL	One repeatable AI workflow
Recipe (future)	composite	Marketplace template	Crew + agents + integrations + routines bundled
Cyclic Issue (future)	issue-tracker	recurring user issue	”Standup every Monday” without leaving Issues

The user-facing label is Routine across web UI, CLI, agent system prompts and marketing surfaces. The backend identifier (database table, Go package, HTTP route paths) remains pipelines for backwards compatibility.

Naval theme. Routines are the boring, accurate, repeated procedures a ship’s crew runs every day — the equivalent of naval drills in the Crewship metaphor. Programs your AI agents follow.

Three authoring paths

Crewship is unusual among workflow systems because agents author routines, not just execute them. Three paths converge on the same pipelines table:

Agent

The most common path. An agent that’s solved a repetitive problem twice posts to http://localhost:9119/pipelines/save from inside its container; the sidecar injects authorship and forwards to the main API. Next time [AVAILABLE ROUTINES] block in the system prompt advertises it to other crews.

UI

Open /routines, click + New routine, pick a starter template, edit the DSL JSON, click Test & Save. Test_run runs against the execution tier; on pass the routine is persisted with authored_via=user_api and the JWT user as author.

CLI

crewship routine save --name "..." --definition file.json --author-crew <crew_id>. Same test_run gate as UI. CI-friendly: validate offline first with crewship routine validate file.json, then save.

DSL anatomy

Minimal valid routine:

{
  "dsl_version": "1.0",
  "name": "summarize-text",
  "description": "Summarize input text in 3 bullets.",
  "inputs": [
    { "name": "text", "type": "string", "required": true }
  ],
  "outputs": [
    { "name": "summary", "type": "string" }
  ],
  "steps": [
    {
      "id": "summarize",
      "type": "agent_run",
      "agent_slug": "tomas",
      "complexity": "fast",
      "prompt": "Summarize: {{ inputs.text }}"
    }
  ]
}

Top-level fields

Field	Type	Notes
`dsl_version`	string	Always `"1.0"` for now. Forward-compat field.
`name`	string	Slug-friendly identifier. Workspace-unique.
`display_name`	string	Optional pretty label.
`description`	string	One-line summary; shown in lists + `[AVAILABLE ROUTINES]`.
`inputs[]`	array of InputSpec	Declared parameters.
`outputs[]`	array of OutputSpec	Read from the final step’s output by name. Documentary in MVP.
`steps[]`	array of Step	Sequential by default; DAG with `needs:`.
`execution_tier`	object	`{preferred, fallback}` overriding workspace default.
`estimated_cost_usd`	number	Author estimate; UI surfaces overrun warnings.
`egress_targets`	array of string	Declared outbound hosts. Enforced for `http` step.
`credentials_required`	array of CredReq	Declared credential needs (`type` + `scope`).
`max_cost_usd`	number	Runtime cost cap; run aborts if exceeded between steps.
`concurrency_key`	string	Template that gates how many runs can be in flight at once for the same workspace + rendered key. Typical pattern `"{{ inputs.account_id }}"` to serialise per-tenant runs. Empty (default) = no gate. See Concurrency + idempotency recipe and the fail-fast note in Troubleshooting.
`max_concurrent`	integer	Cap on simultaneous runs sharing the resolved `concurrency_key`. Defaults to `1` when `concurrency_key` is set (strict per-key serialisation); ignored otherwise.
`guardrails`	object	Per-routine Lookout policy. `guardrails.input.prompt_injection.action`: `block` (default) \| `sanitize` \| `log`.
`eval`	object	Continuous online grading. See Online eval below.

InputSpec

{
  "name": "since",
  "type": "string",
  "required": false,
  "default": "yesterday",
  "description": "ISO date or 'yesterday'/'today'",
  "min": 0,
  "max": 100
}

Template substitution

Anywhere a string is interpolated (prompt, http URL/body/headers, wait until, code, transform, conditional if), placeholder {{ ... }} resolves against:

inputs.X — declared input value
steps.Y.output — full text output of an earlier step
steps.Y.output.path — JSON path into a step’s output (when output parses as JSON)
env.AUTHOR_CREW_NAME / etc. — read-only allowlist of execution context

Save-time validator walks every template-bearing field (prompt, nested inputs, http url/body/headers, wait until, event_filter, approval_prompt, code body, code env values, transform input, transform expression, if condition) and rejects placeholders that reference unknown inputs or unseen-yet steps.

Templates are regex substitution, not expression evaluation. There is no arithmetic, no function calls, no inline conditionals. If you need logic, add another step.

Step types

`agent_run`

{
  "id": "summarize",
  "type": "agent_run",
  "agent_slug": "tomas",
  "complexity": "fast",
  "prompt": "Summarize: {{ inputs.text }}",
  "model_override": null,
  "timeout_seconds": 60,
  "validation": {
    "schema": { "type": "object", "required": ["title"] },
    "must_not_contain": ["API_KEY=", "Bearer "],
    "min_length": 30
  },
  "on_fail": "escalate_tier"
}

complexity resolves through the workspace’s execution_tiers_json mapping into (adapter, model). With complexity: "fast", the agent’s CLI gets --model claude-haiku-4-5-20251001 (or whatever the workspace mapped fast to). model_override is the explicit pin that wins over complexity. on_fail lives at the step level (not inside validation) and is one of escalate_tier | abort | retry_step — escalate_tier walks the fallback chain (e.g., Haiku → Sonnet → Opus) until validation passes or the chain exhausts.

`call_pipeline`

{
  "id": "review_summary",
  "type": "call_pipeline",
  "pipeline_slug": "human-approval-step",
  "nested_inputs": {
    "content": "{{ steps.summarize.output }}",
    "approver_role": "marketing_lead"
  }
}

Composition primitive. Save-time cycle detection rejects loops; runtime depth limit caps at 10 levels.

`http`

{
  "id": "fetch",
  "type": "http",
  "http": {
    "method": "GET",
    "url": "{{ inputs.url }}",
    "max_response_bytes": 200000,
    "success_codes": [200],
    "credential_ref": { "type": "github", "inject_as": "bearer" }
  },
  "timeout_seconds": 30
}

Egress allowlist enforced both pre-flight AND on every redirect (CheckRedirect callback). Allowlist set per-workspace. Credential injection schemes: bearer (default), header with explicit name, query with explicit name.

`wait`

{
  "id": "human_approval",
  "type": "wait",
  "wait": {
    "kind": "approval",
    "approval_prompt": "Approve summary?",
    "timeout_sec": 86400
  }
}

Three kinds: approval (HITL token), datetime (sleep until ISO timestamp), event (filter on journal events). DB-backed waitpoint store survives process restart for approval kind; the recovery scan at boot reports stranded pending waitpoints. Approve via UI Inbox, CLI crewship routine waitpoints approve <token>, or API.

`code`

{
  "id": "process",
  "type": "code",
  "code": {
    "runtime": "python",
    "code": "import json\nprint(json.dumps({'sum': sum(input_nums)}))"
  }
}

Runs in a sandboxed container with --cap-drop=ALL, no host mounts, network constrained by the same egress allowlist. Inputs auto-mapped to CREWSHIP_INPUT_<NAME> env vars.

`transform`

{
  "id": "extract_emails",
  "type": "transform",
  "transform": {
    "input": "{{ steps.fetch.output }}",
    "expression": ".items[] | .email"
  }
}

Pure-Go data reshaping with a tiny jq-flavoured subset. No LLM, no network — fully deterministic. Useful for wiring step outputs without paying for another agent_run.

Conditional `if`

Any step can carry "if": "{{ inputs.run_summary }}". The placeholder is rendered as a plain string — empty / false / 0 / no / off (case-insensitive) → step skipped and marked <skipped> in StepOutputs; anything else counts as truthy. Templates are plain substitution, not expression evaluation, so put the boolean upstream (e.g. set inputs.run_summary from the caller) rather than writing == / != inside the if value.

DAG with `needs[]`

{ "id": "fetch_a", "type": "http", "http": {...} },
{ "id": "fetch_b", "type": "http", "http": {...} },
{ "id": "merge",
  "type": "agent_run",
  "needs": ["fetch_a", "fetch_b"],
  "prompt": "Merge: {{ steps.fetch_a.output }} + {{ steps.fetch_b.output }}"
}

Steps with no overlapping needs execute in parallel (one goroutine wave per ready set). Final output picks the unique leaf node; for multi-leaf DAGs the first leaf in source order wins.

Two-tier execution

The economic value-prop: an Opus-class authoring model designs the routine, a Haiku-class executor model runs each invocation. Workspace execution_tiers_json maps complexity classes to (adapter, model):

{
  "trivial":  { "primary": {"adapter":"claude","model":"claude-haiku-4-5-20251001"} },
  "fast":     { "primary": {"adapter":"claude","model":"claude-haiku-4-5-20251001"},
                "fallback":[{"adapter":"claude","model":"claude-sonnet-4-6"}] },
  "moderate": { "primary": {"adapter":"claude","model":"claude-sonnet-4-6"} },
  "smart":    { "primary": {"adapter":"claude","model":"claude-opus-4-7"} }
}

Per-step complexity annotation drives the resolver. With on_fail: "escalate_tier", a failed validation walks the fallback chain — practically: Haiku tries first, Sonnet on validation fail, Opus on second fail.

Tier override at runtime. The CLI flag --model <model> is constructed from the resolved tier and passed to the agent’s CLI adapter, so a routine’s complexity: "fast" actually fires Haiku, not the agent’s default. CLIAdapter is preserved (so the agent’s CLAUDE_CODE / GEMINI_CLI / etc. wiring stays intact); only the model name swaps.

Test-run gate

Save endpoints (sidecar /pipelines/save, user /api/v1/workspaces/{ws}/pipelines/save, internal /api/v1/internal/pipelines/save) require a fresh passing test_run within the last 5 minutes — otherwise inline runs one against the execution tier. This is the self-improvement loop: an authoring agent that writes brittle DSL gets a structured failure report it can read and revise from. Without the gate, MVP would ship pipelines that pass schema but fail at runtime. skip_test_gate: true is honored only when the caller’s role is OWNER or ADMIN; lower roles get 403. Useful for hand-crafted DSL from known-good templates (the seed flow uses it).

Triggers

Cron schedules

crewship routine schedules create --slug summarize-text \
    --name "daily-summary" --cron "0 9 * * *" --timezone "Europe/Prague" \
    --inputs '{"text":"…default"}'

5-field cron expression. Scheduler runs in-process and ticks every 30s, so minimum resolution is 1 minute. Single-instance only — running multiple replicas would double-fire (no leader election yet).

Webhooks

crewship routine webhooks create --slug pr-review-structured \
    --hmac-secret "$(openssl rand -hex 32)" --rate-limit 30

Output reveals the public URL and signing secret once (Stripe-style). External services POST event payload to /api/v1/webhooks/{token}. With HMAC, sender includes header X-Crewship-Signature: sha256=<hex_hmac_of_body>, validated server-side via hmac.Equal (timing-safe). Rate limited per token, per minute, default 60. To rotate the secret: delete the webhook + create a new one. There is no in-place rotation by design.

Manual

crewship routine run <slug> --inputs '{...}' or click the Run button in the UI detail panel. Same execution path.

Dry-run preview

Three execution modes, distinguished by Mode in the request body and surface:

Mode	Side effects	Increments invocation_count	Cost
`run`	yes (agents called)	yes	real
`test_run`	yes (agents called)	no	real
`dry_run`	no (templates rendered, agents skipped)	no	estimated

Dry-run is the safe “what would this routine do?” preview. It walks the DSL, renders all template substitutions against the supplied inputs, resolves each step’s execution tier (adapter + model), and reports a would_execute list with per-step estimated cost. No agents are invoked, no journal entries beyond a single pipeline.dry_run audit row are written.

crewship routine dry-run email-fetch --inputs '{"since":"yesterday"}'

In the UI: click Dry run in the routine detail panel. The would_execute report renders inline above the tab bar with per-step:

Step ID + type
Resolved tier_adapter:tier_model (e.g. claude:claude-haiku-4-5)
Estimated cost in USD (order-of-magnitude only — the executor uses a flat token-density heuristic, not real pricing)
would_call_agent / would_call_pipeline target

The estimate is intentionally labelled “estimated” everywhere it surfaces — it’s a planning aid, not a quote. Real cost only lands once you switch to run or test_run.

Versioning + rollback

Every save creates a new immutable row in pipeline_versions (v79 migration). The pipelines.head_version column points at the current. Rollback creates a NEW version on top of HEAD whose definition equals the target’s:

crewship routine versions email-fetch
crewship routine rollback email-fetch --to 3

History is preserved — you can roll forward to a future version by another rollback. There is no “delete version” — if a version was bad, the trail of “v3 → v4 (rollback to v2) → v5 (fix)” is the audit story you keep.

Bundle export / import

Routines are portable across workspaces:

# Export from workspace A
crewship routine export email-fetch --include-history > email-fetch.json

# Import into workspace B
cat email-fetch.json | crewship routine import

The bundle format is crewship-pipeline-bundle/v1: routine row + (optionally) the full version chain + change_summary annotations. Author identity is rewritten on import so the importing user becomes the new author. Slug is preserved; if it conflicts in the destination workspace the existing row updates (new version), or you change the bundle’s slug before import.

HITL waitpoints

A routine that includes a wait step of kind: approval parks the run goroutine on a DB-backed waitpoint. Operators decide via three paths:

# CLI
crewship routine waitpoints list
crewship routine waitpoints approve <token> --comment "LGTM"
crewship routine waitpoints reject  <token> --comment "needs revision"

# UI: /routines → routine → Wait sub-tab → click Approve / Reject
# API: POST /api/v1/workspaces/{ws}/pipelines/waitpoints/{token}/approve

The decision comment is forwarded to the parked run as the wait step’s output, so downstream steps can read approval rationale via {{ steps.<wait_step_id>.output }}.

Online eval sampler

The online sampler watches completed routine runs and grades a configurable percentage of them through the existing rubric grader so production traffic continuously feeds the drift detector — not just on-demand replays or scheduled regression suites. Per-routine DSL:

name: nightly-summary
eval:
  online:
    sample_rate: 0.05              # grade 5% of runs; 0 disables, 1.0 grades every run
    grader_agent_slug: qa-grader   # references an agent in the AUTHOR crew with a rubric prompt

What happens on a tick (default cadence: every 1 minute):

Scan pipeline_runs WHERE status = 'completed' AND completed_at > watermark.
For each candidate, resolve the routine DSL. If eval.online is absent or sample_rate <= 0, skip.
Draw from crypto/rand. If the sample lands above sample_rate, skip.
Otherwise, INSERT into eval_runs with kind = 'online', status = 'queued'. The existing grader worker picks it up and writes the result back.

Field	Notes
`sample_rate`	Float `[0, 1]`. `0` disables — useful as an incident “pause grading” toggle. `1.0` is expensive but ok for newly-launched routines while calibrating the grader. Realistic prod default: `0.05`.
`grader_agent_slug`	Required when `sample_rate > 0`. Missing grader is treated as a deterministic skip (logged once, no retry storm).

Correctness guarantees:

Schema-layer idempotency — eval_runs has a partial UNIQUE INDEX (pipeline_run_id) WHERE kind='online'. A duplicate sampler instance or a crash-recovery watermark replay can attempt the same enqueue twice; the second collapses to a no-op rather than queueing twice and double-billing the grader.
(completed_at, id) tuple cursor — parallel fan-out steps that complete at the same nanosecond all get graded; a timestamp-only cursor would orphan siblings.
Stuck-on-error watermark — a transient per-row error (resolver outage, entropy outage, enqueue conflict) freezes the watermark at the row before the failure so the next tick retries. Deterministic skips (no eval config, sample roll missed) advance normally.
Trace correlation — the eval row carries routine_slug + pipeline_run_id so an operator clicking a low-scoring grade in the eval UI lands on the actual trace via pipeline_run_id -> trace.

Limitations:

Watermark is in-memory only. On process restart it resets to now - 1h; outages longer than that leave a gap in grading coverage. The UNIQUE index keeps reprocessing harmless.
Page cap is 500 rows/tick. A workspace processing > 500 routine runs/minute at sample_rate = 1.0 will accumulate backlog; the watermark only advances past successfully-handled rows, so the backlog drains across subsequent ticks rather than being lost.

Validation gates and credential leak guards

Each step’s validation block runs after the step output materializes. The schema is a JSON Schema draft 2020-12 subset (most of type, required, properties, items, pattern, format, enum, etc.) plus three Crewship extensions:

must_not_contain: ["API_KEY=", "Bearer "] — output must include none of these substrings
must_contain: ["##"] — output must include all of these
min_length / max_length — convenience for non-JSON outputs

The must_not_contain gate is the credential-leak tripwire: if an agent is about to leak a real API key in its output, the gate fails the step before downstream consumers see it. Pair with on_fail: "abort" (set at the step level, alongside validation) for hard stop, or escalate_tier if you want the higher model to retry without the leak.

Observability

Every routine run emits a sequence of journal entries:

pipeline.run.started — once at run begin
pipeline.step.started — per step
pipeline.step.completed / pipeline.step.failed / pipeline.step.validation_failed — per step terminal
pipeline.run.completed / pipeline.run.failed — once at run end

Plus WebSocket broadcast on the workspace channel for live UI updates (PipelineRunNode in the orchestration graph + Runs sub-tab waterfall both subscribe).

Per-step cost + duration

Both the UI Runs sub-tab waterfall and crewship routine logs <run_id> --slug X surface the cost_usd and duration_ms fields the executor stamps on every pipeline.step.completed event. Same data, two presentation surfaces:

UI: right-aligned columns next to each step row, with a footer total summing the run. An em-dash (—) renders for events that don’t carry cost (pipeline.step.started, .failed, live-only echoes) — easier to scan than $0.0000 next to real values.
CLI logs: extra DURATION and COST columns in the timeline output. Same em-dash rule for non-positive values.

Worker tier escalation is visible too — a failed validation gate that retries on a higher tier emits a second step.started+step.completed pair with its own cost on the retry, so the column-summed footer reflects the full spend (including retries), not just the first attempt. Three terminal-side observability surfaces, ordered by when you’d reach for each:

Phase	Command	Why
Before run	`crewship routine doctor <slug>`	Preflight ✓/⚠/✗ checklist — catches missing crew provisioning, agent slug typos, missing credentials, contradictory validation gates, tight cost caps. Cheap to run, returns in milliseconds, fails CI builds on FAIL.
During run	`crewship routine watch <slug>`	Polls the runs endpoint and prints events as an ANSI-coloured timeline. `--json` for JSON Lines piping into `jq`; `--once` exits after the first run terminates (CI-friendly).
After run	`crewship routine logs <run_id> --slug X`	One-shot post-mortem dump — every step’s prompt, output, validation verdict, cost. Use to investigate a specific failure surfaced by `watch` or by the UI.

For variance characterisation across many runs (is this routine production-ready at this tier?), use crewship routine bench. For matrix-level cross-tier consistency (which scenarios diverge between Haiku and Opus?), use crewship eval scenarios and crewship eval baseline diff for CI regression gates.

RBAC

Role	Read	Run	Save	Schedules / Webhooks	`skip_test_gate`
VIEWER	✓	—	—	—	—
MEMBER	✓	✓	—	—	—
MANAGER	✓	✓	✓	✓	—
ADMIN	✓	✓	✓	✓	✓
OWNER	✓	✓	✓	✓	✓

Common patterns

”Run this routine every day at 9 AM"

crewship routine schedules create --slug daily-status-digest \
    --cron "0 9 * * *" --timezone "Europe/Prague"

"Trigger this routine from a GitHub Actions hook"

crewship routine webhooks create --slug pr-review-structured \
    --hmac-secret "$(openssl rand -hex 32)" --rate-limit 30
# Use the printed token + secret as Settings → Webhooks in the GH repo.

"Validate routine DSL in CI before committing”

- run: crewship routine validate routines/*.json

Exit code 1 on any invalid file.

”Discover what an agent is about to do, without running it”

crewship routine dry-run summarize-text --inputs '{"text":"..."}'

Returns a would_execute report with which agent, which tier, the rendered prompt, and an estimated cost. Zero side effects.

”Cancel an in-flight run”

crewship routine cancel <run_id>

Signals the run goroutine; it stops at the next safe point and emits pipeline.run.failed with reason “cancelled”.

”Serialise runs per tenant / customer / repo”

Use concurrency_key with a template referencing the tenant-identifying input:

{
  "concurrency_key": "{{ inputs.account_id }}",
  "max_concurrent": 1,
  "inputs": [
    { "name": "account_id", "type": "string", "required": true }
  ]
}

Two requests for the same account_id queue rather than fan out; requests for different account_ids run in parallel. Pair with the Idempotency-Key header for webhook handlers (see the Concurrency + idempotency recipe). The platform fails fast if the rendered key is empty (missing input + no literal prefix in the template) — see Troubleshooting for why and how to fix.

Troubleshooting

Before going through this list: run crewship routine doctor <slug> first. Most “blind alley” failures (missing crew provisioning, agent slug typo, missing credential, contradictory validation gate, cost cap too tight) surface as a ✗ check on doctor before they ever cost an LLM call.

Save returns 422 “save requires a fresh, passing test_run”

The save endpoint requires the test gate cleared. Preferred path: call /test_run, capture the save_token from its response, and pass "save_token": "<token>" in the /save body. The token is a server-signed HMAC over (workspace_id | definition_hash | user_id | issued_at) and expires after 5 minutes — it cannot be forged from the client side. Alternatives:

--skip-test-gate (CLI) / "skip_test_gate": true (API) if your role is OWNER/ADMIN and you trust the DSL — bypasses the gate without running the test_run.
Legacy fallback: send "last_test_run_at" + "last_test_run_passed" in the body. Honoured for backwards compatibility, but trusts client-supplied values; new clients should always use save_token.

Run starts but no step events appear in the UI

Check that ?include_steps=1 is in the runs URL the UI fetches. The list endpoint defaults to run-level only to keep payload bounded; the detail panel passes the flag explicitly. After a refresh, the waterfall populates from journal entries plus live WebSocket events.

Schedule shows enabled=true but never fires

The scheduler ticks every 30s; with single-binary deployment, restarting the server resets the in-memory tick cursor. Pending schedules whose next_run_at has passed will fire on the next tick after restart. If you’ve edited the cron expression, next_run_at recomputes from now() — so a 0 9 * * * schedule edited at 10:30 won’t fire until 9 AM tomorrow.

Webhook returns 401 / 403 on a known-good token

HMAC mismatch: check the X-Crewship-Signature: sha256=<hex> header, computed as HMAC-SHA256(signing_secret, request_body) over the raw bytes the sender sent (not a re-serialized form). The server uses hmac.Equal for comparison so timing-safe.

Run cost is higher than `estimated_cost_usd`

Two-tier escalation walks the fallback chain. With on_fail: escalate_tier, a failed Haiku run retries on Sonnet (5-10× more expensive), then Opus (20-50× more expensive) before giving up. Tighten validation gates (loosen must_contain, raise min_length), or set max_cost_usd on the routine to abort the run between steps when a cost ceiling is hit.

”Pipeline X step Y stuck in pending state”

Probably a waitpoint without a matching listener (process restarted between the wait step starting and a decision arriving). Check crewship routine waitpoints list; if listed, decide via approve/reject. The recovery scan at server boot reports stranded waitpoints in the log line pipeline waitpoint store wired (...) stranded_pending=N.

Run returns 500 “pipeline: concurrency_key rendered to empty value”

The DSL declared a non-empty concurrency_key template but the rendered value is an empty string — typically because a referenced input was omitted at trigger time. Full error message:

pipeline: concurrency_key rendered to empty value (referenced input missing or empty): template "{{ inputs.<NAME> }}"

Why fail-fast: a routine that declares concurrency_key: "{{ inputs.account_id }}" is asking the platform to serialise runs per tenant. If account_id is missing, treating the empty key as “no gate” would silently allow unlimited parallelism for a routine the author explicitly asked us to serialise — a denial-of-self by misconfiguration. The executor refuses to start the run. Fixes, in order of preference:

Supply the input. The caller (curl / crewship routine run --inputs '{...}' / scheduler / webhook) needs to pass the referenced input. For webhooks this means the inputs_template in the webhook config must produce it from the incoming payload.
Set a default on the InputSpec. If the input is genuinely optional but you still want a tenant-style gate, add "default": "global" (or any non-empty sentinel) to the InputSpec. The platform merges defaults before rendering the key.
Use a literal prefix. A template like "vendor-alert-{{ inputs.vendor_id }}" always renders non-empty (the literal vendor-alert- survives even when vendor_id is missing); the key still gates, just less granularly.
Drop the gate. If you genuinely don’t want concurrency control, omit concurrency_key entirely (do NOT set it to "" — that’s the unset sentinel).

CLI reference

Command	Purpose
`routine list`	Workspace routine catalog
`routine get <slug>`	Detail with full DSL
`routine save`	Save from JSON file
`routine run <slug>`	Invoke against execution tier (`--tier-override fast/smart/...`)
`routine dry-run <slug>`	Preview without invoking
`routine delete <slug>`	Soft-delete
`routine doctor <slug>`	Preflight checklist (✓/⚠/✗) — catches blind alleys before run
`routine bench <slug>`	N-runs variance characterisation — pass-rate + cost/latency stats
`routine runs <slug>`	Run history from journal
`routine records <slug>`	Run history from `pipeline_runs` projection (filterable by status)
`routine logs <run_id>`	Full journal trace for one run (post-mortem)
`routine watch <slug>`	Live event stream
`routine cancel <run_id>`	Signal in-flight run
`routine versions <slug>`	Version history
`routine rollback <slug> --to N`	Roll back to v N
`routine export <slug>`	JSON bundle to stdout
`routine import [bundle.json]`	Load from file or stdin
`routine validate [file.json]`	Offline DSL check
`routine schedules ...`	Cron CRUD
`routine webhooks ...`	Webhook CRUD
`routine waitpoints ...`	HITL inbox + decisions

For batch evaluation across the eval-* fleet (matrix sweep, head-to-head tier compare, baseline regression diff), see the eval CLI. Cookbook recipe 6 walks the eval-driven promotion workflow end-to-end. The pipeline alias is preserved for back-compat on the legacy subcommands (pipeline list, pipeline run, pipeline get, etc.). The post-rename additions — schedules, webhooks, waitpoints, validate, watch, logs, records, bench, doctor — are only registered under routine, so scripts that need them must switch.

Backend reference

Migrations v78 (pipelines + execution_tiers_json), v79 (versions + waitpoints), v80 (schedules), v81 (run idempotency), v82 (webhooks)
Source: internal/pipeline/ (~10 700 LOC, 36 files)
API: internal/api/pipelines.go, pipeline_runs.go, pipeline_schedules.go, pipeline_webhooks.go
Sidecar: internal/sidecar/pipelines.go (port 9119)
Frontend: app/(dashboard)/routines/, components/features/routines/, hooks use-pipelines*

Limitations (current MVP)

Run state is in-memory — restart loses active runs. Pending waitpoints survive (DB-backed); active step execution does not. A pipeline_runs table with status enum + checkpoint is the next-step PR.
Single-instance scheduler — running multiple replicas would double-fire schedules.
DSL editor is read-only in UI — Monaco write-mode is a follow-up. Authoring through CLI / agent / API works fully.
No cross-adapter tier swap yet — same-provider model swap (Haiku→Opus) works; Claude→Gemini swap requires a shorthand→constant mapping not yet wired.
No NL→cron converter — ops still hand-type 0 9 * * *. Foundation PRD has the design; not in MVP.
No errors fingerprinting / bulk replay — one-by-one rerun via routine run only.

Get Started

Guides

Security

Configuration

Documentation Index

​What is a Routine?

​Three authoring paths

Agent

UI

CLI

​DSL anatomy

​Top-level fields

​InputSpec

​Template substitution

​Step types

​agent_run

​call_pipeline

​http

​wait

​code

​transform

​Conditional if

​DAG with needs[]

​Two-tier execution

​Test-run gate

​Triggers

​Cron schedules

​Webhooks

​Manual

​Dry-run preview

​Versioning + rollback

​Bundle export / import

​HITL waitpoints

​Online eval sampler

​Validation gates and credential leak guards

​Observability

​Per-step cost + duration

​RBAC

​Common patterns

​”Run this routine every day at 9 AM"

​"Trigger this routine from a GitHub Actions hook"

​"Validate routine DSL in CI before committing”

​”Discover what an agent is about to do, without running it”

​”Cancel an in-flight run”

​”Serialise runs per tenant / customer / repo”

​Troubleshooting

​Save returns 422 “save requires a fresh, passing test_run”

​Run starts but no step events appear in the UI

​Schedule shows enabled=true but never fires

​Webhook returns 401 / 403 on a known-good token

​Run cost is higher than estimated_cost_usd

​”Pipeline X step Y stuck in pending state”

​Run returns 500 “pipeline: concurrency_key rendered to empty value”

​CLI reference

​Backend reference

​Limitations (current MVP)

What is a Routine?

Three authoring paths

DSL anatomy

Top-level fields

InputSpec

Template substitution

Step types

`agent_run`

`call_pipeline`

`http`

`wait`

`code`

`transform`

Conditional `if`

DAG with `needs[]`

Two-tier execution

Test-run gate

Triggers

Cron schedules

Webhooks

Manual

Dry-run preview

Versioning + rollback

Bundle export / import

HITL waitpoints

Online eval sampler

Validation gates and credential leak guards

Observability

Per-step cost + duration

RBAC

Common patterns

”Run this routine every day at 9 AM"

"Trigger this routine from a GitHub Actions hook"

"Validate routine DSL in CI before committing”

”Discover what an agent is about to do, without running it”

”Cancel an in-flight run”

”Serialise runs per tenant / customer / repo”

Troubleshooting

Save returns 422 “save requires a fresh, passing test_run”

Run starts but no step events appear in the UI

Schedule shows enabled=true but never fires

Webhook returns 401 / 403 on a known-good token

Run cost is higher than `estimated_cost_usd`

”Pipeline X step Y stuck in pending state”

Run returns 500 “pipeline: concurrency_key rendered to empty value”

CLI reference

Backend reference

Limitations (current MVP)