Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.crewship.ai/llms.txt

Use this file to discover all available pages before exploring further.

What is a Routine?

A Routine is a declarative recipe for repeatable AI work. You author it once (or an agent authors it for you when it spots a repetitive pattern), it runs the same way every time, and it lives in your workspace as a reusable asset alongside crews, skills, and credentials. Each routine bundles:
  • A JSON DSL definition — inputs, outputs, ordered or DAG-structured steps, validation gates, declared egress, credential requirements
  • Authorship metadata — which crew, which agent (or user), via which path
  • Triggers — cron schedules and webhook tokens that fire it autonomously
  • Run history — every invocation as an immutable journal trace with step-level events
  • Versions — every save creates a new version; rollback is one API call
Compared to other layers in Crewship:
LayerScopeAuthoringUse case
RoutineatomicAI-authored OR human DSLOne repeatable AI workflow
Recipe (future)compositeMarketplace templateCrew + agents + integrations + routines bundled
Cyclic Issue (future)issue-trackerrecurring user issue”Standup every Monday” without leaving Issues
The user-facing label is Routine across web UI, CLI, agent system prompts and marketing surfaces. The backend identifier (database table, Go package, HTTP route paths) remains pipelines for backwards compatibility.
Naval theme. Routines are the boring, accurate, repeated procedures a ship’s crew runs every day — the equivalent of naval drills in the Crewship metaphor. Programs your AI agents follow.

Three authoring paths

Crewship is unusual among workflow systems because agents author routines, not just execute them. Three paths converge on the same pipelines table:

Agent

The most common path. An agent that’s solved a repetitive problem twice posts to http://localhost:9119/pipelines/save from inside its container; the sidecar injects authorship and forwards to the main API. Next time [AVAILABLE ROUTINES] block in the system prompt advertises it to other crews.

UI

Open /routines, click + New routine, pick a starter template, edit the DSL JSON, click Test & Save. Test_run runs against the execution tier; on pass the routine is persisted with authored_via=user_api and the JWT user as author.

CLI

crewship routine save --name "..." --definition file.json --author-crew <crew_id>. Same test_run gate as UI. CI-friendly: validate offline first with crewship routine validate file.json, then save.

DSL anatomy

Minimal valid routine:
{
  "dsl_version": "1.0",
  "name": "summarize-text",
  "description": "Summarize input text in 3 bullets.",
  "inputs": [
    { "name": "text", "type": "string", "required": true }
  ],
  "outputs": [
    { "name": "summary", "type": "string" }
  ],
  "steps": [
    {
      "id": "summarize",
      "type": "agent_run",
      "agent_slug": "tomas",
      "complexity": "fast",
      "prompt": "Summarize: {{ inputs.text }}"
    }
  ]
}

Top-level fields

FieldTypeNotes
dsl_versionstringAlways "1.0" for now. Forward-compat field.
namestringSlug-friendly identifier. Workspace-unique.
display_namestringOptional pretty label.
descriptionstringOne-line summary; shown in lists + [AVAILABLE ROUTINES].
inputs[]array of InputSpecDeclared parameters.
outputs[]array of OutputSpecRead from the final step’s output by name. Documentary in MVP.
steps[]array of StepSequential by default; DAG with needs:.
execution_tierobject{preferred, fallback} overriding workspace default.
estimated_cost_usdnumberAuthor estimate; UI surfaces overrun warnings.
egress_targetsarray of stringDeclared outbound hosts. Enforced for http step.
credentials_requiredarray of CredReqDeclared credential needs (type + scope).
max_cost_usdnumberRuntime cost cap; run aborts if exceeded between steps.
concurrency_keystringTemplate that gates how many runs can be in flight at once for the same workspace + rendered key. Typical pattern "{{ inputs.account_id }}" to serialise per-tenant runs. Empty (default) = no gate. See Concurrency + idempotency recipe and the fail-fast note in Troubleshooting.
max_concurrentintegerCap on simultaneous runs sharing the resolved concurrency_key. Defaults to 1 when concurrency_key is set (strict per-key serialisation); ignored otherwise.
guardrailsobjectPer-routine Lookout policy. guardrails.input.prompt_injection.action: block (default) | sanitize | log.
evalobjectContinuous online grading. See Online eval below.

InputSpec

{
  "name": "since",
  "type": "string",
  "required": false,
  "default": "yesterday",
  "description": "ISO date or 'yesterday'/'today'",
  "min": 0,
  "max": 100
}
type is one of string | integer | number | boolean | array | object. min / max are *float64 so decimal constraints work for number types.

Template substitution

Anywhere a string is interpolated (prompt, http URL/body/headers, wait until, code, transform, conditional if), placeholder {{ ... }} resolves against:
  • inputs.X — declared input value
  • steps.Y.output — full text output of an earlier step
  • steps.Y.output.path — JSON path into a step’s output (when output parses as JSON)
  • env.AUTHOR_CREW_NAME / etc. — read-only allowlist of execution context
Save-time validator walks every template-bearing field (prompt, nested inputs, http url/body/headers, wait until, event_filter, approval_prompt, code body, code env values, transform input, transform expression, if condition) and rejects placeholders that reference unknown inputs or unseen-yet steps.
Templates are regex substitution, not expression evaluation. There is no arithmetic, no function calls, no inline conditionals. If you need logic, add another step.

Step types

agent_run

{
  "id": "summarize",
  "type": "agent_run",
  "agent_slug": "tomas",
  "complexity": "fast",
  "prompt": "Summarize: {{ inputs.text }}",
  "model_override": null,
  "timeout_seconds": 60,
  "validation": {
    "schema": { "type": "object", "required": ["title"] },
    "must_not_contain": ["API_KEY=", "Bearer "],
    "min_length": 30
  },
  "on_fail": "escalate_tier"
}
complexity resolves through the workspace’s execution_tiers_json mapping into (adapter, model). With complexity: "fast", the agent’s CLI gets --model claude-haiku-4-5-20251001 (or whatever the workspace mapped fast to). model_override is the explicit pin that wins over complexity. on_fail lives at the step level (not inside validation) and is one of escalate_tier | abort | retry_stepescalate_tier walks the fallback chain (e.g., Haiku → Sonnet → Opus) until validation passes or the chain exhausts.

call_pipeline

{
  "id": "review_summary",
  "type": "call_pipeline",
  "pipeline_slug": "human-approval-step",
  "nested_inputs": {
    "content": "{{ steps.summarize.output }}",
    "approver_role": "marketing_lead"
  }
}
Composition primitive. Save-time cycle detection rejects loops; runtime depth limit caps at 10 levels.

http

{
  "id": "fetch",
  "type": "http",
  "http": {
    "method": "GET",
    "url": "{{ inputs.url }}",
    "max_response_bytes": 200000,
    "success_codes": [200],
    "credential_ref": { "type": "github", "inject_as": "bearer" }
  },
  "timeout_seconds": 30
}
Egress allowlist enforced both pre-flight AND on every redirect (CheckRedirect callback). Allowlist set per-workspace. Credential injection schemes: bearer (default), header with explicit name, query with explicit name.

wait

{
  "id": "human_approval",
  "type": "wait",
  "wait": {
    "kind": "approval",
    "approval_prompt": "Approve summary?",
    "timeout_sec": 86400
  }
}
Three kinds: approval (HITL token), datetime (sleep until ISO timestamp), event (filter on journal events). DB-backed waitpoint store survives process restart for approval kind; the recovery scan at boot reports stranded pending waitpoints. Approve via UI Inbox, CLI crewship routine waitpoints approve <token>, or API.

code

{
  "id": "process",
  "type": "code",
  "code": {
    "runtime": "python",
    "code": "import json\nprint(json.dumps({'sum': sum(input_nums)}))"
  }
}
Runs in a sandboxed container with --cap-drop=ALL, no host mounts, network constrained by the same egress allowlist. Inputs auto-mapped to CREWSHIP_INPUT_<NAME> env vars.

transform

{
  "id": "extract_emails",
  "type": "transform",
  "transform": {
    "input": "{{ steps.fetch.output }}",
    "expression": ".items[] | .email"
  }
}
Pure-Go data reshaping with a tiny jq-flavoured subset. No LLM, no network — fully deterministic. Useful for wiring step outputs without paying for another agent_run.

Conditional if

Any step can carry "if": "{{ inputs.run_summary }}". The placeholder is rendered as a plain string — empty / false / 0 / no / off (case-insensitive) → step skipped and marked <skipped> in StepOutputs; anything else counts as truthy. Templates are plain substitution, not expression evaluation, so put the boolean upstream (e.g. set inputs.run_summary from the caller) rather than writing == / != inside the if value.

DAG with needs[]

{ "id": "fetch_a", "type": "http", "http": {...} },
{ "id": "fetch_b", "type": "http", "http": {...} },
{ "id": "merge",
  "type": "agent_run",
  "needs": ["fetch_a", "fetch_b"],
  "prompt": "Merge: {{ steps.fetch_a.output }} + {{ steps.fetch_b.output }}"
}
Steps with no overlapping needs execute in parallel (one goroutine wave per ready set). Final output picks the unique leaf node; for multi-leaf DAGs the first leaf in source order wins.

Two-tier execution

The economic value-prop: an Opus-class authoring model designs the routine, a Haiku-class executor model runs each invocation. Workspace execution_tiers_json maps complexity classes to (adapter, model):
{
  "trivial":  { "primary": {"adapter":"claude","model":"claude-haiku-4-5-20251001"} },
  "fast":     { "primary": {"adapter":"claude","model":"claude-haiku-4-5-20251001"},
                "fallback":[{"adapter":"claude","model":"claude-sonnet-4-6"}] },
  "moderate": { "primary": {"adapter":"claude","model":"claude-sonnet-4-6"} },
  "smart":    { "primary": {"adapter":"claude","model":"claude-opus-4-7"} }
}
Per-step complexity annotation drives the resolver. With on_fail: "escalate_tier", a failed validation walks the fallback chain — practically: Haiku tries first, Sonnet on validation fail, Opus on second fail.
Tier override at runtime. The CLI flag --model <model> is constructed from the resolved tier and passed to the agent’s CLI adapter, so a routine’s complexity: "fast" actually fires Haiku, not the agent’s default. CLIAdapter is preserved (so the agent’s CLAUDE_CODE / GEMINI_CLI / etc. wiring stays intact); only the model name swaps.

Test-run gate

Save endpoints (sidecar /pipelines/save, user /api/v1/workspaces/{ws}/pipelines/save, internal /api/v1/internal/pipelines/save) require a fresh passing test_run within the last 5 minutes — otherwise inline runs one against the execution tier. This is the self-improvement loop: an authoring agent that writes brittle DSL gets a structured failure report it can read and revise from. Without the gate, MVP would ship pipelines that pass schema but fail at runtime. skip_test_gate: true is honored only when the caller’s role is OWNER or ADMIN; lower roles get 403. Useful for hand-crafted DSL from known-good templates (the seed flow uses it).

Triggers

Cron schedules

crewship routine schedules create --slug summarize-text \
    --name "daily-summary" --cron "0 9 * * *" --timezone "Europe/Prague" \
    --inputs '{"text":"…default"}'
5-field cron expression. Scheduler runs in-process and ticks every 30s, so minimum resolution is 1 minute. Single-instance only — running multiple replicas would double-fire (no leader election yet).

Webhooks

crewship routine webhooks create --slug pr-review-structured \
    --hmac-secret "$(openssl rand -hex 32)" --rate-limit 30
Output reveals the public URL and signing secret once (Stripe-style). External services POST event payload to /api/v1/webhooks/{token}. With HMAC, sender includes header X-Crewship-Signature: sha256=<hex_hmac_of_body>, validated server-side via hmac.Equal (timing-safe). Rate limited per token, per minute, default 60. To rotate the secret: delete the webhook + create a new one. There is no in-place rotation by design.

Manual

crewship routine run <slug> --inputs '{...}' or click the Run button in the UI detail panel. Same execution path.

Dry-run preview

Three execution modes, distinguished by Mode in the request body and surface:
ModeSide effectsIncrements invocation_countCost
runyes (agents called)yesreal
test_runyes (agents called)noreal
dry_runno (templates rendered, agents skipped)noestimated
Dry-run is the safe “what would this routine do?” preview. It walks the DSL, renders all template substitutions against the supplied inputs, resolves each step’s execution tier (adapter + model), and reports a would_execute list with per-step estimated cost. No agents are invoked, no journal entries beyond a single pipeline.dry_run audit row are written.
crewship routine dry-run email-fetch --inputs '{"since":"yesterday"}'
In the UI: click Dry run in the routine detail panel. The would_execute report renders inline above the tab bar with per-step:
  • Step ID + type
  • Resolved tier_adapter:tier_model (e.g. claude:claude-haiku-4-5)
  • Estimated cost in USD (order-of-magnitude only — the executor uses a flat token-density heuristic, not real pricing)
  • would_call_agent / would_call_pipeline target
The estimate is intentionally labelled “estimated” everywhere it surfaces — it’s a planning aid, not a quote. Real cost only lands once you switch to run or test_run.

Versioning + rollback

Every save creates a new immutable row in pipeline_versions (v79 migration). The pipelines.head_version column points at the current. Rollback creates a NEW version on top of HEAD whose definition equals the target’s:
crewship routine versions email-fetch
crewship routine rollback email-fetch --to 3
History is preserved — you can roll forward to a future version by another rollback. There is no “delete version” — if a version was bad, the trail of “v3 → v4 (rollback to v2) → v5 (fix)” is the audit story you keep.

Bundle export / import

Routines are portable across workspaces:
# Export from workspace A
crewship routine export email-fetch --include-history > email-fetch.json

# Import into workspace B
cat email-fetch.json | crewship routine import
The bundle format is crewship-pipeline-bundle/v1: routine row + (optionally) the full version chain + change_summary annotations. Author identity is rewritten on import so the importing user becomes the new author. Slug is preserved; if it conflicts in the destination workspace the existing row updates (new version), or you change the bundle’s slug before import.

HITL waitpoints

A routine that includes a wait step of kind: approval parks the run goroutine on a DB-backed waitpoint. Operators decide via three paths:
# CLI
crewship routine waitpoints list
crewship routine waitpoints approve <token> --comment "LGTM"
crewship routine waitpoints reject  <token> --comment "needs revision"

# UI: /routines → routine → Wait sub-tab → click Approve / Reject
# API: POST /api/v1/workspaces/{ws}/pipelines/waitpoints/{token}/approve
The decision comment is forwarded to the parked run as the wait step’s output, so downstream steps can read approval rationale via {{ steps.<wait_step_id>.output }}.

Online eval sampler

The online sampler watches completed routine runs and grades a configurable percentage of them through the existing rubric grader so production traffic continuously feeds the drift detector — not just on-demand replays or scheduled regression suites. Per-routine DSL:
name: nightly-summary
eval:
  online:
    sample_rate: 0.05              # grade 5% of runs; 0 disables, 1.0 grades every run
    grader_agent_slug: qa-grader   # references an agent in the AUTHOR crew with a rubric prompt
What happens on a tick (default cadence: every 1 minute):
  1. Scan pipeline_runs WHERE status = 'completed' AND completed_at > watermark.
  2. For each candidate, resolve the routine DSL. If eval.online is absent or sample_rate <= 0, skip.
  3. Draw from crypto/rand. If the sample lands above sample_rate, skip.
  4. Otherwise, INSERT into eval_runs with kind = 'online', status = 'queued'. The existing grader worker picks it up and writes the result back.
FieldNotes
sample_rateFloat [0, 1]. 0 disables — useful as an incident “pause grading” toggle. 1.0 is expensive but ok for newly-launched routines while calibrating the grader. Realistic prod default: 0.05.
grader_agent_slugRequired when sample_rate > 0. Missing grader is treated as a deterministic skip (logged once, no retry storm).
Correctness guarantees:
  • Schema-layer idempotencyeval_runs has a partial UNIQUE INDEX (pipeline_run_id) WHERE kind='online'. A duplicate sampler instance or a crash-recovery watermark replay can attempt the same enqueue twice; the second collapses to a no-op rather than queueing twice and double-billing the grader.
  • (completed_at, id) tuple cursor — parallel fan-out steps that complete at the same nanosecond all get graded; a timestamp-only cursor would orphan siblings.
  • Stuck-on-error watermark — a transient per-row error (resolver outage, entropy outage, enqueue conflict) freezes the watermark at the row before the failure so the next tick retries. Deterministic skips (no eval config, sample roll missed) advance normally.
  • Trace correlation — the eval row carries routine_slug + pipeline_run_id so an operator clicking a low-scoring grade in the eval UI lands on the actual trace via pipeline_run_id -> trace.
Limitations:
  • Watermark is in-memory only. On process restart it resets to now - 1h; outages longer than that leave a gap in grading coverage. The UNIQUE index keeps reprocessing harmless.
  • Page cap is 500 rows/tick. A workspace processing > 500 routine runs/minute at sample_rate = 1.0 will accumulate backlog; the watermark only advances past successfully-handled rows, so the backlog drains across subsequent ticks rather than being lost.

Validation gates and credential leak guards

Each step’s validation block runs after the step output materializes. The schema is a JSON Schema draft 2020-12 subset (most of type, required, properties, items, pattern, format, enum, etc.) plus three Crewship extensions:
  • must_not_contain: ["API_KEY=", "Bearer "] — output must include none of these substrings
  • must_contain: ["##"] — output must include all of these
  • min_length / max_length — convenience for non-JSON outputs
The must_not_contain gate is the credential-leak tripwire: if an agent is about to leak a real API key in its output, the gate fails the step before downstream consumers see it. Pair with on_fail: "abort" (set at the step level, alongside validation) for hard stop, or escalate_tier if you want the higher model to retry without the leak.

Observability

Every routine run emits a sequence of journal entries:
  • pipeline.run.started — once at run begin
  • pipeline.step.started — per step
  • pipeline.step.completed / pipeline.step.failed / pipeline.step.validation_failed — per step terminal
  • pipeline.run.completed / pipeline.run.failed — once at run end
Plus WebSocket broadcast on the workspace channel for live UI updates (PipelineRunNode in the orchestration graph + Runs sub-tab waterfall both subscribe).

Per-step cost + duration

Both the UI Runs sub-tab waterfall and crewship routine logs <run_id> --slug X surface the cost_usd and duration_ms fields the executor stamps on every pipeline.step.completed event. Same data, two presentation surfaces:
  • UI: right-aligned columns next to each step row, with a footer total summing the run. An em-dash () renders for events that don’t carry cost (pipeline.step.started, .failed, live-only echoes) — easier to scan than $0.0000 next to real values.
  • CLI logs: extra DURATION and COST columns in the timeline output. Same em-dash rule for non-positive values.
Worker tier escalation is visible too — a failed validation gate that retries on a higher tier emits a second step.started+step.completed pair with its own cost on the retry, so the column-summed footer reflects the full spend (including retries), not just the first attempt. Three terminal-side observability surfaces, ordered by when you’d reach for each:
PhaseCommandWhy
Before runcrewship routine doctor <slug>Preflight ✓/⚠/✗ checklist — catches missing crew provisioning, agent slug typos, missing credentials, contradictory validation gates, tight cost caps. Cheap to run, returns in milliseconds, fails CI builds on FAIL.
During runcrewship routine watch <slug>Polls the runs endpoint and prints events as an ANSI-coloured timeline. --json for JSON Lines piping into jq; --once exits after the first run terminates (CI-friendly).
After runcrewship routine logs <run_id> --slug XOne-shot post-mortem dump — every step’s prompt, output, validation verdict, cost. Use to investigate a specific failure surfaced by watch or by the UI.
For variance characterisation across many runs (is this routine production-ready at this tier?), use crewship routine bench. For matrix-level cross-tier consistency (which scenarios diverge between Haiku and Opus?), use crewship eval scenarios and crewship eval baseline diff for CI regression gates.

RBAC

RoleReadRunSaveSchedules / Webhooksskip_test_gate
VIEWER
MEMBER
MANAGER
ADMIN
OWNER

Common patterns

”Run this routine every day at 9 AM"

crewship routine schedules create --slug daily-status-digest \
    --cron "0 9 * * *" --timezone "Europe/Prague"

"Trigger this routine from a GitHub Actions hook"

crewship routine webhooks create --slug pr-review-structured \
    --hmac-secret "$(openssl rand -hex 32)" --rate-limit 30
# Use the printed token + secret as Settings → Webhooks in the GH repo.

"Validate routine DSL in CI before committing”

- run: crewship routine validate routines/*.json
Exit code 1 on any invalid file.

”Discover what an agent is about to do, without running it”

crewship routine dry-run summarize-text --inputs '{"text":"..."}'
Returns a would_execute report with which agent, which tier, the rendered prompt, and an estimated cost. Zero side effects.

”Cancel an in-flight run”

crewship routine cancel <run_id>
Signals the run goroutine; it stops at the next safe point and emits pipeline.run.failed with reason “cancelled”.

”Serialise runs per tenant / customer / repo”

Use concurrency_key with a template referencing the tenant-identifying input:
{
  "concurrency_key": "{{ inputs.account_id }}",
  "max_concurrent": 1,
  "inputs": [
    { "name": "account_id", "type": "string", "required": true }
  ]
}
Two requests for the same account_id queue rather than fan out; requests for different account_ids run in parallel. Pair with the Idempotency-Key header for webhook handlers (see the Concurrency + idempotency recipe). The platform fails fast if the rendered key is empty (missing input + no literal prefix in the template) — see Troubleshooting for why and how to fix.

Troubleshooting

Before going through this list: run crewship routine doctor <slug> first. Most “blind alley” failures (missing crew provisioning, agent slug typo, missing credential, contradictory validation gate, cost cap too tight) surface as a ✗ check on doctor before they ever cost an LLM call.

Save returns 422 “save requires a fresh, passing test_run”

The save endpoint requires the test gate cleared. Preferred path: call /test_run, capture the save_token from its response, and pass "save_token": "<token>" in the /save body. The token is a server-signed HMAC over (workspace_id | definition_hash | user_id | issued_at) and expires after 5 minutes — it cannot be forged from the client side. Alternatives:
  • --skip-test-gate (CLI) / "skip_test_gate": true (API) if your role is OWNER/ADMIN and you trust the DSL — bypasses the gate without running the test_run.
  • Legacy fallback: send "last_test_run_at" + "last_test_run_passed" in the body. Honoured for backwards compatibility, but trusts client-supplied values; new clients should always use save_token.

Run starts but no step events appear in the UI

Check that ?include_steps=1 is in the runs URL the UI fetches. The list endpoint defaults to run-level only to keep payload bounded; the detail panel passes the flag explicitly. After a refresh, the waterfall populates from journal entries plus live WebSocket events.

Schedule shows enabled=true but never fires

The scheduler ticks every 30s; with single-binary deployment, restarting the server resets the in-memory tick cursor. Pending schedules whose next_run_at has passed will fire on the next tick after restart. If you’ve edited the cron expression, next_run_at recomputes from now() — so a 0 9 * * * schedule edited at 10:30 won’t fire until 9 AM tomorrow.

Webhook returns 401 / 403 on a known-good token

HMAC mismatch: check the X-Crewship-Signature: sha256=<hex> header, computed as HMAC-SHA256(signing_secret, request_body) over the raw bytes the sender sent (not a re-serialized form). The server uses hmac.Equal for comparison so timing-safe.

Run cost is higher than estimated_cost_usd

Two-tier escalation walks the fallback chain. With on_fail: escalate_tier, a failed Haiku run retries on Sonnet (5-10× more expensive), then Opus (20-50× more expensive) before giving up. Tighten validation gates (loosen must_contain, raise min_length), or set max_cost_usd on the routine to abort the run between steps when a cost ceiling is hit.

”Pipeline X step Y stuck in pending state”

Probably a waitpoint without a matching listener (process restarted between the wait step starting and a decision arriving). Check crewship routine waitpoints list; if listed, decide via approve/reject. The recovery scan at server boot reports stranded waitpoints in the log line pipeline waitpoint store wired (...) stranded_pending=N.

Run returns 500 “pipeline: concurrency_key rendered to empty value”

The DSL declared a non-empty concurrency_key template but the rendered value is an empty string — typically because a referenced input was omitted at trigger time. Full error message:
pipeline: concurrency_key rendered to empty value (referenced input missing or empty): template "{{ inputs.<NAME> }}"
Why fail-fast: a routine that declares concurrency_key: "{{ inputs.account_id }}" is asking the platform to serialise runs per tenant. If account_id is missing, treating the empty key as “no gate” would silently allow unlimited parallelism for a routine the author explicitly asked us to serialise — a denial-of-self by misconfiguration. The executor refuses to start the run. Fixes, in order of preference:
  1. Supply the input. The caller (curl / crewship routine run --inputs '{...}' / scheduler / webhook) needs to pass the referenced input. For webhooks this means the inputs_template in the webhook config must produce it from the incoming payload.
  2. Set a default on the InputSpec. If the input is genuinely optional but you still want a tenant-style gate, add "default": "global" (or any non-empty sentinel) to the InputSpec. The platform merges defaults before rendering the key.
  3. Use a literal prefix. A template like "vendor-alert-{{ inputs.vendor_id }}" always renders non-empty (the literal vendor-alert- survives even when vendor_id is missing); the key still gates, just less granularly.
  4. Drop the gate. If you genuinely don’t want concurrency control, omit concurrency_key entirely (do NOT set it to "" — that’s the unset sentinel).

CLI reference

CommandPurpose
routine listWorkspace routine catalog
routine get <slug>Detail with full DSL
routine saveSave from JSON file
routine run <slug>Invoke against execution tier (--tier-override fast/smart/...)
routine dry-run <slug>Preview without invoking
routine delete <slug>Soft-delete
routine doctor <slug>Preflight checklist (✓/⚠/✗) — catches blind alleys before run
routine bench <slug>N-runs variance characterisation — pass-rate + cost/latency stats
routine runs <slug>Run history from journal
routine records <slug>Run history from pipeline_runs projection (filterable by status)
routine logs <run_id>Full journal trace for one run (post-mortem)
routine watch <slug>Live event stream
routine cancel <run_id>Signal in-flight run
routine versions <slug>Version history
routine rollback <slug> --to NRoll back to v N
routine export <slug>JSON bundle to stdout
routine import [bundle.json]Load from file or stdin
routine validate [file.json]Offline DSL check
routine schedules ...Cron CRUD
routine webhooks ...Webhook CRUD
routine waitpoints ...HITL inbox + decisions
For batch evaluation across the eval-* fleet (matrix sweep, head-to-head tier compare, baseline regression diff), see the eval CLI. Cookbook recipe 6 walks the eval-driven promotion workflow end-to-end. The pipeline alias is preserved for back-compat on the legacy subcommands (pipeline list, pipeline run, pipeline get, etc.). The post-rename additions — schedules, webhooks, waitpoints, validate, watch, logs, records, bench, doctor — are only registered under routine, so scripts that need them must switch.

Backend reference

  • Migrations v78 (pipelines + execution_tiers_json), v79 (versions + waitpoints), v80 (schedules), v81 (run idempotency), v82 (webhooks)
  • Source: internal/pipeline/ (~10 700 LOC, 36 files)
  • API: internal/api/pipelines.go, pipeline_runs.go, pipeline_schedules.go, pipeline_webhooks.go
  • Sidecar: internal/sidecar/pipelines.go (port 9119)
  • Frontend: app/(dashboard)/routines/, components/features/routines/, hooks use-pipelines*

Limitations (current MVP)

  • Run state is in-memory — restart loses active runs. Pending waitpoints survive (DB-backed); active step execution does not. A pipeline_runs table with status enum + checkpoint is the next-step PR.
  • Single-instance scheduler — running multiple replicas would double-fire schedules.
  • DSL editor is read-only in UI — Monaco write-mode is a follow-up. Authoring through CLI / agent / API works fully.
  • No cross-adapter tier swap yet — same-provider model swap (Haiku→Opus) works; Claude→Gemini swap requires a shorthand→constant mapping not yet wired.
  • No NL→cron converter — ops still hand-type 0 9 * * *. Foundation PRD has the design; not in MVP.
  • No errors fingerprinting / bulk replay — one-by-one rerun via routine run only.