> ## Documentation Index
> Fetch the complete documentation index at: https://docs.crewship.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Routines

> Declarative AI workflow recipes — AI-authored, repeatable, with cron/webhook/manual triggers, two-tier execution, HITL waitpoints, immutable versions.

## What is a Routine?

A **Routine** is a declarative recipe for repeatable AI work. You author it once (or an agent authors it for you when it spots a repetitive pattern), it runs the same way every time, and it lives in your workspace as a reusable asset alongside crews, skills, and credentials.

Each routine bundles:

* A **JSON DSL definition** — inputs, outputs, ordered or DAG-structured steps, validation gates, declared egress, credential requirements
* **Authorship metadata** — which crew, which agent (or user), via which path
* **Triggers** — cron schedules and webhook tokens that fire it autonomously
* **Run history** — every invocation as an immutable journal trace with step-level events
* **Versions** — every save creates a new version; rollback is one API call

Compared to other layers in Crewship:

| Layer                       | Scope         | Authoring                | Use case                                        |
| --------------------------- | ------------- | ------------------------ | ----------------------------------------------- |
| **Routine**                 | atomic        | AI-authored OR human DSL | One repeatable AI workflow                      |
| **Recipe** *(future)*       | composite     | Marketplace template     | Crew + agents + integrations + routines bundled |
| **Cyclic Issue** *(future)* | issue-tracker | recurring user issue     | "Standup every Monday" without leaving Issues   |

The user-facing label is **Routine** across web UI, CLI, agent system prompts and marketing surfaces. The backend identifier (database table, Go package, HTTP route paths) remains `pipelines` for backwards compatibility.

<Note>
  **Naval theme.** Routines are the boring, accurate, repeated procedures a ship's crew runs every day — the equivalent of [naval drills](https://en.wikipedia.org/wiki/Drill_\(naval\)) in the Crewship metaphor. Programs your AI agents follow.
</Note>

## Three authoring paths

Crewship is unusual among workflow systems because **agents author routines**, not just execute them. Three paths converge on the same `pipelines` table:

<CardGroup cols={3}>
  <Card title="Agent" icon="robot">
    The most common path. An agent that's solved a repetitive problem twice posts to `http://localhost:9119/pipelines/save` from inside its container; the sidecar injects authorship and forwards to the main API. Next time `[AVAILABLE ROUTINES]` block in the system prompt advertises it to other crews.

    You can also just **describe a routine in chat** — "make a routine that summarizes yesterday's commits and posts to Slack." A crew **Lead** carries the bundled **`routine-author`** skill (an authoring playbook): it clarifies the essentials, grounds the DSL in the crew's `[CONNECTED INTEGRATIONS]` and `[AVAILABLE ROUTINES]`, writes and test-runs the routine, then tells you whether it went live or landed as a **proposed** routine for a Manager to approve (see [Governance](#governance--agent-proposes-human-approves-the-risky-ones)).
  </Card>

  <Card title="UI" icon="browser">
    Open `/routines`, click **+ New routine**, pick a starter template, edit the DSL JSON, click **Test & Save**. Test\_run runs against the execution tier; on pass the routine is persisted with `authored_via=user_api` and the JWT user as author. An **existing** routine's DSL is also fully editable in-place from its detail page's **Editor / JSON** tab (CodeMirror, format + revert + copy) — see the note below on how that path's Save differs from Test & Save.
  </Card>

  <Card title="CLI" icon="terminal">
    `crewship routine save --name "..." --definition file.json --author-crew <crew_id>`. The server validates the DSL on save (same gate as the UI). CI-friendly: validate offline first with `crewship routine validate file.json`, then save.
  </Card>
</CardGroup>

<Note>
  **Editing an existing routine's DSL bypasses the test-run gate.** The **Editor / JSON** tab's Save button posts straight to `/pipelines/save` with `skip_test_gate: true` — it does not send the definition through `test_run` first. The server only honors that flag for **OWNER/ADMIN** (lower roles get `403`), so this is a fast lane for trusted operators, not a hole in the gate: a MEMBER/MANAGER can view and copy the DSL from this tab but can't save an edit without going through the create flow's Test & Save. A follow-up will chain `test_run → save_token → save` behind one button so any MANAGER+ role can edit safely.
</Note>

## DSL anatomy

Minimal valid routine:

```json theme={null}
{
  "dsl_version": "1.0",
  "name": "summarize-text",
  "description": "Summarize input text in 3 bullets.",
  "inputs": [
    { "name": "text", "type": "string", "required": true }
  ],
  "outputs": [
    { "name": "summary", "type": "string" }
  ],
  "steps": [
    {
      "id": "summarize",
      "type": "agent_run",
      "agent_slug": "tomas",
      "complexity": "fast",
      "prompt": "Summarize: {{ inputs.text }}"
    }
  ]
}
```

### Top-level fields

| Field                   | Type                             | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| ----------------------- | -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `dsl_version`           | string                           | Always `"1.0"` for now. Forward-compat field.                                                                                                                                                                                                                                                                                                                                                                                             |
| `name`                  | string                           | Slug-friendly identifier. Workspace-unique.                                                                                                                                                                                                                                                                                                                                                                                               |
| `display_name`          | string                           | Optional pretty label.                                                                                                                                                                                                                                                                                                                                                                                                                    |
| `description`           | string                           | One-line summary; shown in lists + `[AVAILABLE ROUTINES]`.                                                                                                                                                                                                                                                                                                                                                                                |
| `inputs[]`              | array of [InputSpec](#inputspec) | Declared parameters.                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `outputs[]`             | array of OutputSpec              | Read from the final step's output by name. Documentary in MVP.                                                                                                                                                                                                                                                                                                                                                                            |
| `steps[]`               | array of [Step](#step-types)     | Sequential by default; DAG with `needs:`.                                                                                                                                                                                                                                                                                                                                                                                                 |
| `execution_tier`        | object                           | `{preferred, fallback}` overriding workspace default.                                                                                                                                                                                                                                                                                                                                                                                     |
| `estimated_cost_usd`    | number                           | Author estimate; UI surfaces overrun warnings.                                                                                                                                                                                                                                                                                                                                                                                            |
| `egress_targets`        | array of string                  | Declared outbound hosts. Enforced at run time for every `http` step and `http` hook, including redirect hops; empty/omitted = unrestricted at the routine layer (the crew network policy and SSRF guards still apply). See [`http`](#http).                                                                                                                                                                                               |
| `integrations_required` | array of string                  | Declared third-party integrations (Composio connector slugs like `"github"`, `"slack"`). **Declared AND enforced at run time** — see [Required integrations](#required-integrations) below. Slugs are lowercased/trimmed.                                                                                                                                                                                                                 |
| `credentials_required`  | array of CredReq                 | Declared credential needs (`type` + `scope`). Declared-only today (not yet enforced at run time).                                                                                                                                                                                                                                                                                                                                         |
| `resources`             | object                           | Agent-declared capability manifest — `datastores[]` and `tools[]` the routine touches that **can't** be inferred from the step graph. **Declared AND enforced at run time** against the crew's container resources — see [Required resources](#required-resources) below. See [Capability manifest](#capability-manifest) below for the shape.                                                                                            |
| `max_cost_usd`          | number                           | Runtime cost cap; run aborts if exceeded between steps.                                                                                                                                                                                                                                                                                                                                                                                   |
| `concurrency_key`       | string                           | Template that gates how many runs can be in flight at once for the same workspace + rendered key. Typical pattern `"{{ inputs.account_id }}"` to serialise per-tenant runs. Empty (default) = no gate. See [Concurrency + idempotency](/guides/routines-cookbook#recipe-5-idempotent-routine-with-concurrency-key) recipe and the fail-fast note in [Troubleshooting](#run-returns-500-pipeline-concurrency_key-rendered-to-empty-value). |
| `max_concurrent`        | integer                          | Cap on simultaneous runs sharing the resolved `concurrency_key`. Defaults to `1` when `concurrency_key` is set (strict per-key serialisation); ignored otherwise.                                                                                                                                                                                                                                                                         |
| `guardrails`            | object                           | Per-routine [Lookout](/guides/lookout) policy. `guardrails.input.prompt_injection.action`: `block` (default) \| `sanitize` \| `log`.                                                                                                                                                                                                                                                                                                      |
| `eval`                  | object                           | Continuous online grading. See [Online eval](#online-eval-sampler) below.                                                                                                                                                                                                                                                                                                                                                                 |
| `agentless`             | boolean                          | Token-zero guarantee: the routine may only contain `http` / `code` / `wait` / `transform` steps and can never invoke an LLM. See [Agentless routines](#agentless-routines).                                                                                                                                                                                                                                                               |

### InputSpec

```json theme={null}
{
  "name": "since",
  "type": "string",
  "required": false,
  "default": "yesterday",
  "description": "ISO date or 'yesterday'/'today'",
  "min": 0,
  "max": 100
}
```

`type` is one of `string | integer | number | boolean | array | object`. `min` / `max` are `*float64` so decimal constraints work for `number` types.

### Template substitution

Anywhere a string is interpolated (prompt, http URL/body/headers, wait until, code, transform, conditional `if`), placeholder `{{ ... }}` resolves against:

* `inputs.X` — declared input value
* `steps.Y.output` — full text output of an earlier step
* `steps.Y.output.path` — JSON path into a step's output (when output parses as JSON)
* `env.AUTHOR_CREW_NAME` / etc. — read-only allowlist of execution context

Save-time validator walks **every** template-bearing field (prompt, nested inputs, http url/body/headers, wait until, event\_filter, approval\_prompt, code body, code env values, transform input, transform expression, if condition) and rejects placeholders that reference unknown inputs or unseen-yet steps.

<Warning>
  Templates are **regex substitution, not expression evaluation**. There is no arithmetic, no function calls, no inline conditionals. If you need logic, add another step.
</Warning>

<Info>
  Save-time validation walks **every** template-bearing field and rejects placeholders that reference unknown inputs or unseen-yet steps — so a typo'd `{{ inputs.tyop }}` fails at save, not at 3am in production.
</Info>

## Required integrations

A routine can DECLARE the third-party integrations (Composio connectors) it
needs with the top-level `integrations_required` array, and the run path will
**block** a run when the executing crew hasn't connected one of them. This
closes the "integration forgotten" gap — the same shape as `egress_targets`,
but for connectors instead of hosts.

```json theme={null}
{
  "name": "triage-incident",
  "integrations_required": ["slack", "github"],
  "steps": [ ... ]
}
```

Semantics:

* **Declared is always allowed.** Saving a routine that names an integration
  the crew lacks is fine — declaring is a contract, not a connection. Only the
  *run* enforces. (Save-time validation only checks the list is well-formed:
  non-empty slugs, lowercased/trimmed, within a sane count cap.)
* **Enforced at run time.** Before a run starts, Crewship resolves the
  integrations the routine's **author crew** has connected and compares them to
  `integrations_required`. If any are missing the run is **blocked** with an
  RFC 7807 Problem Details response, HTTP **422**, carrying a machine-readable
  `missing_integrations: string[]` member and a human `detail` like
  `routine requires integration "slack" not connected for crew "Marketing"`.
  The UI uses `missing_integrations` to render a **Connect** action. The run
  never starts — no tokens spent.
* **No-op fast path.** An empty / absent `integrations_required` does zero
  resolution work — no overhead for routines that don't use it.
* **Fail-open.** If integration-availability resolution itself errors, the run
  is **allowed** (a warning is logged). A bug in resolution must never wedge
  every run of every routine — a forgotten integration is a soft, recoverable
  failure; a hard block on all runs would be a self-inflicted outage.
* **`run` is gated; `dry_run` is not.** A live `run` executes against the
  author crew's agents, so it's gated (fail fast rather than land an
  unrunnable routine). The internal save gate's draft validation applies the
  same integration check. `dry_run` is a preview that invokes nothing, so it's
  left ungated — it shows what the routine *would* need, even integrations the
  crew hasn't connected yet.

<Note>
  Integration availability is resolved from the crew's connected Composio
  connectors (the MCP server rows the bind flow writes). Two limitations:
  under the workspace **default connector** (every agent inherits all connected
  apps), the gate treats integrations as available without enumerating them; and
  resolution reflects what's **wired**, not live connection health (a revoked
  account still reads as available until its binding is removed).
  `credentials_required` remains **declared-only** for now — it is not yet
  enforced at run time the way `integrations_required` is.
</Note>

## Capability manifest

Every routine has a derived **capability manifest** — the full "what this
routine touches" blast radius. The detail API (`GET .../routines/{id}`) returns
it under a `manifest` member so the UI can render a data-flow diagram and
governance can reason about the whole footprint of a run, not just its visible
steps.

Most of the manifest is **auto-derived** from the DSL — you don't declare it:

| Manifest field          | Derived from                                                                                            |
| ----------------------- | ------------------------------------------------------------------------------------------------------- |
| `integrations`          | `integrations_required` (normalized)                                                                    |
| `egress`                | `egress_targets` **plus** any host parseable from `http` step URLs (templated `{{ }}` URLs are skipped) |
| `credentials`           | `credentials_required`                                                                                  |
| `agents`                | `agent_slug` of every `agent_run` step                                                                  |
| `routines`              | `pipeline_slug` of every `call_pipeline` step                                                           |
| `tools`                 | `code` step runtimes **plus** declared `resources.tools`                                                |
| `datastores`            | declared `resources.datastores`                                                                         |
| `has_http` / `has_code` | whether any `http` / `code` step exists                                                                 |

The walk covers routine-level (`before_all` / `after_all` / `on_failure`) and
per-step (`before` / `after`) lifecycle hooks too — a capability hidden in an
`on_failure` hook is still part of the blast radius. Every list is deduped +
sorted and always rendered as `[]` (never `null`), so the diagram is stable.

### The `resources` block

Two things can't be inferred from the step graph: **datastores** a routine
reads/writes, and **CLI tools/scripts** it runs. In production, `code` steps
aren't wired — agents run scripts (`ansible`, `kubectl`, …) via an `agent_run`
step that shells out, so static analysis can't see them. Declare them so they
show up in the manifest:

```json theme={null}
{
  "name": "deploy-service",
  "resources": {
    "datastores": [
      { "type": "postgres", "name": "main", "note": "writes table runs" },
      { "type": "redis",    "name": "cache" }
    ],
    "tools": [
      { "type": "ansible",  "name": "deploy.yml" },
      { "type": "kubectl" }
    ]
  },
  "steps": [ ... ]
}
```

* `datastores[].type` — **any string**, but use the canonical vocabulary
  `redis | postgres | mysql | mongodb | other` so it matches what the crew's
  container catalog reports (the precondition gate compares `type`
  case-insensitively; a non-canonical type simply won't match a crew resource).
* `tools[].type` — **any string**; canonical examples `ansible | terraform |
  kubectl | bash | python | other`. Same matching rule as datastores.
* `name` / `note` are free-form and optional.

Validation is **lenient**: declaring a resource is always allowed at **save**
time (declaring is a contract, not a provisioning request). Save only rejects
malformed entries — a blank or non-slug `type`, or more than 32 of either kind.
The *run* path enforces availability — see [Required resources](#required-resources).

### Required resources

The `resources` block is a **run-time precondition gate**, the resource sibling
of [Required integrations](#required-integrations): it states what the routine
**requires** (datastores like Postgres/Redis, CLI tools like `ansible`), and the
run path checks it against what the executing crew's container actually **has**
(`[CONTAINER RESOURCES]` — the crew's sidecar datastores + installed tools).

Semantics:

* **Enforced at run time.** Before a run starts, Crewship resolves the **author
  crew's** container resources and compares them to the declared
  `resources.datastores` + `resources.tools`. If the routine requires a
  datastore or tool the crew doesn't have, the run is **blocked** with an
  RFC 7807 Problem Details response, HTTP **422**, carrying a machine-readable
  `missing_resources: [{ kind, type, name }]` member (`kind` is `"datastore"` or
  `"tool"`) and a human `detail` like
  `routine needs datastore postgres, tool ansible, not available to crew "Ops"`.
  The run never starts — no tokens spent — until the resource is provisioned on
  the crew.
* **Matching.** A required **datastore** is satisfied when the crew has a
  datastore of the same engine `type` (case-insensitive; the service `name` is
  advisory and not matched). A required **tool** is satisfied when the crew has
  an installed tool whose name matches the required `type` (the tool's `name`,
  e.g. `deploy.yml`, is the concrete artifact and isn't matched against the
  container).
* **No-op fast path.** A routine that declares no `resources` block does zero
  resolution work. Only the *declared* resources are gated — `code`-step
  runtimes folded into the manifest's `tools` (e.g. `cel`) are internal
  executor runtimes, not crew CLIs, and are never gated.
* **Fail-open.** If the routine has no author crew, or resource resolution
  itself errors, the run is **allowed** (a warning is logged) — same reasoning
  as the integration gate: a resolver bug must never wedge every run.
* **`run` is gated; `dry_run` is not.** (The internal save gate's draft
  validation applies the same resource check.)

### Agents already know what the container has

You don't have to tell an agent that its crew runs Postgres or ships `kubectl` —
Crewship surfaces it automatically. Every agent's system prompt now includes a
`[CONTAINER RESOURCES]` block listing the crew's **datastores** (derived from the
crew's sidecar services — a service named `postgres` is reachable at host
`postgres` on its declared port) and **installed CLI tools** (derived from the
crew's devcontainer features + mise toolchain, e.g. `ansible`, `kubectl`, `git`,
`python`, `node`). The agent is instructed to use these directly instead of
probing or trying to install them. So when you author a routine that needs a
datastore, declare it under `resources.datastores` and connect via the host/port
the agent already sees in that block. The block is omitted entirely when a crew
has no services and no notable tools.

## Step types

### `agent_run`

```json theme={null}
{
  "id": "summarize",
  "type": "agent_run",
  "agent_slug": "tomas",
  "complexity": "fast",
  "prompt": "Summarize: {{ inputs.text }}",
  "model_override": null,
  "timeout_seconds": 60,
  "validation": {
    "schema": { "type": "object", "required": ["title"] },
    "must_not_contain": ["API_KEY=", "Bearer "],
    "min_length": 30
  },
  "on_fail": "escalate_tier"
}
```

`complexity` resolves through the workspace's `execution_tiers_json` mapping into `(adapter, model)`. With `complexity: "fast"`, the agent's CLI gets `--model claude-haiku-4-5-20251001` (or whatever the workspace mapped `fast` to). `model_override` is the explicit pin that wins over `complexity`.

`on_fail` lives at the step level (not inside `validation`) and is one of `escalate_tier | abort | retry_step` — `escalate_tier` walks the fallback chain (e.g., Haiku → Sonnet → Opus) until validation passes or the chain exhausts.

### `call_pipeline`

```json theme={null}
{
  "id": "review_summary",
  "type": "call_pipeline",
  "pipeline_slug": "human-approval-step",
  "nested_inputs": {
    "content": "{{ steps.summarize.output }}",
    "approver_role": "marketing_lead"
  }
}
```

Composition primitive. Save-time cycle detection rejects loops; runtime depth limit caps at 10 levels.

### `http`

```json theme={null}
{
  "id": "fetch",
  "type": "http",
  "http": {
    "method": "GET",
    "url": "{{ inputs.url }}",
    "max_response_bytes": 200000,
    "success_codes": [200],
    "credential_ref": { "type": "API_KEY", "inject_as": "bearer" }
  },
  "timeout_seconds": 30
}
```

**Egress enforcement.** Every `http` step (and `http` hook) passes two host
gates at run time, checked pre-flight AND on every redirect hop
(CheckRedirect callback):

1. **Routine layer — `egress_targets`.** When the routine declares
   `egress_targets`, the request host must be one of the declared hosts or a
   subdomain of one (`api.x.com` matches target `x.com`; `evilx.com` does
   not). A routine that declares **no** `egress_targets` is unrestricted at
   this layer — backward-compatible with every routine that predates the
   field. The SSRF guard (private/link-local IPs, DNS-rebind-safe dialing)
   applies regardless.
2. **Crew layer — network policy.** The authoring crew's network policy
   (`network_mode` + allowed domains — the same dial that governs the crew's
   agent containers) also applies to direct `http` steps. Crews on the
   default `free` mode are unaffected; a `restricted` crew's routines can
   only reach the crew's allowed domains (exact host match, same as the
   container proxy).

A blocked request fails the step with a structured error naming the step,
the host, and which layer denied it — before any bytes leave the server.

**Credential injection.** `credential_ref.type` is resolved at run time
against the workspace credential vault by **type** (case-insensitive match
on the vault type, e.g. `API_KEY`, `GENERIC_SECRET`), never by ID — so a
shared routine runs against any workspace holding a credential of the right
type. Only `ACTIVE` credentials resolve; credentials pinned to another crew
are invisible; when several match, the authoring crew's own credential wins
over workspace-shared ones and the newest wins within each group (rotation).
The decrypted value goes into the outbound request only — never into logs,
the journal, or step output. If nothing matches, the request is sent without
credentials (public endpoints keep working). Injection schemes: `bearer`
(default), `header` with explicit name, `query` with explicit name.

### `wait`

```json theme={null}
{
  "id": "human_approval",
  "type": "wait",
  "wait": {
    "kind": "approval",
    "approval_prompt": "Approve summary?",
    "timeout_sec": 86400
  }
}
```

Three kinds: `approval` (HITL token), `datetime` (sleep until ISO timestamp), `event` (filter on journal events). The DB-backed waitpoint store survives process restart for the `approval` kind: at boot, the run that was parked on the wait step is resumed and re-attaches to the original pending token, so the approval stays answerable (see [Durability and restart recovery](#durability-and-restart-recovery)).

**Async, non-blocking.** Hitting a `kind: approval` gate does **not** hold a goroutine or fail the run. A foreground `crewship routine run` returns promptly with `status: WAITING` (exit 0) and a waitpoint token, and **releases its execution slot** while it waits. Approving **or** rejecting resumes the run from the gate (already-completed steps are restored/skipped); a rejection resolves the run to `FAILED` cleanly rather than stranding it. Parked approvals whose `timeout_sec` elapsed are reconciled at the next boot scan.

Approve via UI Inbox, CLI `crewship routine waitpoints approve <token>`, or API.

### `code`

A `code` step has a `runtime`. Two runtimes are wired today — **`expr`** and **`cel`** — both in-process, pure-Go, token-zero (no container, no LLM, no filesystem, no network). Use `expr` for a single boolean comparison (wake-gate probes); reach for `cel` as soon as you need real logic (booleans, arithmetic, string/list ops) — its own code comments call it out as the general-purpose deterministic primitive.

#### `runtime: expr` (wired, token-zero)

```yaml theme={null}
- id: probe
  type: code
  code:
    runtime: expr
    code: "{{ inputs.spend_usd }} > {{ inputs.threshold_usd }}"
```

`expr` evaluates a **single comparison** and emits `true` or `false`:

* Operators: `>`  `>=`  `<`  `<=`  `==`  `!=`.
* The body is `Render()`-ed first, so `{{ inputs.x }}` / `{{ steps.y.output }}` placeholders substitute before evaluation.
* Anything that isn't a single comparison (multiple operators, function calls, arbitrary code) **fails closed** with a clear error — `expr` is deliberately not a scripting language.

This is the canonical primitive for **agentless** probes and **schedule wake-gates** (emit `true` only when work is needed). See [Wake gates](#wake-gates).

#### `runtime: cel` (wired, token-zero, general logic)

```yaml theme={null}
- id: spike
  type: code
  code:
    runtime: cel
    # Real logic in one deterministic, token-zero step — no {{ }} needed,
    # inputs are exposed as a typed `inputs` map:
    code: 'inputs.spend_usd > inputs.threshold_usd && inputs.region in ["eu", "us"]'
```

`cel` evaluates a [Google CEL](https://github.com/google/cel-go) expression — **non-Turing-complete** (every expression provably terminates), so it keeps the token-zero / no-execution-surface guarantee of `expr` while adding boolean operators (`&&`, `||`, `!`), arithmetic, string ops, list/map membership, ternaries, and field access. Reach for it when `expr`'s single comparison isn't enough. A `bool` result emits `true`/`false`; numeric and string results emit their canonical string form. Compile/eval errors fail closed.

#### `runtime: bash | python | go` (rejected at author time)

```yaml theme={null}
- id: process
  type: code
  code:
    runtime: python
    code: "import json; print(json.dumps({'sum': sum(input_nums)}))"
```

These are schema-legal runtime names but have **no sandboxed runner wired**. As of PR #710 they're no longer "saves-cleanly-then-fails-at-3am": a routine using one is **rejected at save / apply / test\_run time** with a message pointing at the fix (`runtime: expr` or `cel`, or convert the step to `agent_run`).

<Warning>
  `runtime: bash`, `python`, and `go` are **not executable today** and are **rejected when you save, apply, or test-run a routine that uses them** — the error names the offending step and suggests `expr`/`cel` or an `agent_run` conversion. Only `expr` and `cel` are wired.
</Warning>

### `transform`

```json theme={null}
{
  "id": "extract_emails",
  "type": "transform",
  "transform": {
    "input": "{{ steps.fetch.output }}",
    "expression": ".items[] | .email"
  }
}
```

Pure-Go data reshaping with a tiny jq-flavoured subset. No LLM, no network — fully deterministic. Useful for wiring step outputs without paying for another agent\_run.

### Conditional `if`

Any step can carry `"if": "{{ inputs.run_summary }}"`. The placeholder is rendered as a plain string — empty / `false` / `0` / `no` / `off` (case-insensitive) → step skipped and marked `<skipped>` in StepOutputs; anything else counts as truthy. Templates are plain substitution, not expression evaluation, so put the boolean upstream (e.g. set `inputs.run_summary` from the caller) rather than writing `==` / `!=` inside the `if` value.

### DAG with `needs[]`

```json theme={null}
{ "id": "fetch_a", "type": "http", "http": {...} },
{ "id": "fetch_b", "type": "http", "http": {...} },
{ "id": "merge",
  "type": "agent_run",
  "needs": ["fetch_a", "fetch_b"],
  "prompt": "Merge: {{ steps.fetch_a.output }} + {{ steps.fetch_b.output }}"
}
```

Steps with no overlapping `needs` execute in parallel (one goroutine wave per ready set). Final output picks the unique leaf node; for multi-leaf DAGs the first leaf in source order wins.

## Lifecycle hooks (`before_all` / `after_all` / `on_failure`)

Routine-level `hooks` run deterministic side-channel steps around the main execution — a clean home for setup/teardown that isn't part of the visible step graph. Hook steps must be `code` / `http` / `transform` (no `agent_run` — a hook must not recurse or spend tokens):

```yaml theme={null}
hooks:
  before_all:   # setup — its FAILURE aborts the run before any step executes
    id: claim
    type: http
    http: { method: POST, url: "https://ops.example.com/claim" }
  after_all:    # runs after a COMPLETED run — best-effort
    id: release
    type: http
    http: { method: POST, url: "https://ops.example.com/release" }
  on_failure:   # runs after a FAILED run — best-effort cleanup
    id: alert
    type: http
    http: { method: POST, url: "https://ops.example.com/alert" }
```

`after_all` and `on_failure` are best-effort — logged, but they never change the run's outcome. Hooks fire only on the **top-level** run (not nested `call_pipeline` expansions) and are skipped on resume re-entry and dry-run. Per-step `before` / `after` hooks also exist and are included in the [capability manifest](#capability-manifest) walk. Full reference: [Lifecycle hooks](/manifest/routine).

## Per-step overrides (no version bump)

Tweak a single step's `prompt` or `model` without bumping the routine version — the override is applied at run start over the versioned DSL, so the durable, authored definition stays the source of truth while an operator can patch and clear a live behavior quickly:

```bash theme={null}
crewship routine step-override set my-routine summarize \
  --prompt "Summarize in 3 bullets, lead with the risk." --model smart
crewship routine step-override list my-routine
crewship routine step-override clear my-routine summarize
```

Only non-empty fields win — a prompt-only override leaves the authored model untouched. Full reference: [Per-step prompt/model override](/manifest/routine).

## Agentless routines

Declare `"agentless": true` to get a **token-zero guarantee**: the routine can never invoke an LLM, no matter who edits it later. This is what makes high-frequency automation (health checks, metric probes, TLS expiry watches) free to run on a tight cron.

```json theme={null}
{
  "dsl_version": "1.0",
  "name": "cost-spike-probe",
  "agentless": true,
  "egress_targets": ["billing.example.com"],
  "steps": [
    { "id": "fetch", "type": "http",
      "http": { "method": "GET", "url": "https://billing.example.com/today" } },
    { "id": "amount", "type": "transform",
      "transform": { "input": "{{ steps.fetch.output }}", "expression": ".spend_usd" } },
    { "id": "judge", "type": "code",
      "code": { "runtime": "expr", "code": "{{ steps.amount.output }} > 100" } }
  ]
}
```

The probe's verdict comes from data reshaping plus a single comparison, not scripting: the `http` step fetches a JSON status, `transform` projects the number out of it, and the [`expr` code step](#code) emits the boolean `true`/`false` — all token-zero. (A pure `http` + `transform` projection of an already-boolean field works too; reach for `expr` when you need the comparison.)

Enforced at two layers:

* **Save time** — validation rejects `agent_run` (direct LLM spend), `call_pipeline` (the target resolves by slug at runtime, so a nested routine could gain an agent step later and silently break the guarantee), and `eval.online` with `sample_rate > 0` (online grading runs a grader agent against the routine's runs).
* **Run time** — the executor independently refuses to dispatch an LLM-capable step inside an agentless run, so even a definition written before the validator existed fails closed.

Everything else works as usual: egress allowlist, credentials, versioning, dry-run, schedules, webhooks. Agentless routines are also the only valid probes for [wake gates](#wake-gates).

## Two-tier execution

The economic value-prop: an Opus-class authoring model designs the routine, a Haiku-class executor model runs each invocation. Workspace `execution_tiers_json` maps complexity classes to `(adapter, model)`:

```json theme={null}
{
  "trivial":  { "primary": {"adapter":"claude","model":"claude-haiku-4-5-20251001"} },
  "fast":     { "primary": {"adapter":"claude","model":"claude-haiku-4-5-20251001"},
                "fallback":[{"adapter":"claude","model":"claude-sonnet-4-6"}] },
  "moderate": { "primary": {"adapter":"claude","model":"claude-sonnet-4-6"} },
  "smart":    { "primary": {"adapter":"claude","model":"claude-opus-4-7"} }
}
```

Per-step `complexity` annotation drives the resolver. With `on_fail: "escalate_tier"`, a failed validation walks the fallback chain — practically: Haiku tries first, Sonnet on validation fail, Opus on second fail.

<Info>
  **Tier override at runtime.** The CLI flag `--model <model>` is constructed from the resolved tier and passed to the agent's CLI adapter, so a routine's `complexity: "fast"` actually fires Haiku, not the agent's default. `CLIAdapter` is preserved (so the agent's CLAUDE\_CODE / GEMINI\_CLI / etc. wiring stays intact); only the model name swaps.
</Info>

## Save validation gate

Save endpoints (sidecar `/pipelines/save`, user `/api/v1/workspaces/{ws}/pipelines/save`, internal `/api/v1/internal/pipelines/save`) require the routine to clear a validation gate before it persists. The gate is a **dry-run validation** of the draft, not a real execution — there is no "test run" mode (you cannot run an agent dry). The sidecar agent-authoring flow forwards the draft to `/api/v1/internal/pipelines/test_run`, which parses, schema-validates, and dry-runs it (rendering every template, invoking no agent); on success it sets `last_test_run_passed` and forwards to save.

This is the **self-improvement loop**: an authoring agent that writes brittle DSL gets a structured failure report it can read and revise from. Without the gate, MVP would ship pipelines that pass schema but fail at runtime. Real execution happens on the first live `run`, and risky routines are human-reviewed (governance) before they go live.

<Tip>
  `skip_test_gate: true` is honored only when the caller's role is OWNER or ADMIN; lower roles get 403. Useful for hand-crafted DSL from known-good templates (the seed flow uses it).
</Tip>

## Governance — agent proposes, human approves the risky ones

Routines have a lifecycle `status` (migration v128): **`active`** (live + runnable), **`proposed`** (awaiting approval), or **`disabled`** (admin airbag). The save validation gate still applies on top — `status` is an *additional* gate.

### Maker-checker on save

When a routine is saved (by an agent via the sidecar, or by a user via the UI/CLI), Crewship classifies it. A save is **risky** if **any** of these hold:

* it declares an `integrations_required` the author crew can't currently satisfy (the same resolver the [run gate](#required-integrations) uses);
* it has any `http`/egress step (or routine-level `egress_targets`);
* it has any `code`-runtime step;
* it declares `credentials_required`.

Otherwise it's **safe** — only `agent_run` / `transform` / `call_pipeline` / `wait` steps over satisfiable integrations, no egress, no credentials.

* **Safe → `active`.** Goes live immediately, exactly as before.
* **Risky → `proposed`.** The routine is persisted but **not runnable**, and a **blocking inbox item** is raised for `MANAGER+` (the same Inbox surface as proposed skills). Approve it to go live, or reject it.

A `proposed` (or `disabled`) routine refuses `run` / `run_batch` with `409 Conflict` — `"routine is awaiting approval"` or `"routine is disabled"`. `dry_run` always previews a saved routine, so it's never blocked.

<Note>
  **OWNER/ADMIN escape hatch — `skip_governance_gate`.** Symmetric with `skip_test_gate`: passing `"skip_governance_gate": true` on the user save (`POST /api/v1/workspaces/{ws}/pipelines/save`) forces a risky definition live as `active` and raises **no** review item. Honored **only** for OWNER/ADMIN (lower roles get `403`); it is deliberately **not** available on the agent/sidecar save path (`InternalSave`), so agent-authored risky routines are always reviewed. This is what the `crewship seed` flow uses so a freshly-seeded workspace's hand-curated starter routines are immediately runnable instead of stuck "awaiting approval". Use it only for DSL you trust.
</Note>

### Approve / reject (MANAGER+)

```bash theme={null}
crewship routine approve <slug>   # status → active, resolves the inbox item
crewship routine reject  <slug>   # removes the routine, resolves the inbox item
```

* `POST /api/v1/workspaces/{ws}/pipelines/{slug}/approve` — `MANAGER+`. Flips to `active`.
* `POST /api/v1/workspaces/{ws}/pipelines/{slug}/reject` — `MANAGER+`. Soft-deletes the proposed routine.

### Disable / enable (OWNER/ADMIN airbag)

```bash theme={null}
crewship routine disable <slug>   # status → disabled; cancels in-flight runs
crewship routine enable  <slug>   # status → active
```

* `POST /api/v1/workspaces/{ws}/pipelines/{slug}/disable` — `OWNER/ADMIN`. Flips to `disabled` **and cancels any in-flight runs** of that routine immediately.
* `POST /api/v1/workspaces/{ws}/pipelines/{slug}/enable` — `OWNER/ADMIN`. Returns it to `active`.

List responses (and `crewship routine list`) carry `status`; filter the queue with:

```bash theme={null}
crewship routine list --status proposed
```

## Triggers

### Cron schedules

```bash theme={null}
crewship routine schedules create --slug summarize-text \
    --name "daily-summary" --cron "0 9 * * *" --timezone "Europe/Prague" \
    --inputs '{"text":"…default"}'
```

5-field cron expression. Scheduler runs in-process and ticks every 30s, so minimum resolution is 1 minute.

<Warning>
  Single-instance only — running multiple replicas would double-fire (no leader election yet).
</Warning>

### Wake gates

A plain cron fires the full routine — including its LLM steps — on every tick, even when there is nothing worth the model's attention. A **wake gate** fixes that: the schedule references an [agentless](#agentless-routines) probe routine, the scheduler runs the probe first on each tick (free of LLM spend by the agentless guarantee), and the main routine fires only when the probe's **final output is truthy**. Same falsy rule as step `if:` conditions — empty, `false`, `0`, `null`, `nil`, `no`, `off` (case-insensitive) skip the tick; anything else wakes the routine.

```bash theme={null}
# LLM only runs when the probe prints something truthy:
crewship routine schedules create --slug cost-report \
    --cron "*/15 * * * *" \
    --wake-slug cost-spike-probe --wake-inputs '{"threshold":"100"}'

# Drop the gate later (fire on every tick again):
crewship routine schedules update <schedule_id> --no-wake
```

This gives schedules a cost ladder: **agentless schedule** (always free) → **wake-gated schedule** (probe free, LLM only on signal) → **plain schedule** (today's default, unchanged).

A freshly-seeded workspace ships a working example: the *Demo: feed watch — wake on change* schedule runs the agentless `feed-watch-probe` every 15 minutes and only wakes `feed-change-report` (an agent routine) when the watched feed drifts from its baseline. Point the probe's `url`/`expected_items` inputs at your own endpoint to make it real.

Semantics worth knowing:

* The probe must be `agentless: true`, live in the same workspace, and can't be the schedule's own routine — all validated when the schedule is saved.
* **Probe errors fail open**: a broken or deleted probe wakes the main routine instead of going silently blind, and records `last_wake_status: ERROR` so you can see the probe needs fixing.
* A skipped tick advances `next_run_at` but leaves `last_run_*` untouched — run telemetry stays strictly about main runs. Wake telemetry lives in `wake_check_count` / `wake_fire_count` / `last_wake_at` / `last_wake_status`, and `routine schedules list` shows it as `<probe> woke/checked` in the WAKE column.
* Probe executions are regular runs with `triggered_via: wake_check`, so they're auditable in run history and filterable out of dashboards.

### Webhooks

```bash theme={null}
crewship routine webhooks create --slug pr-review-structured \
    --hmac-secret "$(openssl rand -hex 32)" --rate-limit 30
```

Output reveals the public URL and signing secret **once** (Stripe-style). External services POST event payload to `/api/v1/webhooks/{token}`. With HMAC, sender includes header `X-Crewship-Signature: sha256=<hex_hmac_of_body>`, validated server-side via `hmac.Equal` (timing-safe). Rate limited per token, per minute, default 60.

**Delivery is asynchronous.** The endpoint verifies the signature, rate limit, governance status, and the routine's `concurrency_key` gate synchronously, reserves a run id, then answers immediately:

```json theme={null}
HTTP/1.1 202 Accepted
{"run_id": "run_c...", "status": "PENDING", "deduped": false}
```

The routine executes in the background under the returned `run_id`, so senders with short delivery timeouts (GitHub \~10s, Stripe \~5s) never time out on long agent runs — and a sender closing the connection early **cannot** cancel an in-flight run: the run's context derives from the server lifecycle, not the HTTP request. Poll the handle for the outcome:

```bash theme={null}
crewship routine logs <run_id>          # persisted run state (status, step outputs, error)
crewship routine logs <run_id> --full   # full step-by-step journal timeline
# or: GET /api/v1/workspaces/{wsId}/pipeline-runs/{runId}
```

Redelivered events dedupe **synchronously**: a replay (same `Idempotency-Key` / `X-Crewship-Event-ID`, or identical bytes within the dedupe window) answers `202` with `"status": "DEDUPED"` and the **original** run's id — the routine executes exactly once. A `proposed`/`disabled` routine answers `409` (policy block, nothing dispatched). A delivery arriving while the routine's `concurrency_key` gate is at capacity answers `429` with a `Retry-After` header **before** anything is dispatched — a 429 never consumes the idempotency key, so redelivering the same event later executes it normally. Runs that hit a `wait: approval` step park as `WAITING` and resume once the waitpoint is approved, exactly as with other triggers.

<Warning>
  The signing secret is shown **once**. To rotate it: delete the webhook + create a new one. There is no in-place rotation by design.
</Warning>

### Manual

`crewship routine run <slug> --inputs '{...}'` or click the Run button in the UI detail panel. Same execution path.

After you click **Run** or **Test run** in the UI, a live **Run activity** rail appears inline in the detail panel showing the just-started run step by step (started → each step → completed/failed) — so status is visible immediately without switching to the Runs tab. Full run history stays in the **Runs** tab; see the [Activity](/guides/activity#run-activity-rail) guide for the rail, and the toolbar Activity Bar for a workspace-wide "what's running now" view.

<Note>
  The **Test run** button calls the public, JWT-authed `test_run` endpoint (`POST /api/v1/workspaces/{workspaceId}/pipelines/test_run`). It **validates a draft** — parse + `Validate` + the integration and resource preconditions + a `dry_run` pass (no agent is invoked; you can't run an agent "dry") — and on success mints an HMAC **`save_token`** bound to (workspace, definition hash, user). **Save** verifies that token, so a draft can't be saved as "test passed" unless it actually passed `test_run`. The UI button and the CLI both use this endpoint. To preview a *saved* routine instead, use **Dry run** — it walks the saved definition, renders templates, and returns the declared **manifest** (the blast radius) without invoking anything.
</Note>

### Deferred dispatch: delay, ttl, debounce, priority

A triggered run can be parked instead of firing immediately — useful for "run 60s from now" scheduling or for coalescing a burst of near-duplicate triggers into one run:

```bash theme={null}
# fire 60s from now, expire if not dispatched within 5 min, high priority
crewship routine run my-routine --delay 60 --ttl 300 --priority 9

# coalesce a burst: repeated triggers sharing a key collapse into one run
crewship routine run my-routine --debounce-key vendor-42 --debounce-window 30 --debounce-max 300

crewship routine pending list            # not-yet-fired deferred triggers
crewship routine pending cancel <id>     # cancel before it fires
```

An in-process dispatcher (5s tick) fires due rows highest-priority-first and expires rows past their `ttl`. Immediate runs (no `--delay` / `--debounce-key`) are unaffected. Full reference, including the underlying API fields: [Deferred dispatch](/manifest/routine).

## Dry-run preview

Two execution modes, distinguished by `Mode` in the request body and surface:

| Mode      | Side effects                            | Increments invocation\_count | Cost      |
| --------- | --------------------------------------- | ---------------------------- | --------- |
| `run`     | yes (agents called) — production        | yes                          | real      |
| `dry_run` | no (templates rendered, agents skipped) | no                           | estimated |

There is **no `test_run` mode**: you cannot run an agent "dry" — it executes arbitrary scripts (bash, ansible, curl) whose side effects can't be intercepted — so a real run is always `run`. The agent-authoring save gate validates a draft via a `dry_run` (structure + templates), not a real execution.

Dry-run is the safe "what would this routine do?" preview, and an honest **static plan — not a proof the run will succeed**. It walks the DSL, renders all template substitutions against the supplied inputs, resolves each step's execution tier (adapter + model), and reports a `would_execute` list with per-step estimated cost. It also returns the routine's declared **`manifest`** — the full blast radius (`integrations`, `egress`, `credentials`, `agents`, `routines`, `datastores`, `tools`, `has_http`, `has_code`) — so the UI can show "would use: ansible, Postgres, discord.com, agent jordan". (A definition that no longer parses leaves `manifest` null and still returns the report.) No agents are invoked; no journal entries beyond a single `pipeline.dry_run` audit row are written.

```bash theme={null}
crewship routine dry-run email-fetch --inputs '{"since":"yesterday"}'
```

In the UI: click **Dry run** in the routine detail panel. The `would_execute` report renders inline above the tab bar with per-step:

* Step ID + type
* Resolved `tier_adapter:tier_model` (e.g. `claude:claude-haiku-4-5`)
* Estimated cost in USD (order-of-magnitude only — the executor uses a flat token-density heuristic, not real pricing)
* `would_call_agent` / `would_call_pipeline` target

The estimate is intentionally labelled "estimated" everywhere it surfaces — it's a planning aid, not a quote. Real cost only lands once you switch to `run`.

## Versioning + rollback

Every save creates a new immutable row in `pipeline_versions` (v79 migration). The `pipelines.head_version` column points at the current. Rollback creates a NEW version on top of HEAD whose definition equals the target's:

```bash theme={null}
crewship routine versions email-fetch
crewship routine rollback email-fetch --to 3
```

History is preserved — you can roll forward to a future version by another rollback. There is no "delete version" — if a version was bad, the trail of "v3 → v4 (rollback to v2) → v5 (fix)" is the audit story you keep.

## Bundle export / import

Routines are portable across workspaces:

```bash theme={null}
# Export from workspace A
crewship routine export email-fetch --include-history > email-fetch.json

# Import into workspace B
cat email-fetch.json | crewship routine import
```

The bundle format is `crewship-pipeline-bundle/v1`: routine row + (optionally) the full version chain + change\_summary annotations. Author identity is rewritten on import so the importing user becomes the new author. Slug is preserved; if it conflicts in the destination workspace the existing row updates (new version), or you change the bundle's slug before import.

## HITL waitpoints

A routine that includes a `wait` step of `kind: approval` parks the run on a DB-backed waitpoint — **without** holding a goroutine or an execution slot. The triggering `crewship routine run` returns immediately with `status: WAITING` and the token (see the [`wait` step](#wait)). Operators decide via three paths:

```bash theme={null}
# CLI
crewship routine waitpoints list
crewship routine waitpoints approve <token> --comment "LGTM"
crewship routine waitpoints reject  <token> --comment "needs revision"

# UI: /routines → routine → Wait sub-tab → click Approve / Reject
# API: POST /api/v1/workspaces/{ws}/pipelines/waitpoints/{token}/approve
```

When you trigger a run from the routine detail page and it parks on an approval
gate, you don't have to leave the page: the **Run activity** panel switches its
status to amber **"Waiting for approval"**, pins the parked step in the
timeline, and shows an inline **Approve / Reject** banner (with an optional
comment) for that run's waitpoint. The same pending item also surfaces in the
top-bar Inbox bell and the left-sidebar Inbox badge, so an approval waiting on
you is visible whether or not you're looking at the routine. The workspace-wide
**Wait points** tab and `/inbox` remain the places to act on approvals for runs
you didn't just start.

The decision comment is forwarded to the parked run as the wait step's output, so downstream steps can read approval rationale via `{{ steps.<wait_step_id>.output }}`.

## Durability and restart recovery

Run state is persisted to the `pipeline_runs` table at every step boundary: when a step starts, `current_step_id` is stamped; when it completes, the full step-outputs map and accumulated cost are flushed. A hard kill (crash, OOM, `kill -9`) therefore loses at most the step that was in flight.

At boot, the server scans for runs left in `queued`/`running` from the previous process lifetime and **resumes them from the next unfinished step**:

* **Completed steps are restored, not re-executed.** Their outputs feed downstream `{{ steps.X.output }}` templates exactly as if the process had never died.
* **The in-flight step re-executes from scratch** — at-least-once semantics. For an `agent_run` step this means the agent call is re-issued (and re-billed); `http`/`code` steps with external side effects may fire twice. Design steps to be idempotent where that matters.
* **Runs parked on a `wait` approval step re-attach to the original waitpoint token.** No duplicate approval card is created; the pending inbox item stays answerable across the restart, and approving it resumes the run.
* **DAG runs (`needs:`) resume at wave granularity** — the parallel scheduler flushes state when each wave completes, so a kill mid-wave re-executes that wave's unfinished steps.
* **`call_pipeline` boundaries are NOT persisted.** A kill while a nested pipeline is executing re-runs the **entire nested pipeline** on resume — the parent's `call_pipeline` step is the unit of recovery, and the nested run's own per-step progress is not checkpointed. Keep nested pipelines short or idempotent if a mid-flight kill matters to you.
* The accumulated cost is restored too, so `max_cost_usd` keeps counting across the restart instead of resetting. **Caveat:** cost is flushed at step boundaries, so whatever the killed in-flight step had already spent before the kill is **not** in the restored total — the cap under-counts the true spend by up to one step's worth (and the re-executed step is billed again on top).
* **A resumed run that finds its concurrency slot occupied waits and retries** with capped exponential backoff (2s doubling up to 60s) instead of failing — losing the slot race to a freshly-fired scheduled run is a timing collision, not a reason to abandon hours of restored work. If the server shuts down while a run is still waiting for its slot, the row stays in-flight and the next boot resumes it again.
* **A waitpoint that timed out while the process was down** resumes, observes the expired token, and fails with `wait step "X" (approval) timed out` — distinct from an operator clicking deny (`… denied`).

When persisted state is insufficient to resume safely, the run is stamped `interrupted` instead — never silently dropped, never wrongly resumed. Fallback triggers: the pipeline row is gone or no longer parses, the definition changed since the run started (content-hash mismatch — this catches in-place edits even when every step id survives, not just renamed/removed steps), unreadable persisted inputs/outputs, or a non-resumable mode (only live `run` rows resume; a `dry_run` preview row from a previous lifetime is never re-run). The reason lands in the run's `error_message`.

Graceful shutdowns are different: an in-flight run cancelled by shutdown is finalized as `cancelled` (a terminal state) and is **not** resumed at next boot. Resume targets hard kills, where no terminal write could land.

Set `CREWSHIP_PIPELINE_RESUME=off` to disable resume and restore the older stamp-everything-`interrupted` behaviour — useful if a crash loop would otherwise re-burn the in-flight agent step's tokens on every restart. Default is on.

## Online eval sampler

The online sampler watches completed routine runs and grades a configurable percentage of them through the existing rubric grader so production traffic continuously feeds the drift detector — not just on-demand replays or scheduled regression suites.

Per-routine DSL:

```yaml theme={null}
name: nightly-summary
eval:
  online:
    sample_rate: 0.05              # grade 5% of runs; 0 disables, 1.0 grades every run
    grader_agent_slug: qa-grader   # references an agent in the AUTHOR crew with a rubric prompt
```

What happens on a tick (default cadence: every 1 minute):

1. Scan `pipeline_runs WHERE status = 'completed' AND completed_at > watermark`.
2. For each candidate, resolve the routine DSL. If `eval.online` is absent or `sample_rate <= 0`, skip.
3. Draw from `crypto/rand`. If the sample lands above `sample_rate`, skip.
4. Otherwise, INSERT into `eval_runs` with `kind = 'online'`, `status = 'queued'`. The existing grader worker picks it up and writes the result back.

| Field               | Notes                                                                                                                                                                                            |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `sample_rate`       | Float `[0, 1]`. `0` disables — useful as an incident "pause grading" toggle. `1.0` is expensive but ok for newly-launched routines while calibrating the grader. Realistic prod default: `0.05`. |
| `grader_agent_slug` | Required when `sample_rate > 0`. Missing grader is treated as a deterministic skip (logged once, no retry storm).                                                                                |

**Correctness guarantees:**

* **Schema-layer idempotency** — `eval_runs` has a partial `UNIQUE INDEX (pipeline_run_id) WHERE kind='online'`. A duplicate sampler instance or a crash-recovery watermark replay can attempt the same enqueue twice; the second collapses to a no-op rather than queueing twice and double-billing the grader.
* **`(completed_at, id)` tuple cursor** — parallel fan-out steps that complete at the same nanosecond all get graded; a timestamp-only cursor would orphan siblings.
* **Stuck-on-error watermark** — a transient per-row error (resolver outage, entropy outage, enqueue conflict) freezes the watermark at the row before the failure so the next tick retries. Deterministic skips (no eval config, sample roll missed) advance normally.
* **Trace correlation** — the eval row carries `routine_slug` + `pipeline_run_id` so an operator clicking a low-scoring grade in the eval UI lands on the actual trace via `pipeline_run_id` -> [trace](/guides/tracing).

**Limitations:**

* Watermark is in-memory only. On process restart it resets to `now - 1h`; outages longer than that leave a gap in grading coverage. The UNIQUE index keeps reprocessing harmless.
* Page cap is 500 rows/tick. A workspace processing > 500 routine runs/minute at `sample_rate = 1.0` will accumulate backlog; the watermark only advances past successfully-handled rows, so the backlog drains across subsequent ticks rather than being lost.

## Validation gates and credential leak guards

Each step's `validation` block runs after the step output materializes. The schema is a JSON Schema draft 2020-12 subset (most of `type`, `required`, `properties`, `items`, `pattern`, `format`, `enum`, etc.) plus three Crewship extensions:

* `must_not_contain: ["API_KEY=", "Bearer "]` — output must include none of these substrings
* `must_contain: ["##"]` — output must include all of these
* `min_length` / `max_length` — convenience for non-JSON outputs

The `must_not_contain` gate is the credential-leak tripwire: if an agent is about to leak a real API key in its output, the gate fails the step before downstream consumers see it. Pair with `on_fail: "abort"` (set at the step level, alongside `validation`) for hard stop, or `escalate_tier` if you want the higher model to retry without the leak.

## Observability

Every routine run emits a sequence of journal entries:

* `pipeline.run.started` — once at run begin
* `pipeline.step.started` — per step
* `pipeline.step.completed` / `pipeline.step.failed` / `pipeline.step.validation_failed` — per step terminal
* `pipeline.run.completed` / `pipeline.run.failed` — once at run end

Plus WebSocket broadcast on the workspace channel for live UI updates (PipelineRunNode in the orchestration graph + Runs sub-tab waterfall both subscribe).

### Live visibility — what is a routine doing right now?

While at least one routine run is in flight, the dashboard surfaces it from anywhere in the app:

* **Header chip** — a pulsing "N routines running" pill appears in the toolbar next to the Online / Crews pills (hidden when nothing is active). If any run is parked on a human approval it turns amber and appends "· M awaiting approval". Clicking opens a popover with the six newest active runs — routine name, short run id, elapsed time, cost so far, current step — each with **Open trace ↗** (deep-link to `/activity?run=<id>`), **Cancel** (same manage-tier RBAC as the Runs tab), and a **Review →** shortcut into the routine for parked runs. With more than six active runs a "View all N running →" footer jumps to the Activity rail pre-filtered to the active bucket (`/activity?status=active`).
* **`/routines` sidebar** — a routine with an active run gets a pulsing blue dot and a sub-line showing the current step and elapsed time (`▶ ask-casey · 0:12`); a parked run shows the amber `⏸ awaiting approval` variant.
* **`/routines` list table** — the status cell swaps the historical "last run" pill for a live **Running** (or amber **Awaiting approval**) pill with current step · elapsed · cost, and live routines bubble to the top of the table regardless of the chosen column sort.

All three surfaces share one workspace-scoped subscription (`GET /api/v1/workspaces/{ws}/pipeline-runs?status=active` + the `pipeline.run.*`/`pipeline.step.started` events, 3s poll while anything is active). `status=active` bundles `running`, `queued`, `paused` **and `waiting`** — the status a run parks in while a [HITL waitpoint](#hitl-waitpoints) awaits a decision.

### Run warnings

`before_all`/`after_all`/`on_failure` lifecycle hooks run best-effort: a failing `after_all` or `on_failure` hook (a teardown step like credential-release or cost-meter-close) never flips the run's terminal status — a `before_all` failure is different and fails the run outright, since nothing downstream ran. A failed `after_all`/`on_failure` hook is instead recorded as a structured **warning** on the run so it doesn't silently vanish into server logs while the run reports `completed`.

Fetch it via `GET /api/v1/workspaces/{ws}/pipeline-runs/{runId}` — the response's `warnings` array (always present, empty when there are none) has one entry per failed hook:

```json theme={null}
{ "stage": "hook after_all", "message": "credential release step timed out", "at": "2026-07-02T12:01:05Z" }
```

`crewship routine logs <run_id>` (the slug-free state lookup) prints a `Warnings:` section when the run has any, alongside the existing `Error:` line for the run's own terminal status.

### Per-step cost + duration

Both the UI Runs sub-tab waterfall and `crewship routine logs <run_id> --slug X` surface the `cost_usd` and `duration_ms` fields the executor stamps on every `pipeline.step.completed` event. Same data, two presentation surfaces:

* **UI**: right-aligned columns next to each step row, with a footer total summing the run. An em-dash (`—`) renders for events that don't carry cost (`pipeline.step.started`, `.failed`, live-only echoes) — easier to scan than `$0.0000` next to real values.
* **CLI logs**: extra `DURATION` and `COST` columns in the timeline output. Same em-dash rule for non-positive values.

Worker tier escalation is visible too — a failed validation gate that retries on a higher tier emits a second `step.started`+`step.completed` pair with its own cost on the retry, so the column-summed footer reflects the *full* spend (including retries), not just the first attempt.

### Run tags & metadata

Tag and annotate a run at invoke time to make it filterable later:

```bash theme={null}
crewship routine run cost-spike-probe \
  --tag prod --tag billing \
  --metadata '{"source":"manual","ticket":"OPS-42"}'
```

**Tags** are workspace-scoped labels (max 10/run, lowercased) surfaced on the run detail and usable as a filter (`crewship routine runs <slug> --tag prod`); replays inherit the source run's tags. **Metadata** is a free-form JSON object stored on the run and returned by `GET /pipeline-runs/{id}` — set at invoke time today (mid-run mutation and `{{ run.metadata.X }}` templating are a follow-up).

### Replay & error fingerprinting

Failed runs are bucketed by a stable **error fingerprint** (failing step + normalized message), so recurring failures group together instead of scrolling past one-by-one in run history:

```bash theme={null}
crewship routine replay <run_id>                   # re-invoke with the run's captured inputs
crewship routine errors                             # list fingerprint groups
crewship routine bulk-replay --fingerprint <fp>      # replay every run in a fingerprint group after a fix
```

A replay is stamped `is_replay=true` + `replay_of=<run_id>`; gate a step on `{{ env.is_replay }}` to skip a side effect (e.g. a notification) on replay. Full reference: [Run observability: tags, metadata, replay, errors](/manifest/routine).

Three terminal-side observability surfaces, ordered by when you'd reach for each:

| Phase          | Command                                   | Why                                                                                                                                                                                                                    |
| -------------- | ----------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Before run** | `crewship routine doctor <slug>`          | Preflight ✓/⚠/✗ checklist — catches missing crew provisioning, agent slug typos, missing credentials, contradictory validation gates, tight cost caps. Cheap to run, returns in milliseconds, fails CI builds on FAIL. |
| **During run** | `crewship routine watch <slug>`           | Polls the runs endpoint and prints events as an ANSI-coloured timeline. `--json` for JSON Lines piping into `jq`; `--once` exits after the first run terminates (CI-friendly).                                         |
| **After run**  | `crewship routine logs <run_id> --slug X` | One-shot post-mortem dump — every step's prompt, output, validation verdict, cost. Use to investigate a specific failure surfaced by `watch` or by the UI.                                                             |

For variance characterisation across many runs (is this routine production-ready at this tier?), use [`crewship routine bench`](/cli/routine#crewship-routine-bench-slug). For matrix-level cross-tier consistency (which scenarios diverge between Haiku and Opus?), use [`crewship eval scenarios`](/cli/eval#crewship-eval-scenarios) and [`crewship eval baseline diff`](/cli/eval#baseline-diff-name) for CI regression gates.

## RBAC

| Role    | Read | Run | Save | Schedules / Webhooks | `skip_test_gate` |
| ------- | :--: | :-: | :--: | :------------------: | :--------------: |
| VIEWER  |   ✓  |  —  |   —  |           —          |         —        |
| MEMBER  |   ✓  |  ✓  |   —  |           —          |         —        |
| MANAGER |   ✓  |  ✓  |   ✓  |           ✓          |         —        |
| ADMIN   |   ✓  |  ✓  |   ✓  |           ✓          |         ✓        |
| OWNER   |   ✓  |  ✓  |   ✓  |           ✓          |         ✓        |

## Common patterns

### "Run this routine every day at 9 AM"

```bash theme={null}
crewship routine schedules create --slug daily-status-digest \
    --cron "0 9 * * *" --timezone "Europe/Prague"
```

### "Trigger this routine from a GitHub Actions hook"

```bash theme={null}
crewship routine webhooks create --slug pr-review-structured \
    --hmac-secret "$(openssl rand -hex 32)" --rate-limit 30
# Use the printed token + secret as Settings → Webhooks in the GH repo.
```

### "Validate routine DSL in CI before committing"

```yaml theme={null}
- run: crewship routine validate routines/*.json
```

Exit code 1 on any invalid file.

### "Discover what an agent is about to do, without running it"

```bash theme={null}
crewship routine dry-run summarize-text --inputs '{"text":"..."}'
```

Returns a `would_execute` report with which agent, which tier, the rendered prompt, and an estimated cost. Zero side effects.

### "Watch a metric on a tight cron, spend tokens only on a spike"

Author an [agentless](#agentless-routines) probe that prints `true`/`false`, then attach it as a [wake gate](#wake-gates):

```bash theme={null}
crewship routine schedules create --slug incident-summary \
    --cron "*/5 * * * *" --wake-slug error-rate-probe
```

The probe runs every 5 minutes for free; the LLM routine fires only on the ticks where the probe's output is truthy. `routine schedules list` shows how often the gate woke vs. checked.

### "Cancel an in-flight run"

```bash theme={null}
crewship routine cancel <run_id>
```

Signals the run goroutine; it stops at the next safe point and emits `pipeline.run.failed` with reason "cancelled".

### "Serialise runs per tenant / customer / repo"

Use `concurrency_key` with a template referencing the tenant-identifying input:

```json theme={null}
{
  "concurrency_key": "{{ inputs.account_id }}",
  "max_concurrent": 1,
  "inputs": [
    { "name": "account_id", "type": "string", "required": true }
  ]
}
```

Two requests for the same `account_id` queue rather than fan out; requests for different `account_id`s run in parallel. Pair with the `Idempotency-Key` header for webhook handlers (see the [Concurrency + idempotency recipe](/guides/routines-cookbook#recipe-5-idempotent-routine-with-concurrency-key)).

The platform **fails fast** if the rendered key is empty (missing input + no literal prefix in the template) — see [Troubleshooting](#run-returns-500-pipeline-concurrency_key-rendered-to-empty-value) for why and how to fix.

## Troubleshooting

<Tip>
  **Before going through this list:** run `crewship routine doctor <slug>` first. Most "blind alley" failures (missing crew provisioning, agent slug typo, missing credential, contradictory validation gate, cost cap too tight) surface as a ✗ check on doctor before they ever cost an LLM call.
</Tip>

<AccordionGroup>
  <Accordion title="Save returns 422 &#x22;save requires a fresh, passing test_run&#x22;">
    The save endpoint requires the validation gate cleared. The server validates the DSL on save (parse + schema + cycle detection); to clear the residual gate, send `"last_test_run_at"` (RFC3339, within the last 5 minutes) + `"last_test_run_passed": true` in the `/save` body. This is the body-trust path `crewship routine save` uses, and it mirrors the sidecar agent-authoring flow which sets the same fields after the internal dry-run validation. There is no public `test_run` endpoint to mint a token from — a real run can't be done "dry", so a real run is reserved for the first live `crewship routine run`.

    Alternative: `crewship apply --skip-test-gate` (CLI — the flag lives on `apply`, not `routine save`) / `"skip_test_gate": true` (API) if your role is OWNER/ADMIN and you trust the DSL — bypasses the gate explicitly.
  </Accordion>

  <Accordion title="Run starts but no step events appear in the UI">
    Check that `?include_steps=1` is in the runs URL the UI fetches. The list endpoint defaults to run-level only to keep payload bounded; the detail panel passes the flag explicitly. After a refresh, the waterfall populates from journal entries plus live WebSocket events.
  </Accordion>

  <Accordion title="Schedule shows enabled=true but never fires">
    The scheduler ticks every 30s; with single-binary deployment, restarting the server resets the in-memory tick cursor. Pending schedules whose `next_run_at` has passed will fire on the next tick after restart. If you've edited the cron expression, `next_run_at` recomputes from `now()` — so a `0 9 * * *` schedule edited at 10:30 won't fire until 9 AM tomorrow.
  </Accordion>

  <Accordion title="Webhook returns 401 / 403 on a known-good token">
    HMAC mismatch: check the `X-Crewship-Signature: sha256=<hex>` header, computed as `HMAC-SHA256(signing_secret, request_body)` over the **raw bytes** the sender sent (not a re-serialized form). The server uses `hmac.Equal` for comparison so timing-safe.
  </Accordion>

  <Accordion title="Run cost is higher than estimated_cost_usd">
    Two-tier escalation walks the fallback chain. With `on_fail: escalate_tier`, a failed Haiku run retries on Sonnet (5-10× more expensive), then Opus (20-50× more expensive) before giving up. Tighten validation gates (loosen `must_contain`, raise `min_length`), or set `max_cost_usd` on the routine to abort the run between steps when a cost ceiling is hit.
  </Accordion>

  <Accordion title="&#x22;Pipeline X step Y stuck in pending state&#x22;">
    Since resume-from-step landed, a restart between the `wait` step starting and a decision arriving re-attaches the run to its pending waitpoint at boot — approving via `crewship routine waitpoints approve <token>` resumes it. If the run shows `interrupted` instead, its persisted state was insufficient to resume (the reason is in the run's `error_message`); the orphaned waitpoint can still be listed and rejected to clear the inbox. The boot log lines `pipeline boot recovery done (resume-from-step) resumed=N interrupted=M` and `pipeline waitpoint store wired (...) stranded_pending=N` show what recovery did.
  </Accordion>

  <Accordion title="Run returns 500 &#x22;pipeline: concurrency_key rendered to empty value&#x22;">
    The DSL declared a non-empty `concurrency_key` template but the rendered value is an empty string — typically because a referenced input was omitted at trigger time. Full error message:

    ```
    pipeline: concurrency_key rendered to empty value (referenced input missing or empty): template "{{ inputs.<NAME> }}"
    ```

    Why fail-fast: a routine that declares `concurrency_key: "{{ inputs.account_id }}"` is asking the platform to serialise runs per tenant. If `account_id` is missing, treating the empty key as "no gate" would silently allow unlimited parallelism for a routine the author explicitly asked us to serialise — a denial-of-self by misconfiguration. The executor refuses to start the run.

    Fixes, in order of preference:

    1. **Supply the input.** The caller (curl / `crewship routine run --inputs '{...}'` / scheduler / webhook) needs to pass the referenced input. For webhooks this means the `inputs_template` in the webhook config must produce it from the incoming payload.
    2. **Set a default on the InputSpec.** If the input is genuinely optional but you still want a tenant-style gate, add `"default": "global"` (or any non-empty sentinel) to the `InputSpec`. The platform merges defaults before rendering the key.
    3. **Use a literal prefix.** A template like `"vendor-alert-{{ inputs.vendor_id }}"` always renders non-empty (the literal `vendor-alert-` survives even when `vendor_id` is missing); the key still gates, just less granularly.
    4. **Drop the gate.** If you genuinely don't want concurrency control, omit `concurrency_key` entirely (do NOT set it to `""` — that's the *unset* sentinel).
  </Accordion>
</AccordionGroup>

## CLI reference

| Command                                  | Purpose                                                                                                                                                                                                                                                                                                                                                                        |
| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `routine list`                           | Workspace routine catalog                                                                                                                                                                                                                                                                                                                                                      |
| `routine get <slug>`                     | Detail with full DSL                                                                                                                                                                                                                                                                                                                                                           |
| `routine save`                           | Save from JSON file                                                                                                                                                                                                                                                                                                                                                            |
| `routine run <slug>`                     | Invoke against execution tier (`--tier-override fast/smart/...`). Deferred dispatch via `--delay/--ttl/--debounce-key/--priority`; exactly-once retries via `--idempotency-key <key>` (+ `--idempotency-ttl <s>`) — a duplicate key inside the window returns the original run as `DEDUPED` instead of executing again (same contract as the webhook `Idempotency-Key` header) |
| `routine dry-run <slug>`                 | Preview without invoking                                                                                                                                                                                                                                                                                                                                                       |
| `routine delete <slug>`                  | Soft-delete                                                                                                                                                                                                                                                                                                                                                                    |
| `routine doctor <slug>`                  | Preflight checklist (✓/⚠/✗) — catches blind alleys before run                                                                                                                                                                                                                                                                                                                  |
| `routine bench <slug>`                   | N-runs variance characterisation — pass-rate + cost/latency stats                                                                                                                                                                                                                                                                                                              |
| `routine runs <slug>`                    | Run history from journal                                                                                                                                                                                                                                                                                                                                                       |
| `routine records <slug>`                 | Run history from `pipeline_runs` projection (filterable by status)                                                                                                                                                                                                                                                                                                             |
| `routine logs <run_id>`                  | Full journal trace for one run (post-mortem)                                                                                                                                                                                                                                                                                                                                   |
| `routine watch <slug>`                   | Live event stream                                                                                                                                                                                                                                                                                                                                                              |
| `routine cancel <run_id>`                | Signal in-flight run                                                                                                                                                                                                                                                                                                                                                           |
| `routine replay <run_id>`                | Re-invoke a run with its captured inputs                                                                                                                                                                                                                                                                                                                                       |
| `routine errors`                         | List failed-run error fingerprint groups                                                                                                                                                                                                                                                                                                                                       |
| `routine bulk-replay --fingerprint <fp>` | Replay every run in a fingerprint group                                                                                                                                                                                                                                                                                                                                        |
| `routine step-override set/list/clear`   | Live prompt/model override for one step, no version bump                                                                                                                                                                                                                                                                                                                       |
| `routine pending list/cancel <id>`       | Inspect / cancel deferred (`--delay` / `--debounce-key`) triggers                                                                                                                                                                                                                                                                                                              |
| `routine versions <slug>`                | Version history                                                                                                                                                                                                                                                                                                                                                                |
| `routine rollback <slug> --to N`         | Roll back to v N                                                                                                                                                                                                                                                                                                                                                               |
| `routine export <slug>`                  | JSON bundle to stdout                                                                                                                                                                                                                                                                                                                                                          |
| `routine import [bundle.json]`           | Load from file or stdin                                                                                                                                                                                                                                                                                                                                                        |
| `routine validate [file.json]`           | Offline DSL check                                                                                                                                                                                                                                                                                                                                                              |
| `routine schedules ...`                  | Cron CRUD                                                                                                                                                                                                                                                                                                                                                                      |
| `routine webhooks ...`                   | Webhook CRUD                                                                                                                                                                                                                                                                                                                                                                   |
| `routine waitpoints ...`                 | HITL inbox + decisions                                                                                                                                                                                                                                                                                                                                                         |

For batch evaluation across the eval-\* fleet (matrix sweep, head-to-head tier compare, baseline regression diff), see the [eval CLI](/cli/eval). Cookbook recipe 6 walks the eval-driven promotion workflow end-to-end.

The `pipeline` alias is preserved for back-compat on the legacy subcommands (`pipeline list`, `pipeline run`, `pipeline get`, etc.). The post-rename additions — `schedules`, `webhooks`, `waitpoints`, `validate`, `watch`, `logs`, `records`, `bench`, `doctor` — are only registered under `routine`, so scripts that need them must switch.

## Backend reference

* Migrations v78 (pipelines + execution\_tiers\_json), v79 (versions + waitpoints), v80 (schedules), v81 (run idempotency), v82 (webhooks), v115 (schedule wake gates)
* Source: `internal/pipeline/` (\~10 700 LOC, 36 files)
* API: `internal/api/pipelines.go`, `pipeline_runs.go`, `pipeline_schedules.go`, `pipeline_webhooks.go`
* Sidecar: `internal/sidecar/pipelines.go` (port 9119)
* Frontend: `app/(dashboard)/routines/`, `components/features/routines/`, hooks `use-pipelines*`

## Production notes

Two caveats worth internalizing before you lean on routines for anything time- or cost-sensitive — both are permanent architectural properties of the current single-binary deployment, not bugs waiting on a fix:

<Warning>
  **Single-instance only.** The scheduler, run registry, and online eval watermark all assume one process. Running multiple replicas would double-fire cron schedules and webhooks (no leader election yet) — see [Cron schedules](#cron-schedules).
</Warning>

<Warning>
  **Crash recovery is at-least-once, step-granular, not exact.** A hard kill loses at most the in-flight step, which re-executes (and re-bills) on resume. `call_pipeline` has no nested checkpointing — a kill mid-nested-run re-executes the **entire** nested pipeline. `max_cost_usd` under-counts after a crash, since whatever the killed step had already spent isn't in the restored total. See [Durability and restart recovery](#durability-and-restart-recovery) for the full resume matrix.
</Warning>

## Limitations (current MVP)

These are known gaps in the current MVP — none block production use, but they shape what you can rely on today.

* **Resume is at-least-once, step-granular** — restart recovery re-enters runs from the last persisted step boundary (see [Durability and restart recovery](#durability-and-restart-recovery)). The step that was in flight at the kill re-executes from scratch; there is no sub-step checkpointing, nested `call_pipeline` runs re-execute in full, and DAG runs recover at wave granularity. Runs whose definition changed since the run started (content-hash mismatch) fall back to `interrupted`. `max_cost_usd` under-counts true spend after a crash — see [Production notes](#production-notes).
* **Single-instance scheduler** — running multiple replicas would double-fire schedules.
* **`credentials_required` is declared-only** — unlike `integrations_required` and `resources`, it is not yet enforced at run time (see [Required integrations](#required-integrations)).
* **No cross-adapter tier swap yet** — same-provider model swap (Haiku→Opus) works; Claude→Gemini swap requires a shorthand→constant mapping not yet wired.
* **No NL→cron converter** — ops still hand-type `0 9 * * *`. Foundation PRD has the design; not in MVP.