Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.crewship.ai/llms.txt

Use this file to discover all available pages before exploring further.

crewship routine

Manage workspace routines — AI-authored, repeatable workflow recipes that any crew can invoke.
crewship routine <subcommand> [flags]
Alias: crewship pipeline (back-compat — every routine X invocation also works as pipeline X) See Routines guide for the conceptual overview.

Subcommands

GroupCommandPurpose
CorelistList workspace routines
Coreget <slug>Show routine detail with full DSL
CoresaveSave a new routine from a JSON file
Corerun <slug>Invoke a routine against the live execution tier
Coredry-run <slug>Preview what a routine would do without invoking agents
Coredelete <slug>Soft-delete a routine
Diagnosticsdoctor <slug>Preflight checklist — catches blind alleys before run
Diagnosticsbench <slug>Variance characterisation — N runs, pass-rate + cost/latency stats
Runsruns <slug>List recent runs from journal
Runsrecords <slug>List runs from pipeline_runs projection (v83)
Runslogs <run_id>Full journal trace for a run (post-mortem)
Runscancel <run_id>Cancel an in-flight run
Runswatch <slug>Stream live run + step events
Versionsversions <slug>List version history
Versionsrollback <slug> --to NRoll back to version N
Bundlesexport <slug>Export bundle JSON to stdout
Bundlesimport [bundle.json]Import bundle from file or stdin
Authoringvalidate [file.json]Validate DSL offline (no server call)
Triggersschedules ...Cron schedule CRUD
Triggerswebhooks ...Webhook CRUD
HITLwaitpoints ...Approval inbox + decisions
For batch evaluation across the eval-* routine fleet (matrix sweep, head-to-head tier compare, regression baselines), see the eval CLI.

crewship routine list

List workspace routines, sorted by usage by default.
crewship routine list
crewship routine list --order recent
crewship routine list --order name
FlagDefaultDescription
--orderpopularitySort: popularity (invocation_count DESC), recent (updated_at DESC), name (alpha)
Output columns: SLUG, INVOC, LAST STATUS, AUTHOR CREW, DESCRIPTION

crewship routine get <slug>

Show full routine detail including the DSL definition.
crewship routine get summarize-text
Output: human header (slug, name, author, invocations) + pretty-printed DSL JSON.

crewship routine save

Save a new routine from a JSON DSL file. Test_run gate runs first; if it fails, save aborts with a structured error pointing at the failing step.
crewship routine save \
    --name "email-fetch-summarize" \
    --description "Fetch new emails since a date and summarize." \
    --definition email-fetch.json \
    --author-crew crew_marketing \
    --author-agent agent_lead
FlagRequiredDescription
--nameHuman-readable name; slug derived from this
--descriptionOne-line summary shown in [AVAILABLE ROUTINES]
--definitionPath to JSON DSL file
--author-crewcrew_id that owns this routine
--author-agentagent_id that authored (optional but recommended)
--sample-inputsJSON inputs the test_run uses

crewship routine run <slug>

Invoke a saved routine against the live execution tier.
crewship routine run summarize-text --inputs '{"text":"..."}'
crewship routine run pr-review-structured --inputs '{"diff":"..."}' --invoking-crew crew_eng

# Force every agent_run step onto a specific tier (eval / debug)
crewship routine run eval-extract-emails --tier-override fast
crewship routine run eval-extract-emails --tier-override smart
FlagDescription
--inputsJSON inputs for the run
--invoking-crewcrew_id recorded as the invoker (cross-crew reuse audit)
--tier-overrideforce every agent_run step onto trivial/fast/moderate/smart. Step-level model_override still wins. Empty (default) = use authored complexity. Used by the eval suite to diff weak vs strong worker tiers without DSL changes.
Output: run_id, status, output, duration_ms, cost_usd.

Common errors

  • API error (500): pipeline: concurrency_key rendered to empty value (referenced input missing or empty) — the DSL declared a concurrency_key template that references an input you didn’t supply, and the template has no literal text to fall back on. See Troubleshooting in the routines guide for the four fixes.

crewship routine doctor <slug>

Preflight checklist that walks every checkable precondition before run and emits a ✓/⚠/✗ report. Catches the blind alleys (missing crew provisioning, agent slug typo, missing credential, contradictory validation gate, cost cap too tight) at one round-trip instead of one failure-per-run.
crewship routine doctor eval-extract-emails
crewship routine doctor my-routine -f json   # CI-friendly
Checks performed:
  • routine_exists — routine present in the workspace
  • author_crew — author crew exists, has a devcontainer config, provisioning status is completed (not in_progress or failed)
  • agent_slugs — every step’s agent_slug and every outcomes.grader_agent_slug resolves to an agent in the author crew (runtime resolution is crew-scoped)
  • credential:<TYPE> — every entry in credentials_required has an active workspace credential of that type
  • egress_allowlistegress_targets declared when http steps are present; no wildcards or loopback hosts (*, localhost, 127.0.0.0/8)
  • cost_capmax_cost_usd is set and is at least 1.5× estimated_cost_usd
  • validation:<step_id> — gates are structurally satisfiable (min_length ≤ max_length, no string in both must_contain and must_not_contain)
Exit code is non-zero if any check FAILed. WARN does not propagate to the exit code — it’s an operator hint, not a blocker.

crewship routine bench <slug>

Run a routine N times with the same inputs and report pass-rate, cost stats, and latency distribution. Trigger.dev-style stability characterisation — use to decide whether a routine is production-ready at its current tier.
# 10 runs at the routine's authored tier
crewship routine bench eval-extract-emails --runs 10

# 5 runs with explicit tier override and JSON output for CI
crewship routine bench eval-extract-emails --runs 5 --tier-override fast -f json

# Bench with custom inputs + spaced cooldown for rate-limited credentials
crewship routine bench summarize-text --runs 5 --cooldown-ms 1000 \
    --inputs '{"text":"specific text to bench"}'
FlagDescription
--runsNumber of runs (default 10)
--tier-overrideForce every agent_run step onto a tier (trivial/fast/moderate/smart). Empty = authored complexity.
--inputsJSON inputs forwarded to every run
--fail-fastAbort on the first non-pass instead of completing every run
--cooldown-msDelay between runs (useful when stress-testing rate-limited credentials)
Output reports total / mean / p95 / max cost; p50 / p95 / max duration; fail-reason breakdown bucketed into cost-cap / rubric-fail / gate-fail / auth-fail / timeout / other. Headline verdict label so the operator doesn’t need to interpret the stats:
VerdictWhen
PRODUCTION_READY≥90% pass rate
FLAKY70–90% pass rate (investigate top fail reason)
UNRELIABLE<70% pass rate (escalate tier or rewrite gates)
BROKEN0% pass rate (auth issue, unsatisfiable gate, or cost guardrail trips every time)
INSUFFICIENT_DATA0 runs

crewship routine records <slug>

List runs for a routine using the pipeline_runs projection (introduced in migration v83). Backed by indexed columns, so this is O(log n) where runs is O(scan-with-LIKE) over journal_entries. Prefer records for filterable / scriptable run history.
crewship routine records eval-extract-emails
crewship routine records eval-extract-emails --status failed --limit 100
crewship routine records eval-extract-emails --json | jq '.[] | select(.cost_usd > 0.05)'
FlagDescription
--limitMax records to return, 1–500 (default 50)
--statusFilter: queued/running/completed/failed/cancelled/interrupted/dry_run
--jsonEmit JSON for scripting
Falls back with a clear message to crewship routine runs <slug> when the server predates v83.

crewship routine logs <run_id>

Fetch the full journal trace for a single run. Post-mortem and CI-diagnostics surface — every step’s prompt, output, validation verdict, and cost in one document.
crewship routine logs run_cmox10oci00016b48aacd
crewship routine logs run_cmox10oci00016b48aacd --slug eval-extract-emails
crewship routine logs run_cmox... --slug eval-... --json | jq '.events[] | select(.type=="pipeline.step.failed")'
Example output (table mode, with --slug):
TIME          EVENT             SEVERITY  DURATION  COST     SUMMARY
18:42:01.103  run.started       info      —         —        Pipeline pr-review starting…
18:42:01.110  step.started      info      —         —        Pipeline pr-review step extract started
18:42:03.421  step.completed    info      2.31s     $0.0021  Pipeline pr-review step extract completed
18:42:03.428  step.started      info      —         —        Pipeline pr-review step summarise started
18:42:05.812  step.completed    info      2.38s     $0.0013  Pipeline pr-review step summarise completed
18:42:05.820  run.completed     info      —         —        Pipeline pr-review completed

Completed in 4.7s · cost $0.0034
The DURATION and COST columns surface the per-step duration_ms / cost_usd from the journal payload (em-dash means “not applicable” — started / failed events don’t carry them). Matches the Runs-tab waterfall in the UI so the same numbers reconcile across surfaces.
FlagDefaultDescription
--slugRoutine slug the run belongs to (optional; enables full journal timeline)
--jsonfalseJSON output for scripting
Pair with routine watch for live observation and routine logs for post-hoc forensics.

crewship routine dry-run <slug>

Preview what the routine WOULD execute without invoking agents. Returns a structured would_execute report with each step’s resolved tier, rendered prompt, and estimated cost.
crewship routine dry-run summarize-text --inputs '{"text":"..."}'
Example output:
Dry run: DRY_RUN_OK (estimated 0ms, $0.0034 total)

Step 1 [extract] (agent_run):
  would call agent: viktor
  resolved tier: claude/claude-haiku-4-5
  estimated cost: $0.0021
  rendered prompt:
    Extract the key entities from the following text...

Step 2 [summarise] (agent_run):
  would call agent: tomas
  resolved tier: claude/claude-sonnet-4-6
  estimated cost: $0.0013
  rendered prompt:
    Write a 2-sentence summary of...
Zero side effects — useful for “is this routine wired correctly?” without burning tokens. The cost figure is an order-of-magnitude estimate (flat token-density heuristic, not real pricing). The same report renders inline in the UI: click Dry run in the routine detail panel and the report appears above the tab bar.

crewship routine validate [file.json]

Offline DSL validation with no server call. Reads from the file argument or stdin if absent.
crewship routine validate routine.json
cat routine.json | crewship routine validate
crewship routine validate routine.json --json   # machine-readable
Exit code 0 = valid, 1 = invalid. CI-friendly:
- name: Validate routine DSL
  run: crewship routine validate routines/*.json
Runs the same parse + validate + step-id-uniqueness checks the server would, except cross-routine cycle detection (which needs the workspace’s full call graph) and slug uniqueness (which needs the DB).

crewship routine watch <slug>

Stream live run + step events for a routine. Polls every 2s, dedupes events, prints in chronological order with ANSI colour by status.
crewship routine watch summarize-text
crewship routine watch summarize-text --json | jq '.entry_type'
crewship routine watch summarize-text --once     # exit on first run terminal
crewship routine watch summarize-text --run-id <id>
FlagDefaultDescription
--jsonfalseJSON Lines instead of coloured text
--oncefalseExit after the first run completes / fails
--run-idFilter to events for one run_id only
--interval2sPoll interval (Go duration format)
Ctrl-C / SIGTERM exit cleanly.

crewship routine versions <slug>

crewship routine versions summarize-text
Output columns: VERSION, HEAD, PARENT, HASH, AUTHOR, CREATED, SUMMARY. The * marker tags the current HEAD.

crewship routine rollback <slug>

crewship routine rollback summarize-text --to 3
FlagRequiredDescription
--toTarget version number to roll back to
Creates a new version on top of HEAD whose definition equals v3’s. History is preserved; you can roll forward by another rollback.

crewship routine export <slug>

crewship routine export summarize-text > bundle.json
crewship routine export summarize-text --include-history > bundle-with-versions.json
FlagDescription
--include-historyInclude all version chain entries (default: HEAD only)
Output: crewship-pipeline-bundle/v1 JSON to stdout. Suitable for pipe to routine import or commit to git.

crewship routine import [bundle.json]

crewship routine import bundle.json
cat bundle.json | crewship routine import
crewship routine export src-routine | crewship routine import   # cross-workspace copy
Pre-validates JSON locally before sending — typos at the keyboard get a local error rather than a 400 from the server.

crewship routine cancel <run_id>

crewship routine cancel run_cmovn40h30001c57bb667
Signals the run goroutine. Run terminates at the next safe point and emits pipeline.run.failed with reason “cancelled”. Already-terminal runs return 409.

crewship routine schedules

Manage cron-driven triggers. Scheduler runs in-process and ticks every 30s.

schedules list

crewship routine schedules list
crewship routine schedules list --slug summarize-text
crewship routine schedules list --json
Output columns: ID, NAME, ROUTINE, CRON, TZ, ENABLED, NEXT

schedules create

crewship routine schedules create \
    --slug summarize-text \
    --name "daily-summary" \
    --cron "0 9 * * *" \
    --timezone "Europe/Prague" \
    --inputs '{"text":"daily input"}'
FlagRequiredDescription
--slugTarget routine slug
--cron5-field cron expression
--nameSchedule name (default: "<slug> schedule")
--timezoneIANA timezone (default: UTC; for host-local, pass an IANA zone such as America/Los_Angeles or read it from cat /etc/timezone / timedatectl show -p Timezone --value)
--inputsJSON object passed as inputs on each tick
--enabledCreate enabled (default true)

schedules update <id>

Pass any subset of --cron / --timezone / --name / --enabled / --inputs to update.

schedules enable <id> / schedules disable <id>

Toggle without altering the cron expression.

schedules now <id>

Force-fire ad-hoc (out-of-cycle invocation for testing).

schedules delete <id>

crewship routine schedules delete sch_abc123 --yes
--yes skips the confirmation prompt.

crewship routine webhooks

Manage event-driven triggers. Each webhook is token-addressed (POST to /api/v1/webhooks/{token}), optionally HMAC-signed for delivery integrity, and rate-limited per token.

webhooks list

crewship routine webhooks list
crewship routine webhooks list --slug pr-review-structured
Output columns: ID, NAME, ROUTINE, HMAC, FIRES, LAST STATUS, RATE/MIN, ENABLED

webhooks create

crewship routine webhooks create \
    --slug pr-review-structured \
    --name "github-pr-reviews" \
    --hmac-secret "$(openssl rand -hex 32)" \
    --rate-limit 30
Stripe-style secret reveal: the signing secret is shown ONCE in the response. Save it now or rotate via delete+recreate.
FlagRequiredDescription
--slugTarget routine slug
--nameWebhook name (default: "<slug> webhook")
--hmac-secretHMAC signing secret — empty means no signature verification
--rate-limitMax fires per minute (default 60)
--inputs-templateJSON template merged with the request body
--base-urlOverride the public base URL printed

webhooks url <id>

Print the public URL without re-revealing the signing secret.

webhooks delete <id>

crewship routine webhooks delete wh_abc123 --yes
Existing senders using the deleted token will start getting 404s.

crewship routine waitpoints

Inspect + decide on pending HITL approval waitpoints.

waitpoints list

crewship routine waitpoints list
crewship routine waitpoints list --json
Output columns: TOKEN, RUN ID, STEP, KIND, CREATED, TIMEOUT, PROMPT

waitpoints show <token>

Full prompt + metadata for a single waitpoint.

waitpoints approve <token> / waitpoints reject <token>

crewship routine waitpoints approve wpt_abc123 --comment "LGTM"
crewship routine waitpoints reject  wpt_abc123 --comment "needs revision"
The decision comment is forwarded to the parked run as the wait step’s output, so downstream steps can read approval rationale via {{ steps.<wait_step_id>.output }}.

Global flags

All routine subcommands accept these global flags:
FlagEnvDescription
--serverCREWSHIP_SERVERServer URL (default http://localhost:8080)
--workspaceCREWSHIP_WORKSPACEWorkspace ID or slug
--format`tablejsonyamlquiet`
--no-colorDisable ANSI colour output
--verboseVerbose request/response logging

See also