crewship doctor
Runs a battery of checks against the local Crewship install and prints one row per check with a status. Designed for the “is everything OK before I file an issue?” workflow.| Flag | Type | Default | Description |
|---|---|---|---|
--fix | bool | false | Attempt safe auto-repairs while running the checks. Currently creates the missing data directory; future checks may opt in. |
--json | bool | false | Emit machine-readable JSON to stdout instead of the colored human table. Each check lands as a snake_case-keyed object — ideal for piping into jq or a healthcheck. |
--no-color | bool | false | Disable ANSI colours. Global root flag (not doctor-specific), available on every command. Automatically enabled when stdout is not a TTY (no need to pass the flag in CI / log-collection scripts). |
Exit codes
| Code | Meaning |
|---|---|
0 | All checks PASS/INFO/WARN. Safe to run. WARNs indicate degraded features (read the remediation column) but do not block startup. |
1 | One or more checks FAIL. Server cannot start (or some critical feature is broken). |
crewship doctor
into a healthcheck without false alarms on transient external state
(e.g. Sentry endpoint briefly unreachable). FAIL means a core
requirement for starting Crewship is missing or broken; treat the
non-zero exit as a hard gate.
Check categories
Container runtime
Detects which Docker-compatible runtime is available — Docker, Podman, Colima, OrbStack, Apple Containers, or Rancher Desktop. Auto-probes each candidate socket; the first reachable one wins.- PASS — runtime detected; agents can run.
- FAIL — no runtime found. Install one; see Install — container runtime requirement.
Port binding
Confirms the HTTP port (default8080 or CREWSHIP_PORT) is free.
- PASS — port is bindable.
- FAIL — port in use.
lsof -i :8080to find the squatter, or changeCREWSHIP_PORT.
Database file permissions
Stats~/.crewship/ directory and the SQLite DB file. Expects:
- Parent directory:
0700 - DB file:
0600 - WAL + SHM sidecars (if present):
0600
tar without
preserving mode bits. Fix:
NEXTAUTH_SECRET
Surfaces where the JWT signing secret lives (env var vs<dataDir>/secrets.env) and validates the value is at least 32 characters. As of PR #446 the secret is auto-bootstrapped on first start, so a missing value is no longer a failure — doctor reports INFO and the next crewship start generates it.
| Result | Trigger |
|---|---|
PASS — env-provided (N chars) | NEXTAUTH_SECRET is in the process env and ≥ 32 chars. |
PASS — auto-managed in <path> (N chars) | Not in env; the persisted secrets.env has a valid entry. |
INFO — not yet bootstrapped | Not in env and secrets.env doesn’t exist yet. The next crewship start will generate and persist it. |
WARN — env-provided value is short (N chars) | In env but below 32 chars. Regenerate with openssl rand -hex 32. |
WARN — persisted value invalid: … | The persisted entry exists but failed ValidateNextAuthSecret. Replace the line in secrets.env. |
WARN — persisted secret file exists but NEXTAUTH_SECRET entry is missing | The file was hand-edited or partial; delete it so the next start regenerates (any credentials encrypted under the missing key are unrecoverable). |
WARN — cannot inspect persisted secret file: … | Usually a permission/ownership problem on the data dir. Check ~/.crewship (or $CREWSHIP_DATA_DIR) permissions. |
Episodic recall mode
Reads theepisodic field from the running server’s /healthz.
Reports whether episodic memory recall runs
with a vector embedder or degraded to keyword/FTS only.
- PASS —
vector + sparse recall— an embedder is configured (KEEPER_OLLAMA_URL); the boot-time indexer sweeper is embedding journal entries and recall serves vector + BM25 results. - WARN —
sparse-only— no embedder configured. Recall still works on keywords, but vector similarity is off. SetKEEPER_OLLAMA_URLto an Ollama host servingnomic-embed-text. - INFO — server not reachable / older server — the daemon is down
(the server reachable check already FAILs for that) or predates
the
episodichealth field.
Legacy crew resources
Calls the authenticatedGET /api/v1/admin/legacy-resources endpoint to detect
orphaned pre-C1 (slug-only) crew docker resources that survive
crewship seed --nuke and make every agent in the affected crew fail to start
— surfaced to users only as a generic “failed to start agent container”.
(Detection runs on this admin endpoint rather than the unauthenticated
/healthz path, so a slow docker daemon can’t stall health probes.)
- PASS —
clean— no orphaned legacy volumes/containers for any current crew slug. - WARN —
present— at least one orphaned pre-C1 resource exists. Agents in the affected crew(s) will fail to start. Runcrewship admin prune-legacyto remove them. - INFO — not logged in / unreachable / non-docker server — the check needs
an authenticated session (
crewship login); or the daemon is down (the server reachable check covers that) or the container provider isn’t docker.
Telemetry status
Readsapp_settings.telemetry_opt_in from the local DB. Reports:
- PASS — ENABLED + DSN — telemetry is on and a DSN is wired in.
Shows the endpoint host (vendor default vs
CREWSHIP_SENTRY_DSNoverride). - PASS — DISABLED — operator opted out via
crewship telemetry off. - WARN — ENABLED but no DSN — consent recorded but no DSN available. Local dev builds without ldflag injection; not an error, just informational.
- WARN — not configured — fresh DB, never asked. Will default to
ENABLED on the next
crewship start(v0.1 beta behaviour).
DSN reachability
Only runs when telemetry is enabled AND a DSN is set. Best-effort TCP connect (5s timeout) to the resolved endpoint at port 443.- PASS — endpoint reachable; events will ship.
- WARN — endpoint unreachable. Crashes won’t ship until network
heals. Not a Crewship health signal — could be local firewall, DNS
issue, or Sentry outage.
crewship doctordoes not fail on this; you’d be silently angry at the wrong layer.
Update available
Calls the GitHub Releases API for the project (cached 24h locally).- PASS — running latest stable, or running a newer pre-release.
- WARN — newer stable available; output shows current → latest
with install hint (
brew upgrade crewshipordocker pull). - INFO — running a dev build, version check skipped.
CREWSHIP_SKIP_UPDATE_CHECK=1.
Example output
PR #441 added a version + OS/arch header banner and a “Next steps” footer that points the operator at the three commands that actually move them forward (or at the troubleshooting page on FAIL). Presentation-only — no check behaviour changes.Output formats
--no-color— disable ANSI colors. Auto-detected when stdout is not a TTY (so log-collection scripts get plain text without any flag).- Status codes (
PASS/WARN/FAIL/INFO) are always in left-padded brackets and parseable withawk '{print $1}'.
Auto-repair (--fix)
--fix opts into safe, narrowly-scoped repair side-effects during the
check run. Currently scoped to:
- Database directory missing — creates
$CREWSHIP_DATA_DIR(or~/.crewship) with0700perms instead of failing the check. The detail column then reads(created via --fix)so you can see what changed.
--fix is
for first-run onboarding, not running surgery on a populated install.
When to run
- Before opening a GitHub issue — attach
crewship doctor --no-coloroutput. Saves a back-and-forth diagnostic round. - After upgrading the binary — pre-migration snapshot has run by this point; doctor confirms the new binary is happy with the existing DB.
- In CI smoke tests — the
.github/workflows/smoke-test.ymlworkflow runscrewship doctoragainst the freshly-released binary on each release tag to catch broken binaries before they reach users.
Related
crewship start— the command doctor is checking for. If doctor is green,startshould succeed.crewship telemetry— flip consent state that doctor reports.crewship version— same version info doctor uses for the update-availability check.- Troubleshooting — symptom-keyed catalog of common failures.