Skip to main content

crewship doctor

Runs a battery of checks against the local Crewship install and prints one row per check with a status. Designed for the “is everything OK before I file an issue?” workflow.
crewship doctor [--fix] [--json] [--no-color]
FlagTypeDefaultDescription
--fixboolfalseAttempt safe auto-repairs while running the checks. Currently creates the missing data directory; future checks may opt in.
--jsonboolfalseEmit machine-readable JSON to stdout instead of the colored human table. Each check lands as a snake_case-keyed object — ideal for piping into jq or a healthcheck.
--no-colorboolfalseDisable ANSI colours. Global root flag (not doctor-specific), available on every command. Automatically enabled when stdout is not a TTY (no need to pass the flag in CI / log-collection scripts).

Exit codes

CodeMeaning
0All checks PASS/INFO/WARN. Safe to run. WARNs indicate degraded features (read the remediation column) but do not block startup.
1One or more checks FAIL. Server cannot start (or some critical feature is broken).
WARN does not fail the command — you can wire crewship doctor into a healthcheck without false alarms on transient external state (e.g. Sentry endpoint briefly unreachable). FAIL means a core requirement for starting Crewship is missing or broken; treat the non-zero exit as a hard gate.

Check categories

Container runtime

Detects which Docker-compatible runtime is available — Docker, Podman, Colima, OrbStack, Apple Containers, or Rancher Desktop. Auto-probes each candidate socket; the first reachable one wins.

Port binding

Confirms the HTTP port (default 8080 or CREWSHIP_PORT) is free.
  • PASS — port is bindable.
  • FAIL — port in use. lsof -i :8080 to find the squatter, or change CREWSHIP_PORT.

Database file permissions

Stats ~/.crewship/ directory and the SQLite DB file. Expects:
  • Parent directory: 0700
  • DB file: 0600
  • WAL + SHM sidecars (if present): 0600
Drift here usually means a backup was restored via tar without preserving mode bits. Fix:
chmod 0700 ~/.crewship
chmod 0600 ~/.crewship/crewship.db ~/.crewship/crewship.db-wal ~/.crewship/crewship.db-shm 2>/dev/null

NEXTAUTH_SECRET

Surfaces where the JWT signing secret lives (env var vs <dataDir>/secrets.env) and validates the value is at least 32 characters. As of PR #446 the secret is auto-bootstrapped on first start, so a missing value is no longer a failure — doctor reports INFO and the next crewship start generates it.
ResultTrigger
PASS — env-provided (N chars)NEXTAUTH_SECRET is in the process env and ≥ 32 chars.
PASS — auto-managed in <path> (N chars)Not in env; the persisted secrets.env has a valid entry.
INFO — not yet bootstrappedNot in env and secrets.env doesn’t exist yet. The next crewship start will generate and persist it.
WARN — env-provided value is short (N chars)In env but below 32 chars. Regenerate with openssl rand -hex 32.
WARN — persisted value invalid: …The persisted entry exists but failed ValidateNextAuthSecret. Replace the line in secrets.env.
WARN — persisted secret file exists but NEXTAUTH_SECRET entry is missingThe file was hand-edited or partial; delete it so the next start regenerates (any credentials encrypted under the missing key are unrecoverable).
WARN — cannot inspect persisted secret file: …Usually a permission/ownership problem on the data dir. Check ~/.crewship (or $CREWSHIP_DATA_DIR) permissions.

Episodic recall mode

Reads the episodic field from the running server’s /healthz. Reports whether episodic memory recall runs with a vector embedder or degraded to keyword/FTS only.
  • PASS — vector + sparse recall — an embedder is configured (KEEPER_OLLAMA_URL); the boot-time indexer sweeper is embedding journal entries and recall serves vector + BM25 results.
  • WARN — sparse-only — no embedder configured. Recall still works on keywords, but vector similarity is off. Set KEEPER_OLLAMA_URL to an Ollama host serving nomic-embed-text.
  • INFO — server not reachable / older server — the daemon is down (the server reachable check already FAILs for that) or predates the episodic health field.

Legacy crew resources

Calls the authenticated GET /api/v1/admin/legacy-resources endpoint to detect orphaned pre-C1 (slug-only) crew docker resources that survive crewship seed --nuke and make every agent in the affected crew fail to start — surfaced to users only as a generic “failed to start agent container”. (Detection runs on this admin endpoint rather than the unauthenticated /healthz path, so a slow docker daemon can’t stall health probes.)
  • PASS — clean — no orphaned legacy volumes/containers for any current crew slug.
  • WARN — present — at least one orphaned pre-C1 resource exists. Agents in the affected crew(s) will fail to start. Run crewship admin prune-legacy to remove them.
  • INFO — not logged in / unreachable / non-docker server — the check needs an authenticated session (crewship login); or the daemon is down (the server reachable check covers that) or the container provider isn’t docker.

Telemetry status

Reads app_settings.telemetry_opt_in from the local DB. Reports:
  • PASS — ENABLED + DSN — telemetry is on and a DSN is wired in. Shows the endpoint host (vendor default vs CREWSHIP_SENTRY_DSN override).
  • PASS — DISABLED — operator opted out via crewship telemetry off.
  • WARN — ENABLED but no DSN — consent recorded but no DSN available. Local dev builds without ldflag injection; not an error, just informational.
  • WARN — not configured — fresh DB, never asked. Will default to ENABLED on the next crewship start (v0.1 beta behaviour).
See Telemetry for opt-out and routing-override recipes.

DSN reachability

Only runs when telemetry is enabled AND a DSN is set. Best-effort TCP connect (5s timeout) to the resolved endpoint at port 443.
  • PASS — endpoint reachable; events will ship.
  • WARN — endpoint unreachable. Crashes won’t ship until network heals. Not a Crewship health signal — could be local firewall, DNS issue, or Sentry outage. crewship doctor does not fail on this; you’d be silently angry at the wrong layer.

Update available

Calls the GitHub Releases API for the project (cached 24h locally).
  • PASS — running latest stable, or running a newer pre-release.
  • WARN — newer stable available; output shows current → latest with install hint (brew upgrade crewship or docker pull).
  • INFO — running a dev build, version check skipped.
Suppress with CREWSHIP_SKIP_UPDATE_CHECK=1.

Example output

PR #441 added a version + OS/arch header banner and a “Next steps” footer that points the operator at the three commands that actually move them forward (or at the troubleshooting page on FAIL). Presentation-only — no check behaviour changes.
$ crewship doctor

crewship v0.1.0-beta.4 · darwin/arm64

  [PASS]  container runtime              docker (Docker Desktop 4.35.1) at /var/run/docker.sock
  [PASS]  port binding                   :8080 is bindable
  [PASS]  database permissions           ~/.crewship (0700) + crewship.db (0600)
  [PASS]  NEXTAUTH_SECRET                auto-managed in /Users/you/.crewship/secrets.env (64 chars)
  [PASS]  telemetry status               ENABLED, endpoint o4511387688108032.ingest.de.sentry.io (vendor default)
  [PASS]  DSN reachability               connected to o4511387688108032.ingest.de.sentry.io:443 (94ms)
  [WARN]  update available               v0.1.0 stable is out; you're on v0.1.0-beta.1
                                         → brew upgrade crewship

Next steps:
  crewship start                 (boot the daemon)
  open http://localhost:8080     (then complete the browser onboarding wizard)
  https://docs.crewship.ai/guides/troubleshooting  (shown on FAIL only)

Output formats

  • --no-color — disable ANSI colors. Auto-detected when stdout is not a TTY (so log-collection scripts get plain text without any flag).
  • Status codes (PASS/WARN/FAIL/INFO) are always in left-padded brackets and parseable with awk '{print $1}'.

Auto-repair (--fix)

--fix opts into safe, narrowly-scoped repair side-effects during the check run. Currently scoped to:
  • Database directory missing — creates $CREWSHIP_DATA_DIR (or ~/.crewship) with 0700 perms instead of failing the check. The detail column then reads (created via --fix) so you can see what changed.
Anything destructive (file overwrites, mode-bit flips on existing files, DB schema repairs) is intentionally out of scope — --fix is for first-run onboarding, not running surgery on a populated install.

When to run

  • Before opening a GitHub issue — attach crewship doctor --no-color output. Saves a back-and-forth diagnostic round.
  • After upgrading the binary — pre-migration snapshot has run by this point; doctor confirms the new binary is happy with the existing DB.
  • In CI smoke tests — the .github/workflows/smoke-test.yml workflow runs crewship doctor against the freshly-released binary on each release tag to catch broken binaries before they reach users.