Troubleshooting
Most beta-era problems fall into a small set of buckets. Start here before opening an issue.First step: crewship doctor
doctor runs a battery of checks and prints one row per check with a
status: PASS / WARN / FAIL / INFO. Read each WARN/FAIL — the
hint text in the second column is the remediation.
doctor covers (as of v0.1 beta):
- Container runtime detected — Docker / Podman / Colima / OrbStack / Apple Containers. If FAIL, install one (see Install — container runtime requirement).
- Server reachable — TCP-dials the configured server host:port
(default
localhost:8080) so you can tell “server down / wrong port” apart from “auth issue”. - Database file permissions — confirms
~/.crewship/crewship.dbis0600(owner-only). Drift here usually means a backup was restored viatarwithout preserving the mode bit. - Telemetry status — shows current opt-in state and resolved
endpoint host (vendor default vs
CREWSHIP_SENTRY_DSNoverride). - DSN reachability — best-effort TCP connect to the Sentry endpoint:443 (5s timeout). Sentry being unreachable is not a Crewship health signal — WARN here just informs you that crashes won’t ship until the network heals.
- Update available — checks GitHub Releases API for a newer version. WARN if a stable release is newer than your installed build.
Symptom catalog
Find your symptom below; each panel expands to its diagnosis and fix.`crewship start` exits immediately, no error
`crewship start` exits immediately, no error
Almost always means migrations failed silently or the data directory
isn’t writable. Run:
--verbose switches the logger to debug level. Look for failed to run migrations or failed to open database. Common fixes:- Permission error:
chmod 700 ~/.crewship && chmod 600 ~/.crewship/crewship.db. The migrate path enforces 0600 on the DB file but can’t reset 0700 on the parent. - Disk full:
~/.crewship/crewship.dblives next to a pre-migration snapshot (*.pre-migrate-*.bak). If the disk is full, the snapshot can’t be written and the boot aborts to prevent a half-migrated DB. Free space, then re-run.
Agent containers won't start
Agent containers won't start
crewship doctor will already have flagged this. Quick install paths
on macOS:systemctl start docker or install podman. Crewship picks
whatever socket it finds — no further config needed."Agent exited with code 123"
"Agent exited with code 123"
Specific to Claude Code adapter. Almost always missing or invalid
Other non-zero exits (1, 2, 127) usually mean the agent CLI binary
isn’t installed in the container image — check
CLAUDE_CODE_OAUTH_TOKEN — the agent CLI failed its startup auth
check. Fix:crewship agent debug <id> for the container’s resolved PATH.`curl … install.sh | bash` fails with "Bad substitution"
`curl … install.sh | bash` fails with "Bad substitution"
Wrong shell. Use The script uses
bash, not sh:set -euo pipefail and array-expansion guards that
Debian/Ubuntu dash (the default /bin/sh) does not implement. See
the bash note in Install.`brew install crewship-ai/tap/crewship` fails with "tap not found"
`brew install crewship-ai/tap/crewship` fails with "tap not found"
Update brew and try again:If the tap repo (https://github.com/crewship-ai/homebrew-tap) lists a
formula version different from what you’d expect, the goreleaser
auto-push for the latest release may have failed — file an issue and
fall back to the
curl | bash install.macOS Gatekeeper blocks the binary
macOS Gatekeeper blocks the binary
Releases are cosign-signed but not Apple-notarized in v0.1 (Apple
Developer Program enrollment is a separate workstream). After the
first download you’ll see:Unblock once:Or right-click → Open → Open Anyway in Finder. The
brew install path
strips quarantine automatically; only direct downloads need this.Linux: daemon won't survive logout
Linux: daemon won't survive logout
crewship start as a systemctl --user service (or a tmux-less SSH
session) is killed when the user logs out, because systemd-user
sessions are tied to the login by default.Enable lingering so the user manager keeps running:loginctl show-user "$USER" | grep Linger= reports Linger=yes once
it’s on. Headless servers (no interactive shell) are the common case;
desktop installs that always have a logged-in graphical session can
skip this step.Windows: SmartScreen blocks the binary
Windows: SmartScreen blocks the binary
Windows builds aren’t Authenticode-signed in v0.1 (we sign later in
the beta). On first launch you’ll see a SmartScreen “Don’t run” prompt.Right-click
crewship.exe → Properties → check Unblock → OK.
One-time per binary — subsequent runs skip the prompt. The Homebrew
and Docker Compose install paths bypass this entirely."Migration version N collision"
"Migration version N collision"
This means your local DB has a migration applied under one name but
the binary expects a different name at that version. Caused by
upgrading from a fork or a pre-release build that shipped a renamed
migration.Maintainer-side: the
migration-lint CI workflow now catches this
class of bug pre-merge, so it should not recur in v0.1+ releases.Web UI loads but API requests fail with 401
Web UI loads but API requests fail with 401
Token expired or never set. Two paths:In the UI: log out and log back in. If the bootstrap user was reset
(rare — only happens after
crewship.db deletion), visit
/bootstrap to recreate the initial admin account.A delegated assignment shows "working on the task…" forever
A delegated assignment shows "working on the task…" forever
When a lead delegates work (
[Assignment] @agent is working on the task… in the lead’s chat), the sub-agent run lives in the server
process. If the server crashes or restarts mid-run, that run is gone —
and before v0.1.x the assignment row stayed RUNNING forever, which
also blocked one of the crew’s concurrency slots so every later
delegation for that crew queued indefinitely.This is now self-healing, on two layers:- On restart, the server fails every assignment that was
RUNNINGunder the previous process with the reasoninterrupted by server restart. The lead’s chat resolves with the failure, the slot is freed, and any queued assignments behind it are dispatched immediately. - While running, a background sweeper (every 5 minutes) fails any
assignment stuck in
RUNNINGwith no completion past its staleness bound: the target agent’s configuredtimeout_secondsplus a 15-minute grace margin, with a 2-hour floor. Because the bound tracks each agent’s own timeout, a long-but-healthy run is never touched — even one configured to run longer than the floor; the sweeper only reclaims slots leaked by abnormal conditions (e.g. a force-killed container that never reported back).
assignment.failed
entries attributed to assignment_recovery, so a post-mortem can tell
recovery-failures apart from real agent failures."Sentry endpoint unreachable" in doctor
"Sentry endpoint unreachable" in doctor
WARN, not FAIL. Telemetry won’t ship until the network heals; nothing
else breaks. If you want to silence the warning:Or redirect to a Sentry instance you can reach:
Where to go next
Ifcrewship doctor is green and you still hit issues:
Open a GitHub issue
Attach
crewship doctor --no-color output and the relevant log
section.Discord community
Real-time help, especially for “is this expected?” questions.
Crash-report dashboard
If telemetry is enabled and the crash has a Sentry event ID,
paste it into the issue — the maintainer can correlate immediately.
Source-map view
For frontend issues, the Sentry stack trace has a “View on GitHub”
link per frame — useful screenshots to attach to reports.