Documentation Index
Fetch the complete documentation index at: https://docs.crewship.ai/llms.txt
Use this file to discover all available pages before exploring further.
Architecture
Crewship is a self-hosted runtime that gives every crew of AI coding agents a shared Linux container — a real machine where they can install services, run databases, and build a complete working system. The runtime is shipped as a single Go binary with the Next.js UI embedded in it, an embedded SQLite database (PostgreSQL on the v0.2 roadmap), and a sidecar proxy that handles credential access without exposing keys to the agent process.
High-Level Architecture
+------------------+
| User Browser |
+--------+---------+
|
HTTP / WebSocket
|
+----------------------------------+
| crewship binary |
| +----------------------------+ |
| | Go HTTP Server (port 8080) | |
| +---+-----+-----+-----+-----+ |
| | | | | |
| REST WS SSE IPC Static |
| API Hub (Unix) Files |
| | | | | |
| +---v-----v-----v-----v-----+ |
| | SQLite (embedded) | |
| +----------------------------+ |
+--------+---------+----------------+
| |
Docker API Unix Socket IPC
| (/tmp/crewship.sock)
+--------v---------v--------+
| Crew Container |
| +---------+ +---------+ |
| | Sidecar | | Agent | |
| | (1002) | | (1001) | |
| +---------+ +---------+ |
+----------------------------+
Single Binary Design
The make build process creates a single self-contained binary:
- Frontend:
pnpm build creates a Next.js static export
- Embedding: The
out/ directory is copied to web/out/ and embedded via //go:embed
- Compilation:
go build produces a statically linked binary with version metadata
crewship binary
├── Go HTTP server (Gorilla mux, chi router)
├── WebSocket hub (real-time updates)
├── SQLite database (modernc.org/sqlite, driver: "sqlite")
├── Container orchestrator (Docker/Apple provider)
├── Sidecar proxy binary (embedded)
├── Embedded static frontend (web/out/*)
└── Seed data (crew/agent templates)
The SQLite driver registers as "sqlite" (not "sqlite3"). This is because Crewship uses modernc.org/sqlite, a pure-Go SQLite implementation.
Container Model
Crewship uses one container per crew (not per agent). All agents in a crew share a container and filesystem.
Crew Container (crewship-team-{slug})
├── /crew/
│ ├── agents/{slug}/ # Per-agent home directory
│ │ ├── .memory/ # Persistent memory files
│ │ └── .mcp.json # MCP server config
│ └── shared/ # Cross-agent shared space
├── /output/{slug}/ # Agent output (visible in UI)
├── /secrets/{slug}/ # Read-only credentials
├── /workspace/ # Temporary scratch
└── Processes:
├── Sidecar proxy (UID 1002, port 9119)
└── Agent CLI exec (UID 1001, via docker exec)
UID Security Boundary
| UID | Process | Purpose |
|---|
| 1001 | Agent process | Runs coding CLI (Claude Code, OpenCode, etc.) |
| 1002 | Sidecar proxy | Handles credential injection, network policy |
The sidecar runs as UID 1002 so agents (UID 1001) cannot directly access the credential store or manipulate the proxy.
IPC (Inter-Process Communication)
Communication between crewshipd and containers uses HTTP-over-Unix-socket at /tmp/crewship.sock. Authentication uses the X-Internal-Token header.
Sidecar (in container)
|
| HTTP POST to crewshipd base URL
| Header: X-Internal-Token: {auto-generated-token}
v
crewshipd (/api/v1/internal/*)
|
| Process request (assignment, keeper, query)
v
Response back to sidecar
The internal token is auto-generated at startup if not configured:
token, err := generateRandomToken(32) // 32 bytes -> 64 hex chars
Orchestrator
The Orchestrator (internal/orchestrator/orchestrator.go) manages agent lifecycle:
- Container management: Ensures crew containers exist via
GetOrCreateContainer
- Agent execution: Runs agents as
docker exec commands inside crew containers
- System prompt assembly: Builds the full prompt from preamble, persona, context, memory, and history
- Credential selection: Picks the best credential via priority and round-robin
- Activity tracking: Refreshes crew TTL on each agent run
System Prompt Assembly Order
The system prompt is built in two stages:
Stage 1 — Static prompt (internal/api/agent_config_resolver.go):
1. [CREWSHIP ETHOS] — role-specific adventure context
2. [LANGUAGE PREFERENCE] — workspace preferred language (optional)
3. [AGENT IDENTITY] — name, role, crew
4. [PERSONA] — user-defined system prompt
5. [SKILLS AVAILABLE] — injected skill playbooks
Stage 2 — Runtime context (internal/orchestrator/orchestrator.go):
6. Conversation history (60% of remaining token budget)
7. [CREW CONTEXT] (Lead) OR [PEER CONTEXT]
8. [AGENT MEMORY] (40% of remaining token budget)
9. [MEMORY INSTRUCTIONS] — how to read/write memory files
Sidecar Proxy
The sidecar (internal/sidecar/) is an HTTP proxy running inside each crew container. It provides:
Core Functions
| Function | Description |
|---|
| Credential injection | Intercepts HTTP requests and adds API keys |
| Domain allowlist | Blocks requests to non-allowed domains |
| Reverse proxy | Handles ANTHROPIC_BASE_URL=http://127.0.0.1:9119 |
| Memory API | Search, status, reindex operations |
| Assignment routing | Forward /assign, /results, /query to crewshipd |
| Keeper bridge | Forward /keeper/request, /keeper/execute |
| MCP gateway | Connect to MCP servers, proxy tool calls |
| Network policy | Enforce free/restricted network modes |
Network Modes
| Mode | Behavior |
|---|
| free (default) | Allow all outbound connections |
| restricted | Only allow connections to allowlisted domains |
Default allowed domains (from internal/sidecar/allowlist.go — each entry exists for a specific CLI):
- Anthropic —
api.anthropic.com (Claude Code API key + OAuth), console.anthropic.com (OAuth refresh callback)
- OpenAI / Codex —
api.openai.com (API key), auth.openai.com (ChatGPT-subscription login), chatgpt.com (subscription routing)
- Google / Gemini —
generativelanguage.googleapis.com (AI Studio path), oauth2.googleapis.com (OAuth), accounts.google.com (OAuth UI redirect)
- Cursor —
api.cursor.sh (auth/billing), api2.cursor.sh (model gateway, since 2026-Q1)
- Factory Droid —
api.factory.ai (legacy), app.factory.ai (CLI installer + API base)
Unknown network modes default to restricted (fail closed). Keep this list in sync with components/features/crews/crew-network-policy.tsx on the frontend.
Sidecar Endpoints (36 paths, 39 routes)
The sidecar exposes a local API on port 9119 (internal/sidecar/server.go). The path count below is 36; the 39-route total reflects three paths that register multiple HTTP verbs (/manifest GET+PATCH, /crew-connections GET+POST, /connections/{id}/files GET+POST):
| Category | Endpoints |
|---|
| Memory | /memory/search, /memory/status, /memory/reindex |
| Assignment | /assign, /results/{id}, /query, /standup, /escalate, /report-confidence |
| Missions | /mission/create, /mission/{id}, /mission/{id}/start, /mission/templates |
| Keeper | /keeper/request, /keeper/execute |
| MCP | /mcp/tools, /mcp/call, /mcp/status |
| Connections | /connections, /connections/{id}/message, /connections/{id}/messages, /connections/{id}/files, /crew-connections |
| Management | /crews, /crew/create, /agent/create, /credentials, /agent-credentials, /issue/create |
| Pipelines | /pipelines, /pipelines/{slug}, /pipelines/save, /pipelines/{slug}/run, /pipelines/{slug}/dry_run |
| Other | /manifest (GET/PATCH), /expose-port |
Keeper Request Lifecycle
The agent never holds a credential directly. Anything sensitive goes through one of two endpoints on the sidecar:
| Endpoint | What the agent gets back | When to use |
|---|
POST /keeper/request | The credential value, in cleartext | Tools that genuinely need the value (e.g. an SDK constructor that takes an API key) |
POST /keeper/execute | Stdout / stderr from a command, scrubbed | Anything that can be expressed as a shell command — preferred, because the agent never sees the value |
Decision flow
The sidecar forwards both endpoints over /tmp/crewship.sock to crewshipd, which calls into internal/keeper/gatekeeper:
- L1 fast-path — for
request only (never execute), an L1 credential plus an intent ≥ 10 non-whitespace characters AND ≥ 3 distinct non-whitespace characters is auto-allowed without an LLM round-trip. Single-character, whitespace-only, or trivial-filler intents like "aaaaaaaaaa" are rejected.
- LLM evaluation — every other request is scored by
Gatekeeper.Evaluate, which prompts a local Ollama model with the request, the agent’s recent conversation, and the credential’s SecurityLevel. The model returns a structured { decision, reason, risk } JSON object.
- Fail-closed default — if no LLM provider is configured, the Gatekeeper returns
DENY. The same applies if the LLM response cannot be parsed as JSON.
- Execute branch only — on
ALLOW, crewshipd runs the command via docker exec inside the crew container with the credential injected as an environment variable. Stdout and stderr pass through the scrubber before being returned. The credential value never crosses back into UID 1001.
Sequence diagram
Permission Scope
The trust boundary lives in the crew container, not in a per-call session token. Three components define what an agent can do, and all three are tied to the container’s lifetime:
| Component | Lifetime | Mutable at runtime? |
|---|
DomainAllowlist | Container | Yes — s.Allowlist().Add(domain) (used by the MCP gateway when it discovers a new server URL) |
CredStore | Container | Loaded once at sidecar start; an agent can only read credentials its agent_id was granted |
| Keeper decision (per request) | Single request | Not memoized — a previous ALLOW does not count toward the next |
The crew journal is the audit trail and is persisted across container restarts: every keeper decision, allowlist change, and command execution emits a typed entry scoped to (workspace_id, crew_id, agent_id, mission_id). Tearing down the container resets the in-memory components above; the journal record survives.
Threat Model
What Crewship protects against
- Agent leaking a credential into chat or logs.
/keeper/execute runs the command in crewshipd, scrubs stdout/stderr, and only the redacted result reaches UID 1001.
- Unauthorised egress in restricted network mode. The proxy refuses non-allowlisted hosts. Unknown network modes default to
restricted (fail-closed) rather than free, so a misconfigured workspace cannot accidentally open up egress.
- Credential dump from the agent process. Credentials are held in crewshipd’s process memory; UID 1001 can talk to UID 1002 over loopback but cannot read the sidecar’s process memory or files. Filesystem credentials at
/secrets/{slug}/ are mounted read-only.
- Prompt injection in tool output.
internal/lookout scans inputs and outputs; on a finding it either sanitises (default for secret-shaped output) or blocks (default for input). Every action is logged to the journal.
- Trivial fast-path bypasses. L1 auto-allow is gated to
request only — an attacker cannot smuggle a command through the L1 path because execute always pays the LLM round-trip. Empty or whitespace-only intents are rejected before they reach the LLM.
What Crewship does NOT protect against
- A compromised host. Docker is the bottom of the trust stack. If the host kernel, Docker daemon, or root account is compromised, every credential and every journal entry is exposed. Run crewshipd on a hardened host; do not expose port 8080 publicly without an authenticating reverse proxy.
- A malicious or misconfigured Keeper LLM. If the Ollama model used by the Gatekeeper has been swapped for an adversarial one,
ALLOW decisions cannot be trusted. Pin model digests, audit the model out-of-band, and treat the Keeper as a defense-in-depth layer rather than a primary control.
- Side channels in scrubbed output. The scrubber redacts known credential shapes. A sufficiently creative agent can still leak partial information through timing, exit codes, or format-string tricks. Treat
/keeper/request as a weaker contract than /keeper/execute for genuinely sensitive secrets.
- Cross-tenant isolation on shared infrastructure. A single crewshipd process serves a single workspace database at a time. Hosting multiple tenants requires separate crewshipd instances with separate
ENCRYPTION_KEYs and separate Docker daemons (or namespaces).
- Supply-chain compromise of the agent CLIs. Crewship invokes
claude, opencode, codex, and similar binaries as-is. Pin their versions in any Dockerfile you control, and review their changelogs before bumping.
CLI Adapters
Crewship supports multiple AI coding CLIs, configured per agent:
| Adapter | CLI Command | System Prompt | Notes |
|---|
CLAUDE_CODE | claude --print --output-format stream-json | Via --system-prompt flag | Supports --model, --tools, --mcp-config |
CODEX_CLI | codex --quiet | N/A | Supports --sandbox for CODING profile |
GEMINI_CLI | gemini | Via --system-instruction | Uses -p for prompt |
OPENCODE | opencode run | Via AGENTS.md file in CWD | Reads system prompt from file |
CURSOR_CLI | cursor-agent -p --output-format stream-json --stream-partial-output --force | Folded into user message (no flag) | MCP listed but not invoked in headless mode (Cursor forum #143045/#148397); SupportsMCP() returns false |
FACTORY_DROID | droid exec --auto <low|medium|high> -o stream-json | Folded into user message (no flag) | Autonomy mapped from tool_profile (MINIMAL→low, CODING→medium, FULL→high); --mission avoided (Factory issue #794) |
WebSocket Hub
Real-time updates (agent status, mission progress, chat messages) are delivered via WebSocket connections managed by the ws.Hub. The hub broadcasts events to connected clients based on workspace and crew subscriptions.
Database
Crewship uses SQLite in single-binary mode. Key design decisions:
- Driver:
modernc.org/sqlite (pure Go, no CGo)
- Migrations: Go-only in
internal/database/migrate.go (never Prisma migrate)
- Prisma: Used only for TypeScript type generation (
pnpm db:generate)
- State: Agent run states stored in BoltDB (
bbolt); a PostgreSQL state provider is on the v0.2 roadmap for high-throughput deployments
Manifest System
Each crew container maintains a manifest file at /crew/manifest.json that tracks installed packages, credentials used, and setup commands. The manifest enables container reproducibility — when a container is recreated, the manifest records what was installed.
| Endpoint | Method | Description |
|---|
/manifest | GET | Retrieve the current manifest |
/manifest | PATCH | Update the manifest (merge fields) |
The manifest tracks:
- Installed packages: apt, npm, and pip packages installed during agent sessions
- Credentials used: Which credentials were active during setup
- Setup commands: Commands run to configure the container environment
Output Processing
Agent CLI output uses the stream-JSON format (Claude Code --output-format stream-json). Each line is a JSON object with a type field indicating the content block type:
| Block Type | Description |
|---|
text | Text content from the agent |
tool_use | Agent invoking a tool (with name and input) |
tool_result | Result returned from a tool invocation |
thinking | Internal reasoning (when extended thinking is enabled) |
The final result message includes metadata:
duration_ms — total execution time in milliseconds
total_cost_usd — API cost for the run
num_turns — number of conversation turns
Credential scrubbing: All agent output passes through a Scrubber that detects and redacts credential values (API keys, tokens) before they reach WebSocket clients or logs.
Execution Wrapping
Agent CLI commands are wrapped for reliable streaming and process management:
-
Line-buffered stdout:
stdbuf -oL forces line-buffered output so JSON lines are flushed immediately for real-time streaming in the UI.
-
tmux session wrapping: When tmux is available, agent commands run inside a named tmux session (
agent-{slug}). This provides:
- Resilience against SSH disconnects
- Named sessions for debugging (
tmux attach -t agent-{slug})
- FIFO-based output streaming to the orchestrator
-
FIFO for output streaming: A named pipe is created for stdout, allowing the orchestrator to read output asynchronously while the agent runs in a tmux session.
-
Exit code tracking: The exit code is written to a file so the orchestrator can determine success/failure even when the process runs inside tmux.
If tmux setup fails, the orchestrator falls back to direct sh -c execution with stdbuf.
Background Services
Three background services run alongside the main HTTP server:
| Service | Interval | Purpose |
|---|
| StatsCollector | 5 seconds | Polls container metrics (CPU, memory, network, PIDs) via the container provider and broadcasts container.stats events over WebSocket |
| TokenSyncer | Configurable | Periodically fetches OAuth tokens from the token pool and syncs them to the credential store |
| CredentialMonitor | Configurable | Validates provider credentials (e.g., Anthropic API key status) and detects status changes (active, expired, rate-limited) |
The StatsCollector uses up to 10 concurrent workers with a 3-second timeout per container to avoid blocking the polling loop on unresponsive containers.
PR #204 introduced the Crew Journal platform — a set of packages built around a single append-only event stream that is the canonical source of truth for every observable action.
Single-event-stream design
The journal_entries table (migration 52) is the one write target. Every platform surface is either a read-model over that stream or a middleware that emits into it:
+-------------------+
| journal_entries |
| (append-only) |
+---------+---------+
|
+-------------------+---------+------------------+----------------+
| | | | |
Paymaster Watch Episodic Cartographer Quartermaster
(cost_ledger + Roster Memory (checkpoints) (eval_runs +
budget_limits (agent_ (journal_ regression
are the write status row embeddings report)
side; reads is the BLOB index)
come from the live
journal + projection)
rollups)
- Paymaster writes
cost_ledger rows and emits llm.call / cost.incurred / budget.* entries.
- Watch Roster upserts
agent_status and emits agent.status_change.
- Episodic memory selectively embeds a subset of journal entries into
journal_embeddings.
- Cartographer anchors named checkpoints at journal cursors.
- Harbor Master queues approvals; transitions emit
approval.*.
- Hooks fire on lifecycle events and emit
hook.fired / hook.blocked.
- Quartermaster derives typed trajectories and metrics from the journal.
Write-path order (LLM call stack)
The full LLM middleware stack composes as:
telemetry -> paymaster -> lookout -> raw provider
The order is load-bearing:
- Telemetry outermost so the span covers budget enforcement + guardrails + the network hop.
- Paymaster outside lookout so a blocked call still records a partial ledger row when appropriate (attempted-but-blocked audit) and a cache-eligible call doesn’t waste time on guardrails before being short-circuited.
- Lookout outside the raw provider so bad inputs never reach the LLM.
See internal/llm/middleware.go for the composition function and the rationale comment block.
Adapter pattern: orchestrator decoupling
The orchestrator package stays decoupled from internal/api to avoid an import cycle. It depends only on narrow interfaces (HookDispatcher, ApprovalGate, EpisodicRecaller, PresenceTracker) defined next to it. The concrete adapters live in internal/server/orchestrator_adapters.go:
| Adapter | Interface | Wraps |
|---|
hooksAdapter | orchestrator.HookDispatcher | hooks.Dispatch |
approvalGateAdapter | orchestrator.ApprovalGate | harbormaster.Gate (with NewEvaluatorWithDefaults) |
episodicRecallAdapter | orchestrator.EpisodicRecaller | episodic.Recall + RenderInjection |
presenceAdapter | orchestrator.PresenceTracker | presence.Upsert |
The server package is the one place that imports all of them — exactly because it owns top-level wiring. Any new feature that needs to be invoked from the orchestrator should follow the same pattern: define a narrow interface next to the orchestrator, write the adapter in server/, keep internal/api and internal/<feature> mutually unaware.
- Migration 52 added 8 tables:
journal_entries, journal_embeddings, agent_status, approvals_queue, checkpoints, hooks_config, cost_ledger, budget_limits.
- Migration 53 added
eval_runs (Quartermaster’s durable replay/regression index with status + metrics columns).
- Migration 55 added
journal_entries_fts (FTS5 mirror of summary + payload), journal_entries_archived (compaction sink with truncated payloads), memory_relations (A-Mem-style edge graph), and memory_health_snapshots (daily 5-metric scoring) — see Episodic memory.
- Migration 60 added a partial index on
journal_entries (workspace_id, trace_id) WHERE trace_id IS NOT NULL to make run aggregation GROUP BY trace_id cheap.
- Migration 61 backfilled the legacy
agent_runs table into journal entries, snapshotted the original rows to agent_runs_archive, and dropped agent_runs — journal_entries is now the single source of truth for runs.
- Migration 62 added 9 columns to
cost_ledger for billing-mode-aware accounting (billing_mode, quota_remaining_pct, quota_window, subscription_plan, four rate_*_per_m snapshot columns, cost_confidence) plus a partial index on (workspace_id, billing_mode) — see Paymaster.
A consolidated catalog of recent migrations lives in Migrations.
Runs as journal projections
Since migration 61, the legacy agent_runs row no longer exists; a “run” is a logical aggregate over journal entries that share a trace_id (which equals the run id). The lifecycle emits five typed entries — run.started, run.completed, run.failed, run.cancelled, run.timeout — at every transition; journal.ListRuns reconstructs the row shape via GROUP BY trace_id, and journal.RunStats rolls up KPIs the same way. Trace-id propagation through goroutines is explicit: handlers attach the id to context with journal.WithRunID(ctx, runID) and emitters extract it via RunIDFromContext(ctx). The noop emitter loudly errors on run.* types, so a misconfigured wiring fails immediately rather than silently dropping observability.
The /journal UI exposes this via a tab strip — Timeline | Runs | Stats — backed by the same SSE stream and the same FTS5 index (?q= does a server-side MATCH against summary + payload). Card enrichment (palette colours and lucide icons for crew/agent/mission chips) comes from a workspace-scoped lookup map at GET /api/v1/journal/lookup, fetched once and invalidated by realtime events. See Crew Journal.
Container snapshots — declared intent vs actual state
devcontainer.json is what the operator declares the container should look like. container.snapshot journal entries record what the container actually looks like after agents have finished a session running apt-get install, pip install, or npm install. The internal/containerstate package probes dpkg-query -W, pip freeze, npm ls -g --json, and /etc/os-release after every successful exec; the snapshot is hashed, and the orchestrator emits an entry only when the hash changes — so a quiet session is free. Missing probes (e.g. no pip in a Node-only image) are soft-fail and produce empty package lists rather than errors. See Devcontainers.
Journal emitter wiring
The production *journal.Writer is built once at server boot and exposed via Router.Journal(). Handlers take it through a SetJournal setter (collapses nil to noopEmitter{}) so they never nil-panic on audit emits. Call sites emit without checking:
_, _ = h.journal.Emit(ctx, journal.Entry{...})
The telemetry package’s RegisterJournalResolver() ties OTel span context into the emit path so every entry carries trace_id / span_id when telemetry is configured.