Documentation Index
Fetch the complete documentation index at: https://docs.crewship.ai/llms.txt
Use this file to discover all available pages before exploring further.
Crew Journal
The Crew Journal is an append-only event stream backed by the journal_entries table (migration 52). Every observable action in Crewship lands here as one row: peer conversations, mission transitions, keeper decisions, LLM calls, approvals, checkpoints, hook fires, exec/network/file events.
Downstream features — Paymaster, Watch Roster, Episodic Memory, Cartographer — are read-models or middleware over this one stream. If it happened in the platform and nobody can see it in the journal, it didn’t happen.
Schema
CREATE TABLE journal_entries (
id TEXT PRIMARY KEY, -- j_<16-hex>
workspace_id TEXT NOT NULL,
crew_id TEXT,
agent_id TEXT,
mission_id TEXT,
ts TEXT NOT NULL, -- RFC3339Nano, milli precision
entry_type TEXT NOT NULL, -- see catalog below
severity TEXT NOT NULL DEFAULT 'info',
priority TEXT NOT NULL DEFAULT 'normal', -- normal|high|pin|permanent
actor_type TEXT NOT NULL,
actor_id TEXT,
summary TEXT NOT NULL, -- one-line human string
payload TEXT NOT NULL DEFAULT '{}',-- typed JSON by entry_type
refs TEXT NOT NULL DEFAULT '{}',-- parent_entry_id, approval_id, ...
trace_id TEXT,
span_id TEXT,
expires_at TEXT
);
A contentless FTS5 virtual table journal_entries_fts (migration 55) mirrors summary and payload, with insert/update/delete triggers that keep it in sync with the base table.
Entries are immutable — corrections are new entries with refs.parent_entry_id. IDs are 64-bit random hex (j_a1b2c3d4e5f60718).
Entry type catalog
Defined in internal/journal/types.go. Every type is a stable string — renames require a backfill migration.
| Bucket | Entry types |
|---|
| Communication | peer.conversation, peer.escalation, message.broadcast, agent.mentioned |
| Mission | mission.status_change, mission.comment, assignment.created/running/completed/failed, crew.action, task.delegated |
| Runs (since PR #234) | run.started, run.completed, run.failed, run.cancelled, run.timeout |
| Security | keeper.request, keeper.decision, keeper.rule_auto_tuned, guardrail.input_blocked, guardrail.output_blocked, approval.requested/granted/denied/timeout/cancelled |
| Cost | llm.call, llm.cache_hit, cost.incurred, budget.exceeded, budget.warning |
| Memory | memory.updated, memory.consolidated, memory.priority_changed, summary.generated |
| Observability | exec.command, exec.output_chunk, network.port_opened/closed, network.egress, file.written, container.metrics, container.snapshot |
| Presence | agent.status_change |
| Checkpointing | checkpoint.created, checkpoint.restored, fork.created |
| Eval | eval.run_started, eval.metric, eval.regression_detected |
| Hooks | hook.fired, hook.blocked |
| Pipeline | pipeline.run.started/completed/failed, pipeline.step.started/completed/failed/skipped/retry/validation_failed, pipeline.dry_run |
| Credentials | credential.auto_assign_failed, credential.auto_assign_empty |
| Skills | skill.imported, skill.deleted, skill.assigned, skill.unassigned |
| Audit | audit.entity_created/updated/deleted/restored (mirrors audit_logs) |
| Provisioning | provisioning.queued/building/complete/failed (crew runtime container build) |
| Chat | chat.user_message, chat.agent_response (user↔agent turns) |
| Agent error | agent.error (panic recoveries + stream-handling failures) |
| System | system.compaction, system.migration, system.hook_toggled, system.consolidation_triggered/completed |
run.* — agent run lifecycle
Since PR #234 (migration 61) the legacy agent_runs table no longer exists; a “run” is reconstructed by grouping journal entries on trace_id (which equals the run id). Five typed entries cover every transition:
| Entry type | When |
|---|
run.started | Orchestrator picked up the assignment and exec’d the CLI. |
run.completed | CLI returned exit 0 within budget. |
run.failed | Non-zero exit, panic, or upstream API error. |
run.cancelled | Operator or hook called stop. |
run.timeout | Run TTL elapsed without a terminal entry. |
Trace-id propagation is explicit: emit sites attach the id to context with journal.WithRunID(ctx, runID), and downstream emitters read it back via RunIDFromContext(ctx). The noop emitter loudly errors on run.* types so a misconfigured wiring fails immediately rather than silently dropping observability.
journal.ListRuns and journal.RunStats reconstruct the row shape and KPIs that used to come from agent_runs. The /journal UI’s Runs tab is a thin view over these.
container.snapshot — container actuals
Emitted by internal/containerstate after every successful agent exec. The package probes dpkg-query -W, pip freeze, npm ls -g --json, and /etc/os-release; the result is hashed; the entry is emitted only when the hash changes — so a quiet session is free. Missing probes (e.g. no pip in a Node-only image) soft-fail to empty lists. The payload structure:
{
"os_name": "Debian GNU/Linux 12 (bookworm)",
"packages": {
"apt": [{"name": "curl", "version": "7.88.1-10+deb12u5"}, ...],
"pip": [{"name": "requests", "version": "2.31.0"}, ...],
"npm": [{"name": "typescript", "version": "5.4.5"}, ...]
},
"hash": "sha256:..."
}
This is what the container actually has after agents ran apt-get install / pip install / npm install during a session. Compare with declared intent in devcontainer.json.
Severity is one of info, notice, warn, error. Actor types: agent, user, system, keeper, sidecar, orchestrator.
Priority markers
Priority is an operator-facing importance marker orthogonal to Severity.
Severity answers “how alarming?”; Priority answers “how long do we
remember it and how prominently should it surface at recall time?”.
| Marker | Effect |
|---|
normal (default) | No special treatment. |
high | Recall importance floors at 0.85; subject to normal compaction. |
pin | Snapshot to /crew/shared/.memory/pins.md at next consolidate run; recall importance floors at 0.80. |
permanent | Never compacted. Extracted to learned-*.md on the next consolidator run, skipping the 10-entry threshold (extraction happens on the cadence, not the instant the marker is set). Recall importance floors at 0.95. |
Markers are set via the HTTP endpoint or CLI — agents cannot mark their
own outputs (keeps automation from unilaterally promoting its own
memory). OWNER and ADMIN roles only.
# HTTP
curl -X POST https://host/api/v1/journal/j_abc/priority \
-H "Authorization: Bearer $TOKEN" \
-d '{"priority":"permanent","reason":"FX compliance constraint"}'
# CLI
crewship journal priority j_abc --mark permanent --reason "FX compliance constraint"
The consolidator (internal/consolidate) honours permanent as a
fast-path signal — a single permanent entry triggers rule extraction
on the next run regardless of volume. The compactor (internal/ consolidate/compact.go) excludes permanent from the 30-day rollup
via a WHERE priority != 'permanent' clause, so deliberately-pinned
knowledge survives the life of the DB.
Inspired by OpenClaw Auto-Dream’s ⚠️ PERMANENT / 🔥 HIGH / 📌 PIN
markers — the semantics are deliberately close so a reader who knows
one system can reason about the other.
Writing
The write surface is journal.Emitter:
type Emitter interface {
Emit(ctx context.Context, e Entry) (string, error)
Flush(ctx context.Context) error
}
The production journal.Writer is constructed once at server start and reached via Router.Journal(). Handlers emit without nil checking (noopEmitter is the default):
_, _ = h.journal.Emit(ctx, journal.Entry{
WorkspaceID: ws, CrewID: crew, AgentID: agent,
Type: journal.EntryKeeperDecision,
Severity: journal.SeverityWarn,
ActorType: journal.ActorKeeper,
Summary: "keeper denied production SSH",
Payload: map[string]any{"credential_id": credID, "risk_score": 8},
})
Emit is asynchronous: entries are queued (buffer 1024), a background goroutine batches up to 64 rows or flushes every 100ms. When the queue is saturated the call falls back to a synchronous write — durability over latency. Call Flush before tearing down a long-running test to guarantee all prior emits are on disk.
Reading
HTTP
GET /api/v1/journal — paginated list, newest first. See the API reference.
GET /api/v1/journal/stream — SSE live tail; seeds with the most recent 50 entries, polls every 1s, emits event: entry frames, and heartbeats every 15s. Reconnect with Last-Event-ID to skip already-seen rows.
GET /api/v1/journal/count — total matching count for a filter set; ignores cursor/limit so the badge stays honest.
GET /api/v1/journal/{id} — single entry (scoped to workspace; cross-tenant IDs return 404).
POST /api/v1/journal/{id}/priority — annotate an entry with normal/high/pin/permanent. OWNER or ADMIN only; emits a memory.priority_changed audit row.
CLI
crewship journal # last 50 entries
crewship journal --crew backend-team --since 24h
crewship journal --severity warn,error
crewship journal --type peer.escalation,keeper.decision --lines 100
crewship journal --mission MIS-42 --format json
crewship journal --follow # live tail via SSE
crewship journal get j_a1b2c3d4 # single entry
crewship journal count --severity error --since 24h
crewship journal priority j_a1b2c3d4 --mark permanent --reason "FX rule"
See crewship journal for the full flag reference. The CLI implements live tail via SSE (--follow) with bounded reconnect backoff and Last-Event-ID resume.
Filtering
Every filter is AND-combined and indexed at the DB level:
| Query param | CLI flag | SQL predicate |
|---|
crew_id | --crew | crew_id = ? |
agent_id | --agent | agent_id = ? |
mission_id | --mission | mission_id = ? |
trace_id | --trace-id | trace_id = ? (one run’s spans) |
crew_ids | — | CSV crew_id IN (?,?,...) (takes precedence over crew_id) |
agent_ids | — | CSV agent_id IN (?,?,...) (takes precedence over agent_id) |
entry_type | --type | CSV entry_type IN (?,?,...) |
exclude_entry_type | --exclude-type | CSV entry_type NOT IN (?,?,...) |
severity | --severity | CSV IN (?,?,...) |
actor_type | --actor-type | CSV IN (?,?,...) |
priority | --priority | CSV IN (?,?,...) |
since, until | --since (--until on count) | ts >= ? / ts <= ? |
q | --query / -q | FTS5 MATCH on summary + payload (see below) |
cursor | — | keyset (ts, id) < prior page |
Pagination is keyset (compound ts, id), not offset, so deep paging stays O(log n). limit is 1-500, default 100 for list, 50 for the SSE seed.
Full-text search
Migration 55 adds a contentless FTS5 virtual table journal_entries_fts mirroring summary and payload, with insert / update / delete triggers that keep it in sync with the base table. The ?q= query parameter (CLI: --query / -q) compiles to a phrase-wrapped MATCH against this index:
crewship journal --query "OOM" --since 24h
crewship journal --query "google.com" --crew backend-team
GET /api/v1/journal?q=ratelimit&since=2026-04-29T00:00:00Z
GET /api/v1/journal/stream?q=approval # live tail filtered by FTS
The same parameter works on the SSE stream — the seed slice and every subsequent poll apply the FTS filter, so a “watch for OOM” tab stays cheap. q is bounded in length (rejected if absurd) and merges with structural filters (crew_id, severity, …) via AND.
Lookup table (card enrichment)
GET /api/v1/journal/lookup
Returns a workspace-scoped map of crews, agents, and missions so the UI can render entry cards with palette-coloured chips and lucide icons without joining on the streaming path. The lookup payload is fetched once on page mount (the React JournalLookupProvider caches it) and invalidated by realtime events (new crew, renamed agent, …). useJournalLookup is the consumer hook; backend handler is internal/api/journal_lookup.go.
The endpoint returns:
{
"crews": [{"id":"crw_…","slug":"backend","name":"Backend","icon":"server","color":"emerald"}, ...],
"agents": [{"id":"agt_…","slug":"viktor","name":"Viktor","crew_id":"crw_…","avatar":"…"}, ...],
"missions": [{"id":"MIS-42","title":"Migrate auth","status":"in_progress","crew_id":"crw_…"}, ...]
}
Entries themselves never carry display strings — the journal stores stable IDs only. Renaming a crew updates the lookup on the next fetch; historical journal rows continue to show the new name.
Unified runs surface
Since PR #234 the standalone /runs page is folded into /journal as a preset tab. The /runs URL serves a redirect to /journal?tab=runs. The tab strip is Timeline | Runs | Stats:
- Timeline — chronological event stream with FTS, severity, type, and crew filters.
- Runs — KPI tiles (total / completed / failed / cancelled / running) plus a table grouped by
trace_id. Same data the old /runs page rendered, now sourced via journal.ListRuns.
- Stats — workspace KPIs (calls/day, top entry types, top error types).
All three share the same SSE stream and FTS index — selecting a tab does not retrigger a fetch, only a client-side filter shift. Audit-tab navigation is dropped (it was always a redirect placeholder); the /audit page remains in the sidebar for security review.
Tenancy
Workspace isolation is enforced at the store level (journal.List/Get/Count take a workspace filter and refuse to run without one). The handler additionally pulls workspace_id from the session context — there is no way for a caller to pass a foreign workspace id through a query parameter.
Cross-tenant existence is never leaked: unknown IDs return 404 with the same shape as “not in your workspace”. The shared crewBelongsToWorkspace / missionBelongsToWorkspace helpers (defined in internal/api/paymaster_handler.go and reused across every read handler) enforce the same contract — see the Paymaster API reference for the endpoints that exercise them.
Retention
expires_at is an optional TTL; compaction skips rows past it.
- The daily Compactor (see Consolidate) rolls up
info/notice rows older than 30 days into one system.compaction entry and deletes the originals. warn/error rows are kept indefinitely.
exec.output_chunk, container.metrics, and network.* are never embedded into Episodic memory because they would drown the signal.
Gotchas
- TS precision. Writes serialise as
2006-01-02T15:04:05.000Z (milli). Reads also accept RFC3339Nano and second-precision strings so backfilled rows don’t fail to parse.
- Empty trace/span. The
trace_id/span_id columns are populated by the tracing package’s SetTraceResolver. If OpenTelemetry is not initialised the columns stay NULL — this is fine, not a bug.
- Do not rename entry types. A rename breaks every existing row. Add a new type and dual-write during the transition instead.