Crew Journal

The Crew Journal is an append-only event stream backed by the journal_entries table (migration 52). Every observable action in Crewship lands here as one row: peer conversations, mission transitions, keeper decisions, LLM calls, approvals, checkpoints, hook fires, exec/network/file events. Downstream features — Paymaster, Watch Roster, Episodic Memory, Cartographer — are read-models or middleware over this one stream. If it happened in the platform and nobody can see it in the journal, it didn’t happen.

Schema

CREATE TABLE journal_entries (
  id           TEXT PRIMARY KEY,          -- j_<16-hex>
  workspace_id TEXT NOT NULL,
  crew_id      TEXT,
  agent_id     TEXT,
  mission_id   TEXT,
  ts           TEXT NOT NULL,             -- RFC3339Nano, milli precision
  entry_type   TEXT NOT NULL,             -- see catalog below
  severity     TEXT NOT NULL DEFAULT 'info',
  priority     TEXT NOT NULL DEFAULT 'normal', -- normal|high|pin|permanent
  actor_type   TEXT NOT NULL,
  actor_id     TEXT,
  summary      TEXT NOT NULL,             -- one-line human string
  payload      TEXT NOT NULL DEFAULT '{}',-- typed JSON by entry_type
  refs         TEXT NOT NULL DEFAULT '{}',-- parent_entry_id, approval_id, ...
  trace_id     TEXT,
  span_id      TEXT,
  expires_at   TEXT
);

A contentless FTS5 virtual table journal_entries_fts (migration 55) mirrors summary and payload, with insert/update/delete triggers that keep it in sync with the base table. Entries are immutable — corrections are new entries with refs.parent_entry_id. IDs are 64-bit random hex (j_a1b2c3d4e5f60718).

Entry type catalog

Defined in internal/journal/types.go. Every type is a stable string — renames require a backfill migration.

Full entry-type catalog by bucket

Bucket	Entry types
Communication	`peer.conversation`, `peer.escalation`, `message.broadcast`, `agent.mentioned`
Mission	`mission.status_change`, `mission.comment`, `assignment.created/running/completed/failed`, `crew.action`, `task.delegated`
Runs (since PR #234)	`run.started`, `run.completed`, `run.failed`, `run.cancelled`, `run.timeout`
Security	`keeper.request`, `keeper.decision`, `keeper.rule_auto_tuned`, `guardrail.input_blocked`, `guardrail.output_blocked`, `approval.requested/granted/denied/timeout/cancelled`
Cost	`llm.call`, `llm.cache_hit`, `cost.incurred`, `budget.exceeded`, `budget.warning`
Memory	`memory.updated`, `memory.consolidated`, `memory.priority_changed`, `summary.generated`
Observability	`exec.command`, `exec.output_chunk`, `network.port_opened/closed`, `network.egress`, `file.written`, `container.metrics`, `container.snapshot`
Presence	`agent.status_change`
Checkpointing	`checkpoint.created`, `checkpoint.restored`, `fork.created`
Eval	`eval.run_started`, `eval.metric`, `eval.regression_detected`
Hooks	`hook.fired`, `hook.blocked`
Pipeline	`pipeline.run.started/completed/failed`, `pipeline.step.started/completed/failed/validation_failed`, `pipeline.dry_run`
Credentials	`credential.auto_assign_failed`, `credential.auto_assign_empty`
Skills	`skill.imported`, `skill.deleted`, `skill.assigned`, `skill.unassigned`, `skill.invoked`
Audit	`audit.entity_created/updated/deleted/restored` (mirrors `audit_logs`)
Provisioning	`provisioning.queued/building/complete/failed` (crew runtime container build)
Chat	`chat.user_message`, `chat.agent_response` (user↔agent turns)
Agent error	`agent.error` (panic recoveries + stream-handling failures)
System	`system.compaction`, `system.migration`, `system.hook_toggled`, `system.consolidation_triggered/completed`

`run.*` — agent run lifecycle

Since PR #234 (migration 61) the legacy agent_runs table no longer exists; a “run” is reconstructed by grouping journal entries on trace_id (which equals the run id). Five typed entries cover every transition:

Entry type	When
`run.started`	Orchestrator picked up the assignment and exec’d the CLI.
`run.completed`	CLI returned exit 0 within budget.
`run.failed`	Non-zero exit, panic, or upstream API error.
`run.cancelled`	Operator or hook called stop.
`run.timeout`	Run TTL elapsed without a terminal entry.

Trace-id propagation is explicit: emit sites attach the id to context with journal.WithRunID(ctx, runID), and downstream emitters read it back via RunIDFromContext(ctx). The noop emitter loudly errors on run.* types so a misconfigured wiring fails immediately rather than silently dropping observability. journal.ListRuns, journal.RunStats and journal.RunInsights reconstruct the row shape, KPIs and windowed operations aggregate that used to come from agent_runs. The /journal UI’s Runs tab (a fleet operations overview — see below) reads all three.

`container.snapshot` — container actuals

Emitted by internal/containerstate after every successful agent exec. The package probes dpkg-query -W, pip freeze, npm ls -g --json, and /etc/os-release; the result is hashed; the entry is emitted only when the hash changes — so a quiet session is free. Missing probes (e.g. no pip in a Node-only image) soft-fail to empty lists. The payload structure:

{
  "os_name": "Debian GNU/Linux 12 (bookworm)",
  "packages": {
    "apt": [{"name": "curl", "version": "7.88.1-10+deb12u5"}, ...],
    "pip": [{"name": "requests", "version": "2.31.0"}, ...],
    "npm": [{"name": "typescript", "version": "5.4.5"}, ...]
  },
  "hash": "sha256:..."
}

This is what the container actually has after agents ran apt-get install / pip install / npm install during a session. Compare with declared intent in devcontainer.json. Severity is one of info, notice, warn, error. Actor types: agent, user, system, keeper, sidecar, orchestrator.

Priority markers

Priority is an operator-facing importance marker orthogonal to Severity. Severity answers “how alarming?”; Priority answers “how long do we remember it and how prominently should it surface at recall time?”.

Marker	Effect
`normal` (default)	No special treatment.
`high`	Recall importance floors at 0.85; subject to normal compaction.
`pin`	Snapshot to `/crew/shared/.memory/pins.md` at next consolidate run; recall importance floors at 0.80.
`permanent`	Never compacted. Extracted to `learned-*.md` on the next consolidator run, skipping the 10-entry threshold (extraction happens on the cadence, not the instant the marker is set). Recall importance floors at 0.95.

Markers are set via the HTTP endpoint or CLI — agents cannot mark their own outputs (keeps automation from unilaterally promoting its own memory). OWNER and ADMIN roles only.

# HTTP
curl -X POST https://host/api/v1/journal/j_abc/priority \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"priority":"permanent","reason":"FX compliance constraint"}'

# CLI
crewship journal priority j_abc --mark permanent --reason "FX compliance constraint"

The consolidator (internal/consolidate) honours permanent as a fast-path signal — a single permanent entry triggers rule extraction on the next run regardless of volume. The compactor (internal/ consolidate/compact.go) excludes permanent from the 30-day rollup via a WHERE priority != 'permanent' clause, so deliberately-pinned knowledge survives the life of the DB. The three-level priority scheme (permanent / high / pin) is deliberately small so operator intent is unambiguous and the consolidator can act on a single field.

Writing

The write surface is journal.Emitter:

type Emitter interface {
    Emit(ctx context.Context, e Entry) (string, error)
    Flush(ctx context.Context) error
}

The production journal.Writer is constructed once at server start and reached via Router.Journal(). Handlers emit without nil checking (noopEmitter is the default):

_, _ = h.journal.Emit(ctx, journal.Entry{
    WorkspaceID: ws, CrewID: crew, AgentID: agent,
    Type:        journal.EntryKeeperDecision,
    Severity:    journal.SeverityWarn,
    ActorType:   journal.ActorKeeper,
    Summary:     "keeper denied production SSH",
    Payload:     map[string]any{"credential_id": credID, "risk_score": 8},
})

Emit is asynchronous: entries are queued (buffer 1024), a background goroutine batches up to 64 rows or flushes every 100ms. When the queue is saturated the call falls back to a synchronous write — durability over latency. Call Flush before tearing down a long-running test to guarantee all prior emits are on disk.

Reading

HTTP

GET /api/v1/journal — paginated list, newest first. See the API reference.
GET /api/v1/journal/stream — SSE live tail; seeds with the most recent 50 entries, polls every 1s, emits event: entry frames, and heartbeats every 15s. Reconnect with Last-Event-ID to skip already-seen rows.
GET /api/v1/journal/count — total matching count for a filter set; ignores cursor/limit so the badge stays honest.
GET /api/v1/journal/{id} — single entry (scoped to workspace; cross-tenant IDs return 404).
POST /api/v1/journal/{id}/priority — annotate an entry with normal/high/pin/permanent. OWNER or ADMIN only; emits a memory.priority_changed audit row.

CLI

crewship journal                                 # last 50 entries
crewship journal --crew backend-team --since 24h
crewship journal --severity warn,error
crewship journal --type peer.escalation,keeper.decision --lines 100
crewship journal --mission MIS-42 --format json
crewship journal --follow                        # live tail via SSE
crewship journal get j_a1b2c3d4                  # single entry
crewship journal count --severity error --since 24h
crewship journal priority j_a1b2c3d4 --mark permanent --reason "FX rule"

See crewship journal for the full flag reference. The CLI implements live tail via SSE (--follow) with bounded reconnect backoff and Last-Event-ID resume.

Filtering

Every filter is AND-combined and indexed at the DB level:

Query param	CLI flag	SQL predicate
`crew_id`	`--crew`	`crew_id = ?`
`agent_id`	`--agent`	`agent_id = ?`
`mission_id`	`--mission`	`mission_id = ?`
`trace_id`	`--trace-id`	`trace_id = ?` (one run’s spans)
`crew_ids`	—	CSV `crew_id IN (?,?,...)` (takes precedence over `crew_id`)
`agent_ids`	—	CSV `agent_id IN (?,?,...)` (takes precedence over `agent_id`)
`entry_type`	`--type`	CSV `entry_type IN (?,?,...)`
`exclude_entry_type`	`--exclude-type`	CSV `entry_type NOT IN (?,?,...)`
`severity`	`--severity`	CSV `IN (?,?,...)`
`actor_type`	`--actor-type`	CSV `IN (?,?,...)`
`priority`	`--priority`	CSV `IN (?,?,...)`
`since`, `until`	`--since` (`--until` on `count`)	`ts >= ?` / `ts <= ?`
`q`	`--query` / `-q`	FTS5 MATCH on `summary` + `payload` (see below)
`cursor`	—	keyset `(ts, id) <` prior page

Pagination is keyset (compound ts, id), not offset, so deep paging stays O(log n). limit is 1-500, default 100 for list, 50 for the SSE seed.

Full-text search

Migration 55 adds a contentless FTS5 virtual table journal_entries_fts mirroring summary and payload, with insert / update / delete triggers that keep it in sync with the base table. The ?q= query parameter (CLI: --query / -q) compiles to a phrase-wrapped MATCH against this index:

crewship journal --query "OOM" --since 24h
crewship journal --query "google.com" --crew backend-team

GET /api/v1/journal?q=ratelimit&since=2026-04-29T00:00:00Z
GET /api/v1/journal/stream?q=approval         # live tail filtered by FTS

The same parameter works on the SSE stream — the seed slice and every subsequent poll apply the FTS filter, so a “watch for OOM” tab stays cheap. q is bounded in length (rejected if absurd) and merges with structural filters (crew_id, severity, …) via AND.

Lookup table (card enrichment)

GET /api/v1/journal/lookup

Returns a workspace-scoped map of crews, agents, and missions so the UI can render entry cards with palette-coloured chips and lucide icons without joining on the streaming path. The lookup payload is fetched once on page mount (the React JournalLookupProvider caches it) and invalidated by realtime events (new crew, renamed agent, …). useJournalLookup is the consumer hook; backend handler is internal/api/journal_lookup.go. The endpoint returns:

{
  "crews":    [{"id":"crw_…","slug":"backend","name":"Backend","icon":"server","color":"emerald"}, ...],
  "agents":   [{"id":"agt_…","slug":"viktor","name":"Viktor","crew_id":"crw_…","avatar":"…"}, ...],
  "missions": [{"id":"MIS-42","title":"Migrate auth","status":"in_progress","crew_id":"crw_…"}, ...]
}

Entries themselves never carry display strings — the journal stores stable IDs only. Renaming a crew updates the lookup on the next fetch; historical journal rows continue to show the new name.

Unified runs surface

Since PR #234 the standalone /runs page is folded into /journal as a preset tab. The /runs URL serves a redirect to /journal?tab=runs. The tab strip is Timeline | Runs | Stats:

Timeline — chronological event stream with FTS, severity, type, and crew filters.
Runs — a fleet operations overview (not just a list). Because it reads the run superset — every run in the workspace, including ad-hoc agent/chat/delegation runs that never touch a routine — it surfaces breakdowns the routine-scoped Routines → Insights view structurally can’t. See below.
Stats — workspace KPIs (calls/day, top entry types, top error types).

All three share the same SSE stream and FTS index — selecting a tab does not retrigger a fetch, only a client-side filter shift. Audit-tab navigation is dropped (it was always a redirect placeholder); the /audit page remains in the sidebar for security review.

Runs — fleet operations overview

The Runs tab has four sections. The KPI row and breakdowns (2-3) are scoped by a 24h / 7d / 30d window selector; the live pulse and recent-runs table (1, 4) are not:

Live pulse — every currently-running execution across the fleet, with live-ticking elapsed time. Sourced from /api/v1/runs?status=RUNNING.
KPI row — total runs in the window with an outcome split-bar (succeeded vs failed), success rate, failure count, and median / p95 duration.
Breakdowns — by trigger (schedule / agent / user / webhook / system), top crews (volume + fail rate), and by model (e.g. Opus vs Sonnet — the resolved model recorded on each run).
Recent runs table — a filterable list (status + trigger); each row deep-links to that run’s trace in the Timeline. Includes the resolved Model column.

Sections 2 and 3 are backed by a dedicated aggregate:

GET /api/v1/runs/insights?window=24h    # 24h | 7d | 30d

Response:

{ window, totals{total,succeeded,failed,running}, duration{p50_ms,p95_ms}, by_trigger[], by_model[], by_crew[], top_agents[], truncated }

. It reconstructs runs by grouping journal_entries on trace_id over the window (journal.RunInsights), folds the outcome / duration / breakdown counters in Go, and the API layer resolves agent_id → crew + display names. Aggregation is bounded to the most-recent maxInsightRows runs in the window; when that cap is hit, truncated is true and the UI says so rather than presenting a partial total as complete. CLI parity (drive it the same way an agent would):

crewship run insights                # last 24h
crewship run insights --window 7d
crewship run insights -o json        # machine-readable for scripting

Tenancy

Workspace isolation is enforced at the store level (journal.List/Get/Count take a workspace filter and refuse to run without one). The handler additionally pulls workspace_id from the session context — there is no way for a caller to pass a foreign workspace id through a query parameter. Cross-tenant existence is never leaked: unknown IDs return 404 with the same shape as “not in your workspace”. The shared crewBelongsToWorkspace / missionBelongsToWorkspace helpers (defined in internal/api/paymaster_handler.go and reused across every read handler) enforce the same contract — see the Paymaster API reference for the endpoints that exercise them.

Retention

expires_at is an optional TTL; compaction skips rows past it.
The daily Compactor (see Consolidate) rolls up info/notice rows older than 30 days into one system.compaction entry and deletes the originals. warn/error rows are kept indefinitely.
exec.output_chunk, container.metrics, and network.* are never embedded into Episodic memory because they would drown the signal.

Gotchas

TS precision. Writes serialise as 2006-01-02T15:04:05.000Z (milli). Reads also accept RFC3339Nano and second-precision strings so backfilled rows don’t fail to parse.
Empty trace/span. The trace_id/span_id columns are populated by the tracing package’s SetTraceResolver. If OpenTelemetry is not initialised the columns stay NULL — this is fine, not a bug.

Do not rename entry types. A rename breaks every existing row. Add a new type and dual-write during the transition instead.

Paymaster — reads llm.call and cost.incurred.
Watch Roster — emits agent.status_change on transitions.
Cartographer — anchors checkpoints to the journal cursor.
Episodic memory — selectively embeds high-signal entry types.
crewship journal CLI, Journal API reference.

​Crew Journal

​Schema

​Entry type catalog

​run.* — agent run lifecycle

​container.snapshot — container actuals

​Priority markers

​Writing

​Reading

​HTTP

​CLI

​Filtering

​Full-text search

​Lookup table (card enrichment)

​Unified runs surface

​Runs — fleet operations overview

​Tenancy

​Retention

​Gotchas

​Related

Crew Journal

Schema

Entry type catalog

`run.*` — agent run lifecycle

`container.snapshot` — container actuals

Priority markers

Writing

Reading

HTTP

CLI

Filtering

Full-text search

Lookup table (card enrichment)

Unified runs surface

Runs — fleet operations overview

Tenancy

Retention

Gotchas

Related