Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.crewship.ai/llms.txt

Use this file to discover all available pages before exploring further.

Crew Journal

The Crew Journal is an append-only event stream backed by the journal_entries table (migration 52). Every observable action in Crewship lands here as one row: peer conversations, mission transitions, keeper decisions, LLM calls, approvals, checkpoints, hook fires, exec/network/file events. Downstream features — Paymaster, Watch Roster, Episodic Memory, Cartographer — are read-models or middleware over this one stream. If it happened in the platform and nobody can see it in the journal, it didn’t happen.

Schema

CREATE TABLE journal_entries (
  id           TEXT PRIMARY KEY,          -- j_<16-hex>
  workspace_id TEXT NOT NULL,
  crew_id      TEXT,
  agent_id     TEXT,
  mission_id   TEXT,
  ts           TEXT NOT NULL,             -- RFC3339Nano, milli precision
  entry_type   TEXT NOT NULL,             -- see catalog below
  severity     TEXT NOT NULL DEFAULT 'info',
  priority     TEXT NOT NULL DEFAULT 'normal', -- normal|high|pin|permanent
  actor_type   TEXT NOT NULL,
  actor_id     TEXT,
  summary      TEXT NOT NULL,             -- one-line human string
  payload      TEXT NOT NULL DEFAULT '{}',-- typed JSON by entry_type
  refs         TEXT NOT NULL DEFAULT '{}',-- parent_entry_id, approval_id, ...
  trace_id     TEXT,
  span_id      TEXT,
  expires_at   TEXT
);
A contentless FTS5 virtual table journal_entries_fts (migration 55) mirrors summary and payload, with insert/update/delete triggers that keep it in sync with the base table. Entries are immutable — corrections are new entries with refs.parent_entry_id. IDs are 64-bit random hex (j_a1b2c3d4e5f60718).

Entry type catalog

Defined in internal/journal/types.go. Every type is a stable string — renames require a backfill migration.
BucketEntry types
Communicationpeer.conversation, peer.escalation, message.broadcast, agent.mentioned
Missionmission.status_change, mission.comment, assignment.created/running/completed/failed, crew.action, task.delegated
Runs (since PR #234)run.started, run.completed, run.failed, run.cancelled, run.timeout
Securitykeeper.request, keeper.decision, keeper.rule_auto_tuned, guardrail.input_blocked, guardrail.output_blocked, approval.requested/granted/denied/timeout/cancelled
Costllm.call, llm.cache_hit, cost.incurred, budget.exceeded, budget.warning
Memorymemory.updated, memory.consolidated, memory.priority_changed, summary.generated
Observabilityexec.command, exec.output_chunk, network.port_opened/closed, network.egress, file.written, container.metrics, container.snapshot
Presenceagent.status_change
Checkpointingcheckpoint.created, checkpoint.restored, fork.created
Evaleval.run_started, eval.metric, eval.regression_detected
Hookshook.fired, hook.blocked
Pipelinepipeline.run.started/completed/failed, pipeline.step.started/completed/failed/skipped/retry/validation_failed, pipeline.dry_run
Credentialscredential.auto_assign_failed, credential.auto_assign_empty
Skillsskill.imported, skill.deleted, skill.assigned, skill.unassigned
Auditaudit.entity_created/updated/deleted/restored (mirrors audit_logs)
Provisioningprovisioning.queued/building/complete/failed (crew runtime container build)
Chatchat.user_message, chat.agent_response (user↔agent turns)
Agent erroragent.error (panic recoveries + stream-handling failures)
Systemsystem.compaction, system.migration, system.hook_toggled, system.consolidation_triggered/completed

run.* — agent run lifecycle

Since PR #234 (migration 61) the legacy agent_runs table no longer exists; a “run” is reconstructed by grouping journal entries on trace_id (which equals the run id). Five typed entries cover every transition:
Entry typeWhen
run.startedOrchestrator picked up the assignment and exec’d the CLI.
run.completedCLI returned exit 0 within budget.
run.failedNon-zero exit, panic, or upstream API error.
run.cancelledOperator or hook called stop.
run.timeoutRun TTL elapsed without a terminal entry.
Trace-id propagation is explicit: emit sites attach the id to context with journal.WithRunID(ctx, runID), and downstream emitters read it back via RunIDFromContext(ctx). The noop emitter loudly errors on run.* types so a misconfigured wiring fails immediately rather than silently dropping observability. journal.ListRuns and journal.RunStats reconstruct the row shape and KPIs that used to come from agent_runs. The /journal UI’s Runs tab is a thin view over these.

container.snapshot — container actuals

Emitted by internal/containerstate after every successful agent exec. The package probes dpkg-query -W, pip freeze, npm ls -g --json, and /etc/os-release; the result is hashed; the entry is emitted only when the hash changes — so a quiet session is free. Missing probes (e.g. no pip in a Node-only image) soft-fail to empty lists. The payload structure:
{
  "os_name": "Debian GNU/Linux 12 (bookworm)",
  "packages": {
    "apt": [{"name": "curl", "version": "7.88.1-10+deb12u5"}, ...],
    "pip": [{"name": "requests", "version": "2.31.0"}, ...],
    "npm": [{"name": "typescript", "version": "5.4.5"}, ...]
  },
  "hash": "sha256:..."
}
This is what the container actually has after agents ran apt-get install / pip install / npm install during a session. Compare with declared intent in devcontainer.json. Severity is one of info, notice, warn, error. Actor types: agent, user, system, keeper, sidecar, orchestrator.

Priority markers

Priority is an operator-facing importance marker orthogonal to Severity. Severity answers “how alarming?”; Priority answers “how long do we remember it and how prominently should it surface at recall time?”.
MarkerEffect
normal (default)No special treatment.
highRecall importance floors at 0.85; subject to normal compaction.
pinSnapshot to /crew/shared/.memory/pins.md at next consolidate run; recall importance floors at 0.80.
permanentNever compacted. Extracted to learned-*.md on the next consolidator run, skipping the 10-entry threshold (extraction happens on the cadence, not the instant the marker is set). Recall importance floors at 0.95.
Markers are set via the HTTP endpoint or CLI — agents cannot mark their own outputs (keeps automation from unilaterally promoting its own memory). OWNER and ADMIN roles only.
# HTTP
curl -X POST https://host/api/v1/journal/j_abc/priority \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"priority":"permanent","reason":"FX compliance constraint"}'

# CLI
crewship journal priority j_abc --mark permanent --reason "FX compliance constraint"
The consolidator (internal/consolidate) honours permanent as a fast-path signal — a single permanent entry triggers rule extraction on the next run regardless of volume. The compactor (internal/ consolidate/compact.go) excludes permanent from the 30-day rollup via a WHERE priority != 'permanent' clause, so deliberately-pinned knowledge survives the life of the DB. Inspired by OpenClaw Auto-Dream’s ⚠️ PERMANENT / 🔥 HIGH / 📌 PIN markers — the semantics are deliberately close so a reader who knows one system can reason about the other.

Writing

The write surface is journal.Emitter:
type Emitter interface {
    Emit(ctx context.Context, e Entry) (string, error)
    Flush(ctx context.Context) error
}
The production journal.Writer is constructed once at server start and reached via Router.Journal(). Handlers emit without nil checking (noopEmitter is the default):
_, _ = h.journal.Emit(ctx, journal.Entry{
    WorkspaceID: ws, CrewID: crew, AgentID: agent,
    Type:        journal.EntryKeeperDecision,
    Severity:    journal.SeverityWarn,
    ActorType:   journal.ActorKeeper,
    Summary:     "keeper denied production SSH",
    Payload:     map[string]any{"credential_id": credID, "risk_score": 8},
})
Emit is asynchronous: entries are queued (buffer 1024), a background goroutine batches up to 64 rows or flushes every 100ms. When the queue is saturated the call falls back to a synchronous write — durability over latency. Call Flush before tearing down a long-running test to guarantee all prior emits are on disk.

Reading

HTTP

  • GET /api/v1/journal — paginated list, newest first. See the API reference.
  • GET /api/v1/journal/stream — SSE live tail; seeds with the most recent 50 entries, polls every 1s, emits event: entry frames, and heartbeats every 15s. Reconnect with Last-Event-ID to skip already-seen rows.
  • GET /api/v1/journal/count — total matching count for a filter set; ignores cursor/limit so the badge stays honest.
  • GET /api/v1/journal/{id} — single entry (scoped to workspace; cross-tenant IDs return 404).
  • POST /api/v1/journal/{id}/priority — annotate an entry with normal/high/pin/permanent. OWNER or ADMIN only; emits a memory.priority_changed audit row.

CLI

crewship journal                                 # last 50 entries
crewship journal --crew backend-team --since 24h
crewship journal --severity warn,error
crewship journal --type peer.escalation,keeper.decision --lines 100
crewship journal --mission MIS-42 --format json
crewship journal --follow                        # live tail via SSE
crewship journal get j_a1b2c3d4                  # single entry
crewship journal count --severity error --since 24h
crewship journal priority j_a1b2c3d4 --mark permanent --reason "FX rule"
See crewship journal for the full flag reference. The CLI implements live tail via SSE (--follow) with bounded reconnect backoff and Last-Event-ID resume.

Filtering

Every filter is AND-combined and indexed at the DB level:
Query paramCLI flagSQL predicate
crew_id--crewcrew_id = ?
agent_id--agentagent_id = ?
mission_id--missionmission_id = ?
trace_id--trace-idtrace_id = ? (one run’s spans)
crew_idsCSV crew_id IN (?,?,...) (takes precedence over crew_id)
agent_idsCSV agent_id IN (?,?,...) (takes precedence over agent_id)
entry_type--typeCSV entry_type IN (?,?,...)
exclude_entry_type--exclude-typeCSV entry_type NOT IN (?,?,...)
severity--severityCSV IN (?,?,...)
actor_type--actor-typeCSV IN (?,?,...)
priority--priorityCSV IN (?,?,...)
since, until--since (--until on count)ts >= ? / ts <= ?
q--query / -qFTS5 MATCH on summary + payload (see below)
cursorkeyset (ts, id) < prior page
Pagination is keyset (compound ts, id), not offset, so deep paging stays O(log n). limit is 1-500, default 100 for list, 50 for the SSE seed. Migration 55 adds a contentless FTS5 virtual table journal_entries_fts mirroring summary and payload, with insert / update / delete triggers that keep it in sync with the base table. The ?q= query parameter (CLI: --query / -q) compiles to a phrase-wrapped MATCH against this index:
crewship journal --query "OOM" --since 24h
crewship journal --query "google.com" --crew backend-team
GET /api/v1/journal?q=ratelimit&since=2026-04-29T00:00:00Z
GET /api/v1/journal/stream?q=approval         # live tail filtered by FTS
The same parameter works on the SSE stream — the seed slice and every subsequent poll apply the FTS filter, so a “watch for OOM” tab stays cheap. q is bounded in length (rejected if absurd) and merges with structural filters (crew_id, severity, …) via AND.

Lookup table (card enrichment)

GET /api/v1/journal/lookup
Returns a workspace-scoped map of crews, agents, and missions so the UI can render entry cards with palette-coloured chips and lucide icons without joining on the streaming path. The lookup payload is fetched once on page mount (the React JournalLookupProvider caches it) and invalidated by realtime events (new crew, renamed agent, …). useJournalLookup is the consumer hook; backend handler is internal/api/journal_lookup.go. The endpoint returns:
{
  "crews":    [{"id":"crw_…","slug":"backend","name":"Backend","icon":"server","color":"emerald"}, ...],
  "agents":   [{"id":"agt_…","slug":"viktor","name":"Viktor","crew_id":"crw_…","avatar":"…"}, ...],
  "missions": [{"id":"MIS-42","title":"Migrate auth","status":"in_progress","crew_id":"crw_…"}, ...]
}
Entries themselves never carry display strings — the journal stores stable IDs only. Renaming a crew updates the lookup on the next fetch; historical journal rows continue to show the new name.

Unified runs surface

Since PR #234 the standalone /runs page is folded into /journal as a preset tab. The /runs URL serves a redirect to /journal?tab=runs. The tab strip is Timeline | Runs | Stats:
  • Timeline — chronological event stream with FTS, severity, type, and crew filters.
  • Runs — KPI tiles (total / completed / failed / cancelled / running) plus a table grouped by trace_id. Same data the old /runs page rendered, now sourced via journal.ListRuns.
  • Stats — workspace KPIs (calls/day, top entry types, top error types).
All three share the same SSE stream and FTS index — selecting a tab does not retrigger a fetch, only a client-side filter shift. Audit-tab navigation is dropped (it was always a redirect placeholder); the /audit page remains in the sidebar for security review.

Tenancy

Workspace isolation is enforced at the store level (journal.List/Get/Count take a workspace filter and refuse to run without one). The handler additionally pulls workspace_id from the session context — there is no way for a caller to pass a foreign workspace id through a query parameter. Cross-tenant existence is never leaked: unknown IDs return 404 with the same shape as “not in your workspace”. The shared crewBelongsToWorkspace / missionBelongsToWorkspace helpers (defined in internal/api/paymaster_handler.go and reused across every read handler) enforce the same contract — see the Paymaster API reference for the endpoints that exercise them.

Retention

  • expires_at is an optional TTL; compaction skips rows past it.
  • The daily Compactor (see Consolidate) rolls up info/notice rows older than 30 days into one system.compaction entry and deletes the originals. warn/error rows are kept indefinitely.
  • exec.output_chunk, container.metrics, and network.* are never embedded into Episodic memory because they would drown the signal.

Gotchas

  • TS precision. Writes serialise as 2006-01-02T15:04:05.000Z (milli). Reads also accept RFC3339Nano and second-precision strings so backfilled rows don’t fail to parse.
  • Empty trace/span. The trace_id/span_id columns are populated by the tracing package’s SetTraceResolver. If OpenTelemetry is not initialised the columns stay NULL — this is fine, not a bug.
  • Do not rename entry types. A rename breaks every existing row. Add a new type and dual-write during the transition instead.