Documentation Index
Fetch the complete documentation index at: https://docs.crewship.ai/llms.txt
Use this file to discover all available pages before exploring further.
Episodic Memory
Roadmap (v0.2). The episodic package ships in v0.1 but the embed-on-write hook into the journal writer is not yet wired, so journal entries are not indexed automatically. Manual indexing via the API works today for evaluation; vector recall queries only return results for entries that were indexed manually until the auto-wire ships in v0.2.
Episodic is a narrow vector-recall layer over the Crew Journal. It embeds a selective subset of journal entries and serves top-K similarity queries to agents planning new work.
The “selective” part is load-bearing. Per 2025-2026 multi-agent memory research, indexing every event causes catastrophic drift: high-volume low-signal types (exec.output_chunk, container.metrics, network.*) drown the embedding space and dilute recall. Episodic refuses to embed those types and ingests only escalations, summaries, terminal mission status, denied keeper calls, eval regressions, and operator-tagged entries.
Embedding criteria
// internal/episodic/types.go
func shouldEmbed(entryType string, severity string) bool {
switch entryType {
case "peer.escalation",
"summary.generated",
"memory.consolidated",
"approval.denied",
"eval.regression_detected":
return true
case "keeper.decision":
return severity == "warn" || severity == "error"
case "mission.status_change":
return severity == "warn" || severity == "error"
}
return false
}
| Entry type | When embedded |
|---|
peer.escalation | Always |
summary.generated | Always |
memory.consolidated | Always |
approval.denied | Always |
eval.regression_detected | Always |
keeper.decision | Only at severity warn/error (denied or risky) |
mission.status_change | Only at severity warn/error (failed or problematic) |
peer.conversation | Listed in EmbeddableEntryTypes but shouldEmbed always returns false — plain Q&A is too high-volume to embed. The indexer is expected to apply its own escalation-aware filter and force it via the override below when the question ends in an escalation. |
exec.output_chunk, container.metrics, network.* | Never |
To force an embedding on an otherwise-filtered entry, tag it with refs.episodic = true in the journal payload. The indexer checks this flag before applying the type/severity filter.
Storage
journal_embeddings (added in migration 52 alongside journal_entries):
CREATE TABLE journal_embeddings (
entry_id TEXT PRIMARY KEY REFERENCES journal_entries(id) ON DELETE CASCADE,
vector BLOB NOT NULL, -- raw float32 bytes
dim INTEGER NOT NULL,
model TEXT NOT NULL, -- "nomic-embed-text", etc.
indexed_at TEXT NOT NULL,
importance_score REAL DEFAULT 0.5, -- fed into recall sort (see "Importance scoring")
reference_count INTEGER DEFAULT 0, -- bumped when a hit lands in a prompt
last_referenced_at TEXT -- stamped at the same time
);
Vectors are stored as little-endian float32 BLOBs — native Go slice serialisation. SQLite has no pgvector, so recall is a brute-force cosine scan over the scope-filtered rows. For the expected scale (~1% of entries embedded, low thousands per agent) the scan finishes in low milliseconds. If this outgrows SQLite the right move is an external vector store, not a SQLite extension — so storage sits behind an interface.
Migration 55 additions
Added in PR #212 alongside the memory uplift:
-- FTS5 contentless mirror of journal_entries.summary + payload.
-- Insert/update/delete triggers keep it in lockstep with the base table.
-- Powers the ?q= parameter on /api/v1/journal and the BM25 leg of HybridRecall.
CREATE VIRTUAL TABLE journal_entries_fts USING fts5(
summary, payload, content='journal_entries', content_rowid='rowid'
);
-- Compaction sink. Aged rows from journal_entries land here with payload
-- truncated to 400 chars before being deleted from the base table.
CREATE TABLE journal_entries_archived (
id TEXT PRIMARY KEY,
workspace_id TEXT NOT NULL,
ts TEXT NOT NULL,
entry_type TEXT NOT NULL,
severity TEXT NOT NULL,
summary TEXT NOT NULL,
payload TEXT NOT NULL, -- 400-char truncation
archived_at TEXT NOT NULL
);
-- A-Mem-style edge graph: similarity edges linking related embeddings.
CREATE TABLE memory_relations (
src_entry_id TEXT NOT NULL,
dst_entry_id TEXT NOT NULL,
relation TEXT NOT NULL, -- 'similar' | 'supports' | 'contradicts'
weight REAL NOT NULL DEFAULT 0.0, -- cosine for 'similar', 1.0 for hand-asserted
created_at TEXT NOT NULL,
PRIMARY KEY (src_entry_id, dst_entry_id, relation)
);
-- Daily 5-metric health score per workspace (and optionally per crew).
CREATE TABLE memory_health_snapshots (
workspace_id TEXT NOT NULL,
crew_id TEXT, -- NULL = workspace-aggregate
snapshot_date TEXT NOT NULL, -- YYYY-MM-DD UTC
freshness REAL NOT NULL,
coverage REAL NOT NULL,
coherence REAL NOT NULL,
efficiency REAL NOT NULL,
reachability REAL NOT NULL,
score REAL NOT NULL, -- weighted sum (see below)
PRIMARY KEY (workspace_id, crew_id, snapshot_date)
);
Scopes
type Scope string
const (
ScopeOwn Scope = "own" // agent's own past only
ScopeCrewShared Scope = "crew_shared" // own + high-value crew entries
)
ScopeForRole(role) maps role strings to scopes:
| Role | Scope |
|---|
AGENT (default) | own |
LEAD | crew_shared |
Workspace isolation is always enforced at the query boundary — a cross-workspace recall is impossible even with a misconfigured scope.
Hybrid retrieval
Pure cosine recall misses entries that are lexically obvious — searching for "OOM in checkout" does not necessarily match an entry whose summary literally says OOM, because nomic-embed-text clusters by semantics, not surface form. PR #212 adds hybrid retrieval: dense (cosine) and sparse (BM25 over the FTS5 index) results fused via Reciprocal Rank Fusion.
hits, err := episodic.HybridRecall(ctx, db, embedder, episodic.Query{
WorkspaceID: ws,
AgentID: agent,
Scope: episodic.ScopeOwn,
QueryText: "OOM in checkout",
K: 5,
})
The fusion formula:
RRF_score(entry) = Σ over rankings r of: 1 / (k_RRF + rank_r(entry))
with k_RRF = 60 (the value used in the original RRF paper; further tuning had marginal returns). Each ranking contributes the inverse of the entry’s position; entries ranked highly in either leg float to the top, while entries that appear in both pump up most. There is no weighting parameter — both legs are treated equal, and the constant 60 dominates ranking noise at low ranks where it matters most.
HybridRecall is the recommended entry point for new code. The pure-cosine Recall is preserved for the orchestrator’s prompt-injection path (where rank stability across small query perturbations matters more than recall on lexical hits) and for tests that want a deterministic similarity score.
Memory relations (A-Mem-style edges)
Embeddings live in isolation by default; memory_relations lets us treat them as a graph. Two relation types are populated automatically:
similar — LinkSimilarOnIndex runs at insert time: for the freshly-indexed entry, query the existing embedding pool, take the top-3 cosine matches above 0.80, write (src=new, dst=match, relation='similar', weight=cosine) rows. Edges are unidirectional (the new entry points to its predecessors); the inverse direction is implied at read time.
supports — LinkSupports is called by the consolidator when it derives a rule from one or more journal entries. The rule entry points to every supporting evidence entry: relation='supports', weight=1.0. This is what makes consolidated rules traceable back to their evidence.
A future contradicts relation is reserved in the schema but not populated today.
Read-side use: when injecting recalled hits into a prompt, the orchestrator can optionally walk one hop along similar edges to deepen context without paying for another embedding round-trip. Disabled by default; enable per-agent by setting agent.memory_walk_depth = 1.
Recall API
hits, err := episodic.Recall(ctx, db, embedder, episodic.Query{
WorkspaceID: ws,
CrewID: crew,
AgentID: agent,
Scope: episodic.ScopeOwn,
QueryText: "production deployment keeper denied",
K: 5,
})
WorkspaceID is required.
ScopeOwn requires AgentID; ScopeCrewShared requires CrewID.
K caps results (1-50, default 5).
Returns []Hit:
type Hit struct {
EntryID string
Score float64 // cosine similarity in [0,1], 1=identical
Age time.Duration // since original entry's ts
Summary string
EntryType string
AgentID string
Payload map[string]any
}
Consumers typically weight fresher hits higher when injecting into prompts. episodic.RenderInjection(hits, maxChars) does this and returns a prompt-ready string, truncated to fit a char budget — used by the orchestrator’s episodicRecallAdapter before every agent run.
Importance scoring
Each journal_embeddings row carries three additional fields that feed
recall ranking and a nightly decay job:
| Column | Purpose |
|---|
importance_score REAL | Multiplied into cosine at Recall sort time (range [0,1]) |
reference_count INTEGER | Incremented whenever a hit lands in a prompt |
last_referenced_at TEXT | Stamped at the same time |
Baseline comes from BaseImportance(entry_type, severity, priority)
(see internal/episodic/importance.go): peer escalations and eval
regressions seed high; info-level routine events seed 0.5; the
operator-applied priority marker floors the value at 0.80 (pin) /
0.85 (high) / 0.95 (permanent).
Nightly decay — episodic.DecayAndReinforce recomputes every
row as:
importance = BASE × RecencyFactor × (1 + ReferenceBoost/8)
RecencyFactor(indexed_at, now) = max(0.1, 1 - days/180) — a 0.1
floor keeps old-but-critical memories from going to literal zero
ReferenceBoost(refs) = log₂(refs + 1) — frequently-recalled
entries lift, but the /8 divisor prevents runaway loops from
dominating over rare-but-critical ones
Recall-time use — Recall sorts candidates by
cosine × importance (not cosine alone), then top-K is returned.
Every returned hit gets MarkReferenced which increments the
reference counter so the next DecayAndReinforce lifts frequently-
hit memories.
This is a direct port of OpenClaw Auto-Dream’s importance formula,
adapted to Crewship’s entry-type catalog and multi-tenant model.
Effect: a six-month-old peer.escalation keeps ranking well after
one recall per week, while a stale low-value info entry falls off.
Untrusted-hints wrapper
RenderInjection wraps its output in a <recalled-memory>…</recalled-memory>
block with an explicit “UNTRUSTED HINTS” preamble. The wrapper is
load-bearing: recalled entries may contain text authored by peers,
tools, or agent output — a past peer.escalation could carry an
"IGNORE PREVIOUS INSTRUCTIONS" payload without anyone realising.
The wrapper instructs the model to treat everything inside as hints
the current task can override, not as authoritative instructions.
Same treatment applies to the orchestrator’s buildMemoryContext
blocks (AGENT.md / CREW.md) — both surfaces are agent-authored and
therefore both should be read by the model as hints.
Patterns borrowed from Hermes Agent’s sanitize_context() and
Self-Evolve’s <self-evolve-memories>…(untrusted metadata)….
Embedder
The Embedder interface is provider-neutral:
type Embedder interface {
Embed(ctx context.Context, text string) ([]float32, error)
Dim() int
ModelID() string
}
Production wiring uses the Ollama adapter (embedder.go) against nomic-embed-text. The adapter follows the OLLAMA_HOST env var and falls back to http://host.docker.internal:11434.
If the embedder is nil (Ollama unreachable at startup), Recall returns an empty slice silently — agent runs don’t fail on embedding outages. This is configured in server/orchestrator_adapters.go:newEpisodicRecallAdapter.
Health scoring
memory_health_snapshots captures one row per workspace per day (and optionally per crew). The score is a weighted sum of five metrics, each in [0, 100]:
| Metric | Weight | Question it answers |
|---|
| Freshness | 25% | How recent is the indexed content? Decays with median entry age. |
| Coverage | 25% | What fraction of embeddable journal entries are actually indexed? |
| Coherence | 20% | How tight is the embedding cluster structure? Higher = entries cleanly group around shared themes. |
| Efficiency | 15% | Average cosine of the top-K hit distribution — high means recalls return distinct, relevant entries; low means the index returns noise. |
| Reachability | 15% | Fraction of embeddings reachable via at least one memory_relations edge. |
The composite score is reported with three colour bands:
| Band | Range | Meaning |
|---|
| Red | < 50 | Memory is degraded; likely missing recent entries, low embedding count, or weak clustering. Operator action: trigger consolidation, check Ollama health. |
| Yellow | 50–75 | Functional but with at least one weak metric. Watch the dashboard. |
| Green | ≥ 75 | All metrics in healthy range. |
Health is exposed by internal/api/memory_health_handler.go over scores derived in internal/episodic/. The daily consolidator job recomputes it after IndexNew; operators can call it on demand via:
The endpoint returns the latest snapshot in the table — it does not recompute on read. Use crewship memory health --recompute (or POST the equivalent) to force a fresh score.
In-session memory nudge
When an agent run accumulates 30 or more new journal entries since its last memory.updated entry, the orchestrator injects a short reminder into the next system prompt:
You have 47 new journal entries since your last memory update. Consider summarising what you’ve learned with summary.generated so future sessions can recall it.
The threshold is fixed at 30 because that’s roughly when an agent’s working context starts evicting its own short-term memory. The nudge only fires once per session — once the agent emits a new summary.generated, the counter resets.
A second nudge fires opportunistically when daily cost exceeds $1 (default; configurable per workspace):
Last 24h: 142 calls, 3.4M tokens, $8.71. Watch for runaway loops — the consolidator will summarise this batch on the next tick.
Both nudges are advisory — agents can ignore them, and they are stripped from the context before the LLM call records its response payload.
Indexer
episodic.IndexNew(ctx, db, embedder) walks unindexed journal entries that pass shouldEmbed, computes embeddings, and inserts them into journal_embeddings. The background consolidator runs this on its 6h tick; you can trigger it manually:
n, err := episodic.IndexNew(ctx, db, embedder)
Only entries whose type survives the coarse SQL filter (the EmbeddableEntryTypes slice) are candidates; the Go-side shouldEmbed then applies the severity-aware refinement.
Gotchas
- Ollama dependency. No Ollama = no episodic memory. Set
OLLAMA_MODELS="/Volumes/SSD 990 PRO/ollama-models" (external SSD) and ollama serve before ./dev.sh start when testing locally.
- Embedding is best-effort. An embedder error during
IndexNew logs and skips; the entry is retried next run.
- Cosine on small sets is fine. Don’t pre-optimise for a vector DB until the per-agent scan latency exceeds 50ms.
- Do not add
exec.output_chunk to the embeddable list. Doing so will flood the embedding space with low-signal log chatter and kill recall quality. The comment in types.go is not a suggestion.