Skip to main content

Episodic Memory

Auto-indexing is wired at boot. When an embedder is configured (KEEPER_OLLAMA_URL pointing at an Ollama host serving nomic-embed-text), the server starts the background indexer sweeper automatically at boot — new embeddable journal entries become vector-searchable within one 30-second sweep, no manual indexing required.Without an embedder, episodic recall runs in sparse-only mode (keyword/FTS, no vector similarity). That degraded state is never silent: the server logs a WARN at boot, GET /healthz reports "episodic": "sparse-only" (vs "vector"), and crewship doctor surfaces it as a WARN with the fix.Roadmap (v0.2): the synchronous embed-on-write hook (Indexer.IndexOne directly from the journal writer hot path) is not yet wired — until then a freshly written entry can lag recall by up to one sweep interval.
Episodic is a narrow vector-recall layer over the Crew Journal. It embeds a selective subset of journal entries and serves top-K similarity queries to agents planning new work. The “selective” part is load-bearing. Per 2025-2026 multi-agent memory research, indexing every event causes catastrophic drift: high-volume low-signal types (exec.output_chunk, container.metrics, network.*) drown the embedding space and dilute recall. Episodic refuses to embed those types and ingests only escalations, summaries, terminal mission status, denied keeper calls, eval regressions, and operator-tagged entries.

Embedding criteria

// internal/episodic/types.go
func shouldEmbed(entryType string, severity string) bool {
    switch entryType {
    case "peer.escalation",
         "summary.generated",
         "memory.consolidated",
         "approval.denied",
         "eval.regression_detected":
        return true
    case "keeper.decision":
        return severity == "warn" || severity == "error"
    case "mission.status_change":
        return severity == "warn" || severity == "error"
    }
    return false
}
Entry typeWhen embedded
peer.escalationAlways
summary.generatedAlways
memory.consolidatedAlways
approval.deniedAlways
eval.regression_detectedAlways
keeper.decisionOnly at severity warn/error (denied or risky)
mission.status_changeOnly at severity warn/error (failed or problematic)
peer.conversationListed in EmbeddableEntryTypes but shouldEmbed always returns false — plain Q&A is too high-volume to embed. The indexer is expected to apply its own escalation-aware filter and force it via the override below when the question ends in an escalation.
exec.output_chunk, container.metrics, network.*Never
To force an embedding on an otherwise-filtered entry, tag it with refs.episodic = true in the journal payload. The indexer checks this flag before applying the type/severity filter.

Storage

journal_embeddings (added in migration 52 alongside journal_entries):
CREATE TABLE journal_embeddings (
  entry_id           TEXT PRIMARY KEY REFERENCES journal_entries(id) ON DELETE CASCADE,
  vector             BLOB NOT NULL,     -- raw float32 bytes
  dim                INTEGER NOT NULL,
  model              TEXT NOT NULL,     -- "nomic-embed-text", etc.
  indexed_at         TEXT NOT NULL,
  importance_score   REAL DEFAULT 0.5,  -- fed into recall sort (see "Importance scoring")
  reference_count    INTEGER DEFAULT 0, -- bumped when a hit lands in a prompt
  last_referenced_at TEXT               -- stamped at the same time
);
Vectors are stored as little-endian float32 BLOBs — native Go slice serialisation. SQLite has no pgvector, so recall is a brute-force cosine scan over the scope-filtered rows. For the expected scale (~1% of entries embedded, low thousands per agent) the scan finishes in low milliseconds. If this outgrows SQLite the right move is an external vector store, not a SQLite extension — so storage sits behind an interface.

Migration 55 additions

Added in PR #212 alongside the memory uplift.
-- FTS5 contentless mirror of journal_entries.summary + payload.
-- Insert/update/delete triggers keep it in lockstep with the base table.
-- Powers the ?q= parameter on /api/v1/journal and the BM25 leg of HybridRecall.
CREATE VIRTUAL TABLE journal_entries_fts USING fts5(
  summary, payload, content='journal_entries', content_rowid='rowid'
);

-- Compaction sink. Aged rows from journal_entries land here with payload
-- truncated to 400 chars before being deleted from the base table.
CREATE TABLE journal_entries_archived (
  id           TEXT PRIMARY KEY,
  workspace_id TEXT NOT NULL,
  ts           TEXT NOT NULL,
  entry_type   TEXT NOT NULL,
  severity     TEXT NOT NULL,
  summary      TEXT NOT NULL,
  payload      TEXT NOT NULL,                 -- 400-char truncation
  archived_at  TEXT NOT NULL
);

-- embedding relation graph: similarity edges linking related embeddings.
CREATE TABLE memory_relations (
  src_entry_id TEXT NOT NULL,
  dst_entry_id TEXT NOT NULL,
  relation     TEXT NOT NULL,                 -- 'similar' | 'supports' | 'contradicts'
  weight       REAL NOT NULL DEFAULT 0.0,     -- cosine for 'similar', 1.0 for hand-asserted
  created_at   TEXT NOT NULL,
  PRIMARY KEY (src_entry_id, dst_entry_id, relation)
);

-- Daily 5-metric health score per workspace (and optionally per crew).
CREATE TABLE memory_health_snapshots (
  workspace_id   TEXT NOT NULL,
  crew_id        TEXT,                        -- NULL = workspace-aggregate
  snapshot_date  TEXT NOT NULL,               -- YYYY-MM-DD UTC
  freshness      REAL NOT NULL,
  coverage       REAL NOT NULL,
  coherence      REAL NOT NULL,
  efficiency     REAL NOT NULL,
  reachability   REAL NOT NULL,
  score          REAL NOT NULL,               -- weighted sum (see below)
  PRIMARY KEY (workspace_id, crew_id, snapshot_date)
);

Scopes

type Scope string
const (
    ScopeOwn        Scope = "own"         // agent's own past only
    ScopeCrewShared Scope = "crew_shared" // own + high-value crew entries
)
ScopeForRole(role) maps role strings to scopes:
RoleScope
AGENT (default)own
LEADcrew_shared
Workspace isolation is always enforced at the query boundary — a cross-workspace recall is impossible even with a misconfigured scope.

Hybrid retrieval

Pure cosine recall misses entries that are lexically obvious — searching for "OOM in checkout" does not necessarily match an entry whose summary literally says OOM, because nomic-embed-text clusters by semantics, not surface form. PR #212 adds hybrid retrieval: dense (cosine) and sparse (BM25 over the FTS5 index) results fused via Reciprocal Rank Fusion.
hits, err := episodic.HybridRecall(ctx, db, embedder, episodic.Query{
    WorkspaceID: ws,
    AgentID:     agent,
    Scope:       episodic.ScopeOwn,
    QueryText:   "OOM in checkout",
    K:           5,
})
The fusion formula:
RRF_score(entry) = Σ over rankings r of:  1 / (k_RRF + rank_r(entry))
with k_RRF = 60 (the value used in the original RRF paper; further tuning had marginal returns). Each ranking contributes the inverse of the entry’s position; entries ranked highly in either leg float to the top, while entries that appear in both pump up most. There is no weighting parameter — both legs are treated equal, and the constant 60 dominates ranking noise at low ranks where it matters most. HybridRecall is the recommended entry point for new code. The pure-cosine Recall is preserved for the orchestrator’s prompt-injection path (where rank stability across small query perturbations matters more than recall on lexical hits) and for tests that want a deterministic similarity score.

Memory relations (embedding relation graph)

Embeddings live in isolation by default; memory_relations lets us treat them as a graph. Two relation types are populated automatically:
  • similarLinkSimilarOnIndex runs at insert time: for the freshly-indexed entry, query the existing embedding pool, take the top-3 cosine matches above 0.80, write (src=new, dst=match, relation='similar', weight=cosine) rows. Edges are unidirectional (the new entry points to its predecessors); the inverse direction is implied at read time.
  • supportsLinkSupports is called by the consolidator when it derives a rule from one or more journal entries. The rule entry points to every supporting evidence entry: relation='supports', weight=1.0. This is what makes consolidated rules traceable back to their evidence.
A future contradicts relation is reserved in the schema but not populated today. Read-side use: when injecting recalled hits into a prompt, the orchestrator can optionally walk one hop along similar edges to deepen context without paying for another embedding round-trip. Disabled by default; enable per-agent by setting agent.memory_walk_depth = 1.

Recall API

hits, err := episodic.Recall(ctx, db, embedder, episodic.Query{
    WorkspaceID: ws,
    CrewID:      crew,
    AgentID:     agent,
    Scope:       episodic.ScopeOwn,
    QueryText:   "production deployment keeper denied",
    K:           5,
})
  • WorkspaceID is required.
  • ScopeOwn requires AgentID; ScopeCrewShared requires CrewID.
  • K caps results (1-50, default 5).
Returns []Hit:
type Hit struct {
    EntryID   string
    Score     float64       // cosine similarity in [0,1], 1=identical
    Age       time.Duration // since original entry's ts
    Summary   string
    EntryType string
    AgentID   string
    Payload   map[string]any
}
Consumers typically weight fresher hits higher when injecting into prompts. episodic.RenderInjection(hits, maxChars) does this and returns a prompt-ready string, truncated to fit a char budget — used by the orchestrator’s episodicRecallAdapter before every agent run.

Importance scoring

Each journal_embeddings row carries three additional fields that feed recall ranking and a nightly decay job:
ColumnPurpose
importance_score REALMultiplied into cosine at Recall sort time (range [0,1])
reference_count INTEGERIncremented whenever a hit lands in a prompt
last_referenced_at TEXTStamped at the same time
Baseline comes from BaseImportance(entry_type, severity, priority) (see internal/episodic/importance.go): peer escalations and eval regressions seed high; info-level routine events seed 0.5; the operator-applied priority marker floors the value at 0.80 (pin) / 0.85 (high) / 0.95 (permanent). Nightly decayepisodic.DecayAndReinforce recomputes every row as:
importance = BASE × RecencyFactor × (1 + ReferenceBoost/8)
  • RecencyFactor(indexed_at, now) = max(0.1, 1 - days/180) — a 0.1 floor keeps old-but-critical memories from going to literal zero
  • ReferenceBoost(refs) = log₂(refs + 1) — frequently-recalled entries lift, but the /8 divisor prevents runaway loops from dominating over rare-but-critical ones
Recall-time useRecall sorts candidates by cosine × importance (not cosine alone), then top-K is returned. Every returned hit gets MarkReferenced which increments the reference counter so the next DecayAndReinforce lifts frequently- hit memories. The formula combines base value, recency decay, and reference-count reinforcement to keep durable signal ranking well even as it ages. Effect: a six-month-old peer.escalation keeps ranking well after one recall per week, while a stale low-value info entry falls off.

Untrusted-hints wrapper

RenderInjection wraps its output in a <recalled-memory>…</recalled-memory> block with an explicit “UNTRUSTED HINTS” preamble. The wrapper is load-bearing: recalled entries may contain text authored by peers, tools, or agent output — a past peer.escalation could carry an "IGNORE PREVIOUS INSTRUCTIONS" payload without anyone realising. The wrapper instructs the model to treat everything inside as hints the current task can override, not as authoritative instructions. Same treatment applies to the orchestrator’s buildMemoryContext blocks (AGENT.md / CREW.md) — both surfaces are agent-authored and therefore both should be read by the model as hints. The wrapper follows the “treat all recalled content as untrusted hints” pattern: anything inside a <recalled-memory> block is guidance the current task can override, never authoritative instruction.

Embedder

The Embedder interface is provider-neutral:
type Embedder interface {
    Embed(ctx context.Context, text string) ([]float32, error)
    Dim() int
    ModelID() string
}
Production wiring uses the Ollama adapter (embedder.go) against nomic-embed-text (768-dimensional vectors). The embedder shares the Keeper’s Ollama base URL — KEEPER_OLLAMA_URL, which defaults to http://localhost:11434 (the config falls back to that value when the env var is unset, so the embedder is configured by default). The embedder ends up nil only when that Ollama endpoint is unreachable at startup, which disables episodic recall — see the next paragraph. If the embedder is nil (Ollama unreachable at startup), Recall returns an empty slice silently — agent runs don’t fail on embedding outages. This is configured in server/orchestrator_adapters.go:newEpisodicRecallAdapter.

Health scoring

memory_health_snapshots captures one row per workspace per day (and optionally per crew). The score is a weighted sum of five metrics, each in [0, 100]:
MetricWeightQuestion it answers
Freshness25%How recent is the indexed content? Decays with median entry age.
Coverage25%What fraction of embeddable journal entries are actually indexed?
Coherence20%How tight is the embedding cluster structure? Higher = entries cleanly group around shared themes.
Efficiency15%Average cosine of the top-K hit distribution — high means recalls return distinct, relevant entries; low means the index returns noise.
Reachability15%Fraction of embeddings reachable via at least one memory_relations edge.
The composite score is reported with three colour bands:
BandRangeMeaning
Red< 50Memory is degraded; likely missing recent entries, low embedding count, or weak clustering. Operator action: trigger consolidation, check Ollama health.
Yellow50–75Functional but with at least one weak metric. Watch the dashboard.
Green≥ 75All metrics in healthy range.
Health is exposed by internal/api/memory_health_handler.go over scores derived in internal/episodic/. The daily consolidator job recomputes it; operators can call it on demand via: The endpoint returns the latest snapshot in the table — it does not recompute on read. Scores are refreshed by the daily consolidator; there is no on-demand recompute flag (crewship memory health accepts only --crew). To force a fresh score sooner, trigger consolidation (crewship consolidate run).

In-session memory nudge

When an agent run accumulates 60 or more new journal entries since its last memory.updated entry, the orchestrator injects a short reminder into the next system prompt:
You have 72 new journal entries since your last memory update. Consider appending any recurring pattern you’ve noticed to ~/.memory/AGENT.md before the session ends — the consolidator won’t replace your personal observations.
The threshold (nudgeThreshold in internal/orchestrator/memory.go) was raised from 30 to 60 because at 30 the nudge fired on essentially every session after a memory write. The nudge only fires once per session — once the agent emits a new memory.updated, the counter resets. The nudge is advisory — agents can ignore it, and it is stripped from the context before the LLM call records its response payload.

Indexer

The Indexer (internal/episodic/indexer.go) runs a background sweeper loop on a 30-second poll (NewIndexer(..., poll)), processing up to 64 unindexed embeddable entries per sweepOnce. The server starts this sweeper automatically at boot when an embedder is configured (startEpisodicIndexer in internal/server/server_lifecycle.go, gated on KEEPER_OLLAMA_URL). When no embedder is configured the server instead logs one WARN at boot and reports "episodic": "sparse-only" on /healthz; crewship doctor reads that field and warns with the enable hint. Hot-path callers that want an embedding ready before the next recall call Indexer.IndexOne(ctx, entry) directly — typically right after writing a summary.generated entry:
err := x.IndexOne(ctx, entry)
Only entries whose type survives the coarse SQL filter (the EmbeddableEntryTypes slice) are candidates; the Go-side shouldEmbed then applies the severity-aware refinement.

Gotchas

Do not add exec.output_chunk to the embeddable list. Doing so will flood the embedding space with low-signal log chatter and kill recall quality. The comment in types.go is not a suggestion.
  • Ollama dependency. No Ollama = sparse-only recall (no vector similarity). Set KEEPER_OLLAMA_URL, plus OLLAMA_MODELS="/Volumes/SSD 990 PRO/ollama-models" (external SSD) and ollama serve before ./dev.sh start when testing locally. Check the active mode via GET /healthz (episodic field) or crewship doctor.
  • Embedding is best-effort. An embedder error during a sweepOnce pass logs and skips; the entry is retried next sweep.
  • Cosine on small sets is fine. Don’t pre-optimise for a vector DB until the per-agent scan latency exceeds 50ms.
  • Crew Journal — source of entries.
  • Consolidate — the nightly workers that produce many of the embeddable entry types (summary.generated, memory.consolidated).