Episodic Memory

Roadmap (v0.2). The episodic package ships in v0.1 but the embed-on-write hook into the journal writer is not yet wired, so journal entries are not indexed automatically. Manual indexing via the API works today for evaluation; vector recall queries only return results for entries that were indexed manually until the auto-wire ships in v0.2.

Episodic is a narrow vector-recall layer over the Crew Journal. It embeds a selective subset of journal entries and serves top-K similarity queries to agents planning new work. The “selective” part is load-bearing. Per 2025-2026 multi-agent memory research, indexing every event causes catastrophic drift: high-volume low-signal types (exec.output_chunk, container.metrics, network.*) drown the embedding space and dilute recall. Episodic refuses to embed those types and ingests only escalations, summaries, terminal mission status, denied keeper calls, eval regressions, and operator-tagged entries.

Embedding criteria

// internal/episodic/types.go
func shouldEmbed(entryType string, severity string) bool {
    switch entryType {
    case "peer.escalation",
         "summary.generated",
         "memory.consolidated",
         "approval.denied",
         "eval.regression_detected":
        return true
    case "keeper.decision":
        return severity == "warn" || severity == "error"
    case "mission.status_change":
        return severity == "warn" || severity == "error"
    }
    return false
}

Entry type	When embedded
`peer.escalation`	Always
`summary.generated`	Always
`memory.consolidated`	Always
`approval.denied`	Always
`eval.regression_detected`	Always
`keeper.decision`	Only at severity warn/error (denied or risky)
`mission.status_change`	Only at severity warn/error (failed or problematic)
`peer.conversation`	Listed in `EmbeddableEntryTypes` but `shouldEmbed` always returns false — plain Q&A is too high-volume to embed. The indexer is expected to apply its own escalation-aware filter and force it via the override below when the question ends in an escalation.
`exec.output_chunk`, `container.metrics`, `network.*`	Never

To force an embedding on an otherwise-filtered entry, tag it with refs.episodic = true in the journal payload. The indexer checks this flag before applying the type/severity filter.

Storage

journal_embeddings (added in migration 52 alongside journal_entries):

CREATE TABLE journal_embeddings (
  entry_id           TEXT PRIMARY KEY REFERENCES journal_entries(id) ON DELETE CASCADE,
  vector             BLOB NOT NULL,     -- raw float32 bytes
  dim                INTEGER NOT NULL,
  model              TEXT NOT NULL,     -- "nomic-embed-text", etc.
  indexed_at         TEXT NOT NULL,
  importance_score   REAL DEFAULT 0.5,  -- fed into recall sort (see "Importance scoring")
  reference_count    INTEGER DEFAULT 0, -- bumped when a hit lands in a prompt
  last_referenced_at TEXT               -- stamped at the same time
);

Vectors are stored as little-endian float32 BLOBs — native Go slice serialisation. SQLite has no pgvector, so recall is a brute-force cosine scan over the scope-filtered rows. For the expected scale (~1% of entries embedded, low thousands per agent) the scan finishes in low milliseconds. If this outgrows SQLite the right move is an external vector store, not a SQLite extension — so storage sits behind an interface.

Migration 55 additions

Added in PR #212 alongside the memory uplift:

-- FTS5 contentless mirror of journal_entries.summary + payload.
-- Insert/update/delete triggers keep it in lockstep with the base table.
-- Powers the ?q= parameter on /api/v1/journal and the BM25 leg of HybridRecall.
CREATE VIRTUAL TABLE journal_entries_fts USING fts5(
  summary, payload, content='journal_entries', content_rowid='rowid'
);

-- Compaction sink. Aged rows from journal_entries land here with payload
-- truncated to 400 chars before being deleted from the base table.
CREATE TABLE journal_entries_archived (
  id           TEXT PRIMARY KEY,
  workspace_id TEXT NOT NULL,
  ts           TEXT NOT NULL,
  entry_type   TEXT NOT NULL,
  severity     TEXT NOT NULL,
  summary      TEXT NOT NULL,
  payload      TEXT NOT NULL,                 -- 400-char truncation
  archived_at  TEXT NOT NULL
);

-- A-Mem-style edge graph: similarity edges linking related embeddings.
CREATE TABLE memory_relations (
  src_entry_id TEXT NOT NULL,
  dst_entry_id TEXT NOT NULL,
  relation     TEXT NOT NULL,                 -- 'similar' | 'supports' | 'contradicts'
  weight       REAL NOT NULL DEFAULT 0.0,     -- cosine for 'similar', 1.0 for hand-asserted
  created_at   TEXT NOT NULL,
  PRIMARY KEY (src_entry_id, dst_entry_id, relation)
);

-- Daily 5-metric health score per workspace (and optionally per crew).
CREATE TABLE memory_health_snapshots (
  workspace_id   TEXT NOT NULL,
  crew_id        TEXT,                        -- NULL = workspace-aggregate
  snapshot_date  TEXT NOT NULL,               -- YYYY-MM-DD UTC
  freshness      REAL NOT NULL,
  coverage       REAL NOT NULL,
  coherence      REAL NOT NULL,
  efficiency     REAL NOT NULL,
  reachability   REAL NOT NULL,
  score          REAL NOT NULL,               -- weighted sum (see below)
  PRIMARY KEY (workspace_id, crew_id, snapshot_date)
);

Scopes

type Scope string
const (
    ScopeOwn        Scope = "own"         // agent's own past only
    ScopeCrewShared Scope = "crew_shared" // own + high-value crew entries
)

ScopeForRole(role) maps role strings to scopes:

Role	Scope
`AGENT` (default)	`own`
`LEAD`	`crew_shared`

Workspace isolation is always enforced at the query boundary — a cross-workspace recall is impossible even with a misconfigured scope.

Hybrid retrieval

Pure cosine recall misses entries that are lexically obvious — searching for "OOM in checkout" does not necessarily match an entry whose summary literally says OOM, because nomic-embed-text clusters by semantics, not surface form. PR #212 adds hybrid retrieval: dense (cosine) and sparse (BM25 over the FTS5 index) results fused via Reciprocal Rank Fusion.

hits, err := episodic.HybridRecall(ctx, db, embedder, episodic.Query{
    WorkspaceID: ws,
    AgentID:     agent,
    Scope:       episodic.ScopeOwn,
    QueryText:   "OOM in checkout",
    K:           5,
})

The fusion formula:

RRF_score(entry) = Σ over rankings r of:  1 / (k_RRF + rank_r(entry))

with k_RRF = 60 (the value used in the original RRF paper; further tuning had marginal returns). Each ranking contributes the inverse of the entry’s position; entries ranked highly in either leg float to the top, while entries that appear in both pump up most. There is no weighting parameter — both legs are treated equal, and the constant 60 dominates ranking noise at low ranks where it matters most. HybridRecall is the recommended entry point for new code. The pure-cosine Recall is preserved for the orchestrator’s prompt-injection path (where rank stability across small query perturbations matters more than recall on lexical hits) and for tests that want a deterministic similarity score.

Memory relations (A-Mem-style edges)

Embeddings live in isolation by default; memory_relations lets us treat them as a graph. Two relation types are populated automatically:

similar — LinkSimilarOnIndex runs at insert time: for the freshly-indexed entry, query the existing embedding pool, take the top-3 cosine matches above 0.80, write (src=new, dst=match, relation='similar', weight=cosine) rows. Edges are unidirectional (the new entry points to its predecessors); the inverse direction is implied at read time.
supports — LinkSupports is called by the consolidator when it derives a rule from one or more journal entries. The rule entry points to every supporting evidence entry: relation='supports', weight=1.0. This is what makes consolidated rules traceable back to their evidence.

A future contradicts relation is reserved in the schema but not populated today. Read-side use: when injecting recalled hits into a prompt, the orchestrator can optionally walk one hop along similar edges to deepen context without paying for another embedding round-trip. Disabled by default; enable per-agent by setting agent.memory_walk_depth = 1.

Recall API

hits, err := episodic.Recall(ctx, db, embedder, episodic.Query{
    WorkspaceID: ws,
    CrewID:      crew,
    AgentID:     agent,
    Scope:       episodic.ScopeOwn,
    QueryText:   "production deployment keeper denied",
    K:           5,
})

WorkspaceID is required.
ScopeOwn requires AgentID; ScopeCrewShared requires CrewID.
K caps results (1-50, default 5).

Returns []Hit:

type Hit struct {
    EntryID   string
    Score     float64       // cosine similarity in [0,1], 1=identical
    Age       time.Duration // since original entry's ts
    Summary   string
    EntryType string
    AgentID   string
    Payload   map[string]any
}

Consumers typically weight fresher hits higher when injecting into prompts. episodic.RenderInjection(hits, maxChars) does this and returns a prompt-ready string, truncated to fit a char budget — used by the orchestrator’s episodicRecallAdapter before every agent run.

Importance scoring

Each journal_embeddings row carries three additional fields that feed recall ranking and a nightly decay job:

Column	Purpose
`importance_score REAL`	Multiplied into cosine at `Recall` sort time (range [0,1])
`reference_count INTEGER`	Incremented whenever a hit lands in a prompt
`last_referenced_at TEXT`	Stamped at the same time

Baseline comes from BaseImportance(entry_type, severity, priority) (see internal/episodic/importance.go): peer escalations and eval regressions seed high; info-level routine events seed 0.5; the operator-applied priority marker floors the value at 0.80 (pin) / 0.85 (high) / 0.95 (permanent). Nightly decay — episodic.DecayAndReinforce recomputes every row as:

importance = BASE × RecencyFactor × (1 + ReferenceBoost/8)

RecencyFactor(indexed_at, now) = max(0.1, 1 - days/180) — a 0.1 floor keeps old-but-critical memories from going to literal zero
ReferenceBoost(refs) = log₂(refs + 1) — frequently-recalled entries lift, but the /8 divisor prevents runaway loops from dominating over rare-but-critical ones

Recall-time use — Recall sorts candidates by cosine × importance (not cosine alone), then top-K is returned. Every returned hit gets MarkReferenced which increments the reference counter so the next DecayAndReinforce lifts frequently- hit memories. This is a direct port of OpenClaw Auto-Dream’s importance formula, adapted to Crewship’s entry-type catalog and multi-tenant model. Effect: a six-month-old peer.escalation keeps ranking well after one recall per week, while a stale low-value info entry falls off.

Untrusted-hints wrapper

RenderInjection wraps its output in a <recalled-memory>…</recalled-memory> block with an explicit “UNTRUSTED HINTS” preamble. The wrapper is load-bearing: recalled entries may contain text authored by peers, tools, or agent output — a past peer.escalation could carry an "IGNORE PREVIOUS INSTRUCTIONS" payload without anyone realising. The wrapper instructs the model to treat everything inside as hints the current task can override, not as authoritative instructions. Same treatment applies to the orchestrator’s buildMemoryContext blocks (AGENT.md / CREW.md) — both surfaces are agent-authored and therefore both should be read by the model as hints. Patterns borrowed from Hermes Agent’s sanitize_context() and Self-Evolve’s <self-evolve-memories>…(untrusted metadata)….

Embedder

The Embedder interface is provider-neutral:

type Embedder interface {
    Embed(ctx context.Context, text string) ([]float32, error)
    Dim() int
    ModelID() string
}

Production wiring uses the Ollama adapter (embedder.go) against nomic-embed-text. The adapter follows the OLLAMA_HOST env var and falls back to http://host.docker.internal:11434. If the embedder is nil (Ollama unreachable at startup), Recall returns an empty slice silently — agent runs don’t fail on embedding outages. This is configured in server/orchestrator_adapters.go:newEpisodicRecallAdapter.

Health scoring

memory_health_snapshots captures one row per workspace per day (and optionally per crew). The score is a weighted sum of five metrics, each in [0, 100]:

Metric	Weight	Question it answers
Freshness	25%	How recent is the indexed content? Decays with median entry age.
Coverage	25%	What fraction of embeddable journal entries are actually indexed?
Coherence	20%	How tight is the embedding cluster structure? Higher = entries cleanly group around shared themes.
Efficiency	15%	Average cosine of the top-K hit distribution — high means recalls return distinct, relevant entries; low means the index returns noise.
Reachability	15%	Fraction of embeddings reachable via at least one `memory_relations` edge.

The composite score is reported with three colour bands:

Band	Range	Meaning
Red	`< 50`	Memory is degraded; likely missing recent entries, low embedding count, or weak clustering. Operator action: trigger consolidation, check Ollama health.
Yellow	`50–75`	Functional but with at least one weak metric. Watch the dashboard.
Green	`≥ 75`	All metrics in healthy range.

Health is exposed by internal/api/memory_health_handler.go over scores derived in internal/episodic/. The daily consolidator job recomputes it after IndexNew; operators can call it on demand via:

HTTP — GET /api/v1/memory/health[?crew_id=<slug>]. See Memory API.
CLI — crewship memory health [--crew <slug>]. See crewship memory health.

The endpoint returns the latest snapshot in the table — it does not recompute on read. Use crewship memory health --recompute (or POST the equivalent) to force a fresh score.

In-session memory nudge

When an agent run accumulates 30 or more new journal entries since its last memory.updated entry, the orchestrator injects a short reminder into the next system prompt:

You have 47 new journal entries since your last memory update. Consider summarising what you’ve learned with summary.generated so future sessions can recall it.

The threshold is fixed at 30 because that’s roughly when an agent’s working context starts evicting its own short-term memory. The nudge only fires once per session — once the agent emits a new summary.generated, the counter resets. A second nudge fires opportunistically when daily cost exceeds $1 (default; configurable per workspace):

Last 24h: 142 calls, 3.4M tokens, $8.71. Watch for runaway loops — the consolidator will summarise this batch on the next tick.

Both nudges are advisory — agents can ignore them, and they are stripped from the context before the LLM call records its response payload.

Indexer

episodic.IndexNew(ctx, db, embedder) walks unindexed journal entries that pass shouldEmbed, computes embeddings, and inserts them into journal_embeddings. The background consolidator runs this on its 6h tick; you can trigger it manually:

n, err := episodic.IndexNew(ctx, db, embedder)

Only entries whose type survives the coarse SQL filter (the EmbeddableEntryTypes slice) are candidates; the Go-side shouldEmbed then applies the severity-aware refinement.

Gotchas

Ollama dependency. No Ollama = no episodic memory. Set OLLAMA_MODELS="/Volumes/SSD 990 PRO/ollama-models" (external SSD) and ollama serve before ./dev.sh start when testing locally.
Embedding is best-effort. An embedder error during IndexNew logs and skips; the entry is retried next run.
Cosine on small sets is fine. Don’t pre-optimise for a vector DB until the per-agent scan latency exceeds 50ms.
Do not add exec.output_chunk to the embeddable list. Doing so will flood the embedding space with low-signal log chatter and kill recall quality. The comment in types.go is not a suggestion.

Crew Journal — source of entries.
Consolidate — runs IndexNew on the daily tick.

Get Started

Guides

Security

Configuration

Episodic Memory

Episodic Memory

Embedding criteria

Storage

Migration 55 additions

Scopes

Hybrid retrieval

Memory relations (A-Mem-style edges)

Recall API

Importance scoring

Untrusted-hints wrapper

Embedder

Health scoring

In-session memory nudge

Indexer

Gotchas

Get Started

Guides

Security

Configuration

Documentation Index

​Episodic Memory

​Embedding criteria

​Storage

​Migration 55 additions

​Scopes

​Hybrid retrieval

​Memory relations (A-Mem-style edges)

​Recall API

​Importance scoring

​Untrusted-hints wrapper

​Embedder

​Health scoring

​In-session memory nudge

​Indexer

​Gotchas

​Related

Episodic Memory

Embedding criteria

Storage

Migration 55 additions

Scopes

Hybrid retrieval

Memory relations (A-Mem-style edges)

Recall API

Importance scoring

Untrusted-hints wrapper

Embedder

Health scoring

In-session memory nudge

Indexer

Gotchas

Related