Auto-indexing is wired at boot. When an embedder is configured (KEEPER_OLLAMA_URL pointing at an Ollama host serving nomic-embed-text), the server starts the background indexer sweeper automatically at boot — new embeddable journal entries become vector-searchable within one 30-second sweep, no manual indexing required.Without an embedder, episodic recall runs in sparse-only mode (keyword/FTS, no vector similarity). That degraded state is never silent: the server logs a WARN at boot, GET /healthz reports "episodic": "sparse-only" (vs "vector"), and crewship doctor surfaces it as a WARN with the fix.Roadmap (v0.2): the synchronous embed-on-write hook (Indexer.IndexOne directly from the journal writer hot path) is not yet wired — until then a freshly written entry can lag recall by up to one sweep interval.
Episodic is a narrow vector-recall layer over the Crew Journal. It embeds a selective subset of journal entries and serves top-K similarity queries to agents planning new work.The “selective” part is load-bearing. Per 2025-2026 multi-agent memory research, indexing every event causes catastrophic drift: high-volume low-signal types (exec.output_chunk, container.metrics, network.*) drown the embedding space and dilute recall. Episodic refuses to embed those types and ingests only escalations, summaries, terminal mission status, denied keeper calls, eval regressions, and operator-tagged entries.
Only at severity warn/error (failed or problematic)
peer.conversation
Listed in EmbeddableEntryTypes but shouldEmbed always returns false — plain Q&A is too high-volume to embed. The indexer is expected to apply its own escalation-aware filter and force it via the override below when the question ends in an escalation.
exec.output_chunk, container.metrics, network.*
Never
To force an embedding on an otherwise-filtered entry, tag it with refs.episodic = true in the journal payload. The indexer checks this flag before applying the type/severity filter.
journal_embeddings (added in migration 52 alongside journal_entries):
CREATE TABLE journal_embeddings ( entry_id TEXT PRIMARY KEY REFERENCES journal_entries(id) ON DELETE CASCADE, vector BLOB NOT NULL, -- raw float32 bytes dim INTEGER NOT NULL, model TEXT NOT NULL, -- "nomic-embed-text", etc. indexed_at TEXT NOT NULL, importance_score REAL DEFAULT 0.5, -- fed into recall sort (see "Importance scoring") reference_count INTEGER DEFAULT 0, -- bumped when a hit lands in a prompt last_referenced_at TEXT -- stamped at the same time);
Vectors are stored as little-endian float32 BLOBs — native Go slice serialisation. SQLite has no pgvector, so recall is a brute-force cosine scan over the scope-filtered rows. For the expected scale (~1% of entries embedded, low thousands per agent) the scan finishes in low milliseconds. If this outgrows SQLite the right move is an external vector store, not a SQLite extension — so storage sits behind an interface.
-- FTS5 contentless mirror of journal_entries.summary + payload.-- Insert/update/delete triggers keep it in lockstep with the base table.-- Powers the ?q= parameter on /api/v1/journal and the BM25 leg of HybridRecall.CREATE VIRTUAL TABLE journal_entries_fts USING fts5( summary, payload, content='journal_entries', content_rowid='rowid');-- Compaction sink. Aged rows from journal_entries land here with payload-- truncated to 400 chars before being deleted from the base table.CREATE TABLE journal_entries_archived ( id TEXT PRIMARY KEY, workspace_id TEXT NOT NULL, ts TEXT NOT NULL, entry_type TEXT NOT NULL, severity TEXT NOT NULL, summary TEXT NOT NULL, payload TEXT NOT NULL, -- 400-char truncation archived_at TEXT NOT NULL);-- embedding relation graph: similarity edges linking related embeddings.CREATE TABLE memory_relations ( src_entry_id TEXT NOT NULL, dst_entry_id TEXT NOT NULL, relation TEXT NOT NULL, -- 'similar' | 'supports' | 'contradicts' weight REAL NOT NULL DEFAULT 0.0, -- cosine for 'similar', 1.0 for hand-asserted created_at TEXT NOT NULL, PRIMARY KEY (src_entry_id, dst_entry_id, relation));-- Daily 5-metric health score per workspace (and optionally per crew).CREATE TABLE memory_health_snapshots ( workspace_id TEXT NOT NULL, crew_id TEXT, -- NULL = workspace-aggregate snapshot_date TEXT NOT NULL, -- YYYY-MM-DD UTC freshness REAL NOT NULL, coverage REAL NOT NULL, coherence REAL NOT NULL, efficiency REAL NOT NULL, reachability REAL NOT NULL, score REAL NOT NULL, -- weighted sum (see below) PRIMARY KEY (workspace_id, crew_id, snapshot_date));
Pure cosine recall misses entries that are lexically obvious — searching for "OOM in checkout" does not necessarily match an entry whose summary literally says OOM, because nomic-embed-text clusters by semantics, not surface form. PR #212 adds hybrid retrieval: dense (cosine) and sparse (BM25 over the FTS5 index) results fused via Reciprocal Rank Fusion.
RRF_score(entry) = Σ over rankings r of: 1 / (k_RRF + rank_r(entry))
with k_RRF = 60 (the value used in the original RRF paper; further tuning had marginal returns). Each ranking contributes the inverse of the entry’s position; entries ranked highly in either leg float to the top, while entries that appear in both pump up most. There is no weighting parameter — both legs are treated equal, and the constant 60 dominates ranking noise at low ranks where it matters most.HybridRecall is the recommended entry point for new code. The pure-cosine Recall is preserved for the orchestrator’s prompt-injection path (where rank stability across small query perturbations matters more than recall on lexical hits) and for tests that want a deterministic similarity score.
Embeddings live in isolation by default; memory_relations lets us treat them as a graph. Two relation types are populated automatically:
similar — LinkSimilarOnIndex runs at insert time: for the freshly-indexed entry, query the existing embedding pool, take the top-3 cosine matches above 0.80, write (src=new, dst=match, relation='similar', weight=cosine) rows. Edges are unidirectional (the new entry points to its predecessors); the inverse direction is implied at read time.
supports — LinkSupports is called by the consolidator when it derives a rule from one or more journal entries. The rule entry points to every supporting evidence entry: relation='supports', weight=1.0. This is what makes consolidated rules traceable back to their evidence.
A future contradicts relation is reserved in the schema but not populated today.Read-side use: when injecting recalled hits into a prompt, the orchestrator can optionally walk one hop along similar edges to deepen context without paying for another embedding round-trip. Disabled by default; enable per-agent by setting agent.memory_walk_depth = 1.
type Hit struct { EntryID string Score float64 // cosine similarity in [0,1], 1=identical Age time.Duration // since original entry's ts Summary string EntryType string AgentID string Payload map[string]any}
Consumers typically weight fresher hits higher when injecting into prompts. episodic.RenderInjection(hits, maxChars) does this and returns a prompt-ready string, truncated to fit a char budget — used by the orchestrator’s episodicRecallAdapter before every agent run.
Each journal_embeddings row carries three additional fields that feed
recall ranking and a nightly decay job:
Column
Purpose
importance_score REAL
Multiplied into cosine at Recall sort time (range [0,1])
reference_count INTEGER
Incremented whenever a hit lands in a prompt
last_referenced_at TEXT
Stamped at the same time
Baseline comes from BaseImportance(entry_type, severity, priority)
(see internal/episodic/importance.go): peer escalations and eval
regressions seed high; info-level routine events seed 0.5; the
operator-applied priority marker floors the value at 0.80 (pin) /
0.85 (high) / 0.95 (permanent).Nightly decay — episodic.DecayAndReinforce recomputes every
row as:
importance = BASE × RecencyFactor × (1 + ReferenceBoost/8)
RecencyFactor(indexed_at, now) = max(0.1, 1 - days/180) — a 0.1
floor keeps old-but-critical memories from going to literal zero
ReferenceBoost(refs) = log₂(refs + 1) — frequently-recalled
entries lift, but the /8 divisor prevents runaway loops from
dominating over rare-but-critical ones
Recall-time use — Recall sorts candidates by
cosine × importance (not cosine alone), then top-K is returned.
Every returned hit gets MarkReferenced which increments the
reference counter so the next DecayAndReinforce lifts frequently-
hit memories.The formula combines base value, recency decay, and reference-count
reinforcement to keep durable signal ranking well even as it ages.
Effect: a six-month-old peer.escalation keeps ranking well after
one recall per week, while a stale low-value info entry falls off.
RenderInjection wraps its output in a <recalled-memory>…</recalled-memory>
block with an explicit “UNTRUSTED HINTS” preamble. The wrapper is
load-bearing: recalled entries may contain text authored by peers,
tools, or agent output — a past peer.escalation could carry an
"IGNORE PREVIOUS INSTRUCTIONS" payload without anyone realising.
The wrapper instructs the model to treat everything inside as hints
the current task can override, not as authoritative instructions.Same treatment applies to the orchestrator’s buildMemoryContext
blocks (AGENT.md / CREW.md) — both surfaces are agent-authored and
therefore both should be read by the model as hints.The wrapper follows the “treat all recalled content as untrusted
hints” pattern: anything inside a <recalled-memory> block is
guidance the current task can override, never authoritative
instruction.
type Embedder interface { Embed(ctx context.Context, text string) ([]float32, error) Dim() int ModelID() string}
Production wiring uses the Ollama adapter (embedder.go) against nomic-embed-text (768-dimensional vectors). The embedder shares the Keeper’s Ollama base URL — KEEPER_OLLAMA_URL, which defaults to http://localhost:11434 (the config falls back to that value when the env var is unset, so the embedder is configured by default). The embedder ends up nil only when that Ollama endpoint is unreachable at startup, which disables episodic recall — see the next paragraph.If the embedder is nil (Ollama unreachable at startup), Recall returns an empty slice silently — agent runs don’t fail on embedding outages. This is configured in server/orchestrator_adapters.go:newEpisodicRecallAdapter.
memory_health_snapshots captures one row per workspace per day (and optionally per crew). The score is a weighted sum of five metrics, each in [0, 100]:
Metric
Weight
Question it answers
Freshness
25%
How recent is the indexed content? Decays with median entry age.
Coverage
25%
What fraction of embeddable journal entries are actually indexed?
Coherence
20%
How tight is the embedding cluster structure? Higher = entries cleanly group around shared themes.
Efficiency
15%
Average cosine of the top-K hit distribution — high means recalls return distinct, relevant entries; low means the index returns noise.
Reachability
15%
Fraction of embeddings reachable via at least one memory_relations edge.
The composite score is reported with three colour bands:
Band
Range
Meaning
Red
< 50
Memory is degraded; likely missing recent entries, low embedding count, or weak clustering. Operator action: trigger consolidation, check Ollama health.
Yellow
50–75
Functional but with at least one weak metric. Watch the dashboard.
Green
≥ 75
All metrics in healthy range.
Health is exposed by internal/api/memory_health_handler.go over scores derived in internal/episodic/. The daily consolidator job recomputes it; operators can call it on demand via:
HTTP — GET /api/v1/memory/health[?crew_id=<slug>]. See Memory API.
The endpoint returns the latest snapshot in the table — it does not recompute on read. Scores are refreshed by the daily consolidator; there is no on-demand recompute flag (crewship memory health accepts only --crew). To force a fresh score sooner, trigger consolidation (crewship consolidate run).
When an agent run accumulates 60 or more new journal entries since its last memory.updated entry, the orchestrator injects a short reminder into the next system prompt:
You have 72 new journal entries since your last memory update. Consider appending any recurring pattern you’ve noticed to ~/.memory/AGENT.md before the session ends — the consolidator won’t replace your personal observations.
The threshold (nudgeThreshold in internal/orchestrator/memory.go) was raised from 30 to 60 because at 30 the nudge fired on essentially every session after a memory write. The nudge only fires once per session — once the agent emits a new memory.updated, the counter resets.The nudge is advisory — agents can ignore it, and it is stripped from the context before the LLM call records its response payload.
The Indexer (internal/episodic/indexer.go) runs a background sweeper loop on a 30-second poll (NewIndexer(..., poll)), processing up to 64 unindexed embeddable entries per sweepOnce.The server starts this sweeper automatically at boot when an embedder is configured (startEpisodicIndexer in internal/server/server_lifecycle.go, gated on KEEPER_OLLAMA_URL). When no embedder is configured the server instead logs one WARN at boot and reports "episodic": "sparse-only" on /healthz; crewship doctor reads that field and warns with the enable hint.Hot-path callers that want an embedding ready before the next recall call Indexer.IndexOne(ctx, entry) directly — typically right after writing a summary.generated entry:
err := x.IndexOne(ctx, entry)
Only entries whose type survives the coarse SQL filter (the EmbeddableEntryTypes slice) are candidates; the Go-side shouldEmbed then applies the severity-aware refinement.
Do not add exec.output_chunk to the embeddable list. Doing so will flood the embedding space with low-signal log chatter and kill recall quality. The comment in types.go is not a suggestion.
Ollama dependency. No Ollama = sparse-only recall (no vector similarity). Set KEEPER_OLLAMA_URL, plus OLLAMA_MODELS="/Volumes/SSD 990 PRO/ollama-models" (external SSD) and ollama serve before ./dev.sh start when testing locally. Check the active mode via GET /healthz (episodic field) or crewship doctor.
Embedding is best-effort. An embedder error during a sweepOnce pass logs and skips; the entry is retried next sweep.
Cosine on small sets is fine. Don’t pre-optimise for a vector DB until the per-agent scan latency exceeds 50ms.