Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.crewship.ai/llms.txt

Use this file to discover all available pages before exploring further.

Orchestration

Crewship’s orchestration system manages multi-agent missions through the MissionEngine (internal/orchestrator/mission.go). It handles task scheduling, dependency resolution, failure recovery, and cross-crew coordination.
“Orchestration” here means the engine subsystem in internal/orchestrator/, not a navigable page. After the Plan/Run/Build/System IA refactor, the user-facing surfaces are split: Routines for reusable recipes, Issues for the work-item tracker, Inbox for your actionable feed, and Activity for the live trace canvas. The legacy /orchestration route now soft-redirects to /activity.

Mission Lifecycle

PLANNING --> IN_PROGRESS --> REVIEW --> COMPLETED
                |              |
                v              v
             FAILED        CANCELLED
A mission progresses through these states:
  1. PLANNING — Mission created, tasks defined (or waiting for Lead to plan)
  2. IN_PROGRESS — Tasks being scheduled and executed
  3. REVIEW — All tasks finished (none failed). The mission enters review before final completion, allowing humans to inspect results
  4. COMPLETED — Mission accepted after review
  5. FAILED — A task failed and could not recover, or deadlock/timeout detected
  6. CANCELLED — Manually stopped by user or system
The REVIEW state is inserted between IN_PROGRESS and COMPLETED. When all tasks reach a terminal state (COMPLETED, FAILED, or SKIPPED) and none have failed, the mission transitions to REVIEW rather than directly to COMPLETED. If any task failed, the mission transitions to FAILED instead.

The Mission Engine

The MissionEngine is the central orchestrator. Key parameters:
ParameterValueSource
Polling interval3 secondstime.NewTicker(3 * time.Second)
Circuit breaker threshold3 consecutive failurescircuitBreakerThreshold = 3
Mission timeout2 hoursmissionTimeoutDefault = 2 * time.Hour
Max result summary8,000 charsmaxResultSummaryLen = 8000
Max brief total32,000 bytesmaxBriefTotalLen = 32000
Per-dependency output truncation4,000 charsmaxDepOutputLen = 4000

Mission Loop

The runMissionLoop function runs as a goroutine for each active mission. Every 3 seconds it:
1. Check mission status (still IN_PROGRESS?)
2. Lead planning phase: if 0 tasks, dispatch Lead to create plan
3. Schedule ready tasks (dependencies met, status PENDING)
4. Check mission completion (all tasks done?)
5. Detect deadlocks (all tasks BLOCKED with no progress)

Task States

PENDING --> RUNNING --> COMPLETED
   ^          |             |
   |          v             v
   +------ FAILED    AWAITING_APPROVAL ---> COMPLETED
              |                         |
              v                         v
           BLOCKED                    FAILED (rejected)
           SKIPPED
StateDescription
PENDINGReady to be scheduled
RUNNINGCurrently being executed by an agent
COMPLETEDFinished successfully
FAILEDExecution failed
BLOCKEDWaiting for dependent tasks to complete
AWAITING_APPROVALTask completed but held for human review before proceeding
SKIPPEDTask was intentionally skipped (counts as terminal, does not cause mission failure)
SKIPPED tasks are treated as terminal alongside COMPLETED and FAILED when checking mission completion. A skipped task does not block downstream dependencies and does not cause mission failure.

Token Budget Calculation

The orchestrator allocates system prompt space using a token budget system defined in internal/tokenutil:
ConstantValueDescription
MaxSystemPromptTokens32,000Total conservative budget for the system prompt
ConversationBudgetPct60%Percentage of remaining budget for conversation history
MemoryBudgetPct40%Percentage of remaining budget for agent memory
The allocation works as follows:
1. Estimate base system prompt tokens
2. remaining = MaxSystemPromptTokens - baseTokens (min 2,000)
3. convTokenBudget  = remaining * 60 / 100
4. memTokenBudget   = remaining * 40 / 100
5. Inject conversation history (up to convTokenBudget)
6. Inject memory context (up to memTokenBudget)
After conversation and memory injection, the orchestrator appends additional context blocks in order: lead crew context (for LEAD agents), peer communication context (for crew AGENT members).

Mission Brief Construction

When an agent is dispatched for a mission task, the buildMissionBrief function constructs a rich context prompt with five sections:

1. IMPORTANT Preamble

Only included when dependency outputs exist. Instructs the agent not to ask clarifying questions:
IMPORTANT: You are part of a multi-agent mission pipeline.
Previous tasks have already been completed and their outputs are provided below.
DO NOT ask for additional information or clarification -- everything you need is in this prompt.
Use the dependency outputs below as your input and execute your task immediately.

2. [MISSION]

Mission title, goal, and a DAG overview listing all tasks with their status markers:
  • + COMPLETED
  • > IN_PROGRESS
  • x FAILED
  • PENDING/BLOCKED

3. [INPUT FROM PREVIOUS TASKS]

Outputs from completed dependency tasks, injected before the assignment so agents read context first. When a task produced a structured handoff block, only the handoff summary, artifacts, and confidence are included (more concise). Otherwise the full result summary is included, truncated to 4,000 characters per dependency.

4. [YOUR ASSIGNMENT]

The specific task title, description, and iteration number (if this is a retry).

5. [OUTPUT FORMAT]

Structured handoff instructions requiring the agent to produce a ---HANDOFF--- block with summary, confidence, and artifacts. The total brief is capped at 32KB (maxBriefTotalLen). If exceeded, the brief is truncated with a note.

Lead Planning Phase

When a mission starts with 0 tasks, the engine dispatches the Lead agent to create a plan. The Lead uses its crew context to understand available agents and creates tasks via the sidecar /mission/create endpoint.
Mission (0 tasks)
    |
    v
Lead agent dispatched (LEAD role, with sidecar)
    |
    v
Lead creates tasks via curl to localhost:9119/mission/create
    |
    v
Mission engine detects new tasks -> begins scheduling

LeadPlanning Flag

The DispatchRequest includes a LeadPlanning flag that tells the API layer to dispatch the agent as a LEAD with sidecar access. This is essential because Lead agents need access to the mission management API (/mission/create, /mission/{id}) to define tasks, while regular AGENT tasks skip the sidecar for security.

TOCTOU Prevention

A time-of-check-to-time-of-use race is prevented by inserting a sentinel missionState into the active map before loading the mission from the database. The planningDispatched flag on the mission state prevents re-dispatching the Lead if it is still working. This flag is only set to true after dispatchLeadPlanning succeeds.

Scaling Rules

The Lead agent follows complexity-based scaling rules injected via the system prompt:
ComplexityAgentsTool CallsDurationTokens
SIMPLE13-10~5 min~10K
MEDIUM1-210-15~15 min~50K
COMPLEX2-415+~30 min~100K

Workflow Templates

Four built-in workflow templates are defined in internal/orchestrator/workflow.go:
Tasks execute one after another in order.
step-1 --> step-2 --> step-3

The Ralph Loop Pattern

The LoopController (internal/orchestrator/loop.go) manages task retry logic:
  1. When a task fails and has max_iterations > 1, the controller increments the iteration counter and resets the task to PENDING
  2. For loop-back patterns (dev-test-loop), when a downstream task fails, the upstream task is reset to restart the cycle
  3. Previous failure context from the progress log is injected so the agent learns from mistakes
The ShouldRetry method checks if a failed task has remaining iterations. If yes, it resets the task:
  • Status back to PENDING
  • Iteration counter incremented
  • All execution fields cleared (assignment_id, result_summary, error_message, started_at, completed_at, duration_ms)
The RetryLoopBack method handles the upstream reset pattern: when a downstream task (e.g., “test”) fails, it checks the dependency chain. If an upstream task (e.g., “develop”) has remaining iterations, that task is reset to PENDING and the failed downstream task is set to BLOCKED — ready to run again once the upstream completes.
Tasks without max_iterations set (or max_iterations <= 1) are never retried. A failed task without retry configuration causes the mission to fail.

Task Approval Gate

The checkApprovalGate function determines whether a completed task should be held for human review. The gate evaluates three inputs:
  1. Explicit flag — if approval_required = 1 on the task, it is always held
  2. Confidence threshold — the agent’s self-reported confidence from the handoff block
  3. Escalation config — per-crew configuration with tiered thresholds

Escalation Config

Each crew can define an escalation_config JSON object with three thresholds:
{
  "auto_approve_threshold": 0.9,
  "notify_threshold": 0.7,
  "require_approval_below": 0.5
}
ThresholdBehavior
auto_approve_thresholdConfidence at or above this value: auto-approve (task goes to COMPLETED)
notify_thresholdConfidence below this value: send a confidence.low WebSocket notification
require_approval_belowConfidence below this value: hold the task in AWAITING_APPROVAL
The evaluation order is:
  1. If confidence >= auto_approve_threshold, return COMPLETED
  2. If approval_required is explicitly set, return AWAITING_APPROVAL
  3. If no config or no confidence data, return COMPLETED
  4. If confidence < require_approval_below, return AWAITING_APPROVAL
  5. If confidence < notify_threshold, send notification but return COMPLETED

Approving or Rejecting Tasks

The ApproveTask method transitions a task from AWAITING_APPROVAL:
  • Approved: task moves to COMPLETED, dependent BLOCKED tasks are unblocked
  • Rejected: task moves to FAILED, all downstream dependent tasks are recursively failed with reason “upstream task rejected”
Approval requires a userID for the audit trail. The approval status (APPROVED or REJECTED), approver, timestamp, and evaluation notes are persisted on the task.
When a task is held in AWAITING_APPROVAL, the mission engine sends an approval.required WebSocket message to the workspace so dashboards can display a badge or notification.

Circular Dependency Detection

The ValidateDAG method checks all mission tasks for:
  1. References to nonexistent task IDs — any depends_on entry that does not match an existing task ID causes validation to fail
  2. Circular dependencies — detected using Kahn’s algorithm (topological sort)

Kahn’s Algorithm

The implementation builds an adjacency list and computes in-degrees for each task:
1. Initialize in-degree for each task based on depends_on count
2. Enqueue all tasks with in-degree 0 (no dependencies)
3. For each dequeued task, decrement in-degree of tasks that depend on it
4. If a task's in-degree reaches 0, enqueue it
5. If visited count != total tasks, a cycle exists
The error message reports the number of tasks involved in the cycle: "circular dependency detected: N tasks involved in cycle". DAG validation runs before the mission loop begins scheduling, preventing tasks from being dispatched into an unresolvable dependency graph.

Deadlock Detection

The mission engine detects deadlocks when all remaining tasks are BLOCKED with no task currently IN_PROGRESS, PENDING, or AWAITING_APPROVAL. The detection logic:
  1. If any task is PENDING, IN_PROGRESS, or AWAITING_APPROVAL — not deadlocked (progress is still possible)
  2. COMPLETED, SKIPPED, and FAILED tasks are terminal — they cannot contribute to progress
  3. If all non-terminal tasks are BLOCKED — deadlock confirmed
When a deadlock is detected:
  1. The mission is marked as FAILED
  2. A mission_deadlock progress event is emitted
  3. All AWAITING_APPROVAL tasks are failed with “mission timed out”

Circuit Breaker

The circuit breaker tracks consecutive failures per agent. After 3 consecutive failures (circuitBreakerThreshold), the agent is considered unhealthy and tasks are not dispatched to it.

CooldownManager

The CooldownManager (internal/orchestrator/failover.go) handles rate limit detection and credential cooldown. When an agent run fails due to a rate limit, the associated credential is placed in a cooldown period to avoid hammering the provider.

Rate Limit Detection

The IsRateLimitError function checks stderr output against known patterns:
PatternExample
rate limit”Rate limit exceeded”
rate_limit”rate_limit_error”
429”HTTP 429”
too many requests”Too many requests”
quota exceeded”Quota exceeded for model”
insufficient_quota”insufficient_quota”
billing_hard_limit”billing_hard_limit_reached”
Detection requires exit code 1 and a case-insensitive match against any of these patterns.

Cooldown Behavior

When a rate limit is detected:
  1. MarkCooldown(credentialID, 5*time.Minute) places the credential in a 5-minute cooldown
  2. IsInCooldown(credentialID) returns true during this period, causing the orchestrator to skip that credential
  3. ClearExpired() removes stale entries
The cooldown is per-credential, not per-agent. If an agent has multiple credentials assigned, only the rate-limited credential is paused — the orchestrator can fall back to an alternate credential.

Progress Logging

The ProgressWriter (internal/orchestrator/progress.go) appends structured JSONL events to a per-mission progress file at data/crews/{crewSlug}/missions/{traceID}/progress.jsonl.

Event Types

EventFieldsWhen
mission_startedmission_idMission loop begins
task_startedtask_id, agent, titleTask dispatched to agent
task_COMPLETEDtask_id, agent, summaryTask finished successfully
task_FAILEDtask_id, agent, errorTask execution failed
task_retrytask_id, agentLoopController resets a task for retry
mission_deadlockmission_idAll tasks BLOCKED with no progress
mission_REVIEWmission_idAll tasks terminal, mission entering review
mission_timeoutmission_idMission exceeded 2-hour timeout
Each event includes a UTC timestamp. The progress file is append-only and agents can read it during retry iterations to understand what happened in previous attempts (the Ralph Loop “external state” pattern). The BuildProgressContext method formats the JSONL into a human-readable text block suitable for injection into an agent’s system prompt.

Structured Handoff

Agents produce structured handoff data at the end of tasks:
---HANDOFF---
summary: Created the REST API endpoints for user management
confidence: high
artifacts: internal/api/users.go, internal/api/users_test.go
---END HANDOFF---
The parseHandoff function extracts this structure from agent output. Both summary and confidence are required for a valid handoff — partial blocks are treated as unparsed. The confidence value (low, medium, high) feeds into the approval gate. When parsed as a float (via escalation config), it determines whether the task auto-approves or requires human review.

Cross-Crew Missions

Mission tasks can reference agents from connected crews. The system auto-routes assignments to the correct crew container. Crew connections must be established by workspace admins before use.
Crew-to-crew handoff with critique exchange (e.g. backend crew hands a draft to a testing crew for review) is on the v0.2 roadmap.

Sidecar API for Orchestration

Lead agents interact with the orchestration system through the sidecar proxy at localhost:9119:
EndpointMethodDescription
/assignPOSTAssign a task to a crew member
/results/{id}GETPoll for assignment result
/queryPOSTAsk a crew member a quick question
/standupGETGet crew standup summary
/escalatePOSTEscalate an issue to humans
/mission/createPOSTCreate a multi-task mission
/mission/{id}GETCheck mission status
/mission/{id}/startPOSTStart a mission
/mission/templatesGETList available workflow templates

What’s Next

  • Keeper — persistent agent memory across sessions
  • Scheduling — cron-based automated agent runs