Documentation Index
Fetch the complete documentation index at: https://docs.crewship.ai/llms.txt
Use this file to discover all available pages before exploring further.
All endpoints require authentication and are workspace-scoped. Mutating endpoints (replay, regression) require OWNER or ADMIN role. Mission IDs must belong to the caller’s workspace — cross-tenant IDs return 404 with the same shape as “not found”. See the Quartermaster guide.
Replay and regression both return 202 Accepted immediately and perform the work in a 10-minute background goroutine. Poll via List Runs.
Queue a replay
Request body:
{
"mission_id": "MIS-42",
"seed": 42
}
| Field | Type | Required | Description |
|---|
mission_id | string | Yes | Target mission. Must be in the caller’s workspace. |
seed | integer | No | Deterministic seed recorded in the run row. 0 = server default. |
Response: 202 Accepted
{
"run_id": "er_a1b2c3d4e5f60718",
"status": "queued"
}
Errors:
| Status | Condition |
|---|
| 400 | Invalid JSON or missing mission_id. |
| 401 | No workspace. |
| 403 | Not OWNER/ADMIN. |
| 404 | mission_id not in your workspace. |
| 500 | DB / token generation failure. |
Queue a regression
POST /api/v1/eval/regression
Request body:
{
"baseline_mission_id": "MIS-41",
"candidate_mission_id": "MIS-42"
}
| Field | Type | Required | Description |
|---|
baseline_mission_id | string | Yes | The reference mission. |
candidate_mission_id | string | Yes | The mission under test. |
Both must be in the caller’s workspace. The handler checks them independently so a partial spoof still 404s.
Response: 202 Accepted
{
"run_id": "er_b2c3d4e5f6071829",
"status": "queued"
}
Errors: Same as replay, plus 400 if either mission ID is empty.
List runs
GET /api/v1/eval/runs?limit=50
Query parameters:
| Param | Type | Default | Description |
|---|
limit | integer | 50 | 1-200. |
Response: 200 OK
{
"rows": [
{
"id": "er_a1b2c3d4e5f60718",
"workspace_id": "ws_123",
"kind": "replay",
"mission_id": "MIS-42",
"baseline_mission_id": "",
"candidate_mission_id": "",
"seed": 42,
"seed_signature": "sha256:7c1b...",
"status": "completed",
"result": "ok",
"tokens": 184251,
"cost_usd": 0.8421,
"regressed": false,
"created_by": "user_123",
"created_at": "2026-04-17T10:00:00Z",
"updated_at": "2026-04-17T10:02:41Z"
},
{
"id": "er_b2c3d4e5f6071829",
"kind": "regression",
"baseline_mission_id": "MIS-41",
"candidate_mission_id": "MIS-42",
"status": "completed",
"result": "regressed: tool success -8% cost +22%",
"regressed": true,
"created_at": "2026-04-17T10:15:00Z"
}
],
"count": 2,
"limit": 50
}
| Field | Type | Description |
|---|
rows[].kind | string | replay or regression. |
rows[].status | string | queued, running, completed, failed. |
rows[].result | string | Human-readable outcome. On failure, the error message. |
rows[].seed_signature | string | sha256 over step-type + tool-name sequence; stable across deterministic replays. |
rows[].regressed | boolean | For regression kind only; true if at least one metric crossed the threshold. |
Tenancy and role gates
- All reads + writes scoped to the session’s workspace.
- Decisions on replay/regression require
OWNER or ADMIN.
- Cross-tenant mission IDs return 404, not 403, to avoid leaking cross-workspace existence.
Journal side-effects
The background worker emits eval.run_started at the start, eval.metric for each computed metric, and eval.regression_detected when a regression run crosses a threshold. Correlate by run_id in the payload.