Diff
cmd/crewship/cmd_diff.go takes two existing run IDs, fetches their state plus the last assistant message of each linked chat, and renders a terse before/after view. Distinct from eval compare, which re-runs an eval scenario on two tiers — diff works strictly on what already happened.
Use cases:
- “Did v2 of my routine actually fix the bug?” → diff before/after
- “What changed between Friday’s prod run and Monday’s regression?”
- “Did the agent give two different answers to the same prompt?”
run-a <id>: <err> or run-b <id>: <err>.
crewship diff <run-a> <run-b>
| Flag | Type | Default | Effect |
|---|---|---|---|
--full | bool | false | Show full assistant output. Without it, each side is truncated to 1 KiB with an ellipsis. |
Examples
Output
Side-by-side headers (agent, status, started, finished) with status colours (green/red/yellow), then a minimal line-by-line diff of the assistant output:See also
crewship eval compare— re-run an eval scenario on two model tiers.crewship history— find the two run IDs.crewship explain <run-id>— explain what one run did, with journal context.