Assertions Reference
Assertions validate that the LLM used your CLI correctly. Each task requires at least one assertion. Multiple assertions are ANDed — all must pass for the task to pass.
Use verify to run any shell command as an assertion. If you can check it in a terminal, you can assert on it.
1. exit_code
Check the process exit code.
assert:
- exit_code: 0 # success
- exit_code: 1 # expected failure
When to use: Almost every task should assert exit_code: 0 unless you're testing error handling.
2. output_contains
Check that stdout includes a substring.
assert:
- output_contains: "Successfully created"
- output_contains: "user: alice"
When to use: Verify the command produced expected output. Case-sensitive. Multiple output_contains assertions are ANDed.
3. output_equals
Check that stdout exactly matches a string (after trimming whitespace).
assert:
- output_equals: "OK"
When to use: When the exact output matters (e.g., machine-parseable output). Both the actual output and expected string are trimmed of leading/trailing whitespace before comparison. Use sparingly — most CLIs have variable output.
4. error_contains
Check that stderr includes a substring.
assert:
- error_contains: "warning: deprecated"
- error_contains: "file not found"
When to use: Verify error messages, warnings, or diagnostic output written to stderr.
5. file_exists
Check that a file was created.
assert:
- file_exists: "output.txt"
- file_exists: "build/dist/bundle.js"
When to use: Verify file-creating commands (init, build, export, save). Paths are relative to the task working directory.
6. file_contains
Check that a file contains specific text.
assert:
- file_contains:
path: "config.json"
text: '"name": "my-app"'
When to use: Verify file contents after generation or modification. The text is checked as a substring, not an exact match.
7. ran
Check that a command matching a regex was executed.
assert:
- ran: "docker build"
- ran: "git commit.*-m"
- ran: "npm install.*react"
When to use: Verify the LLM ran the right command. The pattern is a regular expression matched against the full command string. If the regex is invalid, it falls back to substring matching. This is one of the most useful assertions.
8. not_ran
Check that a command matching a regex was NOT executed.
assert:
- not_ran: "rm -rf /"
- not_ran: "sudo"
- not_ran: "DROP TABLE"
When to use: Safety checks to ensure dangerous commands were not run. Same regex behavior as ran — invalid patterns fall back to substring matching.
9. run_count
Check how many times a matching command was executed.
assert:
- run_count:
pattern: "curl"
min: 2
max: 5
- run_count:
pattern: "docker pull"
min: 1
When to use: Verify that a command ran a specific number of times. Counts are inclusive (min means "at least", max means "at most"). Both min and max are optional — use either or both.
10. verify
Run any command after the task completes and assert on its result. This is the most flexible assertion — if you can check it with a shell command, you can assert on it.
assert:
# Just check a command succeeds (exit code 0)
- verify:
run: "docker ps | grep my-container"
# Check command output contains a substring
- verify:
run: "cat output.txt"
output_contains: "expected data"
# Check command output exactly matches
- verify:
run: "wc -l result.csv"
output_equals: "100"
| Field | Required | Description |
|---|---|---|
run | Yes | Shell command to execute after the task |
output_contains | No | Substring that must appear in stdout |
output_equals | No | Exact string stdout must match (trimmed) |
If you omit both output_contains and output_equals, verify just checks that the command exits with code 0. The exit code check always runs first — if the command fails, output checks are skipped.
When to use: Whenever the built-in assertions aren't enough. Common cases:
- Checking database state after a migration
- Verifying a container is running
- Inspecting generated file structure (
find,ls,tree) - Validating JSON/YAML output with
jq/yq - Running your own test suite against the result
Combining Assertions
All assertions in a task are ANDed — every assertion must pass:
- id: create-and-verify
intent: "Create a new configuration file with default settings"
assert:
- exit_code: 0
- output_contains: "Created config.yaml"
- file_exists: "config.yaml"
- file_contains:
path: "config.yaml"
text: "version: 1"
- not_ran: "rm"
Important: What Gets Checked
exit_code,output_contains,output_equals,error_containsall check the last command the LLM executed — not accumulated output from all turnsran,not_ran,run_countcheck the full list of all commands run across all turnsfile_exists,file_contains,verifyrun after the task completes, independent of command history- Command output is truncated to 2,000 characters in assertion results (the full output is still checked, but error messages show truncated values)
Tips
- Always include
exit_code: 0unless testing error cases - Prefer
ranoveroutput_containsfor checking which commands were used —ranchecks all turns,output_containsonly checks the last command - Use
verifywhen stdout doesn't capture the result (e.g., file operations) - Keep regex patterns in
ran/not_ranbroad enough to allow flag reordering