Skip to main content

Assertions Reference

Assertions validate that the LLM used your CLI correctly. Each task requires at least one assertion. Multiple assertions are ANDed — all must pass for the task to pass.

Need a custom check?

Use verify to run any shell command as an assertion. If you can check it in a terminal, you can assert on it.

1. exit_code

Check the process exit code.

assert:
- exit_code: 0 # success
- exit_code: 1 # expected failure

When to use: Almost every task should assert exit_code: 0 unless you're testing error handling.

2. output_contains

Check that stdout includes a substring.

assert:
- output_contains: "Successfully created"
- output_contains: "user: alice"

When to use: Verify the command produced expected output. Case-sensitive. Multiple output_contains assertions are ANDed.

3. output_equals

Check that stdout exactly matches a string (after trimming whitespace).

assert:
- output_equals: "OK"

When to use: When the exact output matters (e.g., machine-parseable output). Both the actual output and expected string are trimmed of leading/trailing whitespace before comparison. Use sparingly — most CLIs have variable output.

4. error_contains

Check that stderr includes a substring.

assert:
- error_contains: "warning: deprecated"
- error_contains: "file not found"

When to use: Verify error messages, warnings, or diagnostic output written to stderr.

5. file_exists

Check that a file was created.

assert:
- file_exists: "output.txt"
- file_exists: "build/dist/bundle.js"

When to use: Verify file-creating commands (init, build, export, save). Paths are relative to the task working directory.

6. file_contains

Check that a file contains specific text.

assert:
- file_contains:
path: "config.json"
text: '"name": "my-app"'

When to use: Verify file contents after generation or modification. The text is checked as a substring, not an exact match.

7. ran

Check that a command matching a regex was executed.

assert:
- ran: "docker build"
- ran: "git commit.*-m"
- ran: "npm install.*react"

When to use: Verify the LLM ran the right command. The pattern is a regular expression matched against the full command string. If the regex is invalid, it falls back to substring matching. This is one of the most useful assertions.

8. not_ran

Check that a command matching a regex was NOT executed.

assert:
- not_ran: "rm -rf /"
- not_ran: "sudo"
- not_ran: "DROP TABLE"

When to use: Safety checks to ensure dangerous commands were not run. Same regex behavior as ran — invalid patterns fall back to substring matching.

9. run_count

Check how many times a matching command was executed.

assert:
- run_count:
pattern: "curl"
min: 2
max: 5
- run_count:
pattern: "docker pull"
min: 1

When to use: Verify that a command ran a specific number of times. Counts are inclusive (min means "at least", max means "at most"). Both min and max are optional — use either or both.

10. verify

Run any command after the task completes and assert on its result. This is the most flexible assertion — if you can check it with a shell command, you can assert on it.

assert:
# Just check a command succeeds (exit code 0)
- verify:
run: "docker ps | grep my-container"

# Check command output contains a substring
- verify:
run: "cat output.txt"
output_contains: "expected data"

# Check command output exactly matches
- verify:
run: "wc -l result.csv"
output_equals: "100"
FieldRequiredDescription
runYesShell command to execute after the task
output_containsNoSubstring that must appear in stdout
output_equalsNoExact string stdout must match (trimmed)

If you omit both output_contains and output_equals, verify just checks that the command exits with code 0. The exit code check always runs first — if the command fails, output checks are skipped.

When to use: Whenever the built-in assertions aren't enough. Common cases:

  • Checking database state after a migration
  • Verifying a container is running
  • Inspecting generated file structure (find, ls, tree)
  • Validating JSON/YAML output with jq / yq
  • Running your own test suite against the result

Combining Assertions

All assertions in a task are ANDed — every assertion must pass:

- id: create-and-verify
intent: "Create a new configuration file with default settings"
assert:
- exit_code: 0
- output_contains: "Created config.yaml"
- file_exists: "config.yaml"
- file_contains:
path: "config.yaml"
text: "version: 1"
- not_ran: "rm"

Important: What Gets Checked

  • exit_code, output_contains, output_equals, error_contains all check the last command the LLM executed — not accumulated output from all turns
  • ran, not_ran, run_count check the full list of all commands run across all turns
  • file_exists, file_contains, verify run after the task completes, independent of command history
  • Command output is truncated to 2,000 characters in assertion results (the full output is still checked, but error messages show truncated values)

Tips

  • Always include exit_code: 0 unless testing error cases
  • Prefer ran over output_contains for checking which commands were used — ran checks all turns, output_contains only checks the last command
  • Use verify when stdout doesn't capture the result (e.g., file operations)
  • Keep regex patterns in ran/not_ran broad enough to allow flag reordering