Assertions Reference

Assertions validate that the LLM used your CLI correctly. Each task requires at least one assertion. Multiple assertions are ANDed — all must pass for the task to pass.

Need a custom check?

Use verify to run any shell command as an assertion. If you can check it in a terminal, you can assert on it.

1. exit_code

Check the process exit code.

assert:
  - exit_code: 0       # success
  - exit_code: 1       # expected failure

When to use: Almost every task should assert exit_code: 0 unless you're testing error handling.

2. output_contains

Check that stdout includes a substring.

assert:
  - output_contains: "Successfully created"
  - output_contains: "user: alice"

When to use: Verify the command produced expected output. Case-sensitive. Multiple output_contains assertions are ANDed.

3. output_equals

Check that stdout exactly matches a string (after trimming whitespace).

assert:
  - output_equals: "OK"

When to use: When the exact output matters (e.g., machine-parseable output). Both the actual output and expected string are trimmed of leading/trailing whitespace before comparison. Use sparingly — most CLIs have variable output.

4. error_contains

Check that stderr includes a substring.

assert:
  - error_contains: "warning: deprecated"
  - error_contains: "file not found"

When to use: Verify error messages, warnings, or diagnostic output written to stderr.

5. file_exists

Check that a file was created.

assert:
  - file_exists: "output.txt"
  - file_exists: "build/dist/bundle.js"

When to use: Verify file-creating commands (init, build, export, save). Paths are relative to the task working directory.

6. file_contains

Check that a file contains specific text.

assert:
  - file_contains:
      path: "config.json"
      text: '"name": "my-app"'

When to use: Verify file contents after generation or modification. The text is checked as a substring, not an exact match.

7. ran

Check that a command matching a regex was executed.

assert:
  - ran: "docker build"
  - ran: "git commit.*-m"
  - ran: "npm install.*react"

When to use: Verify the LLM ran the right command. The pattern is a regular expression matched against the full command string. If the regex is invalid, it falls back to substring matching. This is one of the most useful assertions.

8. not_ran

Check that a command matching a regex was NOT executed.

assert:
  - not_ran: "rm -rf /"
  - not_ran: "sudo"
  - not_ran: "DROP TABLE"

When to use: Safety checks to ensure dangerous commands were not run. Same regex behavior as ran — invalid patterns fall back to substring matching.

9. run_count

Check how many times a matching command was executed.

assert:
  - run_count:
      pattern: "curl"
      min: 2
      max: 5
  - run_count:
      pattern: "docker pull"
      min: 1

When to use: Verify that a command ran a specific number of times. Counts are inclusive (min means "at least", max means "at most"). Both min and max are optional — use either or both.

10. verify

Run any command after the task completes and assert on its result. This is the most flexible assertion — if you can check it with a shell command, you can assert on it.

assert:
  # Just check a command succeeds (exit code 0)
  - verify:
      run: "docker ps | grep my-container"

  # Check command output contains a substring
  - verify:
      run: "cat output.txt"
      output_contains: "expected data"

  # Check command output exactly matches
  - verify:
      run: "wc -l result.csv"
      output_equals: "100"

Field	Required	Description
`run`	Yes	Shell command to execute after the task
`output_contains`	No	Substring that must appear in stdout
`output_equals`	No	Exact string stdout must match (trimmed)

If you omit both output_contains and output_equals, verify just checks that the command exits with code 0. The exit code check always runs first — if the command fails, output checks are skipped.

When to use: Whenever the built-in assertions aren't enough. Common cases:

Checking database state after a migration
Verifying a container is running
Inspecting generated file structure (find, ls, tree)
Validating JSON/YAML output with jq / yq
Running your own test suite against the result

Combining Assertions

All assertions in a task are ANDed — every assertion must pass:

- id: create-and-verify
  intent: "Create a new configuration file with default settings"
  assert:
    - exit_code: 0
    - output_contains: "Created config.yaml"
    - file_exists: "config.yaml"
    - file_contains:
        path: "config.yaml"
        text: "version: 1"
    - not_ran: "rm"

Important: What Gets Checked

exit_code, output_contains, output_equals, error_contains all check the last command the LLM executed — not accumulated output from all turns
ran, not_ran, run_count check the full list of all commands run across all turns
file_exists, file_contains, verify run after the task completes, independent of command history
Command output is truncated to 2,000 characters in assertion results (the full output is still checked, but error messages show truncated values)

Tips

Always include exit_code: 0 unless testing error cases
Prefer ran over output_contains for checking which commands were used — ran checks all turns, output_contains only checks the last command
Use verify when stdout doesn't capture the result (e.g., file operations)
Keep regex patterns in ran/not_ran broad enough to allow flag reordering

1. exit_code​

2. output_contains​

3. output_equals​

4. error_contains​

5. file_exists​

6. file_contains​

7. ran​

8. not_ran​

9. run_count​

10. verify​

Combining Assertions​

Important: What Gets Checked​

Tips​