Troubleshooting

Common issues and how to fix them.

Config Issues

"Could not find cli-bench.yaml"

cli-bench looks for cli-bench.yaml in the current directory.

Fix: Run from the directory containing your config, or use:

cliwatch validate --file path/to/cli-bench.yaml

"Invalid config" / Validation Errors

Run cliwatch validate for detailed error information:

cliwatch validate

Common schema errors:

Missing required fields (cli, tasks, id, intent, assert)
Empty assert array (at least one assertion required)
Invalid difficulty value (must be easy, medium, or hard)
Invalid max_turns (must be 1-20)
Invalid repeat (must be 1-100)

Duplicate Task IDs

All task IDs must be unique across the entire config, including tasks loaded from external file:// references.

Provider Issues

"Missing AI_GATEWAY_API_KEY"

All model calls go through the Vercel AI Gateway. Set the key:

export AI_GATEWAY_API_KEY="vck_..."

In CI, add AI_GATEWAY_API_KEY as a repository secret. See Providers & Models for details.

"Model not found" / API Errors

Check the model ID format: provider/model-id
Verify the model ID is valid (see Providers)
Check that your API key has access to the specified model

CLI Issues

"CLI not found in PATH"

The cli value must be executable:

which mycli          # Check if available
cli: ./bin/mycli     # Use a path for local binaries

"version_command failed"

If your CLI doesn't have a version command, remove the version_command field from your config.

Task Issues

Tasks Failing Unexpectedly

Check your assertions — is the regex in ran too strict? Allow for flag reordering
Increase max_turns — the LLM might need more interaction rounds
Check your intent — is it clear and specific?

"Invalid regex" in ran/not_ran

The pattern must be a valid JavaScript regex:

# Wrong — unescaped special characters
- ran: "docker build (tag)"

# Right
- ran: "docker build.*tag"

Upload Issues

"Failed to upload results"

Check your CLIWatch API key:

echo $CLIWATCH_API_KEY

If you don't want to upload, set upload: never in your config.

Performance

Benchmarks Are Slow

Start with help_modes: [injected] (fastest mode)
Lower max_turns on easy tasks
Use fewer providers during development
Test with smaller models first (claude-haiku-4-5, gpt-4o-mini)

High Token Usage

Use simpler intents that require fewer turns
Use injected help mode (avoids discovery turns)
Reduce max_turns to limit conversation length

Getting Help

Run cliwatch skills <topic> for built-in guides
File issues at github.com/cliwatch/cli-bench

Config Issues​

"Could not find cli-bench.yaml"​

"Invalid config" / Validation Errors​

Duplicate Task IDs​

Provider Issues​

"Missing AI_GATEWAY_API_KEY"​

"Model not found" / API Errors​

CLI Issues​

"CLI not found in PATH"​

"version_command failed"​

Task Issues​

Tasks Failing Unexpectedly​

"Invalid regex" in ran/not_ran​

Upload Issues​

"Failed to upload results"​

Performance​

Benchmarks Are Slow​

High Token Usage​

Getting Help​