Troubleshooting
Common issues and how to fix them.
Config Issues
"Could not find cli-bench.yaml"
cli-bench looks for cli-bench.yaml in the current directory.
Fix: Run from the directory containing your config, or use:
cliwatch validate --file path/to/cli-bench.yaml
"Invalid config" / Validation Errors
Run cliwatch validate for detailed error information:
cliwatch validate
Common schema errors:
- Missing required fields (
cli,tasks,id,intent,assert) - Empty
assertarray (at least one assertion required) - Invalid
difficultyvalue (must beeasy,medium, orhard) - Invalid
max_turns(must be 1-20) - Invalid
repeat(must be 1-100)
Duplicate Task IDs
All task IDs must be unique across the entire config, including tasks loaded from external file:// references.
Provider Issues
"Missing AI_GATEWAY_API_KEY"
All model calls go through the Vercel AI Gateway. Set the key:
export AI_GATEWAY_API_KEY="vck_..."
In CI, add AI_GATEWAY_API_KEY as a repository secret. See Providers & Models for details.
"Model not found" / API Errors
- Check the model ID format:
provider/model-id - Verify the model ID is valid (see Providers)
- Check that your API key has access to the specified model
CLI Issues
"CLI not found in PATH"
The cli value must be executable:
which mycli # Check if available
cli: ./bin/mycli # Use a path for local binaries
"version_command failed"
If your CLI doesn't have a version command, remove the version_command field from your config.
Task Issues
Tasks Failing Unexpectedly
- Check your assertions — is the regex in
rantoo strict? Allow for flag reordering - Increase
max_turns— the LLM might need more interaction rounds - Check your intent — is it clear and specific?
"Invalid regex" in ran/not_ran
The pattern must be a valid JavaScript regex:
# Wrong — unescaped special characters
- ran: "docker build (tag)"
# Right
- ran: "docker build.*tag"
Upload Issues
"Failed to upload results"
Check your CLIWatch API key:
echo $CLIWATCH_API_KEY
If you don't want to upload, set upload: never in your config.
Performance
Benchmarks Are Slow
- Start with
help_modes: [injected](fastest mode) - Lower
max_turnson easy tasks - Use fewer providers during development
- Test with smaller models first (
claude-haiku-4-5,gpt-4o-mini)
High Token Usage
- Use simpler intents that require fewer turns
- Use
injectedhelp mode (avoids discovery turns) - Reduce
max_turnsto limit conversation length
Getting Help
- Run
cliwatch skills <topic>for built-in guides - File issues at github.com/cliwatch/cli-bench