Skip to main content

Troubleshooting

Common issues and how to fix them.

Config Issues

"Could not find cli-bench.yaml"

cli-bench looks for cli-bench.yaml in the current directory.

Fix: Run from the directory containing your config, or use:

cliwatch validate --file path/to/cli-bench.yaml

"Invalid config" / Validation Errors

Run cliwatch validate for detailed error information:

cliwatch validate

Common schema errors:

  • Missing required fields (cli, tasks, id, intent, assert)
  • Empty assert array (at least one assertion required)
  • Invalid difficulty value (must be easy, medium, or hard)
  • Invalid max_turns (must be 1-20)
  • Invalid repeat (must be 1-100)

Duplicate Task IDs

All task IDs must be unique across the entire config, including tasks loaded from external file:// references.

Provider Issues

"Missing AI_GATEWAY_API_KEY"

All model calls go through the Vercel AI Gateway. Set the key:

export AI_GATEWAY_API_KEY="vck_..."

In CI, add AI_GATEWAY_API_KEY as a repository secret. See Providers & Models for details.

"Model not found" / API Errors

  • Check the model ID format: provider/model-id
  • Verify the model ID is valid (see Providers)
  • Check that your API key has access to the specified model

CLI Issues

"CLI not found in PATH"

The cli value must be executable:

which mycli          # Check if available
cli: ./bin/mycli # Use a path for local binaries

"version_command failed"

If your CLI doesn't have a version command, remove the version_command field from your config.

Task Issues

Tasks Failing Unexpectedly

  1. Check your assertions — is the regex in ran too strict? Allow for flag reordering
  2. Increase max_turns — the LLM might need more interaction rounds
  3. Check your intent — is it clear and specific?

"Invalid regex" in ran/not_ran

The pattern must be a valid JavaScript regex:

# Wrong — unescaped special characters
- ran: "docker build (tag)"

# Right
- ran: "docker build.*tag"

Upload Issues

"Failed to upload results"

Check your CLIWatch API key:

echo $CLIWATCH_API_KEY

If you don't want to upload, set upload: never in your config.

Performance

Benchmarks Are Slow

  • Start with help_modes: [injected] (fastest mode)
  • Lower max_turns on easy tasks
  • Use fewer providers during development
  • Test with smaller models first (claude-haiku-4-5, gpt-4o-mini)

High Token Usage

  • Use simpler intents that require fewer turns
  • Use injected help mode (avoids discovery turns)
  • Reduce max_turns to limit conversation length

Getting Help