CLIWatch Docs

Agent-readiness testing for CLIs — benchmark how well AI coding agents can use your command-line tool, catch regressions in CI, and get PR comments with pass rates.

Quick Start

Option 1: Let your AI assistant set it up

Paste this into Claude Code, Cursor, or Codex:

Install @cliwatch/cli globally, then run cliwatch skills to read the setup docs. Use cliwatch init --ci to scaffold the benchmark config and a GitHub Actions workflow. Make sure CLIWATCH_API_KEY and AI_GATEWAY_API_KEY are set as GitHub secrets so results upload and I get PR comments.

Option 2: Set up manually

1. Create a task suite

Create a cli-bench.yaml in your project root (next to package.json). This is where cli-bench looks when you run it:

cli: mycli
version_command: "mycli --version"

tasks:
  - id: show-help
    intent: "Show the help information for mycli"
    difficulty: easy
    assert:
      - exit_code: 0
      - output_contains: "usage"

  - id: create-project
    intent: "Create a new project called my-app"
    difficulty: medium
    assert:
      - exit_code: 0
      - file_exists: "my-app/package.json"

See the full cli-bench.yaml Reference for all options.

2. Run the benchmark

npm install -g @cliwatch/cli-bench

export AI_GATEWAY_API_KEY="vck_..."
export CLIWATCH_API_KEY="cw_..."

# Run from the directory containing cli-bench.yaml
cli-bench --upload

All model calls go through the Vercel AI Gateway — one key for all providers. Create a CLIWatch API key at app.cliwatch.com.

3. Add to CI

Create .github/workflows/cliwatch.yml:

name: CLIWatch Benchmarks
on:
  pull_request:
  push:
    branches: [main]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
      - run: npm install -g @cliwatch/cli-bench
      - run: cli-bench
        env:
          AI_GATEWAY_API_KEY: ${{ secrets.AI_GATEWAY_API_KEY }}
          CLIWATCH_API_KEY: ${{ secrets.CLIWATCH_API_KEY }}

See the full GitHub Actions guide for caching, thresholds, and PR comments.

4. View results

Results appear at app.cliwatch.com with:

Pass rate matrix — which tasks pass on which models
Trend charts — track pass rates across releases
PR comments — benchmark results posted on every pull request

What's Next?

cli-bench.yaml Reference — Full config schema
Assertions — All 10 assertion types
Providers & Models — Supported LLMs and model IDs
GitHub Actions — CI setup with thresholds and PR comments
Writing Effective Tasks — Tips for good intents and test strategy
Troubleshooting — Common issues and debugging

Quick Start​

Option 1: Let your AI assistant set it up​

Option 2: Set up manually​

1. Create a task suite​

2. Run the benchmark​

3. Add to CI​

4. View results​

What's Next?​